#### Task

Create a schema and a streaming dataframe for the JSON files in the following path:  
"/mnt/training/gaming_data/mobile_streaming_events_b/"
  
  
Use the following as basis for creating your schema:  
 |-- eventName: string (nullable = true)  
 |-- eventParams: struct (nullable = true)  
 |    |-- amount: double (nullable = true)  
 |    |-- app_name: string (nullable = true)  
 |    |-- app_version: string (nullable = true)  
 |    |-- client_event_time: string (nullable = true)  
 |    |-- device_id: string (nullable = true)  
 |    |-- game_keyword: string (nullable = true)  
 |    |-- platform: string (nullable = true)  
 |    |-- scoreAdjustment: long (nullable = true)  
  
Read in 2 files per trigger.
  
Create a new modified dataframe:
* keep only rows where eventName is "scoreAdjustment"
* select the *game_keyword*, *platform* and *scoreAdjustment* columns from the eventParams struct.  
* set trigger to run every 5 seconds.

Write the datastream to a delta table called score_adjustments.  
Check to make sure that the table has some data.  
Then stop the datastream.

In [0]:
from pyspark.sql.types import *
schema = StructType([
  StructField('eventName', StringType(), True), 
  StructField('eventParams', StructType([
    StructField('amount', DoubleType(), True), 
    StructField('app_name', StringType(), True), 
    StructField('app_version', StringType(), True), 
    StructField('client_event_time', StringType(), True), 
    StructField('device_id', StringType(), True), 
    StructField('game_keyword', StringType(), True), 
    StructField('platform', StringType(), True), 
    StructField('scoreAdjustment', LongType(), True)
  ]), True)
])

df = (spark.readStream
      .schema(schema)
      .option("maxFilesPerTrigger", 2) 
      .json("/mnt/training/gaming_data/mobile_streaming_events_b/")
     )

newdf = (df
         .filter("eventName == 'scoreAdjustment'")
         .select("eventParams.game_keyword", "eventParams.platform", "eventParams.scoreAdjustment")
        )

(newdf.writeStream
  .format("delta")
  .outputMode("append")
  .trigger(processingTime="5 second") 
  .option("checkpointLocation", "/tmp/score_adjustments/_checkpoints/")
  .table("score_adjustments"))

In [0]:
spark.table("score_adjustments").count()

In [0]:
for stream in spark.streams.active:
  stream.stop()