# Digital Activity Schema Demo Notebook

This notebook is intended to explore the capabilities of the OEA schema standardization process to applicable modules (e.g. Education Insights Module, Graph Reports API Module, Clever Module, i-Ready Module). 

__It is highly recommended you review and pull the test data from the Insights, Graph, Clever, and i-Ready modules, before testing these schema standardization notebooks.__ 

Below describes the execution process of the notebook:

 - First initialize the OEA and Digital Activity Schema Standard class notebooks
 - Then the notebook processes the Insights module data ingested to stage 2, by re-writing the schema to only pull digital activity data.
 - The same process is executed for the Graph API module data - specifically for the M365 and Teams queries from the module.
 - The same process is executed for the Clever module data.
 - The same process is executed for the i-Ready module data - specifically for the Comprehensive Student Lesson Activity with Standards tables from the module.

In [None]:
write_location=''
data_sources='{}'

In [1]:
%run /OEA_py

StatementMeta(, 34, -1, Finished, Available)

2022-08-24 17:46:29,669 - OEA - DEBUG - OEA initialized.
OEA initialized.


In [23]:
%run /Schema_DigitalActivity_py

StatementMeta(, 34, -1, Finished, Available)

In [24]:
# 0) Initialize the OEA framework and modules needed.
oea = OEA()
digAct = DigitalActivity()

StatementMeta(sparkMed, 34, 24, Finished, Available)

2022-08-24 17:50:38,768 - OEA - DEBUG - OEA initialized.
2022-08-24 17:50:38,768 - OEA - DEBUG - OEA initialized.
2022-08-24 17:50:38,768 - OEA - DEBUG - OEA initialized.
OEA initialized.


In [25]:
digAct.get_digital_activity_schema()

StatementMeta(sparkMed, 34, 25, Finished, Available)

OEA Standard Digital Activity Schema:

Columns and data types:

['event_id', 'string', 'no-op']
['event_type', 'string', 'no-op']
['event_actor', 'string', 'no-op']
['event_object', 'string', 'no-op']
['event_eventTime', 'string', 'no-op']
['entity_type', 'string', 'no-op']
['softwareApplication_version', 'string', 'no-op']
['generated_aggregateMeasure_metric_timeOnTaskSec', 'string', 'no-op']
['generated_aggregateMeasure_metric_numAccess', 'string', 'no-op']
['generated_aggregateMeasure_metric_used', 'string', 'no-op']
['generated_aggregateMeasure_metric_activityReportPeriod', 'string', 'no-op']

Column descriptions:

['schema_source', 'https://www.imsglobal.org/spec/caliper/v1p2#tooluseevent']
['event_id', 'unique ID used as a signal key']
['event_type', 'type of activity event']
['event_actor', 'student or teacher that created the signal']
['event_object', 'entity that comprises the object of the interaction']
['event_eventTime', 'date/timestamp of the activity signal']
['entity_typ

In [26]:
digAct.reset_digital_activity_processing()

StatementMeta(sparkMed, 34, 26, Finished, Available)

2022-08-24 17:50:40,349 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7ff2c02f8700>digital_activity
2022-08-24 17:50:40,349 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7ff2c02f8700>digital_activity
2022-08-24 17:50:40,349 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7ff2c02f8700>digital_activity
Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7ff2c02f8700>digital_activity


## Education Insights Premium Module Processing

In [6]:
dfActivity = oea.load("M365","TechActivity_pseudo")
dfActivity.printSchema()
dfActivity.show(1,vertical=True)

StatementMeta(sparkMed, 34, 7, Finished, Available)

root
 |-- SignalType: string (nullable = true)
 |-- StartTime: timestamp (nullable = true)
 |-- UserAgent: string (nullable = true)
 |-- SignalId: string (nullable = true)
 |-- SisClassId: string (nullable = true)
 |-- ClassId: string (nullable = true)
 |-- ChannelId: string (nullable = true)
 |-- AppName: string (nullable = true)
 |-- ActorId_pseudonym: string (nullable = true)
 |-- ActorRole: string (nullable = true)
 |-- SchemaVersion: string (nullable = true)
 |-- AssignmentId: string (nullable = true)
 |-- SubmissionId: string (nullable = true)
 |-- Action: string (nullable = true)
 |-- DueDate: timestamp (nullable = true)
 |-- ClassCreationDate: timestamp (nullable = true)
 |-- Grade: string (nullable = true)
 |-- SourceFileExtension: string (nullable = true)
 |-- MeetingDuration: integer (nullable = true)
 |-- year: integer (nullable = true)
 |-- month: integer (nullable = true)

-RECORD 0-----------------------------------
 SignalType          | OneNotePageChanged   
 StartTime

In [27]:
schemaMapping = [['event_id', 'SignalId'],
                        ['event_type', 'SignalType'], 
                        ['event_actor', 'ActorId_pseudonym'],
                        ['event_object', 'MS_Insights'],
                        ['event_eventTime', 'StartTime'],
                        ['entity_type', 'AppName'],
                        ['softwareApplication_version', 'SchemaVersion'],
                        ['generated_aggregateMeasure_metric_timeOnTaskSec', 'MeetingDuration']]

source_path = 'stage2p/M365/TechActivity_pseudo'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 34, 27, Finished, Available)

2022-08-24 17:51:03,435 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-08-24 17:51:03,435 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-08-24 17:51:03,435 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-08-24 17:51:07,201 - OEA - INFO - Complete processing from: stage2p/M365/TechActivity_pseudo
2022-08-24 17:51:07,201 - OEA - INFO - Complete processing from: stage2p/M365/TechActivity_pseudo
2022-08-24 17:51:07,201 - OEA - INFO - Complete processing from: stage2p/M365/TechActivity_pseudo
Complete processing from: stage2p/M365/TechActivity_pseudo


## Graph Reports API Module Processing

### M365 Query Processing

In [28]:
dfGraph_M365 = oea.load("graph_api","m365_app_user_detail_pseudo")

StatementMeta(sparkMed, 34, 28, Finished, Available)

In [29]:
import pandas as pd
dfPand = dfGraph_M365.toPandas()
dfPandMelt = dfPand.melt(id_vars = ['userPrincipalName_pseudonym', 'reportRefreshDate', 'reportPeriod'],value_vars = ['excel','oneNote', 'outlook', 'powerPoint', 'teams', 'word'],var_name='m365_app_name',value_name='used')
dfGraph_M365 = spark.createDataFrame(dfPandMelt)

StatementMeta(sparkMed, 34, 29, Finished, Available)

In [30]:
dfGraph_M365.write.save(oea.path('stage2p', directory_path="temp/M365"), format='delta', mode='append', mergeSchema='true')

StatementMeta(sparkMed, 34, 30, Finished, Available)

In [31]:
schemaMapping = [['event_actor', 'userPrincipalName_pseudonym'],
                        ['event_object', 'MS_GraphAPI_M365'],
                        ['event_eventTime', 'reportRefreshDate'],
                        ['entity_type', 'm365_app_name'],
                        ['generated_aggregateMeasure_metric_used', 'used'],
                        ['generated_aggregateMeasure_metric_activityReportPeriod', 'reportPeriod']]

source_path = 'stage2p/temp/M365'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 34, 31, Finished, Available)

2022-08-24 17:51:19,444 - OEA - INFO - Processing digital activity data from: stage2p/temp/M365
2022-08-24 17:51:19,444 - OEA - INFO - Processing digital activity data from: stage2p/temp/M365
2022-08-24 17:51:19,444 - OEA - INFO - Processing digital activity data from: stage2p/temp/M365
Processing digital activity data from: stage2p/temp/M365
2022-08-24 17:51:22,874 - OEA - INFO - Complete processing from: stage2p/temp/M365
2022-08-24 17:51:22,874 - OEA - INFO - Complete processing from: stage2p/temp/M365
2022-08-24 17:51:22,874 - OEA - INFO - Complete processing from: stage2p/temp/M365
Complete processing from: stage2p/temp/M365


In [32]:
oea.rm_if_exists(oea.path('stage2p', directory_path="temp"))

StatementMeta(sparkMed, 34, 32, Finished, Available)

In [33]:
# Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.filter(dfDigAct['event_object'] == "MS_Insights").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "MS_GraphAPI_M365").show(1,vertical=True)

StatementMeta(sparkMed, 34, 33, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | e1434d07-2fc4-4e2... 
 event_type                                             | FileAccessed         
 event_actor                                            | 972f5667178017916... 
 event_eventTime                                        | 2021-10-06 21:06:58  
 entity_type                                            | SharePoint Online    
 softwareApplication_version                            | 1.12                 
 generated_aggregateMeasure_metric_timeOnTaskSec        | null                 
 event_object                                           | MS_Insights          
 year                                                   | 2021                 
 month                                                  | 10                   
 generated_aggregateMeasure_metric_used                 | null                 
 generated_aggregateMeasure_metric_activ

### Teams Query Processing

In [34]:
dfGraph_Teams = oea.load("graph_api","teams_activity_user_detail_pseudo")

StatementMeta(sparkMed, 34, 34, Finished, Available)

In [35]:
import pandas as pd
dfPand = dfGraph_Teams.toPandas()
dfPandMelt = dfPand.melt(id_vars = ['userPrincipalName_pseudonym', 'reportRefreshDate', 'reportPeriod', 'videoDuration'],value_vars = ['callCount', 'meetingCount', 'meetingsAttendedCount', 'meetingsOrganizedCount', 'privateChatMessageCount', 'teamChatMessageCount'],var_name='meetings_and_messages',value_name='counts')
dfGraph_Teams_counts = spark.createDataFrame(dfPandMelt)

StatementMeta(sparkMed, 34, 35, Finished, Available)

In [36]:
dfGraph_Teams_counts.write.save(oea.path('stage2p', directory_path="temp"), format='delta', mode='append', mergeSchema='true')

StatementMeta(sparkMed, 34, 36, Finished, Available)

In [37]:
schemaMapping = [['event_type', 'meetings_and_messages'],
                        ['event_actor', 'userPrincipalName_pseudonym'],
                        ['event_object', 'MS_GraphAPI_Teams'],
                        ['event_eventTime', 'reportRefreshDate'],
                        ['generated_aggregateMeasure_metric_timeOnTaskSec', 'videoDuration'],
                        ['generated_aggregateMeasure_metric_numAccess', 'counts'],
                        ['generated_aggregateMeasure_metric_activityReportPeriod', 'reportPeriod']]

source_path = 'stage2p/temp'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 34, 37, Finished, Available)

2022-08-24 17:51:42,122 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-08-24 17:51:42,122 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-08-24 17:51:42,122 - OEA - INFO - Processing digital activity data from: stage2p/temp
Processing digital activity data from: stage2p/temp
2022-08-24 17:51:45,612 - OEA - INFO - Complete processing from: stage2p/temp
2022-08-24 17:51:45,612 - OEA - INFO - Complete processing from: stage2p/temp
2022-08-24 17:51:45,612 - OEA - INFO - Complete processing from: stage2p/temp
Complete processing from: stage2p/temp


In [38]:
oea.rm_if_exists(oea.path('stage2p', directory_path="temp"))

StatementMeta(sparkMed, 34, 38, Finished, Available)

In [39]:
## Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.filter(dfDigAct['event_object'] == "MS_Insights").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "MS_GraphAPI_Teams").show(1,vertical=True)

StatementMeta(sparkMed, 34, 39, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | e1434d07-2fc4-4e2... 
 event_type                                             | FileAccessed         
 event_actor                                            | 972f5667178017916... 
 event_eventTime                                        | 2021-10-06 21:06:58  
 entity_type                                            | SharePoint Online    
 softwareApplication_version                            | 1.12                 
 generated_aggregateMeasure_metric_timeOnTaskSec        | null                 
 event_object                                           | MS_Insights          
 year                                                   | 2021                 
 month                                                  | 10                   
 generated_aggregateMeasure_metric_used                 | null                 
 generated_aggregateMeasure_metric_activ

## Clever Module Processing

### Daily Participation Table Processing

In [20]:
dfClever_Daily_Participation = oea.load("clever","daily_participation_pseudo")

StatementMeta(sparkMed, 34, 21, Finished, Available)

In [40]:
schemaMapping = [['event_actor', 'sis_id_pseudonym'],
                        ['event_object', 'Clever_Daily_Participation'],
                        ['event_eventTime', 'date'],
                        ['generated_aggregateMeasure_metric_used', 'active'],
                        ['generated_aggregateMeasure_metric_numAccess', 'num_logins']]

source_path = 'stage2p/clever/daily_participation_pseudo'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 34, 40, Finished, Available)

2022-08-24 17:52:09,339 - OEA - INFO - Processing digital activity data from: stage2p/clever/daily_participation_pseudo
2022-08-24 17:52:09,339 - OEA - INFO - Processing digital activity data from: stage2p/clever/daily_participation_pseudo
2022-08-24 17:52:09,339 - OEA - INFO - Processing digital activity data from: stage2p/clever/daily_participation_pseudo
Processing digital activity data from: stage2p/clever/daily_participation_pseudo
2022-08-24 17:52:12,632 - OEA - INFO - Complete processing from: stage2p/clever/daily_participation_pseudo
2022-08-24 17:52:12,632 - OEA - INFO - Complete processing from: stage2p/clever/daily_participation_pseudo
2022-08-24 17:52:12,632 - OEA - INFO - Complete processing from: stage2p/clever/daily_participation_pseudo
Complete processing from: stage2p/clever/daily_participation_pseudo


In [41]:
## Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.filter(dfDigAct['event_object'] == "MS_Insights").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "Clever_Daily_Participation").show(1,vertical=True)

StatementMeta(sparkMed, 34, 41, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | e1434d07-2fc4-4e2... 
 event_type                                             | FileAccessed         
 event_actor                                            | 972f5667178017916... 
 event_eventTime                                        | 2021-10-06 21:06:58  
 entity_type                                            | SharePoint Online    
 softwareApplication_version                            | 1.12                 
 generated_aggregateMeasure_metric_timeOnTaskSec        | null                 
 event_object                                           | MS_Insights          
 year                                                   | 2021                 
 month                                                  | 10                   
 generated_aggregateMeasure_metric_used                 | null                 
 generated_aggregateMeasure_metric_activ

### Resource Usage Table Processing

In [42]:
dfClever_Resource_Usage = oea.load("clever","resource_usage_pseudo")

StatementMeta(sparkMed, 34, 42, Finished, Available)

In [43]:
schemaMapping = [['event_type', 'resource_type'],
                        ['event_actor', 'sis_id_pseudonym'],
                        ['event_object', 'Clever_Resource_Usage'],
                        ['event_eventTime', 'date'],
                        ['entity_type', 'resource_name'],
                        ['generated_aggregateMeasure_metric_numAccess', 'num_access']]

source_path = 'stage2p/clever/resource_usage_pseudo'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 34, 43, Finished, Available)

2022-08-24 17:52:22,502 - OEA - INFO - Processing digital activity data from: stage2p/clever/resource_usage_pseudo
2022-08-24 17:52:22,502 - OEA - INFO - Processing digital activity data from: stage2p/clever/resource_usage_pseudo
2022-08-24 17:52:22,502 - OEA - INFO - Processing digital activity data from: stage2p/clever/resource_usage_pseudo
Processing digital activity data from: stage2p/clever/resource_usage_pseudo
2022-08-24 17:52:25,858 - OEA - INFO - Complete processing from: stage2p/clever/resource_usage_pseudo
2022-08-24 17:52:25,858 - OEA - INFO - Complete processing from: stage2p/clever/resource_usage_pseudo
2022-08-24 17:52:25,858 - OEA - INFO - Complete processing from: stage2p/clever/resource_usage_pseudo
Complete processing from: stage2p/clever/resource_usage_pseudo


In [44]:
## Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.filter(dfDigAct['event_object'] == "Clever_Daily_Participation").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "Clever_Resource_Usage").show(1,vertical=True)

StatementMeta(sparkMed, 34, 44, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | null                 
 event_type                                             | null                 
 event_actor                                            | 45a712af06f93abe4... 
 event_eventTime                                        | 2022-01-01           
 entity_type                                            | null                 
 softwareApplication_version                            | null                 
 generated_aggregateMeasure_metric_timeOnTaskSec        | null                 
 event_object                                           | Clever_Daily_Part... 
 year                                                   | 2022                 
 month                                                  | 1                    
 generated_aggregateMeasure_metric_used                 | false                
 generated_aggregateMeasure_metric_activ

## i-Ready Module Processing

### Comprehensive Student Lesson Activity with Standards (ELA) Table Processing

In [45]:
dfiReady_Comp_ELA = oea.load("iready","comprehensive_student_lesson_activity_with_standards_ela_pseudo")

StatementMeta(sparkMed, 34, 45, Finished, Available)

In [46]:
# Change the column from being "time on lesson" in minutes to "time on lesson" in seconds
dfiReady_Comp_ELA = dfiReady_Comp_ELA.withColumn('TotalTimeonLesson_min_', F.col('TotalTimeonLesson_min_')*60)
dfiReady_Comp_ELA = dfiReady_Comp_ELA.withColumnRenamed('TotalTimeonLesson_min_', 'TotalTimeonLesson_sec_')
dfiReady_Comp_ELA.printSchema()
dfiReady_Comp_ELA.show(1, vertical=True)

StatementMeta(sparkMed, 34, 46, Finished, Available)

root
 |-- LastName: string (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- StudentID_pseudonym: string (nullable = true)
 |-- StudentGrade: string (nullable = true)
 |-- AcademicYear: string (nullable = true)
 |-- School: string (nullable = true)
 |-- Subject: string (nullable = true)
 |-- Domain: string (nullable = true)
 |-- LessonGrade: string (nullable = true)
 |-- LessonLevel: string (nullable = true)
 |-- LessonID: string (nullable = true)
 |-- LessonName: string (nullable = true)
 |-- LessonObjective: string (nullable = true)
 |-- CompletionDate: date (nullable = true)
 |-- TotalTimeonLesson_sec_: integer (nullable = true)
 |-- Score: integer (nullable = true)
 |-- PassedorNotPassed: string (nullable = true)
 |-- Teacher-AssignedLesson: string (nullable = true)
 |-- StateStandards: string (nullable = true)
 |-- TypeofStandard: string (nullable = true)
 |-- StandardCode: string (nullable = true)
 |-- StandardText: string (nullable = true)
 |-- year: integer (nulla

In [47]:
dfiReady_Comp_ELA.write.save(oea.path('stage2p', directory_path="temp"), format='delta', mode='append', mergeSchema='true')

StatementMeta(sparkMed, 34, 47, Finished, Available)

In [48]:
schemaMapping = [['event_type', 'Subject'],
                        ['event_actor', 'StudentID_pseudonym'],
                        ['event_object', 'iReady_Comprehensive_Student_Lesson_Activity_with_Standards_ELA'],
                        ['event_eventTime', 'CompletionDate'],
                        ['entity_type', 'Domain'],
                        ['generated_aggregateMeasure_metric_timeOnTaskSec', 'TotalTimeonLesson_sec_']]

source_path = 'stage2p/temp'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 34, 48, Finished, Available)

2022-08-24 17:52:56,398 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-08-24 17:52:56,398 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-08-24 17:52:56,398 - OEA - INFO - Processing digital activity data from: stage2p/temp
Processing digital activity data from: stage2p/temp
2022-08-24 17:52:59,673 - OEA - INFO - Complete processing from: stage2p/temp
2022-08-24 17:52:59,673 - OEA - INFO - Complete processing from: stage2p/temp
2022-08-24 17:52:59,673 - OEA - INFO - Complete processing from: stage2p/temp
Complete processing from: stage2p/temp


In [49]:
oea.rm_if_exists(oea.path('stage2p', directory_path="temp"))

StatementMeta(sparkMed, 34, 49, Finished, Available)

In [51]:
## Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.filter(dfDigAct['event_object'] == "MS_Insights").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "iReady_Comprehensive_Student_Lesson_Activity_with_Standards_ELA").show(1,vertical=True)

StatementMeta(sparkMed, 34, 51, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | e1434d07-2fc4-4e2... 
 event_type                                             | FileAccessed         
 event_actor                                            | 972f5667178017916... 
 event_eventTime                                        | 2021-10-06 21:06:58  
 entity_type                                            | SharePoint Online    
 softwareApplication_version                            | 1.12                 
 generated_aggregateMeasure_metric_timeOnTaskSec        | null                 
 event_object                                           | MS_Insights          
 year                                                   | 2021                 
 month                                                  | 10                   
 generated_aggregateMeasure_metric_used                 | null                 
 generated_aggregateMeasure_metric_activ

### Comprehensive Student Lesson Activity with Standards (Math) Table Processing

In [52]:
dfiReady_Comp_Math = oea.load("iready","comprehensive_student_lesson_activity_with_standards_math_pseudo")

StatementMeta(sparkMed, 34, 52, Finished, Available)

In [53]:
# Change the column from being "time on lesson" in minutes to "time on lesson" in seconds
dfiReady_Comp_Math = dfiReady_Comp_Math.withColumn('TotalTimeonLesson_min_', F.col('TotalTimeonLesson_min_')*60)
dfiReady_Comp_Math = dfiReady_Comp_Math.withColumnRenamed('TotalTimeonLesson_min_', 'TotalTimeonLesson_sec_')
dfiReady_Comp_Math.printSchema()
dfiReady_Comp_Math.show(1, vertical=True)

StatementMeta(sparkMed, 34, 53, Finished, Available)

root
 |-- LastName: string (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- StudentID_pseudonym: string (nullable = true)
 |-- StudentGrade: string (nullable = true)
 |-- AcademicYear: string (nullable = true)
 |-- School: string (nullable = true)
 |-- Subject: string (nullable = true)
 |-- Domain: string (nullable = true)
 |-- LessonGrade: string (nullable = true)
 |-- LessonLevel: string (nullable = true)
 |-- LessonID: string (nullable = true)
 |-- LessonName: string (nullable = true)
 |-- LessonObjective: string (nullable = true)
 |-- CompletionDate: date (nullable = true)
 |-- TotalTimeonLesson_sec_: integer (nullable = true)
 |-- Score: integer (nullable = true)
 |-- PassedorNotPassed: string (nullable = true)
 |-- Teacher-AssignedLesson: string (nullable = true)
 |-- StateStandards: string (nullable = true)
 |-- TypeofStandard: string (nullable = true)
 |-- StandardCode: string (nullable = true)
 |-- StandardText: string (nullable = true)
 |-- year: integer (nulla

In [54]:
dfiReady_Comp_Math.write.save(oea.path('stage2p', directory_path="temp"), format='delta', mode='append', mergeSchema='true')

StatementMeta(sparkMed, 34, 54, Finished, Available)

In [55]:
schemaMapping = [['event_type', 'Subject'],
                        ['event_actor', 'StudentID_pseudonym'],
                        ['event_object', 'iReady_Comprehensive_Student_Lesson_Activity_with_Standards_Math'],
                        ['event_eventTime', 'CompletionDate'],
                        ['entity_type', 'Domain'],
                        ['generated_aggregateMeasure_metric_timeOnTaskSec', 'TotalTimeonLesson_sec_']]

source_path = 'stage2p/temp'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 34, 55, Finished, Available)

2022-08-24 17:54:15,374 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-08-24 17:54:15,374 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-08-24 17:54:15,374 - OEA - INFO - Processing digital activity data from: stage2p/temp
Processing digital activity data from: stage2p/temp
2022-08-24 17:54:18,656 - OEA - INFO - Complete processing from: stage2p/temp
2022-08-24 17:54:18,656 - OEA - INFO - Complete processing from: stage2p/temp
2022-08-24 17:54:18,656 - OEA - INFO - Complete processing from: stage2p/temp
Complete processing from: stage2p/temp


In [56]:
oea.rm_if_exists(oea.path('stage2p', directory_path="temp"))

StatementMeta(sparkMed, 34, 56, Finished, Available)

In [57]:
## Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.filter(dfDigAct['event_object'] == "Clever_Resource_Usage").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "iReady_Comprehensive_Student_Lesson_Activity_with_Standards_Math").show(1,vertical=True)

StatementMeta(sparkMed, 34, 57, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | null                 
 event_type                                             | app                  
 event_actor                                            | 41c2a72a717d28f75... 
 event_eventTime                                        | 2022-01-15           
 entity_type                                            | Big Ideas Math       
 softwareApplication_version                            | null                 
 generated_aggregateMeasure_metric_timeOnTaskSec        | null                 
 event_object                                           | Clever_Resource_U... 
 year                                                   | 2022                 
 month                                                  | 1                    
 generated_aggregateMeasure_metric_used                 | null                 
 generated_aggregateMeasure_metric_activ