# Digital Activity Schema Demo Notebook

This notebook is intended to explore the capabilities of the OEA schema standardization process to applicable modules (e.g. Education Insights Module, Graph Reports API Module). 

__It is highly recommended you review and pull the test data from both the Insights and Graph modules, before testing these schema standardization notebooks.__ 

Below describes the execution process of the notebook:

 - First initialize the OEA and Digital Activity Schema Standard class notebooks
 - Then the notebook processes the Insights module data ingested to stage 2, by re-writing the schema to only pull digital activity data.
 - The same process is executed for the Graph API module data - specifically for the M365 and Teams queries from the module.

In [68]:
%run /OEA_py

StatementMeta(, 4, -1, Finished, Available)

2022-06-23 15:49:49,485 - OEA - DEBUG - OEA initialized.
OEA initialized.


In [69]:
%run /Schema_DigitalActivity_py

StatementMeta(, 4, -1, Finished, Available)

In [70]:
# 0) Initialize the OEA framework and modules needed.
oea = OEA()
digAct = DigitalActivity()

StatementMeta(sparkMed, 4, 3, Finished, Available)

2022-06-23 15:49:57,892 - OEA - DEBUG - OEA initialized.
2022-06-23 15:49:57,892 - OEA - DEBUG - OEA initialized.
OEA initialized.


In [48]:
digAct.get_digital_activity_schema()

StatementMeta(sparkMed, 3, 30, Finished, Available)

OEA Standard Digital Activity Schema:

Columns and data types:

['event_id', 'string', 'no-op']
['event_type', 'string', 'no-op']
['event_actor', 'string', 'no-op']
['event_object', 'string', 'no-op']
['event_eventTime', 'string', 'no-op']
['entity_type', 'string', 'no-op']
['softwareApplication_version', 'string', 'no-op']
['generated_aggregateMeasure_metric_timeOnTask', 'string', 'no-op']
['generated_aggregateMeasure_metric_numAccess', 'string', 'no-op']
['generated_aggregateMeasure_metric_used', 'string', 'no-op']
['generated_aggregateMeasure_metric_activityReportPeriod', 'string', 'no-op']

Column descriptions:

['schema_source', 'https://www.imsglobal.org/spec/caliper/v1p2#tooluseevent']
['event_id', 'unique ID used as a signal key']
['event_type', 'type of activity event']
['event_actor', 'student or teacher that created the signal']
['event_object', 'entity that comprises the object of the interaction']
['event_eventTime', 'date/timestamp of the activity signal']
['entity_type',

In [49]:
digAct.reset_digital_activity_processing()

StatementMeta(sparkMed, 3, 31, Finished, Available)

2022-06-23 00:48:56,329 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7f318c26dfd0>digital_activity
2022-06-23 00:48:56,329 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7f318c26dfd0>digital_activity
2022-06-23 00:48:56,329 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7f318c26dfd0>digital_activity
2022-06-23 00:48:56,329 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7f318c26dfd0>digital_activity
2022-06-23 00:48:56,329 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7f318c26dfd0>digital_activity
2022-06-23 00:48:56,329 - OEA - INFO - Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.net/<__main__.OEA object at 0x7f318c26dfd0>digital_activity
Deleted abfss://stage2p@stoeacisd3v06kw2.dfs.core.windows.

## Education Insights Premium Module Processing

In [50]:
dfActivity = oea.load("M365","TechActivity_pseudo")
dfActivity.printSchema()
dfActivity.show(1,vertical=True)

StatementMeta(sparkMed, 3, 32, Finished, Available)

root
 |-- SignalType: string (nullable = true)
 |-- StartTime: timestamp (nullable = true)
 |-- UserAgent: string (nullable = true)
 |-- SignalId: string (nullable = true)
 |-- SisClassId: string (nullable = true)
 |-- ClassId: string (nullable = true)
 |-- ChannelId: string (nullable = true)
 |-- AppName: string (nullable = true)
 |-- ActorId_pseudonym: string (nullable = true)
 |-- ActorRole: string (nullable = true)
 |-- SchemaVersion: string (nullable = true)
 |-- AssignmentId: string (nullable = true)
 |-- SubmissionId: string (nullable = true)
 |-- ProbablyDateOfAssignmentActivity: timestamp (nullable = true)
 |-- Action: string (nullable = true)
 |-- DueDate: timestamp (nullable = true)
 |-- ClassCreationDate: timestamp (nullable = true)
 |-- Grade: string (nullable = true)
 |-- SourceFileExtension: string (nullable = true)
 |-- MeetingDuration: integer (nullable = true)
 |-- ToBeChanged: string (nullable = true)
 |-- ToBeUpdated: string (nullable = true)
 |-- year: integer (nul

In [51]:
schemaMapping = [['event_id', 'SignalId'],
                        ['event_type', 'SignalType'], 
                        ['event_actor', 'ActorId_pseudonym'],
                        ['event_object', 'MS_Insights'],
                        ['event_eventTime', 'StartTime'],
                        ['entity_type', 'AppName'],
                        ['softwareApplication_version', 'SchemaVersion'],
                        ['generated_aggregateMeasure_metric_timeOnTask', 'MeetingDuration']]

source_path = 'stage2p/M365/TechActivity_pseudo'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 3, 33, Finished, Available)

2022-06-23 00:48:57,781 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:48:57,781 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:48:57,781 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:48:57,781 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:48:57,781 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:48:57,781 - OEA - INFO - Processing digital activity data from: stage2p/M365/TechActivity_pseudo
Processing digital activity data from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:49:01,256 - OEA - INFO - Complete processing from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:49:01,256 - OEA - INFO - Complete processing from: stage2p/M365/TechActivity_pseudo
2022-06-23 00:49:01,256 - OEA - INFO - Complete processing from: s

## Graph Reports API Module Processing

### M365 Query Processing

In [52]:
dfGraph_M365 = oea.load("graph_api","m365_app_user_detail_pseudo")
dfGraph_M365.printSchema()
dfGraph_M365.show(1,vertical=True)

StatementMeta(sparkMed, 3, 34, Finished, Available)

root
 |-- lastActivationDate: string (nullable = true)
 |-- lastActivityDate: string (nullable = true)
 |-- reportRefreshDate: string (nullable = true)
 |-- userPrincipalName_pseudonym: string (nullable = true)
 |-- reportPeriod: string (nullable = true)
 |-- mobile: boolean (nullable = true)
 |-- web: boolean (nullable = true)
 |-- mac: boolean (nullable = true)
 |-- windows: boolean (nullable = true)
 |-- excel: boolean (nullable = true)
 |-- excelMobile: boolean (nullable = true)
 |-- excelWeb: boolean (nullable = true)
 |-- excelMac: boolean (nullable = true)
 |-- excelWindows: boolean (nullable = true)
 |-- oneNote: boolean (nullable = true)
 |-- oneNoteMobile: boolean (nullable = true)
 |-- oneNoteWeb: boolean (nullable = true)
 |-- oneNoteMac: boolean (nullable = true)
 |-- oneNoteWindows: boolean (nullable = true)
 |-- outlook: boolean (nullable = true)
 |-- outlookMobile: boolean (nullable = true)
 |-- outlookWeb: boolean (nullable = true)
 |-- outlookMac: boolean (nullable = 

In [None]:
import pandas as pd
dfPand = dfGraph_M365.toPandas()
dfPandMelt = dfPand.melt(id_vars = ['userPrincipalName_pseudonym', 'reportRefreshDate', 'reportPeriod'],value_vars = ['excel','oneNote', 'outlook', 'powerPoint', 'teams', 'word'],var_name='m365_app_name',value_name='used')
dfGraph_M365 = spark.createDataFrame(dfPandMelt)
dfGraph_M365.printSchema()
dfGraph_M365.show(1, vertical=True)

In [54]:
dfGraph_M365.write.save(oea.path('stage2p', directory_path="temp/M365"), format='delta', mode='append', mergeSchema='true')

StatementMeta(sparkMed, 3, 36, Finished, Available)

In [56]:
schemaMapping = [['event_actor', 'userPrincipalName_pseudonym'],
                        ['event_object', 'MS_GraphAPI_M365'],
                        ['event_eventTime', 'reportRefreshDate'],
                        ['entity_type', 'm365_app_name'],
                        ['generated_aggregateMeasure_metric_used', 'used'],
                        ['generated_aggregateMeasure_metric_activityReportPeriod', 'reportPeriod']]

source_path = 'stage2p/temp/M365'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 3, 38, Finished, Available)

2022-06-23 00:49:21,430 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-06-23 00:49:21,430 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-06-23 00:49:21,430 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-06-23 00:49:21,430 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-06-23 00:49:21,430 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-06-23 00:49:21,430 - OEA - INFO - Processing digital activity data from: stage2p/temp
Processing digital activity data from: stage2p/temp
2022-06-23 00:49:26,418 - OEA - INFO - Complete processing from: stage2p/temp
2022-06-23 00:49:26,418 - OEA - INFO - Complete processing from: stage2p/temp
2022-06-23 00:49:26,418 - OEA - INFO - Complete processing from: stage2p/temp
2022-06-23 00:49:26,418 - OEA - INFO - Complete processing from: stage2p/temp
2022-06-23 00:49:26,418 - OEA - INFO - Complete processing from: stage2p/temp
2022-06-23 0

In [57]:
oea.rm_if_exists(oea.path('stage2p', directory_path="temp"))

StatementMeta(sparkMed, 3, 39, Finished, Available)

In [59]:
## Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.printSchema()
dfDigAct.show(1, vertical=True)

StatementMeta(sparkMed, 3, 41, Finished, Available)

root
 |-- event_id: string (nullable = true)
 |-- event_type: string (nullable = true)
 |-- event_actor: string (nullable = true)
 |-- event_eventTime: string (nullable = true)
 |-- entity_type: string (nullable = true)
 |-- softwareApplication_version: string (nullable = true)
 |-- generated_aggregateMeasure_metric_timeOnTask: string (nullable = true)
 |-- event_object: string (nullable = true)
 |-- year: integer (nullable = true)
 |-- month: integer (nullable = true)
 |-- generated_aggregateMeasure_metric_numAccess: string (nullable = true)
 |-- generated_aggregateMeasure_metric_activityReportPeriod: string (nullable = true)
 |-- generated_aggregateMeasure_metric_used: string (nullable = true)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | 4366da81-d797-4ba... 
 event_type                                             | OneNotePageChanged   
 event_actor                                          

In [60]:
dfDigAct.filter(dfDigAct['event_object'] == "MS_Insights").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "MS_GraphAPI_M365").show(1,vertical=True)

StatementMeta(sparkMed, 3, 42, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | 4366da81-d797-4ba... 
 event_type                                             | OneNotePageChanged   
 event_actor                                            | 40f4ce5e38d197a40... 
 event_eventTime                                        | 2021-10-06 15:15:51  
 entity_type                                            | OneNote              
 softwareApplication_version                            | 1.12                 
 generated_aggregateMeasure_metric_timeOnTask           | null                 
 event_object                                           | MS_Insights          
 year                                                   | 2021                 
 month                                                  | 10                   
 generated_aggregateMeasure_metric_numAccess            | null                 
 generated_aggregateMeasure_metric_activ

### Teams Query Processing

In [71]:
dfGraph_Teams = oea.load("graph_api","teams_activity_user_detail_pseudo")
dfGraph_Teams.printSchema()
dfGraph_Teams.show(1,vertical=True)

StatementMeta(sparkMed, 4, 4, Finished, Available)

root
 |-- adHocMeetingsAttendedCount: long (nullable = true)
 |-- adHocMeetingsOrganizedCount: long (nullable = true)
 |-- assignedProducts: string (nullable = true)
 |-- audioDuration: integer (nullable = true)
 |-- callCount: long (nullable = true)
 |-- deletedDate: string (nullable = true)
 |-- hasOtherAction: boolean (nullable = true)
 |-- isDeleted: boolean (nullable = true)
 |-- isLicensed: boolean (nullable = true)
 |-- lastActivityDate: string (nullable = true)
 |-- meetingCount: long (nullable = true)
 |-- meetingsAttendedCount: long (nullable = true)
 |-- meetingsOrganizedCount: long (nullable = true)
 |-- privateChatMessageCount: long (nullable = true)
 |-- reportPeriod: string (nullable = true)
 |-- reportRefreshDate: string (nullable = true)
 |-- scheduledOneTimeMeetingsAttendedCount: long (nullable = true)
 |-- scheduledOneTimeMeetingsOrganizedCount: long (nullable = true)
 |-- scheduledRecurringMeetingsAttendedCount: long (nullable = true)
 |-- scheduledRecurringMeetings

In [73]:
import pandas as pd
dfPand = dfGraph_Teams.toPandas()
dfPandMelt = dfPand.melt(id_vars = ['userPrincipalName_pseudonym', 'reportRefreshDate', 'reportPeriod', 'videoDuration'],value_vars = ['callCount', 'meetingCount', 'meetingsAttendedCount', 'meetingsOrganizedCount', 'privateChatMessageCount', 'teamChatMessageCount'],var_name='meetings_and_messages',value_name='counts')
dfGraph_Teams_counts = spark.createDataFrame(dfPandMelt)
dfGraph_Teams_counts.printSchema()
dfGraph_Teams_counts.show(1, vertical=True)

StatementMeta(sparkMed, 4, 6, Finished, Available)

  'JavaPackage' object is not callable
Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.


In [74]:
dfGraph_Teams_counts.write.save(oea.path('stage2p', directory_path="temp"), format='delta', mode='append', mergeSchema='true')

StatementMeta(sparkMed, 4, 7, Finished, Available)

In [75]:
schemaMapping = [['event_type', 'meetings_and_messages'],
                        ['event_actor', 'userPrincipalName_pseudonym'],
                        ['event_object', 'MS_GraphAPI_Teams'],
                        ['event_eventTime', 'reportRefreshDate'],
                        ['generated_aggregateMeasure_metric_timeOnTask', 'videoDuration'],
                        ['generated_aggregateMeasure_metric_numAccess', 'counts'],
                        ['generated_aggregateMeasure_metric_activityReportPeriod', 'reportPeriod']]

source_path = 'stage2p/temp'
digAct.process_digital_activity(source_path, schemaMapping)

StatementMeta(sparkMed, 4, 8, Finished, Available)

2022-06-23 16:31:49,067 - OEA - INFO - Processing digital activity data from: stage2p/temp
2022-06-23 16:31:49,067 - OEA - INFO - Processing digital activity data from: stage2p/temp
Processing digital activity data from: stage2p/temp
2022-06-23 16:31:55,349 - OEA - INFO - Complete processing from: stage2p/temp
2022-06-23 16:31:55,349 - OEA - INFO - Complete processing from: stage2p/temp
Complete processing from: stage2p/temp


In [76]:
oea.rm_if_exists(oea.path('stage2p', directory_path="temp"))

StatementMeta(sparkMed, 4, 9, Finished, Available)

In [77]:
## Check results
schema_path = 'stage2p/digital_activity'
dfDigAct = oea.load_delta(schema_path)
dfDigAct.printSchema()
dfDigAct.show(1, vertical=True)

StatementMeta(sparkMed, 4, 10, Finished, Available)

root
 |-- event_id: string (nullable = true)
 |-- event_type: string (nullable = true)
 |-- event_actor: string (nullable = true)
 |-- event_eventTime: string (nullable = true)
 |-- entity_type: string (nullable = true)
 |-- softwareApplication_version: string (nullable = true)
 |-- generated_aggregateMeasure_metric_timeOnTask: string (nullable = true)
 |-- event_object: string (nullable = true)
 |-- year: integer (nullable = true)
 |-- month: integer (nullable = true)
 |-- generated_aggregateMeasure_metric_numAccess: string (nullable = true)
 |-- generated_aggregateMeasure_metric_activityReportPeriod: string (nullable = true)
 |-- generated_aggregateMeasure_metric_used: string (nullable = true)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | 4366da81-d797-4ba... 
 event_type                                             | OneNotePageChanged   
 event_actor                                          

In [78]:
dfDigAct.filter(dfDigAct['event_object'] == "MS_Insights").show(1,vertical=True)
dfDigAct.filter(dfDigAct['event_object'] == "MS_GraphAPI_Teams").show(1,vertical=True)

StatementMeta(sparkMed, 4, 11, Finished, Available)

-RECORD 0----------------------------------------------------------------------
 event_id                                               | 4366da81-d797-4ba... 
 event_type                                             | OneNotePageChanged   
 event_actor                                            | 40f4ce5e38d197a40... 
 event_eventTime                                        | 2021-10-06 15:15:51  
 entity_type                                            | OneNote              
 softwareApplication_version                            | 1.12                 
 generated_aggregateMeasure_metric_timeOnTask           | null                 
 event_object                                           | MS_Insights          
 year                                                   | 2021                 
 month                                                  | 10                   
 generated_aggregateMeasure_metric_numAccess            | null                 
 generated_aggregateMeasure_metric_activ