# Hybrid Engagement Package Enrichment Notebook
This notebook is intended to explore the capabilities of the OEA Hybrid Engagement package by processing SIS student data into a single table.

**It is necessary that you review and execute the Microsoft Education Insights and Contoso SIS module pipelines, before testing this Hybrid Engagement enrichment notebook.**

Below describes the execution process of the notebook:
* First initialize the OEA framework class noptebook.
* Then the notebook processes the Insights module roster data and Contoso SIS module studentattendance data ingested into stage 2, by re-writing the schema to only include student data for mapping of the package Power BI dashboard. The tables used are:
    * stage2p/
        * contoso_sis/studentattendance_pseudo,
        * M365/TechActivity_pseudo,
        * M365/AadUser_pseudo,
        * M365/AadUserPersonMapping_pseudo,
        * M365/Person_pseudo,
        * M365/PersonOrganizationRole_pseudo,
        * M365/Organization_pseudo, and
        * M365/RefDefinition_pseudo.
    * stage2np/M365/
        * AadUser_lookup, and
        * Person_lookup.
* The final resulting tables are written to stage 3:
    * stage3p/hybrid_engagement/Student_pseudo
    * stage3np/hybrid_engagement/Student_lookup

In [1]:
%run /OEA_py

StatementMeta(, 64, -1, Finished, Available)

2022-10-06 15:36:32,478 - OEA - DEBUG - OEA initialized.
OEA initialized.


In [2]:
# 0) Initialize the OEA framework.
oea = OEA()

StatementMeta(sparkMed, 64, 2, Finished, Available)

2022-10-06 15:36:40,393 - OEA - DEBUG - OEA initialized.
2022-10-06 15:36:40,393 - OEA - DEBUG - OEA initialized.
OEA initialized.


## Data Aggregations

### SIS Student_pseudo table creation

In [3]:
dfContosoSIS_studentattendance = oea.load('contoso_sis', 'studentattendance_pseudo')
dfInsights_techActivity = oea.load_delta('stage2p/digital_activity')
dfInsights_aaduserpersonmapping = oea.load('M365', 'AadUserPersonMapping_pseudo')
dfInsights_person = oea.load('M365', 'Person_pseudo')
dfInsights_personOrgRole = oea.load('M365', 'PersonOrganizationRole_pseudo')
dfInsights_organization = oea.load('M365', 'Organization_pseudo')
dfInsights_refDefinition = oea.load('M365', 'RefDefinition_pseudo')

StatementMeta(sparkMed, 64, 3, Finished, Available)

In [4]:
dfInsights = dfInsights_personOrgRole.join(dfInsights_person, dfInsights_personOrgRole.PersonId_pseudonym == dfInsights_person.Id_pseudonym, how='inner')
dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'RefRoleId', 'RefGradeLevelId', 'OrganizationId')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 64, 4, Finished, Available)

SynapseWidget(Synapse.DataFrame, cfde8dd9-a542-4c3e-a0d8-6416c71db66f)

In [5]:
dfInsights = dfInsights.join(dfInsights_organization, dfInsights.OrganizationId == dfInsights_organization.Id, how='inner')
dfInsights = dfInsights.withColumnRenamed('Name', 'OrganizationName')
dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'RefRoleId', 'RefGradeLevelId', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 64, 5, Finished, Available)

SynapseWidget(Synapse.DataFrame, 50b6a2b8-2210-4f6d-be2f-70645315492e)

In [6]:
dfInsights = dfInsights.join(dfInsights_refDefinition, dfInsights.RefRoleId == dfInsights_refDefinition.Id, how='inner')
dfInsights = dfInsights.withColumnRenamed('Code', 'PersonRole')
dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'RefGradeLevelId', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 64, 6, Finished, Available)

SynapseWidget(Synapse.DataFrame, ba3a267c-791e-4cf6-994e-13e30726ab1c)

In [7]:
dfInsights = dfInsights.join(dfInsights_refDefinition, dfInsights.RefGradeLevelId == dfInsights_refDefinition.Id, how='inner')
dfInsights = dfInsights.withColumnRenamed('Code', 'StudentGrade')

dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'StudentGrade', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 64, 7, Finished, Available)

SynapseWidget(Synapse.DataFrame, 69cea556-cbdc-4200-a033-e08259692691)

In [9]:
dfInsights_aaduserpersonmapping = dfInsights_aaduserpersonmapping.withColumnRenamed('PersonId_pseudonym', 'StudentId_internal_pseudonym')
dfInsights = dfInsights.join(dfInsights_aaduserpersonmapping, dfInsights.PersonId_pseudonym == dfInsights_aaduserpersonmapping.StudentId_internal_pseudonym, how='inner')
dfInsights = dfInsights.withColumnRenamed('ObjectId_pseudonym', 'StudentId_external_pseudonym')
dfInsights = dfInsights.select('StudentId_internal_pseudonym', 'StudentId_external_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'StudentGrade', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 64, 9, Finished, Available)

SynapseWidget(Synapse.DataFrame, 83a9aeed-4d81-49c6-a399-177c8f6ecb59)

### StudentAttendance Enrichment

In [10]:
dfContosoSIS_studentattendance = dfContosoSIS_studentattendance.select('student_id_pseudonym', 'AttendanceCode')
dfContosoSIS_studentattendance = dfContosoSIS_studentattendance.withColumn('AttendanceCode_value', F.when(F.col('AttendanceCode') == "P", 1).otherwise(0))
dfAttendance = dfContosoSIS_studentattendance.groupBy("student_id_pseudonym").mean("AttendanceCode_value").alias("AttendanceAverage")
display(dfAttendance.limit(10))

StatementMeta(sparkMed, 64, 10, Finished, Available)

SynapseWidget(Synapse.DataFrame, a2d7bf81-492e-45f0-82f3-17bb36ce9ffb)

In [11]:
dfAttendance = dfAttendance.withColumnRenamed('avg(AttendanceCode_value)', 'AverageAttendance_inPerson')
dfInsights = dfInsights.join(dfAttendance, dfInsights.StudentId_external_pseudonym == dfAttendance.student_id_pseudonym, how='inner')
dfInsights = dfInsights.select('StudentId_internal_pseudonym', 'StudentId_external_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'StudentGrade', 'AverageAttendance_inPerson', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 64, 11, Finished, Available)

SynapseWidget(Synapse.DataFrame, 756aa2c7-47b0-4fcb-b4f9-432db571bf09)

### TechActivity Enrichment
From the Microsoft Education Insights module

In [12]:
dfInsights_techActivity = dfInsights_techActivity.filter(dfInsights_techActivity['event_object'] == 'MS_Insights')
dfInsights_techActivity = dfInsights_techActivity.select('event_actor', 'event_eventTime')
# get the total number of days from Insights-recorded digital activity
metadata_table = dfInsights_techActivity.groupBy("event_eventTime").count()
number_of_days_recorded = metadata_table.count()
# get the average number of days there was Insights digital activity per student
metadata_table2 = dfInsights_techActivity.groupBy("event_actor", "event_eventTime").count()
metadata_table2 = metadata_table2.drop(F.col('count'))
metadata_table2 = metadata_table2.withColumn('DigitalAttendanceCode_value', F.lit(1))
metadata_table2 = metadata_table2.groupBy("event_actor").sum('DigitalAttendanceCode_value')
metadata_table2 = metadata_table2.withColumnRenamed('sum(DigitalAttendanceCode_value)', 'AverageAttendance_digital')
dfDigitalAttendance_insights = metadata_table2.withColumn('AverageAttendance_digital', F.col('AverageAttendance_digital') / number_of_days_recorded)
display(dfDigitalAttendance_insights.limit(10))

StatementMeta(sparkMed, 64, 12, Finished, Available)

SynapseWidget(Synapse.DataFrame, 912257b6-bb08-4806-85a4-6f96ce00e1be)

In [13]:
# Map the user back to the correct SIS student ID from Person ID
dfInsights = dfInsights.join(dfDigitalAttendance_insights, dfInsights.StudentId_internal_pseudonym == dfDigitalAttendance_insights.event_actor, how='inner')
dfInsights = dfInsights.select('StudentId_internal_pseudonym', 'StudentId_external_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'StudentGrade', 'AverageAttendance_inPerson', 'AverageAttendance_digital', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 64, 13, Finished, Available)

SynapseWidget(Synapse.DataFrame, a31df110-f8bd-4330-af52-9a7ff2f2abad)

### Write SIS Student_pseudo table to Stage 3p

In [19]:
dfInsights.coalesce(1).write.format('delta').mode('overwrite').option('header', True).save(oea.stage3p + '/hybrid_engagement/Student_pseudo')

StatementMeta(sparkMed, 55, 20, Finished, Available)

## Data Aggregations

### SIS Student_lookup table creation

In [15]:
dfInsights_person_np = oea.load('M365', 'Person_lookup', stage=oea.stage2np)
dfInsights_aaduser_np = oea.load('M365', 'AadUser_lookup', stage=oea.stage2np)
dfInsights_np = dfInsights_person_np.join(dfInsights_aaduserpersonmapping, dfInsights_person_np.Id_pseudonym == dfInsights_aaduserpersonmapping.StudentId_internal_pseudonym, how='inner')
dfInsights_np = dfInsights_np.withColumnRenamed('Id', 'StudentId_internal').withColumnRenamed('ObjectId_pseudonym', 'StudentId_external_pseudonym')
dfInsights_np = dfInsights_np.select('StudentId_internal_pseudonym', 'StudentId_internal', 'StudentId_external_pseudonym', 'Surname', 'GivenName', 'MiddleName')
display(dfInsights_np.limit(10))

StatementMeta(sparkMed, 64, 15, Finished, Available)

SynapseWidget(Synapse.DataFrame, ac4ad854-4ae9-472b-91e4-5a23f986540f)

In [16]:
dfInsights_aaduser_np = dfInsights_aaduser_np.withColumnRenamed('Surname', 'Surname2').withColumnRenamed('GivenName', 'GivenName2')
dfInsights_np = dfInsights_np.join(dfInsights_aaduser_np, dfInsights_np.StudentId_external_pseudonym == dfInsights_aaduser_np.ObjectId_pseudonym, how='inner')
dfInsights_np = dfInsights_np.withColumnRenamed('ObjectId', 'StudentId_external')
dfInsights_np = dfInsights_np.select('StudentId_internal_pseudonym', 'StudentId_internal', 'StudentId_external_pseudonym', 'StudentId_external', 'Surname', 'GivenName', 'MiddleName')
display(dfInsights_np.limit(10))

StatementMeta(sparkMed, 64, 16, Finished, Available)

SynapseWidget(Synapse.DataFrame, af5b6771-d0d4-4241-9192-0bb3156f81e4)

### Write SIS Student_lookup table to Stage 3np

In [21]:
dfInsights_np.coalesce(1).write.format('delta').mode('overwrite').option('header', True).save(oea.stage3np + '/hybrid_engagement/Student_lookup')

StatementMeta(sparkMed, 55, 22, Finished, Available)