# Hybrid Engagement Package Enrichment Notebook
This notebook is intended to explore the capabilities of the OEA Hybrid Engagement package by processing SIS student data into a single table.

**It is necessary that you review and execute the Microsoft Education Insights and Contoso SIS module pipelines, before testing this Hybrid Engagement enrichment notebook.**

Below describes the execution process of the notebook:
* First initialize the OEA framework class noptebook.
* Then the notebook processes the Insights module roster data and Contoso SIS module studentattendance data ingested into stage 2, by re-writing the schema to only include student data for mapping of the package Power BI dashboard. The tables used are:
    * stage2p/
        * contoso_sis/studentattendance_pseudo,
        * M365/AadUser_pseudo,
        * M365/AadUserPersonMapping_pseudo,
        * M365/Person_pseudo,
        * M365/PersonOrganizationRole_pseudo,
        * M365/Organization_pseudo, and
        * M365/RefDefinition_pseudo.
    * stage2np/M365/
        * AadUser_lookup, and
        * Person_lookup.
* The final resulting tables are written to stage 3:
    * stage3p/hybrid_engagement/Student_pseudo
    * stage3np/hybrid_engagement/Student_lookup

In [1]:
%run /OEA_py

StatementMeta(, 57, -1, Finished, Available)

2022-09-29 16:30:12,903 - OEA - DEBUG - OEA initialized.
OEA initialized.


In [2]:
# 0) Initialize the OEA framework.
oea = OEA()

StatementMeta(sparkMed, 57, 3, Finished, Available)

2022-09-29 16:30:19,754 - OEA - DEBUG - OEA initialized.
2022-09-29 16:30:19,754 - OEA - DEBUG - OEA initialized.
OEA initialized.


## Data Aggregations

### SIS Student_pseudo table creation

In [20]:
dfContosoSIS_studentattendance = oea.load('contoso_sis', 'studentattendance_pseudo')
dfInsights_aaduserpersonmapping = oea.load('M365', 'AadUserPersonMapping_pseudo')
dfInsights_person = oea.load('M365', 'Person_pseudo')
dfInsights_personOrgRole = oea.load('M365', 'PersonOrganizationRole_pseudo')
dfInsights_organization = oea.load('M365', 'Organization_pseudo')
dfInsights_refDefinition = oea.load('M365', 'RefDefinition_pseudo')

StatementMeta(sparkMed, 57, 21, Finished, Available)

In [4]:
dfInsights = dfInsights_personOrgRole.join(dfInsights_person, dfInsights_personOrgRole.PersonId_pseudonym == dfInsights_person.Id_pseudonym, how='inner')
dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'RefRoleId', 'RefGradeLevelId', 'OrganizationId')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 57, 5, Finished, Available)

SynapseWidget(Synapse.DataFrame, 18bcdf1c-3ec1-4f39-b3d4-6aae399b48f1)

In [5]:
dfInsights = dfInsights.join(dfInsights_organization, dfInsights.OrganizationId == dfInsights_organization.Id, how='inner')
dfInsights = dfInsights.withColumnRenamed('Name', 'OrganizationName')
dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'RefRoleId', 'RefGradeLevelId', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 57, 6, Finished, Available)

SynapseWidget(Synapse.DataFrame, 58cb8c18-6624-4d45-9c71-c6d67a887f80)

In [6]:
dfInsights = dfInsights.join(dfInsights_refDefinition, dfInsights.RefRoleId == dfInsights_refDefinition.Id, how='inner')
dfInsights = dfInsights.withColumnRenamed('Code', 'PersonRole')
dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'RefGradeLevelId', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 57, 7, Finished, Available)

SynapseWidget(Synapse.DataFrame, 7569d290-41b1-4eea-8ab4-99059728d559)

In [7]:
dfInsights = dfInsights.join(dfInsights_refDefinition, dfInsights.RefGradeLevelId == dfInsights_refDefinition.Id, how='inner')
dfInsights = dfInsights.withColumnRenamed('Code', 'StudentGrade')
dfInsights = dfInsights.select('PersonId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'StudentGrade', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 57, 8, Finished, Available)

SynapseWidget(Synapse.DataFrame, 27f23c91-78ee-4fea-8948-211f34dafffe)

In [8]:
dfInsights = dfInsights.join(dfInsights_aaduserpersonmapping, dfInsights.PersonId_pseudonym == dfInsights_aaduserpersonmapping.PersonId_pseudonym, how='inner')
dfInsights = dfInsights.withColumnRenamed('ObjectId_pseudonym', 'StudentId_pseudonym')
dfInsights = dfInsights.select('StudentId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'StudentGrade', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 57, 9, Finished, Available)

SynapseWidget(Synapse.DataFrame, d6ed1bd5-02fe-4a5f-afcb-a8e804f2a9af)

### StudentAttendance Enrichment

In [21]:
dfContosoSIS_studentattendance = dfContosoSIS_studentattendance.select('student_id_pseudonym', 'AttendanceCode')
dfContosoSIS_studentattendance = dfContosoSIS_studentattendance.withColumn('AttendanceCode_value', F.when(F.col('AttendanceCode') == "P", 1).otherwise(0))
dfAttendance = dfContosoSIS_studentattendance.groupBy("student_id_pseudonym").mean("AttendanceCode_value").alias("AttendanceAverage")
display(dfAttendance.limit(10))

StatementMeta(sparkMed, 57, 22, Finished, Available)

SynapseWidget(Synapse.DataFrame, 0ab3acef-3f52-4ea0-8881-c43b3892274c)

In [22]:
dfAttendance = dfAttendance.withColumnRenamed('avg(AttendanceCode_value)', 'AverageAttendance')
dfInsights = dfInsights.join(dfAttendance, dfInsights.StudentId_pseudonym == dfAttendance.student_id_pseudonym, how='inner')
dfInsights = dfInsights.select('StudentId_pseudonym', 'Surname', 'GivenName', 'MiddleName', 'PersonRole', 'StudentGrade', 'AverageAttendance', 'OrganizationId', 'OrganizationName')
display(dfInsights.limit(10))

StatementMeta(sparkMed, 57, 23, Finished, Available)

SynapseWidget(Synapse.DataFrame, 5b2ff650-214e-43c4-add7-22376e4b3b8f)

### Write SIS Student_pseudo table to Stage 3p

In [19]:
dfInsights.coalesce(1).write.format('delta').mode('overwrite').option('header', True).save(oea.stage3p + '/hybrid_engagement/Student_pseudo')

StatementMeta(sparkMed, 55, 20, Finished, Available)

## Data Aggregations

### SIS Student_lookup table creation

In [26]:
dfInsights_person_np = oea.load('M365', 'Person_lookup', stage=oea.stage2np)
dfInsights_aaduser_np = oea.load('M365', 'AadUser_lookup', stage=oea.stage2np)
dfInsights_np = dfInsights_person_np.join(dfInsights_aaduserpersonmapping, dfInsights_person_np.Id_pseudonym == dfInsights_aaduserpersonmapping.PersonId_pseudonym, how='inner')
dfInsights_np = dfInsights_np.withColumnRenamed('Id', 'PersonId').withColumnRenamed('ObjectId_pseudonym', 'StudentId_pseudonym')
dfInsights_np = dfInsights_np.select('PersonId_pseudonym', 'PersonId', 'StudentId_pseudonym', 'Surname', 'GivenName', 'MiddleName')
display(dfInsights_np.limit(10))

StatementMeta(spark3p1sm, 64, 27, Finished, Available)

SynapseWidget(Synapse.DataFrame, a482389c-8164-444a-b4c1-639461999a21)

In [27]:
dfInsights_aaduser_np = dfInsights_aaduser_np.withColumnRenamed('Surname', 'Surname2').withColumnRenamed('GivenName', 'GivenName2')
dfInsights_np = dfInsights_np.join(dfInsights_aaduser_np, dfInsights_np.StudentId_pseudonym == dfInsights_aaduser_np.ObjectId_pseudonym, how='inner')
dfInsights_np = dfInsights_np.withColumnRenamed('ObjectId', 'StudentId')
dfInsights_np = dfInsights_np.select('PersonId_pseudonym', 'PersonId', 'StudentId_pseudonym', 'StudentId', 'Surname', 'GivenName', 'MiddleName')
display(dfInsights_np.limit(10))

StatementMeta(spark3p1sm, 64, 28, Finished, Available)

SynapseWidget(Synapse.DataFrame, f0701c06-2934-4924-b8c7-ef618e46a05e)

### Write SIS Student_lookup table to Stage 3np

In [21]:
dfInsights_np.coalesce(1).write.format('delta').mode('overwrite').option('header', True).save(oea.stage3np + '/hybrid_engagement/Student_lookup')

StatementMeta(sparkMed, 55, 22, Finished, Available)