# ContosoISD Example
This example demonstrates how to use the OEA framework and modules to process incoming data, perform data prep, and view the data in an example Power BI dashboard.

# Running the example
1) Select your spark pool in the "Attach to" dropdown list above.

2) Click on "Publish" in the top nav bar (and wait a few seconds for the notification that says "Publishing completed").

3) Click on "Run all" at the top of this tab (and wait for the processing to complete - which can take around 5 to 10 minutes).

4) Open the dashboard in Power BI desktop and point it to your newly setup data lake (you can download the pbix from here: [techInequityDashboardContoso v2.pbix](https://github.com/microsoft/OpenEduAnalytics/blob/main/packages/ContosoISD/power_bi/techInequityDashboardContoso%20v2.pbix) )

# More info
See [OEA Solution Guide](https://github.com/microsoft/OpenEduAnalytics/blob/main/docs/OpenEduAnalyticsSolutionGuide.pdf) for more details on this example.

In [12]:
%run /OEA_py

StatementMeta(, 62, -1, Finished, Available)

In [13]:
%run /example_modules_py

StatementMeta(, 62, -1, Finished, Available)

In [14]:
# 0) Initialize the OEA framework and modules needed.
oea = OEA()
m365 = M365(oea)
contoso_sis = ContosoSIS(oea, 'contoso_sis', False)

StatementMeta(spark3p1sm, 62, 14, Finished, Available)

2021-10-20 15:07:53,616 - OEA - DEBUG - OEA initialized.
OEA initialized.

In [15]:
# 1) Land data into stage1 of your data lake, from multiple source systems (this example copies in test data sets that came with the OEA installation).
contoso_sis.copy_test_data_to_stage1()
m365.copy_test_data_to_stage1()

StatementMeta(spark3p1sm, 62, 15, Finished, Available)

In [16]:
# 2) Process the raw data (csv format) from stage1 into stage2 (adds schema details and writes out in parquet format).
#    [Note: we're not performing pseudonymization in this example, so everything is written to container stage2np.]
m365.process_roster_data_from_stage1()
contoso_sis.process_data_from_stage1()
m365.process_activity_data_from_stage1()

StatementMeta(spark3p1sm, 62, 16, Finished, Available)

Processing ms_insights roster data from: abfss://stage1np@stoeahybriddev4.dfs.core.windows.net/m365
Processing roster entity: path=abfss://stage1np@stoeahybriddev4.dfs.core.windows.net/m365/DIPData/Roster/Calendar.csv, entity=Calendar
Processing roster entity: path=abfss://stage1np@stoeahybriddev4.dfs.core.windows.net/m365/DIPData/Roster/Course.csv, entity=Course
Processing roster entity: path=abfss://stage1np@stoeahybriddev4.dfs.core.windows.net/m365/DIPData/Roster/Org.csv, entity=Org
Processing roster entity: path=abfss://stage1np@stoeahybriddev4.dfs.core.windows.net/m365/DIPData/Roster/Person.csv, entity=Person
Processing roster entity: path=abfss://stage1np@stoeahybriddev4.dfs.core.windows.net/m365/DIPData/Roster/PersonIdentifier.csv, entity=PersonIdentifier
Processing roster entity: path=abfss://stage1np@stoeahybriddev4.dfs.core.windows.net/m365/DIPData/Roster/RefDefinition.csv, entity=RefDefinition
Processing roster entity: path=abfss://stage1np@stoeahybriddev4.dfs.core.windows.n

In [17]:
# 3) Run additional prep on the data to create a unified dataset that can be used in a Power BI report

# Process sectionmark data. Convert id values to use the Person.Id and Section.Id values set in the m365 data.
sqlContext.registerDataFrameAsTable(spark.read.format('parquet').load(oea.stage2np + '/contoso_sis/studentsectionmark'), 'SectionMark')
sqlContext.registerDataFrameAsTable(spark.read.format('parquet').load(oea.stage2np + '/m365/Person'), 'Person')
sqlContext.registerDataFrameAsTable(spark.read.format('parquet').load(oea.stage2np + '/m365/Section'), 'Section')
df = spark.sql("select sm.id Id, p.Id PersonId, s.Id SectionId, cast(sm.numeric_grade_earned as int) NumericGrade, \
sm.alpha_grade_earned AlphaGrade, sm.is_final_grade IsFinalGrade, cast(sm.credits_attempted as int) CreditsAttempted, cast(sm.credits_earned as int) CreditsEarned, \
sm.grad_credit_type GraduationCreditType, sm.id ExternalId, CURRENT_TIMESTAMP CreateDate, CURRENT_TIMESTAMP LastModifiedDate, true IsActive \
from SectionMark sm, Person p, Section s \
where sm.student_id = p.ExternalId \
and sm.section_id = s.ExternalId")
#df.write.format('parquet').mode('overwrite').save(oea.stage2np + '/ContosoISD/SectionMark')

# Repeat the above process, this time for student attendance
# Convert id values to use the Person.Id, Org.Id and Section.Id values
sqlContext.registerDataFrameAsTable(spark.read.format('parquet').load(oea.stage2np + '/contoso_sis/studentattendance'), 'Attendance')
sqlContext.registerDataFrameAsTable(spark.read.format('parquet').load(oea.stage2np + '/m365/Org'), 'Org')
df = spark.sql("select att.id Id, p.Id PersonId, att.school_year SchoolYear, o.Id OrgId, att.attendance_date AttendanceDate, \
att.all_day AllDay, att.Period Period, s.Id SectionId, att.AttendanceCode AttendanceCode, att.PresenceFlag PresenceFlag, \
att.attendance_status AttendanceStatus, att.attendance_type AttendanceType, att.attendance_sequence AttendanceSequence \
from Attendance att, Org o, Person p, Section s \
where att.student_id = p.ExternalId \
and att.school_id = o.ExternalId \
and att.section_id = s.ExternalId")
#df.write.format('parquet').mode('overwrite').save(oea.stage2np +'/ContosoISD/Attendance')

# Add 'Department' column to Course (hardcoded to "Math" for this Contoso example)
sqlContext.registerDataFrameAsTable(spark.read.format('parquet').load(oea.stage2np + '/m365/Course'), 'Course')
df = spark.sql("select Id, Name, Code, Description, ExternalId, CreateDate, LastModifiedDate, IsActive, CalendarId, 'Math' Department from Course")
#df.write.format('parquet').mode('overwrite').save(oea.stage2np + '/ContosoISD/Course')

StatementMeta(spark3p1sm, 62, 17, Finished, Available)

In [18]:
# 4) Create spark db's that point to the data in the data lake to allow for connecting via Power BI through use of the Serverless SQL endpoint.
contoso_sis.create_stage2_db('PARQUET')
m365.create_stage2_db('PARQUET')

#spark.sql('CREATE DATABASE IF NOT EXISTS s2_ContosoISD')
#spark.sql("create table if not exists s2_ContosoISD.Activity using PARQUET location '" + oea.stage2np + "/m365/TechActivity'")
#spark.sql("create table if not exists s2_ContosoISD.Calendar using PARQUET location '" + oea.stage2np + "/m365/Calendar'")
#spark.sql("create table if not exists s2_ContosoISD.Org using PARQUET location '" + oea.stage2np + "/m365/Org'")
#spark.sql("create table if not exists s2_ContosoISD.Person using PARQUET location '" + oea.stage2np + "/m365/Person'")
#spark.sql("create table if not exists s2_ContosoISD.PersonIdentifier using PARQUET location '" + oea.stage2np + "/m365/PersonIdentifier'")
#spark.sql("create table if not exists s2_ContosoISD.RefDefinition using PARQUET location '" + oea.stage2np + "/m365/RefDefinition'")
#spark.sql("create table if not exists s2_ContosoISD.Section using PARQUET location '" + oea.stage2np + "/m365/Section'")
#spark.sql("create table if not exists s2_ContosoISD.Session using PARQUET location '" + oea.stage2np + "/m365/Session'")
#spark.sql("create table if not exists s2_ContosoISD.StaffOrgAffiliation using PARQUET location '" + oea.stage2np + "/m365/StaffOrgAffiliation'")
#spark.sql("create table if not exists s2_ContosoISD.StaffSectionMembership using PARQUET location '" + oea.stage2np + "/m365/StaffSectionMembership'")
#spark.sql("create table if not exists s2_ContosoISD.StudentOrgAffiliation using PARQUET location '" + oea.stage2np + "/m365/StudentOrgAffiliation'")
#spark.sql("create table if not exists s2_ContosoISD.StudentSectionMembership using PARQUET location '" + oea.stage2np + "/m365/StudentSectionMembership'")
#spark.sql("create table if not exists s2_ContosoISD.Course using PARQUET location '" + oea.stage2np + "/ContosoISD/Course'")
#spark.sql("create table if not exists s2_ContosoISD.Attendance using PARQUET location '" + oea.stage2np + "/ContosoISD/Attendance'")
#spark.sql("create table if not exists s2_ContosoISD.SectionMark using PARQUET location '" + oea.stage2np + "/ContosoISD/SectionMark'")
#spark.sql("create table if not exists s2_ContosoISD.Students using PARQUET location '" + oea.stage2np + "/ContosoISD/Students'")

print(f"Created spark db's.\nYou can now open the 'techInequityDashboardContoso v2.pbix' dashboard and change the datasource to point to: {oea.serverless_sql_endpoint}")

StatementMeta(spark3p1sm, 62, 18, Finished, Available)

[OEA] Could not get list of folders in specified path: abfss://stage2p@stoeahybriddev4.dfs.core.windows.net/contoso_sis
This may be because the path does not exist.
Database created: s2_contoso_sis
Database created: s2_contoso_sis
[OEA] Could not get list of folders in specified path: abfss://stage2p@stoeahybriddev4.dfs.core.windows.net/m365
This may be because the path does not exist.
Database created: s2_m365
Database created: s2_m365
Created spark db's.
You can now open the 'techInequityDashboardContoso v2.pbix' dashboard and change the datasource to point to: syn-oea-hybriddev4-ondemand.sql.azuresynapse.net

# Reset everything
You can uncomment line 11 in the cell below and run the cell to reset everything and walk through the process again from the top.

Note: remember to comment out line 11 again to prevent accidental resetting of the example

In [19]:
def reset_all_processing():
    contoso_sis.delete_all_stages()
    m365.delete_all_stages()
    oea.rm_if_exists(oea.stage2np + '/ContosoISD')

    oea.drop_db('s2_contoso_sis')
    #oea.drop_db('s2_contosoisd')
    oea.drop_db('s2_m365')

# Uncomment the following line and run this cell to reset everything if you want to walk through the process again.
#reset_all_processing()

StatementMeta(spark3p1sm, 62, 19, Finished, Available)