# Graph API Module Example 
This example demonstrates how to use the Graph API module to process incoming Microsoft Graph data, perform data prep, and view the data in an example Power BI dashboard.
## Preface
a.) Before running this notebook, you will need to change the storage account URLs to match your own, within every cell.

b.) Above each of the data processing cells in each step, there is "file viewer" code commented out. You can uncomment this for debugging purposes, or to see the contents within each file.
## Running the example (LAST STEP NEEDS TO BE EDITED)
1.) Select your spark pool in the "Attach to" dropdown list above.

2.) Click on "Publish" in the top nav bar (and wait a few seconds for the notification that says "Publishing completed").

3.) Click on "Run all" at the top of this tab (and wait for the processing to complete - which can take around 5 to 10 minutes).

4.) Open the dashboard in Power BI desktop and point it to your newly setup data lake (you can download the pbix from here: [techInequityDashboardContoso v2.pbix](https://github.com/microsoft/OpenEduAnalytics/blob/main/packages/ContosoISD/power_bi/techInequityDashboardContoso%20v2.pbix) )

## 1.) Processing "users" raw data from Graph API - stage1np to stage2np
Data is cleaned from "users" in stage1np of the data lake, written as a parquet, and landed in stage2np. 


In [None]:
# View "users" JSON
#%%pyspark
#df = spark.read.load('abfss://stage1np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/users', format='json')
#display(df.limit(10))

In [3]:
%%pyspark
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, ArrayType
from pyspark.sql.functions import explode

user_schema = StructType(fields=[
    StructField('value', ArrayType(
        StructType([
            StructField('surname', StringType(), False),
            StructField('givenName', StringType(), False),
            StructField('userPrincipalName', StringType(), False),
            StructField('id', StringType(), False)
        ])
    ))
])

df = spark.read.load('abfss://stage1np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/users', format='json', schema=user_schema)
df = df.select(explode('value').alias('exploded_values')).select("exploded_values.*")
display(df.limit(10))
df.write.format('parquet').mode('overwrite').save('abfss://stage2np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/users')

# Create spark db graphapi to allow for access to the data in the data lake via SQL on-demand, and create the table "users".
spark.sql('CREATE DATABASE IF NOT EXISTS GRAPHAPI')
spark.sql("create table if not exists GraphAPI.users using PARQUET location 'abfss://stage2np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/users'")

StatementMeta(spark3p1sm, 32, 3, Finished, Available)

SynapseWidget(Synapse.DataFrame, f7cb3327-e39b-42c0-87fa-32507488be41)

## 2.) Processing "Microsoft 365 app user detail" raw data from Graph API - stage1np to stage2np (NEEDS EDITING)
Data is cleaned from "m365_app_user_detail" in stage1np of the data lake, written as a parquet, and landed in stage2np.

In [4]:
# View "m365_app_user_detail" JSON
#%%pyspark
#df = spark.read.load('abfss://stage1np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/m365_app_user_detail', format='json')
#display(df.limit(10))

StatementMeta(spark3p1sm, 32, 4, Finished, Available)

In [15]:
%%pyspark
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, ArrayType
from pyspark.sql.functions import explode

m365_app_user_detail_schema = StructType(fields=[
    StructField('value', ArrayType(
        StructType([
            StructField('reportRefreshDate', StringType(), False),
            StructField('userPrincipalName', StringType(), False),
            StructField('lastActivityDate', StringType(), False),
            StructField('details', ArrayType(
                StructType([
                    StructField('reportPeriod', StringType(), False),
                    StructField('excel', StringType(), False),
                    StructField('excelWeb', StringType(), False),
                    StructField('outlook', StringType(), False),
                    StructField('outlookWeb', StringType(), False),
                    StructField('powerPoint', StringType(), False),
                    StructField('powerPointWeb', StringType(), False),
                    StructField('teams', StringType(), False),
                    StructField('teamsWeb', StringType(), False),
                    StructField('word', StringType(), False),
                    StructField('wordWeb', StringType(), False),
                ])
            ), False),
        ])
    ))
])

df = spark.read.load('abfss://stage1np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/m365_app_user_detail', format='json', schema=m365_app_user_detail_schema)
df = df.select(explode('value').alias('exploded_values')).select("exploded_values.*")
display(df.limit(10))
df.write.format('parquet').mode('overwrite').save('abfss://stage2np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/m365_app_user_detail')

# Create table "m365_app_user_detail" in spark db "graphapi" to allow for access to the data in the data lake.
spark.sql("create table if not exists GraphAPI.m365_app_user_detail using PARQUET location 'abfss://stage2np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/m365_app_user_detail'")

StatementMeta(spark3p1sm, 32, 15, Finished, Available)

SynapseWidget(Synapse.DataFrame, dc059b89-52cd-4f7e-bb51-3adf743b6776)

## 3.) Processing "Teams activity user details" raw data from Graph API - stage1np to stage2np
Data is cleaned from "teams_activity_user_details" in stage1np of the data lake, written as a parquet, and landed in stage2np.

In [1]:
# View "teams_activity_user_details" JSON
#%%pyspark
#df = spark.read.load('abfss://stage1np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/teams_activity_user_details', format='json')
#display(df.limit(10))

StatementMeta(spark3p1sm, 32, 1, Finished, Available)

In [7]:
teams_activity_user_details_schema = StructType(fields=[
    StructField('value', ArrayType(
        StructType([
            StructField('reportRefreshDate', StringType(), False),
            StructField('reportPeriod', StringType(), False),
            StructField('userPrincipalName', StringType(), False),
            StructField('privateChatMessageCount', IntegerType(), False),
            StructField('teamChatMessageCount', IntegerType(), False),
            StructField('meetingsAttendedCount', IntegerType(), False),
            StructField('meetingCount', IntegerType(), False),
            StructField('audioDuration', StringType(), False),
        ])
    ))
])

df = spark.read.load('abfss://stage1np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/teams_activity_user_details', format='json', schema=teams_activity_user_details_schema)
df = df.select(explode('value').alias('exploded_values')).select("exploded_values.*")
display(df.limit(10))
df.write.format('parquet').mode('overwrite').save('abfss://stage2np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/teams_activity_user_details')

# Create table "teams_activity_user_details" in spark db "graphapi" to allow for access to the data in the data lake.
spark.sql("create table if not exists GraphAPI.teams_activity_user_details using PARQUET location 'abfss://stage2np@stoeahybriddev2.dfs.core.windows.net/GraphAPI/teams_activity_user_details'")

StatementMeta(spark3p1sm, 32, 7, Finished, Available)

SynapseWidget(Synapse.DataFrame, c95b89c8-7b82-4436-9053-e7927f15dde8)

# Reset everything
You can uncomment line 8 in the last cell below and run that cell to reset everything and walk through the process again, INCLUDING THE PIPELINE INTEGRATION TRIGGER.

Note: remember to comment out line 8 again to prevent accidental resetting of the example

In [10]:
%run /OEA_py_updated

StatementMeta(, 32, -1, Finished, Available)

In [11]:
oea = OEA()

StatementMeta(spark3p1sm, 32, 11, Finished, Available)

2021-09-22 22:17:20,706 - OEA - DEBUG - OEA initialized.
OEA initialized.

In [14]:
def reset_all_processing():
    oea.rm_if_exists(oea.stage1np + '/GraphAPI')
    oea.rm_if_exists(oea.stage2np + '/GraphAPI')

    oea.drop_db('graphapi')

# Uncomment the following line and run this cell to reset everything if you want to walk through the process again, INCLUDING THE PIPELINE INTEGRATION TRIGGER.
#reset_all_processing()

StatementMeta(spark3p1sm, 32, 14, Finished, Available)

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Metadata service API com.microsoft.catalog.metastore.sasClient.SasClient$API@600b544d failed with status 500 (Internal Server Error) Response Body ({"result":"DependencyError","errorId":"InternalServerError","errorMessage":"Request to endpoint : SynapseMetadataService failed with Exception : The delegate executed asynchronously through TimeoutPolicy did not complete within the timeout.. TraceId : 61dca612-e876-4e47-94fc-d6c950be8b68. Error Component : SynapseMetadataService"}))