# Insights Module - Ingest

This notebook demonstrates the utility of the OEA_py class notebook, and speeding up the process of ingesting the Insights data.

The steps outlined below describe how this notebook is used to ingest the Microsoft Education Insights module tables:

- Set the workspace for where the tables are located. 
- 1 function is defined and used:
   1. **ingest_insights_dataset**: identifies primary keys per table and ingests each table from Insights (except PersonRelationship and RefTranslation - which currently don't have test data).

In [None]:
workspace = 'dev'
version = '1.14'

In [None]:
%run OEA_py

In [None]:
# 1) set the workspace (this determines where in the data lake you'll be writing to and reading from).
# You can work in 'dev', 'prod', or a sandbox with any name you choose.
# For example, Sam the developer can create a 'sam' workspace and expect to find his datasets in the data lake under oea/sandboxes/sam
oea.set_workspace(workspace)

In [None]:
items = oea.get_folders('stage1/Transactional/M365/v' + version)

In [None]:
print(items)

In [None]:
# 2) this step refines the data through the use of metadata (this is where the pseudonymization of the data occurs).
def ingest_insights_dataset(tables_source):
    items = oea.get_folders(tables_source)
    options = {'header':False}
    for item in items: 
        table_path = tables_source +'/'+ item
        try:
            if item == 'metadata.csv':
                logger.info('ignore metadata processing, since this is not a table to be ingested')
            elif item == 'activity':
                oea.ingest('M365/v'+ version +'/' + item, '_c3', options)
            elif item == 'AadGroupMembership':
                oea.ingest('M365/v' + version + '/' + item, '_c5', options)
            elif item == 'PersonRelationship':
                logger.info('No test data')
            elif item == 'RefTranslation':
                logger.info('No test data')
            else:
                oea.ingest('M365/v'+ version + '/' + item, '_c0', options)
        except AnalysisException as e:
            # This means the table may have not been properly refined due to errors with the primary key not aligning with columns expected in the lookup table.
            pass

In [None]:
metadata = oea.get_metadata_from_url('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_catalog/Microsoft_Education_Insights/test_data/metadata.csv')
ingest_insights_dataset('stage1/Transactional/M365/v' + version)