# Moodle Module - Refine

This notebook demonstrates the utility of the OEA_py class notebook, and speeding up the process of refining/pseudonymizing the Moodle data. 

The steps outlined below describe how this notebook is used to refine tables originally from the Moodle data source:

- Set the workspace for where the tables are located. 
- 2 functions are defined and used:
   1. **oea.refine**: utilizes the OEA_py refine function normally, as expected.
   2. **refine_moodle_dataset**: uses a simple function that iterate through refining Moodle table currently contained in ```stage2/Ingested/moodle``` of the data lake.

In [None]:
workspace = 'dev'

In [None]:
%run OEA_py

In [None]:
# 1) set the workspace (this determines where in the data lake you'll be writing to and reading from).
# You can work in 'dev', 'prod', or a sandbox with any name you choose.
# For example, Sam the developer can create a 'sam' workspace and expect to find his datasets in the data lake under oea/sandboxes/sam
oea.set_workspace(workspace)

In [None]:
# 2) this step refines the data through the use of metadata (this is where the pseudonymization of the data occurs).
def refine_moodle_dataset(tables_source):
    items = oea.get_folders(tables_source)
    for item in items: 
        table_path = tables_source +'/'+ item
        if item == 'metadata.csv':
            logger.info('ignore metadata processing, since this is not a table to be ingested')
        else:
            try:
                if item == 'assign':
                    oea.refine('moodle/v0.1/assign', metadata[item], 'id_pseudonym')
                elif item == 'user':
                    oea.refine('moodle/v0.1/user', metadata[item], 'id_pseudonym')
                else:
                    oea.refine('moodle/v0.1/' + item, metadata[item], 'id')
            except AnalysisException as e:
                # This means the table may have not been properly refined due to errors with the primary key not aligning with columns expected in the lookup table.
                pass
            
            logger.info('Refined table: ' + item + ' from: ' + table_path)
    logger.info('Finished refining Moodle tables')

In [None]:
#metadata = oea.get_metadata_from_url('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_catalog/Moodle/test_data/metadata.csv')
metadata = oea.get_metadata_from_url('https://raw.githubusercontent.com/cstohlmann/oea-moodle-module/main/test_data/metadata.csv')
refine_moodle_dataset('stage2/Ingested/moodle/v0.1')