# Canvas Module - Ingest

This notebook demonstrates the utility of the OEA_py class notebook, and speeding up the process of ingesting the Canvas data.

The steps outlined below describe how this notebook is used to ingest the Canvas module tables:

- Set the workspace for where the tables are located. 
- 1 function is defined and used:
  - ```ingest_canvas_dataset```: identifies primary keys per table and ingests each table from Canvas listed:
 1. **accounts**
 2. **assignments**
 3. **content_tags**
 4. **context_modules**
 5. **courses**
 6. **course_sections**
 7. **enrollments**
 8. **enrollment_terms**
 9. **quiz_submissions**
 10. **quizzes**
 11. **roles**
 12. **submissions**
 13. **users** 

In [None]:
workspace = 'dev'
version = '2.0'

In [None]:
%run OEA_py

In [None]:
# 1) set the workspace (this determines where in the data lake you'll be writing to and reading from).
# You can work in 'dev', 'prod', or a sandbox with any name you choose.
# For example, Sam the developer can create a 'sam' workspace and expect to find his datasets in the data lake under oea/sandboxes/sam
oea.set_workspace(workspace)

In [None]:
# this function ingests each canvas table from stage1/../canvas_preprocessed/...
def ingest_canvas_dataset(tables_source):
    items = oea.get_folders(f'stage1/Transactional/{tables_source}')
    for item in items: 
        table_path = f'canvas/v{version}/{item}'
        try:
            # 3 paths: check_path is for checking whether the table should be ingested, read_path is for reading the stage1 CSV location, write path for stage2 ingested location
            if item == 'metadata.csv':
                logger.info('ignore metadata csv - not a table to be ingested')
            elif item == 'content_tags':
                oea.ingest(table_path, 'content_id')
            else:
                oea.ingest(table_path, 'id')
        except AnalysisException as e:
            # This means the table may have not been properly refined due to errors with the primary key not aligning with columns expected in the lookup table.
            pass
    logger.info('Finished ingesting the most recent Canvas data')

In [None]:
# ingest the canvas dataset
ingest_canvas_dataset(f'canvas/v{version}')