# Test Data Generation: Canvas Tables Demonstration Notebook

**Affiliation**: *Kwantum Edu Analytics*. **Last Modified**: *5/18/2023*.

This OEA test data generation notebook illustrates use of the OEA_py, canvas_roster_test_data_gen_py and canvas_activity_test_data_gen_py python classes for creating and developing Canvas table test data in stage1.

Use the main function outlined in the canvas_roster_test_data_gen_py class notebook ```genCanvasRoster(startdate, enddate, reportgendate, use_general_module_base_truth)``` to create test data for **7** roster tables. 

Use the main function outlined in the canvas_activity_test_data_gen_py class notebook ```genCanvasActivity(startdate, enddate, reportgendate, canvas_roster_tables_source_path, max_num_activities_per_class)``` to create test data for **9** activity tables. 

Parameter descriptions, additional information around methods and test data generation processes/comments are given in the class notebooks. 

*These methods only create higher ed. Canvas module test data currently; these can be updated and adapted to generate K-12 test data.*

In [None]:
%run OEA_py

In [None]:
# set the workspace (this determines where in the data lake you'll be writing to and reading from).
# You can work in 'dev', 'prod', or a sandbox with any name you choose.
# For example, Sam the developer can create a 'sam' workspace and expect to find his datasets in the data lake under oea/sandboxes/sam
oea.set_workspace('dev')

## Generate Canvas Roster Test Data

The functions below create the 7 tables described in the canvas_roster_test_data_gen_py class notebook.

In [None]:
%run /canvas_roster_test_data_gen_py

In [None]:
rosterdatagen = CanvasRosterDataGen()

In [None]:
# depending on sizes of base tables, Canvas generation can take up to 5 min
# refer to the canvas_roster_test_data_gen_py class notebook for additional details

start_date = '2022-01-01T00:00:00' # roster start date
end_date = '2022-06-01T00:00:00' # roster end date
report_gen_date = '2022-02-02T00:00:00' # date the tables/reports were (fictitously) generated
use_general_module_base_truth_tables = True # <- choose whether you'd like to generate test data based on user-generated base-truth tables (set to "False"), or to import and use general module base-truth tables (to link with other OEA module test datasets; set to "True").

rosterdatagen.genCanvasRoster(start_date, end_date, report_gen_date, use_general_module_base_truth_tables)

In [None]:
import json
import pandas as pd
#canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/canvas_gen3'
canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1'
#canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/test'
#canvas_accounts = oea.load_json(f'{canvas_roster_tables_source_path}/accounts/*.json', multiline=True)
#display(canvas_accounts)

# NOTE: for some reason line 9 throws an AttributeError
#dfAccounts = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/accounts/*.json'), lines=True)

# NOTE: THE LINES BELOW WORK, BUT ONLY READS IN ONE LINE OF THE JSON..?
#dfAccounts = oea.load_json(f'{canvas_roster_tables_source_path}/accounts/*.json', multiline=True)
dfAccounts = oea.load_json(f'{canvas_roster_tables_source_path}/accounts.json', multiline=True)
display(dfAccounts.limit(10))

In [None]:
# NOTE: Same AttributeError thrown...
import json
import pandas as pd
canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/canvas_gen3'
#df = pd.read_json(f'{canvas_roster_tables_source_path}/accounts/*.json', orient = 'records', dtype={"A":str, "B":list})
df = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/accounts/*.json'), orient = 'records')
display(df)

In [None]:
pd.show_versions()

## Generate Canvas Activity Test Data

The functions below create the 9 tables described in the canvas_activity_test_data_gen_py class notebook.

**Note**: The function to generate Canvas activity test data, requires that Canvas roster test data tables have already been created.

In [None]:
%run /canvas_activity_test_data_gen_py

In [None]:
activitydatagen = CanvasActivityDataGen()

In [None]:
start_date = '2022-01-01T00:00:00' # roster start date
end_date = '2022-06-01T00:00:00' # roster end date
report_gen_date = '2022-02-02T00:00:00' # date the tables/reports were (fictitously) generated
canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/canvas_gen2' # <- directory path of the Canvas roster tables
max_num_activities_per_class = 3 # <- choose max number of assignments, lessons, etc. you'd like to generate per class (NOTE: students with activities are chosen at random).

activitydatagen.genCanvasActivity(start_date, end_date, report_gen_date, canvas_roster_tables_source_path, max_num_activities_per_class)

In [None]:
canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/canvas_gen2'
#canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/test'
#canvas_accounts = oea.load_json(f'{canvas_roster_tables_source_path}/accounts/*.json', multiline=True)
#display(canvas_accounts)
dfAccounts = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/users/*.json'), lines=True)
display(dfAccounts)

In [None]:
canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/canvas_activity_gen'
#canvas_roster_tables_source_path = 'stage1/Transactional/test_data/v0.1/test'
#canvas_accounts = oea.load_json(f'{canvas_roster_tables_source_path}/accounts/*.json', multiline=True)
#display(canvas_accounts)
dfAssign = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/assignments/*.json'), lines=True)
display(dfAssign)

In [None]:
dfAssign_spark = spark.createDataFrame(dfAssign)
num_assigns = dfAssign_spark.count()
print('number of assignments generated: ' + str(num_assigns))
dfTest = dfAssign_spark.dropDuplicates('id')
test_num_assigns = dfTest.count()
print('number of assignments with unique IDs: ' + str(test_num_assigns))

In [None]:
display(dfAssign_spark)