# Test Data Generation: Moodle Tables Demonstration Notebook

**Affiliation**: *Kwantum Edu Analytics*. **Last Modified**: *3/21/2023*.

This OEA test data generation notebook illustrates use of the OEA_py and moodle_test_data_gen_py python classes for creating and developing Moodle table test data in stage1.

Use the main function outlined in the moodle_test_data_gen_py class notebook ```genMoodle(startdate, enddate, use_general_module_base_truth, gen_activity, num_activities)``` to create test data for the **26** SIS/activity tables. 

Parameter descriptions, additional information around methods and test data generation processes/comments are given in the moodle_test_data_gen_py class notebook.

In [74]:
%run OEA_py

StatementMeta(, 2, -1, Finished, Available)

2023-03-21 18:40:25,807 - OEA - INFO - Now using workspace: dev
2023-03-21 18:40:25,808 - OEA - INFO - OEA initialized.


In [75]:
# set the workspace (this determines where in the data lake you'll be writing to and reading from).
# You can work in 'dev', 'prod', or a sandbox with any name you choose.
# For example, Sam the developer can create a 'sam' workspace and expect to find his datasets in the data lake under oea/sandboxes/sam
oea.set_workspace('dev')

StatementMeta(spark3p2medTDG, 2, 2, Finished, Available)

2023-03-21 18:40:26,745 - OEA - INFO - Now using workspace: dev


In [85]:
%run /moodle_test_data_gen_py

StatementMeta(, 2, -1, Finished, Available)

In [86]:
# this might take some time to initialize the tables to be generated
datagen = MoodleDataGen()

StatementMeta(spark3p2medTDG, 2, 13, Finished, Available)

In [87]:
# depending on sizes of base tables, Moodle generation can take up to 5 min (approx. 2 min when using general module base tables) 
# refer to the moodle_test_data_gen_py class notebook for additional details

start_date = '2022-01-01T00:00:00' # roster start date
end_date = '2022-06-01T00:00:00' # roster end date
use_general_module_base_truth_tables = True # <- choose whether you'd like to generate test data based on user-generated base-truth tables (set to "False"), or to import and use general module base-truth tables (to link with other OEA module test datasets; set to "True").
generate_activity_data = True # <- choose whether you'd like to generate Moodle activity data (NOTE: may require some work to accurately mirror Moodle activity data).
number_of_activity_signals = 5 # <- choose how many rows of student activity data you'd like to generate (NOTE: students/classes are chosen at random).

datagen.genMoodle(start_date, end_date, use_general_module_base_truth_tables, generate_activity_data, number_of_activity_signals)
# NOTE: looks like attempt #s don't necessarily align with max attempts allowed; may need debugging.
    # This affects the following Moodle test data generated tables: assign_submission, forum_discussions, lesson_attempts, and quiz_attempts

StatementMeta(spark3p2medTDG, 2, 14, Finished, Available)

2023-03-21 19:04:36,303 - OEA - INFO - General module base-truth tables already exist - delete the "base_general_modules" folder/directory if you want to replace these.
2023-03-21 19:04:38,752 - OEA - INFO - Generating Moodle test data based on general module base-truth tables...
2023-03-21 19:05:44,729 - OEA - INFO - Successfully generated Moodle SIS/rostering tables.
2023-03-21 19:05:44,729 - OEA - INFO - Now generating Moodle activity tables...
2023-03-21 19:07:00,284 - OEA - INFO - Successfully generated Moodle activity tables (for assignments, quizzes, forums, lessons and messages).
2023-03-21 19:07:00,284 - OEA - INFO - Finished Moodle generation.


In [43]:
dfActivity = oea.load_csv('stage1/Transactional/test_data/v0.1/moodle_gen/enrol/snapshot_batch_data/*/*.csv', header=False)
display(dfActivity.limit(10))

StatementMeta(spark3p2medTDG, 14, 43, Finished, Available)

SynapseWidget(Synapse.DataFrame, fbf9f41e-76c2-471e-b8ae-fed459999e78)

## Tests

In [51]:
# testing context table created
gen_context = datagen.moodle_context
dfContext = spark.createDataFrame(gen_context)
# sample from course table created
num_courses = len(datagen.moodle_course.index) - 1
random_course = random.randint(0,num_courses)
course = datagen.moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
#
dfBT_enroll = spark.createDataFrame(datagen.enrollment)
dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course}')
num_students_in_course = dfEnroll.count()
# find course name
course_name = dfEnroll.groupBy('SectionName').count().collect()[0][0]
# find context ID for message
dfBT_sections = spark.createDataFrame(datagen.sections)
school_id = dfBT_sections.filter(dfBT_sections['SectionID']==f'{course}').collect()[0][5]
dfContext = dfContext.filter(dfContext['instanceid']==f'{school_id}').filter(dfContext['contextlevel']==2)
dfContext = dfContext.filter(dfContext['path'].endswith(f'{course_name}'))
context_id = dfContext.collect()[0][0]
print(course_name)
print(context_id)
display(dfContext)

StatementMeta(spark3p2medTDG, 1, 51, Finished, Available)

Intro to Blacksmithing 230
220aa96a-b657-4ba7-bc71-65ce1cfa5865


SynapseWidget(Synapse.DataFrame, 6ed9bb96-1f52-4998-b5b8-61caa0066995)

In [9]:
# testing forum table created
gen_forum = datagen.moodle_forum
print(gen_forum)

StatementMeta(spark3p2medTDG, 1, 9, Finished, Available)

Empty DataFrame
Columns: [id, course, type, name, intro, introformat, duedate, cutoffdate, assessed, assesstimestart, assesstimefinish, scale, grade_forum, grade_forum_notify, maxbytes, maxattachments, forcesubscribe, trackingtype, rsstype, rssarticles, timemodified, warnafter, blockafter, blockperiod, completiondiscussions, completionreplies, completionposts, displaywordcount, lockdiscussionafter]
Index: []

[0 rows x 29 columns]


In [34]:
# testing enrollment
dfTest_BTT = oea.load_csv('stage1/Transactional/test_data/v0.1/base_enrollment/snapshot_batch_data/*/*.csv', header=True)
dfTest_moodleGen_enrol = oea.load_csv('stage1/Transactional/test_data/v0.1/moodle_gen/enrol/snapshot_batch_data/*/*.csv', header=True)

StatementMeta(spark3p2medTDG, 0, 34, Finished, Available)

In [60]:
moodle_course = oea.load_csv('stage1/Transactional/test_data/v0.1/moodle_gen/course/snapshot_batch_data/*/*.csv', header=True).toPandas()
num_courses = len(moodle_course.index) - 1
random_course = random.randint(0,num_courses)
course = moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
print(course)

StatementMeta(spark3p2medTDG, 0, 60, Finished, Available)

299679cf-f1eb-4bcc-abf4-4386f71f4a2b


In [61]:
course = moodle_course.filter(items=[random_course], axis=0)
print(course)

StatementMeta(spark3p2medTDG, 0, 61, Finished, Available)

                                      id  \
14  299679cf-f1eb-4bcc-abf4-4386f71f4a2b   

                                category sortorder          fullname  \
14  39fda5e1-549d-497e-a608-68501c61e94a         0  Color Theory 247   

       shortname idnumber summary summaryformat  format showgrades  ... theme  \
14  Color Theory     None    null             0  topics          1  ...  None   

            timecreated timemodified requested enablecompletion  \
14  2022-01-01T00:00:00            0         0                0   

   completionnotify cacherev                      originalcourseid  \
14                0        0  39fda5e1-549d-497e-a608-68501c61e94a   

   showactivitydates showcompletionconditions  
14                 1                        0  

[1 rows x 36 columns]
