# Test Data Generation: Moodle Tables Class

**Affiliation**: *Kwantum Edu Analytics*. **Last Modified**: *3/21/2023*.

This OEA test data generation class notebook generates fictitous Moodle tables, as seen in the Moodle module. This notebook is needed to successfully run the moodle_test_data_gen_demo notebook.

For reference of all Moodle outlined below, see Moodle table schemas here: https://www.examulator.com/er/4.0/

This class notebook primarily leans on the use of the OEA_py class notebook, ```Faker``` and ```random``` python packages, and already-generated base-truth tables to generate **26** Moodle module tables (8 SIS/rostering tables and 18 activity tables):

 1. **assign**
 2. **assign_grades**
 3. **assign_submission**
 4. **assignsubmission_file**
 5. **assign_user_mapping**
 6. **context**
 7. **course**
 8. **course_categories**
 9. **course_sections** <- not created at the moment
 10. **enrol**
 11. **forum**
 12. **forum_discussions**
 13. **forum_grades**
 14. **lesson**
 15. **lesson_answers**
 16. **lesson_attempts**
 17. **lesson_grades**
 18. **messages**
 19. **message_conversations**
 20. **message_conversation_members**
 21. **quiz**
 22. **quiz_attempts**
 23. **quiz_grades**
 24. **role**
 25. **role_assignments**
 26. **user** 
 27. **user_enrolments**

There is one main method ```genMoodle(startdate, enddate, use_general_module_base_truth, gen_activity, num_activities)``` to generate the tables described. Parameter descriptions are given:
  - *startdate*: roster start date.
  - *enddate*: roster end date.
  - *use_general_module_base_truth*: boolean argument indicating whether to use the general-module base-truth tables (i.e., base-truth tables that link students, courses, etc. across OEA modules)
    * If ```True``` - lands the general-module base-truth tables if they don't already exist, and generates Moodle test data based on these tables.
    * If ```False``` - uses the default, user-generated base-truth tables to generate Moodle test data.
  - *gen_activity*: boolean argument indicating whether to generate activity data.
  - *num_activities*: number of instances for course-level activities signals desired to be generated. Specifically this refers to the creation of:
    * Total \# of Assignments (1 assignment per randomly chosen course).
    * Total \# of Quizzes (1 quiz per randomly chosen course).
    * Total \# of Forums (1 forum per randomly chosen course).
    * Total \# of Lessons (1 lesson per randomly chosen course).
    * Total \# of Message conversations (1 conversations per randomly chosen course).

In [1]:
import logging
import random, decimal
from tokenize import Ignore
from faker import Faker
import pandas as pd
import datetime as dt
import numpy as np
from pyspark.sql import functions as F

class MoodleDataGen():
    def __init__(self, startdate='2022-01-03T00:00:00', enddate='2022-06-03T00:00:00'):
        self.startdate = startdate
        self.enddate = enddate
        
        self.faker = Faker('en_US')

        # set current datetime for rundate folder for writing out files
        currentDate = dt.datetime.now()
        self.currentDateTime = currentDate.strftime("%Y-%m-%d %H-%M-%S")

        # initialize dfs for each Moodle table to be generated
        assign = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'alwaysshowdescription':[],
            'nosubmissions':[],
            'submissiondrafts':[],
            'sendnotifications':[],
            'sendlatenotifications':[],
            'duedate':[],
            'allowsubmissionsfromdate':[],
            'grade':[],
            'timemodified':[],
            'requiresubmissionstatement':[],
            'completionsubmit':[],
            'cutoffdate':[],
            'gradingduedate':[],
            'teamsubmission':[],
            'requireallteammemberssubmit':[],
            'teamsubmissiongroupingid':[],
            'blindmarking':[],
            'hidegrader':[],
            'revealidentities':[],
            'attemptreopenmethod':[],
            'maxattempts':[],
            'markingworkflow':[],
            'markingallocation':[],
            'sendstudentnotifications':[],
            'preventsubmissionnotingroup':[],
            'activity':[],
            'activityformat':[],
            'timelimit':[],
            'submissionattachments':[]
        }
        self.moodle_assign = pd.DataFrame(assign, dtype=object)
        assign_grades = {
            'id':[],
            'assignment':[],
            'userid':[],
            'timecreated':[],
            'timemodified':[],
            'grader':[],
            'grade':[],
            'attemptnumber':[]
        }
        self.moodle_assign_grades = pd.DataFrame(assign_grades, dtype=object)
        assign_submission = {
            'id':[],
            'assignment':[],
            'userid':[],
            'timecreated':[],
            'timemodified':[],
            'timestarted':[],
            'status':[],
            'groupid':[],
            'attemptnumber':[],
            'latest':[]
        }
        self.moodle_assign_submission = pd.DataFrame(assign_submission, dtype=object)
        assignsubmission_file = {
            'id':[],
            'assignment':[],
            'submission':[],
            'numfiles':[]
        }
        self.moodle_assignsubmission_file = pd.DataFrame(assignsubmission_file, dtype=object)
        assign_user_mapping = {
            'id':[],
            'assignment':[],
            'userid':[]
        }
        self.moodle_assign_user_mapping = pd.DataFrame(assign_user_mapping, dtype=object)
        context = {
            'id':[],
            'contextlevel':[],
            'instanceid':[],
            'path':[],
            'depth':[],
            'locked':[]
        }
        self.moodle_context = pd.DataFrame(context, dtype=object)
        course = {
            'id':[],
            'category':[],
            'sortorder':[],
            'fullname':[],
            'shortname':[],
            'idnumber':[],
            'summary':[],
            'summaryformat':[],
            'format':[],
            'showgrades':[],
            'newsitems':[],
            'startdate':[],
            'enddate':[],
            'relativedatesmode':[],
            'marker':[],
            'maxbytes':[],
            'legacyfiles':[],
            'showreports':[],
            'visible':[],
            'visibleold':[],
            'downloadcontent':[],
            'groupmode':[],
            'groupmodeforce':[],
            'defaultgroupingid':[],
            'lang':[],
            'calendartype':[],
            'theme':[],
            'timecreated':[],
            'timemodified':[],
            'requested':[],
            'enablecompletion':[],
            'completionnotify':[],
            'cacherev':[],
            'originalcourseid':[],
            'showactivitydates':[],
            'showcompletionconditions':[]
        }
        self.moodle_course = pd.DataFrame(course, dtype=object)
        course_categories = {
            'id':[],
            'name':[],
            'idnumber':[],
            'description':[],
            'descriptionformat':[],
            'parent':[],
            'sortorder':[],
            'coursecount':[],
            'visible':[],
            'visibleold':[],
            'timemodified':[],
            'depth':[],
            'path':[],
            'theme':[]
        }
        self.moodle_course_categories = pd.DataFrame(course_categories, dtype=object)
        course_sections = {
            'id':[],
            'course':[],
            'section':[],
            'name':[],
            'summary':[],
            'summaryformat':[],
            'sequence':[],
            'visible':[],
            'availability':[],
            'timemodified':[]
        }
        self.moodle_course_sections = pd.DataFrame(course_sections, dtype=object)
        enrol = {
            'id':[],
            'enrol':[],
            'status':[],
            'courseid':[],
            'sortorder':[],
            'name':[],
            'enrolperiod':[],
            'enrolstartdate':[],
            'enrolenddate':[],
            'expirynotify':[],
            'expirythreshold':[],
            'notifyall':[],
            'password':[],
            'cost':[],
            'currency':[],
            'roleid':[],
            'customint1':[],
            'customint2':[],
            'customint3':[],
            'customint4':[],
            'customint5':[],
            'customint6':[],
            'customint7':[],
            'customint8':[],
            'customchar1':[],
            'customchar2':[],
            'customchar3':[],
            'customdec1':[],
            'customdec2':[],
            'customtext1':[],
            'customtext2':[],
            'customtext3':[],
            'customtext4':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_enrol = pd.DataFrame(enrol, dtype=object)
        forum = {
            'id':[],
            'course':[],
            'type':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'duedate':[],
            'cutoffdate':[],
            'assessed':[],
            'assesstimestart':[],
            'assesstimefinish':[],
            'scale':[],
            'grade_forum':[],
            'grade_forum_notify':[],
            'maxbytes':[],
            'maxattachments':[],
            'forcesubscribe':[],
            'trackingtype':[],
            'rsstype':[],
            'rssarticles':[],
            'timemodified':[],
            'warnafter':[],
            'blockafter':[],
            'blockperiod':[],
            'completiondiscussions':[],
            'completionreplies':[],
            'completionposts':[],
            'displaywordcount':[],
            'lockdiscussionafter':[]
        }
        self.moodle_forum = pd.DataFrame(forum, dtype=object)
        forum_discussions = {
            'id':[],
            'course':[],
            'forum':[],
            'name':[],
            'firstpost':[],
            'userid':[],
            'groupid':[],
            'assessed':[],
            'timemodified':[],
            'usermodified':[],
            'timestart':[],
            'timeend':[],
            'pinned':[],
            'timelocked':[]
        }
        self.moodle_forum_discussions = pd.DataFrame(forum_discussions, dtype=object)
        forum_grades = {
            'id':[],
            'forum':[],
            'itemnumber':[],
            'userid':[],
            'grade':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_forum_grades = pd.DataFrame(forum_grades, dtype=object)
        lesson = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'practice':[],
            'modattempts':[],
            'usepassword':[],
            'password':[],
            'dependency':[],
            'conditions':[],
            'grade':[],
            'custom':[],
            'ongoing':[],
            'usemaxgrade':[],
            'maxanswers':[],
            'maxattempts':[],
            'review':[],
            'nextpagedefault':[],
            'feedback':[],
            'minquestions':[],
            'maxpages':[],
            'timelimit':[],
            'retake':[],
            'activitylink':[],
            'mediafile':[],
            'mediaheight':[],
            'mediawidth':[],
            'mediaclose':[],
            'slideshow':[],
            'width':[],
            'height':[],
            'bgcolor':[],
            'displayleft':[],
            'displayleftif':[],
            'progressbar':[],
            'available':[],
            'deadline':[],
            'timemodified':[],
            'completionendreached':[],
            'completiontimespent':[],
            'allowofflineattempts':[]
        }
        self.moodle_lesson = pd.DataFrame(lesson, dtype=object)
        lesson_answers = {
            'id':[],
            'lessonid':[],
            'pageid':[],
            'jumpto':[],
            'grade':[],
            'score':[],
            'flags':[],
            'timecreated':[],
            'timemodified':[],
            'answer':[],
            'answerformat':[],
            'response':[],
            'responseformat':[]
        }
        self.moodle_lesson_answers = pd.DataFrame(lesson_answers, dtype=object)
        lesson_attempts = {
            'id':[],
            'lessonid':[],
            'pageid':[],
            'userid':[],
            'answerid':[],
            'retry':[],
            'correct':[],
            'useranswer':[],
            'timeseen':[]
        }
        self.moodle_lesson_attempts = pd.DataFrame(lesson_attempts, dtype=object)
        lesson_grades = {
            'id':[],
            'lessonid':[],
            'userid':[],
            'grade':[],
            'late':[],
            'completed':[]
        }
        self.moodle_lesson_grades = pd.DataFrame(lesson_grades, dtype=object)
        messages = {
            'id':[],
            'useridfrom':[],
            'conversationid':[],
            'subject':[],
            'fullmessage':[],
            'fullmessageformat':[],
            'fullmessagehtml':[],
            'smallmessage':[],
            'timecreated':[],
            'fullmessagetrust':[],
            'customdata':[]
        }
        self.moodle_messages = pd.DataFrame(messages, dtype=object)
        message_conversations = {
            'id':[],
            'type':[],
            'name':[],
            'convhash':[],
            'component':[],
            'itemtype':[],
            'itemid':[],
            'contextid':[],
            'enabled':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_message_conversations = pd.DataFrame(message_conversations, dtype=object)
        message_conversation_members = {
            'id':[],
            'conversationid':[],
            'userid':[],
            'timecreated':[]
        }
        self.moodle_message_conversation_members = pd.DataFrame(message_conversation_members, dtype=object)
        quiz = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'timeopen':[],
            'timeclose':[],
            'timelimit':[],
            'overduehandling':[],
            'graceperiod':[],
            'preferredbehavior':[],
            'canredoquestions':[],
            'attempts':[],
            'attemptonlast':[],
            'grademethod':[],
            'decimalpoints':[],
            'questiondecimalpoints':[],
            'reviewattempt':[],
            'reviewcorrectness':[],
            'reviewremarks':[],
            'reviewspecificfeedback':[],
            'reviewgeneralfeedback':[],
            'reviewrightanswer':[],
            'reviewoverallfeedback':[],
            'questionsperpage':[],
            'navmethod':[],
            'shuffleanswers':[],
            'sumgrades':[],
            'grade':[],
            'timecreated':[],
            'timemodified':[],
            'password':[],
            'subnet':[],
            'browsersecurity':[],
            'delay1':[],
            'delay2':[],
            'showuserpicture':[],
            'showblocks':[],
            'completionattemptsexhausted':[],
            'completionminattempts':[],
            'allowofflineattempts':[]
        }
        self.moodle_quiz = pd.DataFrame(quiz, dtype=object)
        quiz_attempts = {
            'id':[],
            'quiz':[],
            'userid':[],
            'attempt':[],
            'uniqueid':[],
            'layout':[],
            'currentpage':[],
            'preview':[],
            'state':[],
            'timestart':[],
            'timefinish':[],
            'timemodified':[],
            'timemodifiedoffline':[],
            'timecheckstate':[],
            'sumgrades':[],
            'gradednotificationsenttime':[]
        }
        self.moodle_quiz_attempts = pd.DataFrame(quiz_attempts, dtype=object)
        quiz_grades = {
            'id':[],
            'quiz':[],
            'userid':[],
            'grade':[],
            'timemodified':[]
        }
        self.moodle_quiz_grades = pd.DataFrame(quiz_grades, dtype=object)
        role = {
            'id':[],
            'name':[],
            'shortname':[],
            'description':[],
            'sortorder':[],
            'archetype':[]
        }
        self.moodle_role = pd.DataFrame(role, dtype=object)
        role_assignments = {
            'id':[],
            'roleid':[],
            'contextid':[],
            'userid':[],
            'timemodified':[],
            'modifierid':[],
            'component':[],
            'itemid':[],
            'sortorder':[]
        }
        self.moodle_role_assignments = pd.DataFrame(role_assignments, dtype=object)
        user = {
            'id':[],
            'auth':[],
            'confirmed':[],
            'policyagreed':[],
            'deleted':[],
            'suspended':[],
            'mnethostid':[],
            'username':[],
            'password':[],
            'idnumber':[],
            'firstname':[],
            'lastname':[],
            'email':[],
            'emailstop':[],
            'phone1':[],
            'phone2':[],
            'institution':[],
            'department':[],
            'address':[],
            'city':[],
            'country':[],
            'lang':[],
            'calendartype':[],
            'theme':[],
            'timezone':[],
            'firstaccess':[],
            'lastaccess':[],
            'lastlogin':[],
            'currentlogin':[],
            'lastip':[],
            'secret':[],
            'picture':[],
            'description':[],
            'descriptionformat':[],
            'mailformat':[],
            'maildigest':[],
            'maildisplay':[],
            'autosubscribe':[],
            'trackforums':[],
            'timecreated':[],
            'timemodified':[],
            'trustbitmask':[],
            'imagealt':[],
            'lastnamephonetic':[],
            'firstnamephonetic':[],
            'middlename':[],
            'alternatename':[],
            'moodlenetprofile':[]
        }
        self.moodle_user = pd.DataFrame(user, dtype=object)
        user_enrolments = {
            'id':[],
            'status':[],
            'enrolid':[],
            'userid':[],
            'timestart':[],
            'timeend':[],
            'modifierid':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_user_enrolments = pd.DataFrame(user_enrolments, dtype=object)

        self.studentroleid = self.faker.uuid4()
        self.modifieruserid = self.faker.uuid4() # NOTE: simulate single IT dept admin. modifying the Moodle data

    def genMoodle(self,startdate='2022-01-01T00:00:00',enddate='2022-06-01T00:00:00',use_general_module_base_truth=False,gen_activity=True,num_activities=5):
        self.startdate = startdate
        self.enddate = enddate
        self.use_general_module_base_truth = use_general_module_base_truth
        if use_general_module_base_truth:
            sourcepath = 'stage1/Transactional/test_data/v0.1/base_general_modules'
            if oea.path_exists(sourcepath):
                logger.info('General module base-truth tables already exist - delete the "base_general_modules" folder/directory if you want to replace these.')
            else:
                # manually delete and replace the general module base_truth_tables CSVs as needed
                logger.info('General module base-truth tables do not currently exist - landing in stage1/.../test_data/v0.1/base_general_modules/')
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/students.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_students', 'general_module_base_truth_students.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/schools.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_schools', 'general_module_base_truth_schools.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/courses.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_courses', 'general_module_base_truth_courses.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/sections.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_sections', 'general_module_base_truth_sections.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/enrollment.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_enrollment', 'general_module_base_truth_enrollment.csv', oea.SNAPSHOT_BATCH_DATA)
            self.students = oea.load_csv(sourcepath + '/base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + '/base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + '/base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + '/base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + '/base_enrollment/', header=True).toPandas()
            logger.info('Generating Moodle test data based on general module base-truth tables...')
        else:
            # expectation is that base_truth_tables exist
            sourcepath = 'stage1/Transactional/test_data/v0.1/'
            self.students = oea.load_csv(sourcepath + 'base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + 'base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + 'base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + 'base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + 'base_enrollment/', header=True).toPandas()
            logger.info('Generating Moodle test data based on user-generated base-truth tables...')
        # generate Moodle test data tables, based on base-truth tables
        self.genUser()
        self.genCourse()
        self.genCourseCategories()
        self.genRole()
        self.genContext()
        self.genRoleAssignments()
        self.genEnrol()
        self.genUserEnrolments()
        logger.info('Successfully generated Moodle SIS/rostering tables.')
        if gen_activity:
            logger.info('Now generating Moodle activity tables...')
            self.genAssign_tables(num_assigns=num_activities)
            self.genQuiz_tables(num_quizzes=num_activities)
            self.genForum_tables(num_forums=num_activities)
            self.genLesson_tables(num_lessons=num_activities)
            self.genMessage_tables(num_convos=num_activities)
            logger.info('Successfully generated Moodle activity tables (for assignments, quizzes, forums, lessons and messages).')
        logger.info('Finished Moodle generation.')

    def __get_daterange(self):
        daterange = []
        startdate = dt.datetime(2022,1,3)
        enddate = dt.datetime(2022,1,28)
        while(startdate < enddate):
            daterange.append(startdate)
            startdate = startdate + dt.timedelta(days=1)
        return daterange
    
    def genUser(self):
        # set base date for "lastaccess" field
        base_lastaccess = dt.datetime(2022,1,28)
        for index, student in self.students.iterrows():
            id = student['StudentID']
            auth = ''
            confirmed = 1
            policyagreed = 1
            deleted = 0
            suspended = 0
            mnethostid = 0 
            firstname = student['FirstName']
            lastname = student['LastName']
            username = f'{firstname}{lastname}'
            password = self.faker.uuid4()
            idnumber = '' # NOTE: unsure; likely should use this field for base-truth StudentID
            email = student['Email']
            emailstop = 0
            phone1 = student['Phone']
            phone2 = ''
            institution = student['SchoolName']
            department = student['Grade'] # NOTE: temp. using student grade
            if self.use_general_module_base_truth:
                address = '' # blank 
            else:
                address = student['Address']
            city = student['City']
            country = 'US'
            lang = 'en'
            calendartype = 'gregorian'
            theme = ''
            timezone = 'PST'
            firstaccess = self.startdate + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59)) # NOTE: schema says BIGINT columntype, but unsure how this should be formatted
            lastaccess = base_lastaccess + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59)) # ^
            lastlogin = lastaccess - dt.timedelta(hours=random.randint(0,20),minutes=random.randint(0,59)) # ^
            currentlogin = 0 # ^
            lastip = ''
            secret = ''
            picture = 0
            description = f'Student at {institution}'
            descriptionformat = 1
            mailformat = 1
            maildigest = 0
            maildisplay = 2
            autosubscribe = 1
            trackforums = 1
            timecreated = self.startdate
            timemodified = 0
            trustbitmask = 0
            imagealt = 'null'
            lastnamephonetic = 'null'
            firstnamephonetic = 'null'
            middlename = student['MiddleName']
            alternatename = 'null'
            moodlenetprofile = f'{firstname}{lastname}@moodle.net'
            self.moodle_user.loc[len(self.moodle_user.index)] = [id,auth,confirmed,policyagreed,deleted,suspended,mnethostid,username,password,idnumber,firstname,lastname,email,emailstop,phone1,phone2, \
                                                                institution,department,address,city,country,lang,calendartype,theme,timezone,firstaccess,lastaccess,lastlogin,currentlogin,lastip,secret, \
                                                                picture,description,descriptionformat,mailformat,maildigest,maildisplay,autosubscribe,trackforums,timecreated,timemodified,trustbitmask, \
                                                                imagealt,lastnamephonetic,firstnamephonetic,middlename,alternatename,moodlenetprofile]
        # NOTE: add the general modifier user (e.g. IT dept leader); not functional at the moment, since most other tables depend on this table being only students
        #id = self.modifieruserid
        #firstname = 'Jackson'
        #lastname = 'Burmeister'
        #username = f'{firstname}{lastname}'
        #email = f'{lastname}001@contoso.edu'
        #description = 'IT Department Leader for Contoso Univsersity'
        #self.moodle_user.loc[len(self.moodle_user.index)] = [self.modifieruserid,auth,confirmed,policyagreed,deleted,suspended,mnethostid,username,password,idnumber,firstname,lastname,email,emailstop,phone1,phone2, \
        #                                                        institution,department,address,city,country,lang,calendartype,theme,timezone,firstaccess,lastaccess,lastlogin,currentlogin,lastip,secret, \
        #                                                        picture,description,descriptionformat,mailformat,maildigest,maildisplay,autosubscribe,trackforums,timecreated,timemodified,trustbitmask, \
        #                                                        imagealt,lastnamephonetic,firstnamephonetic,middlename,alternatename,moodlenetprofile]
        self.writetofile('user', self.moodle_user)

    def genCourse(self):
        for index, section in self.sections.iterrows():
            id = section['SectionID']
            category = section['CourseID'] # NOTE: currently using this field for mapping to school/session/course(_category)
            sortorder = 0
            fullname = section['SectionName']
            shortname = section['CourseName']
            idnumber = '' # NOTE: unsure
            summary = 'null'
            summaryformat = 0
            format = 'topics' # NOTE: unsure
            showgrades = 1
            newsitems = 1
            startdate = self.startdate
            enddate = self.enddate
            relativedatesmode = 0 # NOTE: unsure
            marker = 0
            maxbytes = 0
            legacyfiles = 0
            showreports = 0
            visible = 1
            visibleold = 1
            downloadcontent = 'null'
            groupmode = 0 # NOTE: unsure
            groupmodeforce = 0
            defaultgroupingid = 0 # NOTE: unsure
            lang = 'en'
            calendartype = 'gregorian'
            theme = ''
            timecreated = self.startdate
            timemodified = 0
            requested = 0
            enablecompletion = 0 # NOTE: unsure
            completionnotify = 0
            cacherev = 0
            originalcourseid = section['CourseID'] # NOTE: unsure
            showactivitydates = 1
            showcompletionconditions = 0
            self.moodle_course.loc[len(self.moodle_course.index)] = [id,category,sortorder,fullname,shortname,idnumber,summary,summaryformat,format,showgrades,newsitems,startdate,enddate,relativedatesmode, \
                                                                    marker,maxbytes,legacyfiles,showreports,visible,visibleold,downloadcontent,groupmode,groupmodeforce,defaultgroupingid,lang,calendartype, \
                                                                    theme,timecreated,timemodified,requested,enablecompletion,completionnotify,cacherev,originalcourseid,showactivitydates,showcompletionconditions]
        self.writetofile('course', self.moodle_course)

    def genCourseCategories(self):
        # use the base-truth sections table to count number of classes per school, and per course category
        dfBT_sections = spark.createDataFrame(self.sections)
        dfBT_sections = dfBT_sections.select('SectionID','CourseID','SchoolID')
        dfSchool_classescount = dfBT_sections.groupBy('SchoolID').count()
        dfCourseCategory_classescount = dfBT_sections.groupBy('CourseID').count()
        # set a static idnumber for Spring 2022 semester session(s) per school
        spring2022_idnum = self.faker.uuid4()
        for index, school in self.schools.iterrows():
            # base level: one per school
            id = school['SchoolID']
            name = school['SchoolName']
            idnumber = school['SchoolID']
            description = 'School in the education system'
            descriptionformat = 1
            parent = ''
            sortorder = 0
            school_coursecount = dfSchool_classescount.filter(dfSchool_classescount['SchoolID'] == f'{school['SchoolID']}').collect()[0][4] # dummy variable to used in both level 0 and 1
            coursecount = school_coursecount
            visible = 1
            visibleold = 0
            timemodified = self.startdate
            depth = 0 # NOTE: currently using this field as a tracker for depth with respect to the path-level (school => level 0)
            path = f'{school['SchoolID']}'
            theme = ''
            self.moodle_course_categories.loc[len(self.moodle_course_categories.index)] = [id,name,idnumber,description,descriptionformat,parent,sortorder,coursecount,visible,visibleold,timemodified, \
                                                                    depth,path,theme]
            # level 1: one per school, per session (e.g. semester, trimester, etc.)
            id = self.faker.uuid4()
            name = 'Spring 2022'
            idnumber = f'{spring2022_idnum}'
            description = 'Session/Semester of the school in the education system'
            descriptionformat = 1
            parent = school['SchoolID']
            sortorder = 0
            coursecount = school_coursecount
            visible = 1
            visibleold = 0
            timemodified = self.startdate
            depth = 1 # NOTE: currently using this field as a tracker for depth with respect to the path-level (session => level 1)
            path = f'{school['SchoolID']}/Spring 2022'
            theme = ''
            self.moodle_course_categories.loc[len(self.moodle_course_categories.index)] = [id,name,idnumber,description,descriptionformat,parent,sortorder,coursecount,visible,visibleold,timemodified, \
                                                                    depth,path,theme]
        # scrape id of course_categories table currently, to link to categories via parent field
        dfSessionIDs = spark.createDataFrame(self.moodle_course_categories)
        dfSessionIDs = dfSessionIDs.select('id','depth','parent').filter(dfSessionIDs['depth']==1)
        for index, course in self.courses.iterrows():
            # level 2: one per school per section/class (i.e. class in moodle course table)
            id = course['CourseID']
            name = course['CourseName']
            idnumber = course['CourseID']
            description = 'Course category of a school session in the education system'
            descriptionformat = 1
            parent = dfSessionIDs.filter(dfSessionIDs['parent'] == f'{course['SchoolID']}').collect()[0][0]
            sortorder = 0
            coursecount = dfCourseCategory_classescount.filter(dfCourseCategory_classescount['CourseID'] == f'{course['CourseID']}').collect()[0][4]
            visible = 1 
            visibleold = 0
            timemodified = self.startdate
            depth = 2 # NOTE: currently using this field as a tracker for depth with respect to the path-level (course_category => level 2)
            path = f'{course['SchoolID']}/Spring 2022/{course['CourseID']}'
            theme = '' 
            self.moodle_course_categories.loc[len(self.moodle_course_categories.index)] = [id,name,idnumber,description,descriptionformat,parent,sortorder,coursecount,visible,visibleold,timemodified, \
                                                                    depth,path,theme]
        self.writetofile('course_categories', self.moodle_course_categories)

    def genRole(self):
        id = self.studentroleid
        name = 'Student'
        shortname = '' # NOTE: unsure for this field
        description = 'Student in the education system'
        sortorder = 0 # NOTE: unsure
        archetype = '' # NOTE: unsure
        self.moodle_role.loc[len(self.moodle_role.index)] = [id,name,shortname,description,sortorder,archetype]
        # instructor data is not generated as of now.
        id = self.faker.uuid4()
        name = 'Instructor'
        shortname = ''
        description = 'Instructor in the education system'
        sortorder = 0
        archetype = ''
        self.moodle_role.loc[len(self.moodle_role.index)] = [id,name,shortname,description,sortorder,archetype]
        self.writetofile('role', self.moodle_role)
    
    def genContext(self):
        # GENERALLY UNSURE IF THIS TABLE GENERATION PROCESS ACCURATELY REFLECTS EXPECTED PROD. DATA.
        for index, school in self.schools.iterrows():
            # base level: one per school
            id = self.faker.uuid4()
            contextlevel = 0 
            instanceid = school['SchoolID'] 
            schoolname = school['SchoolName']
            path = f'{schoolname}' 
            depth = 2
            locked = 0
            self.moodle_context.loc[len(self.moodle_context.index)] = [id,contextlevel,instanceid,path,depth,locked]
            # level 1: one per school, per session (e.g. semester, trimester, etc.)
            id = self.faker.uuid4()
            contextlevel = 1 
            instanceid = school['SchoolID'] 
            path = f'{schoolname}/Spring 2022' 
            depth = 2
            locked = 0
            self.moodle_context.loc[len(self.moodle_context.index)] = [id,contextlevel,instanceid,path,depth,locked]
        for index, section in self.sections.iterrows():
            # level 2: one per school, per session, per section (i.e. class in moodle course table)
            id = self.faker.uuid4()
            contextlevel = 2
            instanceid = section['SchoolID']
            sectionname = section['SectionName']
            path = f'{schoolname}/Spring 2022/{sectionname}' 
            depth = 2
            locked = 0
            self.moodle_context.loc[len(self.moodle_context.index)] = [id,contextlevel,instanceid,path,depth,locked]
        self.writetofile('context', self.moodle_context)

    def genRoleAssignments(self):
        itemid = self.faker.uuid4() # NOTE: unsure if this is stagnant per role assignment; may need to use enrollment id per student per course instead
        # use context table to find context ID per student based on the school they attend
        dfContext = spark.createDataFrame(self.moodle_context)
        for index, student in self.students.iterrows():
            id = self.faker.uuid4()
            roleid = self.studentroleid
            contextid = dfContext.filter(dfContext['instanceid']==student['SchoolID']).filter(dfContext['contextlevel'] == 1).collect()[0][0]
            userid = student['StudentID']
            timemodified = self.startdate
            modifierid = self.modifieruserid
            component = '' # NOTE: unsure
            sortorder = 0
            self.moodle_role_assignments.loc[len(self.moodle_role_assignments.index)] = [id,roleid,contextid,userid,timemodified,modifierid,component,itemid,sortorder]
        self.writetofile('role_assignments', self.moodle_role_assignments)
    
    def genEnrol(self):
        for index, section in self.sections.iterrows():
            id = self.faker.uuid4()
            enrol = '' # NOTE: unsure about this field.
            status = 0
            courseid = section['SectionID']
            sortorder = 0
            name = 'Enrolled student' # NOTE: unsure
            enrolperiod = 'Spring 2022' # NOTE: unsure
            enrolstartdate = self.startdate
            enrolenddate = self.enddate
            expirynotify = 0
            expirythreshold = 0
            notifyall = 0
            password = self.faker.uuid4()
            cost = 'null' 
            currency = 'null'
            roleid = self.studentroleid
            customint1 = ''
            customint2 = ''
            customint3 = ''
            customint4 = ''
            customint5 = ''
            customint6 = ''
            customint7 = ''
            customint8 = ''
            customchar1 = ''
            customchar2 = ''
            customchar3 = ''
            customdec1 = 0
            customdec2 = 0
            customtext1 = 'null'
            customtext2 = 'null'
            customtext3 = 'null'
            customtext4 = 'null'
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_enrol.loc[len(self.moodle_enrol.index)] = [id,enrol,status,courseid,sortorder,name,enrolperiod,enrolstartdate,enrolenddate,expirynotify,expirythreshold,notifyall,password, \
                                                                    cost,currency,roleid,customint1,customint2,customint3,customint4,customint5,customint6,customint7,customint8,customchar1, \
                                                                    customchar2,customchar3,customdec1,customdec2,customtext1,customtext2,customtext3,customtext4,timecreated,timemodified]
        self.writetofile('enrol', self.moodle_enrol)

    def genUserEnrolments(self):
        # cast both base-truth enrollment and moodle_enrol tables to Spark, then join together for enrollment id per student in a class
        dfBT_enroll = spark.createDataFrame(self.enrollment)
        dfMoodle_enrol = spark.createDataFrame(self.moodle_enrol)
        dfMoodle_enrol = dfMoodle_enrol.select('id', 'courseid').withColumnRenamed('courseid', 'class_id')
        dfEnroll = dfMoodle_enrol.join(dfBT_enroll, dfMoodle_enrol.class_id == dfBT_enroll.SectionID, how='inner').drop('class_id')
        #dfEnroll = dfBT_enroll(dfMoodle_enrol, dfBT_enroll.SectionID == dfMoodle_enrol.class_id, how='left')
        enroll_joined = dfEnroll.toPandas()
        for index, enroll in enroll_joined.iterrows():
            id = self.faker.uuid4()
            status = 0 # NOTE: unsure - 0 means active participation according to doc.
            enrolid = enroll['id']
            userid = enroll['StudentID']
            timestart = self.startdate
            timeend = self.enddate
            modifierid = self.modifieruserid
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_user_enrolments.loc[len(self.moodle_user_enrolments.index)] = [id,status,enrolid,userid,timestart,timeend,modifierid,timecreated,timemodified]
        self.writetofile('user_enrolments', self.moodle_user_enrolments)

    def genAssign_tables(self,num_assigns=5):
        """This method generates 5 assign tables: assign_user_mapping, assign, assign_submission, assignsubmission_file, assign_grades"""
        n = num_assigns
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the assignment was assigned
            assign_day = random.choice(date_range)
            # randomly choose which course to generate the assignment for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course}')
            num_students_in_course = dfEnroll.count()
            # generate a fake assignment ID
            assign_id = self.faker.uuid4()
            # finally generate the assign tables
            self._genAssignUserMapping(assign_id,dfEnroll) 
            num_students_not_submitted,max_attempts = self._genAssign(assign_id,course,assign_day,num_students_in_course)
            num_students_submit = num_students_in_course - num_students_not_submitted # calculate num students to have submitted
            self._genAssignSubmission(assign_id,assign_day,num_students_submit,dfEnroll,max_attempts)
            self._genAssignSubmissionFile(assign_id)
            self._genAssignGrades(assign_id,assign_day)
            n = n - 1
        self.writetofile('assign_user_mapping', self.moodle_assign_user_mapping)
        self.writetofile('assign', self.moodle_assign)
        self.writetofile('assign_submission', self.moodle_assign_submission)
        self.writetofile('assignsubmission_file', self.moodle_assignsubmission_file)
        self.writetofile('assign_grades', self.moodle_assign_grades)

    def _genAssignUserMapping(self,assignid,dfEnroll):
        df_enroll = dfEnroll.toPandas()
        for index, enroll in df_enroll.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            userid = enroll['StudentID']
            self.moodle_assign_user_mapping.loc[len(self.moodle_assign_user_mapping.index)] = [id,assignment,userid]

    def _genAssign(self,assignid,courseid,assignday,num_students_in_course):
        # NOTE: generally unsure whether there's supposed to be one row per assignment
            # or whether there's one row per student connected to an assignment.
            # this code assumes the former.
        id = assignid
        course = courseid
        name = 'Assignment for Course' # NOTE: can be modified for unique names
        intro = 'This is an assignment for a course'
        introformat = 0
        alwaysshowdescription = 1
        nosubmissions = random.randint(0,num_students_in_course)
        submissiondrafts = 0
        sendnotifications = 0
        sendlatenotifications = 0
        allowsubmissionsfromdate = 0
        grade = 100
        timemodified = assignday
        duedate = timemodified + dt.timedelta(days=7)
        requiresubmissionstatement = 0
        completionsubmit = 0 # NOTE: unsure
        cutoffdate = timemodified + dt.timedelta(days=7)
        gradingduedate = timemodified + dt.timedelta(days=14)
        teamsubmission = 0
        requireallteammemberssubmit = 0
        teamsubmissiongroupingid = 0
        blindmarking = 0
        hidegrader = 0
        revealidentities = 0
        attemptreopenmethod = 'none'
        maxattempts = random.randint(1,3)
        markingworkflow = 0
        markingallocation = 0
        sendstudentnotifications = 0
        preventsubmissionnotingroup = 0
        activity = 'Assignment Progress for the Course'
        activityformat = 0
        timelimit = 0
        submissionattachments = 3
        self.moodle_assign.loc[len(self.moodle_assign.index)] = [id,course,name,intro,introformat,alwaysshowdescription,nosubmissions,submissiondrafts,sendnotifications,sendlatenotifications, \
                                                                    duedate,allowsubmissionsfromdate,grade,timemodified,requiresubmissionstatement,completionsubmit,cutoffdate,gradingduedate,teamsubmission, \
                                                                    requireallteammemberssubmit,teamsubmissiongroupingid,blindmarking,hidegrader,revealidentities,attemptreopenmethod,maxattempts,markingworkflow, \
                                                                    markingallocation,sendstudentnotifications,preventsubmissionnotingroup,activity,activityformat,timelimit,submissionattachments]
        return nosubmissions,maxattempts

    def _genAssignSubmission(self,assignid,assignday,num_students_submit,dfEnroll,maxattempts):
        # randomly sample the students enrolled in the course for assignment submissions
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            userid = student['StudentID']
            random_num_attempts = random.randint(1,maxattempts)
            assignment = assignid
            latest = random_num_attempts
            timecreated = assignday + dt.timedelta(days=random.randint(1,6),hours=random.randint(0,23),minutes=random.randint(0,59))
            if random_num_attempts == 1:
                # if only one attempt for the student, then simply generate that single submission
                timemodified = timecreated
                timestarted = timecreated - dt.timedelta(hours=random.randint(0,4),minutes=random.randint(0,59))
                id = self.faker.uuid4()
                status = 'SUBMITTED'
                groupid = 0
                attemptnumber = 1
                latest = random_num_attempts
                self.moodle_assign_submission.loc[len(self.moodle_assign_submission.index)] = [id,assignment,userid,timecreated,timemodified,timestarted,status,groupid,attemptnumber,latest]
            else:
            # create variables for keeping track of the last attempt from same student
                last_attempt_daytime = timecreated + dt.timedelta(days=random.randint(0,3),hours=random.randint(0,23),minutes=random.randint(0,59))
                previous_attempt = None
                # iterate through total number of attempts
                for n in range (1,random_num_attempts):
                    id = self.faker.uuid4()
                    status = 'SUBMITTED'
                    groupid = 0
                    attemptnumber = n
                    timemodified = last_attempt_daytime
                    if n == random_num_attempts:
                        timestarted = last_attempt_daytime
                    else:
                        if isinstance(previous_attempt, type(None)):
                            timestarted = timecreated + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                            while timestarted > timemodified:
                                timestarted = timecreated + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                        else:
                            timestarted = previous_attempt + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                            while timestarted > timemodified:
                                timestarted = previous_attempt + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                    previous_attempt = timestarted
                    self.moodle_assign_submission.loc[len(self.moodle_assign_submission.index)] = [id,assignment,userid,timecreated,timemodified,timestarted,status,groupid,attemptnumber,latest]

    def _genAssignSubmissionFile(self,assignid):
        for index, submission in self.moodle_assign_submission.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            submission = submission['id']
            numfiles = random.randint(1,3)
            self.moodle_assignsubmission_file.loc[len(self.moodle_assignsubmission_file.index)] = [id,assignment,submission,numfiles]
    
    def _genAssignGrades(self,assignid,assignday):
        # grab the latest submission for each student that submitted and grade them
        df_submissions = self.moodle_assign_submission.copy()
        df_submissions = df_submissions.loc[(df_submissions['attemptnumber']==df_submissions['latest']),['userid','latest']]
        # set any static fields
        assign_graded_datetime = assignday + dt.timedelta(days=random.randint(9,14),hours=random.randint(6,22),minutes=random.randint(0,59))
        for index, submission in df_submissions.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            userid = submission['userid']
            timecreated = assign_graded_datetime
            timemodified = assign_graded_datetime
            grader = 0 # NOTE: unsure
            #grade = '{}'.format(decimal.Decimal(random.randrange(4000, 10000))/100)
            grade = round(random.triangular(40.00,100.00,73.50))
            attemptnumber = submission['latest']
            self.moodle_assign_grades.loc[len(self.moodle_assign_grades.index)] = [id,assignment,userid,timecreated,timemodified,grader,grade,attemptnumber]
    
    def genQuiz_tables(self,num_quizzes=5):
        """This method generates 3 quiz tables: quiz, quiz_attempts and quiz_grades"""
        n = num_quizzes
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the quiz was assigned
            quiz_day = random.choice(date_range)
            # randomly choose which course to generate the quiz for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course}')
            num_students_in_course = dfEnroll.count()
            # generate a fake quiz ID
            quiz_id = self.faker.uuid4()
            # finally generate the quiz tables
            max_attempts = self._genQuiz(quiz_id,course,quiz_day) 
            self._genQuizAttempts(quiz_id,quiz_day,num_students_in_course,dfEnroll,max_attempts)
            self._genQuizGrades(quiz_id)
            n = n - 1
        self.writetofile('quiz', self.moodle_quiz)
        self.writetofile('quiz_attempts', self.moodle_quiz_attempts)
        self.writetofile('quiz_grades', self.moodle_quiz_grades)

    def _genQuiz(self,quizid,courseid,quizday):
        id = quizid
        course = courseid
        name = 'Quiz for Course' # NOTE: can be modified for unique names
        intro = 'This is a quiz for a course'
        introformat = 0
        timeopen = quizday
        timeclose = timeopen + dt.timedelta(days=1)
        timelimit = 0
        overduehandling = 'autoabandon' # NOTE: unsure
        graceperiod = 0
        preferredbehavior = ''
        canredoquestions = 0
        attempts = random.randint(1,3)
        attemptonlast = 0
        grademethod = 4 # NOTE: can be changed; currently set to graded as the last quiz attempt per student
        decimalpoints = 2
        questiondecimalpoints = 0
        reviewattempt = 0
        reviewcorrectness = 0
        reviewmarks = 0
        reviewspecificfeedback = 0
        reviewgeneralfeedback = 0
        reviewrightanswer = 0
        reviewoverallfeedback = 0
        questionsperpage = 20 # NOTE: unsure - currently using this to say there are a total of 20 questions
        navmethod = 'free'
        shuffleanswers = 1
        sumgrades = 100.00 # NOTE: unsure
        grade = 100.00
        timecreated = quizday - dt.timedelta(days=1,hours=random.randint(8,12))
        timemodified = quizday
        password = ''
        subnet = ''
        browsersecurity = 'securewindow'
        delay1 = 120
        delay2 = 120
        showuserpicture = 0 
        showblocks = 0
        completionattemptsexhausted = 0
        completionminattempts = 0
        allowofflineattempts = 0
        self.moodle_quiz.loc[len(self.moodle_quiz.index)] = [id,course,name,intro,introformat,timeopen,timeclose,timelimit,overduehandling,graceperiod,preferredbehavior, \
                                                            canredoquestions,attempts,attemptonlast,grademethod,decimalpoints,questiondecimalpoints,reviewattempt,reviewcorrectness, \
                                                            reviewmarks,reviewspecificfeedback,reviewgeneralfeedback,reviewrightanswer,reviewoverallfeedback,questionsperpage,navmethod, \
                                                            shuffleanswers,sumgrades,grade,timecreated,timemodified,password,subnet,browsersecurity,delay1,delay2,showuserpicture, \
                                                            showblocks,completionattemptsexhausted,completionminattempts,allowofflineattempts]
        return attempts
    
    def _genQuizAttempts(self,quizid,quizday,num_students_in_course,dfEnroll,maxattempts):
        # randomly sample the students enrolled in the course for quiz submissions
        num_students_submit = random.randint(0,num_students_in_course)
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            quiz = quizid
            userid = student['StudentID']
            layout = ''
            currentpage = 0
            preview = 0
            timemodifiedoffline = 0
            timecheckstate = 0
            gradednotificationsenttime = ''
            random_num_attempts = random.randint(1,maxattempts)
            if random_num_attempts == 1:
                # if only one attempt for the student, then simply generate that single submission
                id = self.faker.uuid4()
                uniqueid = self.faker.uuid4()
                state = 'finished'
                attempt = 1
                timestart = quizday + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                timemodified = timefinish
                sumgrades = round(random.triangular(35.00,100.00,73.50), 2)
                self.moodle_quiz_attempts.loc[len(self.moodle_quiz_attempts.index)] = [id,quiz,userid,attempt,uniqueid,layout,currentpage,preview,state,timestart,timefinish,timemodified,timemodifiedoffline,timecheckstate,sumgrades,gradednotificationsenttime]
            else:
            # create variables for keeping track of the last attempt from same student
                last_attempt_daytime = quizday + dt.timedelta(hours=random.randint(20,23),minutes=random.randint(0,59))
                previous_attempt = None
                # iterate through total number of attempts
                for n in range(1,random_num_attempts):
                    id = self.faker.uuid4()
                    attempt = n
                    uniqueid = self.faker.uuid4()
                    state = 'finished'
                    timemodified = last_attempt_daytime
                    sumgrades = round(random.triangular(35.00,100.00,73.50), 2)
                    if n == random_num_attempts:
                        timestart = last_attempt_daytime - dt.timedelta(minutes=random.randint(0,20))
                        timefinish = last_attempt_daytime
                    else:
                        if isinstance(previous_attempt, type(None)):
                            timestart = quizday + dt.timedelta(hours=random.randint(6,20),minutes=random.randint(0,59))
                            timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                        else:
                            timestart = previous_attempt + dt.timedelta(hours=random.randint(0,2),minutes=random.randint(0,59))
                            timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                    previous_attempt = timefinish
                    self.moodle_quiz_attempts.loc[len(self.moodle_quiz_attempts.index)] = [id,quiz,userid,attempt,uniqueid,layout,currentpage,preview,state,timestart,timefinish,timemodified,timemodifiedoffline,timecheckstate,sumgrades,gradednotificationsenttime]

    def _genQuizGrades(self,quizid):
        # grab the latest quiz attempt for each student that submitted and use that grade as the overall student grade
        df_attempts = self.moodle_quiz_attempts.copy()
        df_attempts = df_attempts.loc[(df_attempts['timemodified']==df_attempts['timefinish']),['userid','sumgrades','timemodified']]
        for index, attempt in df_attempts.iterrows():
            id = self.faker.uuid4()
            quiz = quizid
            userid = attempt['userid']
            grade = attempt['sumgrades']
            timemodified = attempt['timemodified']
            self.moodle_quiz_grades.loc[len(self.moodle_quiz_grades.index)] = [id,quiz,userid,grade,timemodified]

    def genForum_tables(self,num_forums=5):
        """This method generates 3 forum tables: forum, forum_discussions and forum_grades"""
        n = num_forums
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the forum was assigned
            forum_day = random.choice(date_range)
            # randomly choose which course to generate the forum for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course}')
            num_students_in_course = dfEnroll.count()
            # generate a fake forum ID
            forum_id = self.faker.uuid4()
            # finally generate the forum tables
            time_modified,complete_discuss,complete_replies = self._genForum(forum_id,course,forum_day,num_students_in_course)
            self._genForumDiscussions(forum_id,course,dfEnroll,time_modified,complete_discuss,complete_replies,date_range)
            self._genForumGrades(forum_id,time_modified)
            n = n - 1
        self.writetofile('forum', self.moodle_forum)
        self.writetofile('forum_discussions', self.moodle_forum_discussions)
        self.writetofile('forum_grades', self.moodle_forum_grades)

    def _genForum(self,forumid,courseid,forumgradedate,num_students_in_course):
        id = forumid
        course = courseid
        type = 'general'
        name = 'Forum for Course' # NOTE: can be modified for unique names
        intro = 'This is a forum for a course'
        introformat = 0
        duedate = forumgradedate # NOTE: currently set to end of semester
        cutoffdate = self.enddate
        assessed = num_students_in_course # NOTE: unsure if this should be representing how many are graded
        assesstimestart = forumgradedate + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59))
        assesstimefinish = assesstimestart + dt.timedelta(hours=random.randint(0,4),minutes=random.randint(0,59))
        scale = 1 # NOTE: unsure
        grade_forum = 1 # NOTE: unsure
        grade_forum_notify = 0
        maxbytes = 0
        maxattachments = 1
        forcesubscribe = 0
        trackingtype = 1
        rsstype = 0
        rssarticles = 0
        timemodified = assesstimefinish
        warnafter = duedate - dt.timedelta(hours=12)
        blockafter = self.enddate
        blockperiod = 0
        completiondiscussions = random.randint(1,3)
        completionreplies = random.randint(1,3)
        completionposts = completiondiscussions + completionreplies # NOTE: generated by the num posts and replies needed to be marked as complete
        displaywordcount = 0
        lockdiscussionafter = cutoffdate
        self.moodle_forum.loc[len(self.moodle_forum.index)] = [id,course,type,name,intro,introformat,duedate,cutoffdate,assessed,assesstimestart,assesstimefinish,scale,grade_forum, \
                                                            grade_forum_notify,maxbytes,maxattachments,forcesubscribe,trackingtype,rsstype,rssarticles,timemodified,warnafter,blockafter, \
                                                            blockperiod,completiondiscussions,completionreplies,completionposts,displaywordcount,lockdiscussionafter]
        return timemodified,completiondiscussions,completionreplies

    def _genForumDiscussions(self,forumid,courseid,dfEnroll,time_modified,complete_discuss,complete_replies,daterange):
        # currently set to all students have successfully completed the necessary posts/replies
        df_enroll = dfEnroll.toPandas()
        for index, student in df_enroll.iterrows():
            # assign static varibles per student submission(s)
            course = courseid
            forum = forumid
            firstpost = self.faker.uuid4()
            userid = student['StudentID']
            groupid = -1
            assessed = 1
            timemodified = time_modified
            pinned = 0
            timelocked = self.enddate
            # randomly set variable for the datetime the student added a post or reply
            last_post = random.choice(daterange)
            if complete_discuss == 1:
                id = firstpost
                name = 'post'
                timestart = last_post + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                timeend = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,59))
                usermodified = timeend
                self.moodle_forum_discussions.loc[len(self.moodle_forum_discussions.index)] = [id,course,forum,name,firstpost,userid,groupid,assessed,timemodified,usermodified,timestart,timeend,pinned,timelocked]
            if complete_replies == 1:
                id = self.faker.uuid4()
                name = 'reply'
                timestart = last_post + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                timeend = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,59))
                usermodified = timeend
                self.moodle_forum_discussions.loc[len(self.moodle_forum_discussions.index)] = [id,course,forum,name,firstpost,userid,groupid,assessed,timemodified,usermodified,timestart,timeend,pinned,timelocked]
            else:
                # iterate through adding student posts based on the completion requirement
                for n in range (1,complete_discuss):
                    name = 'post'
                    timestart = last_post + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                    timeend = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,59))
                    usermodified = timeend
                    if n == 1:
                        id = firstpost
                    else:
                        id = self.faker.uuid4()
                    last_post = timeend
                    self.moodle_forum_discussions.loc[len(self.moodle_forum_discussions.index)] = [id,course,forum,name,firstpost,userid,groupid,assessed,timemodified,usermodified,timestart,timeend,pinned,timelocked]
                # iterate through adding student replies based on the completion requirement
                for n in range (1,complete_replies):
                    id = self.faker.uuid4()
                    name = 'reply'
                    timestart = last_post + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                    timeend = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,59))
                    usermodified = timeend
                    self.moodle_forum_discussions.loc[len(self.moodle_forum_discussions.index)] = [id,course,forum,name,firstpost,userid,groupid,assessed,timemodified,usermodified,timestart,timeend,pinned,timelocked]

    def _genForumGrades(self,forumid,time_modified):
        # grade each student forum post/reply in moodle_forum_discussions table
        df_discuss = self.moodle_forum_discussions.copy()
        for index, discuss in df_discuss.iterrows():
            id = self.faker.uuid4()
            forum = forumid
            itemnumber = discuss['id']
            userid = discuss['userid']
            grade = round(random.triangular(60.00,100.00,77.50), 2)
            timecreated = discuss['timeend']
            timemodified = time_modified
            self.moodle_forum_grades.loc[len(self.moodle_forum_grades.index)] = [id,forum,itemnumber,userid,grade,timecreated,timemodified]

    def genLesson_tables(self,num_lessons=5):
        """This method generates the 4 lesson tables: lesson, lesson_attempts, lesson_answers and lesson_grades"""
        n = num_lessons
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the lesson was assigned
            lesson_day = random.choice(date_range)
            # randomly choose which course to generate the lesson for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course}')
            num_students_in_course = dfEnroll.count()
            # generate a fake lesson ID and static lesson page ID (currently set to one page per lesson)
            lesson_id = self.faker.uuid4()
            lesson_page_id = self.faker.uuid4()
            # finally generate the lesson tables
            max_attempts,min_questions = self._genLesson(lesson_id,course,lesson_day) 
            self._genLessonAnswers(lesson_id,lesson_page_id,min_questions,lesson_day)
            self._genLessonAttempts(lesson_id,lesson_page_id,dfEnroll,num_students_in_course,max_attempts,lesson_day)
            self._genLessonGrades(lesson_id,min_questions,dfEnroll)
            n = n - 1
        self.writetofile('lesson', self.moodle_lesson)
        self.writetofile('lesson_attempts', self.moodle_lesson_attempts)
        self.writetofile('lesson_answers', self.moodle_lesson_answers)
        self.writetofile('lesson_grades', self.moodle_lesson_grades)

    def _genLesson(self,lesson_id,courseid,lessonday):
        # NOTE: generally unsure whether there's supposed to be one row per lesson,
            # or whether there's one row per student connected to a lesson.
            # this code assumes the former.
        id = lesson_id
        course = courseid
        name = 'Lesson for Course' # NOTE: can be modified for unique names
        intro = 'This is a lesson for a course'
        introformat = 0
        practice = 0
        modattempts = 0 # NOTE: unsure
        usepassword = 0
        password = ''
        dependency = 0
        conditions = 'null'
        grade = 100.00
        custom = 0
        ongoing = 0
        usemaxgrade = 1 # NOTE: unsure
        maxanswers = 20
        maxattempts = random.randint(1,5)
        review = 0 # NOTE: unsure
        nextpagedefault = 0
        feedback = 1
        minquestions = random.randint(5,20)
        maxpages = 1
        timelimit = 0
        retake = 1 # NOTE: unsure
        activitylink = 0
        mediafile = '' # NOTE: unsure; local file path or full external URL
        mediaheight = 100
        mediawidth = 650
        mediaclose = 0
        slideshow = 0
        width = 640
        height = 480
        bgcolor = '#FFFFFF'
        displayleft = 0
        displayleftif = 0
        progressbar = 0
        available = lessonday
        deadline = lessonday + dt.timedelta(days=random.randint(7,21))
        timemodified = lessonday
        completionendreached = 0 # NOTE: unsure
        completiontimespent = 0 # NOTE: unsure
        allowofflineattempts = 0
        self.moodle_lesson.loc[len(self.moodle_lesson.index)] = [id,course,name,intro,introformat,practice,modattempts,usepassword,password,dependency,conditions,grade,custom,ongoing, \
                                                            usemaxgrade,maxanswers,maxattempts,review,nextpagedefault,feedback,minquestions,maxpages,timelimit,retake,activitylink,mediafile, \
                                                            mediaheight,mediawidth,mediaclose,slideshow,width,height,bgcolor,displayleft,displayleftif,progressbar,available,deadline, \
                                                            timemodified,completionendreached,completiontimespent,allowofflineattempts]
        return maxattempts,minquestions

    def _genLessonAnswers(self,lesson_id,page_id,min_questions,lessonday):
        # generally unsure whether minquestion field can be used in this way, and whether this table looks this way in production data
        for n in range(min_questions):
            id = self.faker.uuid4()
            lessonid = lesson_id
            pageid = page_id
            jumpto = 0
            grade = 1 # NOTE: unsure
            score = 1 # NOTE: unsure
            flags = 0
            timecreated = lessonday + dt.timedelta(hours=6,minutes=random.randint(0,59))
            timemodified = timecreated + dt.timedelta(hours=2,minutes=random.randint(0,59))
            answer = 'sample instructor answer for this lesson question'
            answerformat = 0
            response = 'sample correct response for this lesson question' # NOTE: unsure
            responseformat = 0
            self.moodle_lesson_answers.loc[len(self.moodle_lesson_answers.index)] = [id,lessonid,pageid,jumpto,grade,score,flags,timecreated,timemodified,answer,answerformat,response,responseformat]

    def _genLessonAttempts(self,lesson_id,page_id,dfEnroll,num_students_in_course,maxattempts,lessonday):
        # randomly sample the students enrolled in the course for lesson attempts
        num_students_submit = random.randint(0,num_students_in_course)
        df_enroll = dfEnroll.toPandas()
        df_students_attempt = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_attempt.iterrows():
            # assign static varibles per student attempt(s)
            lessonid = lesson_id
            pageid = page_id
            userid = student['StudentID']
            retry = 0
            useranswer = 'sample student response/answer to this lesson question'
            timeseen = lessonday + dt.timedelta(days=random.randint(0,8),hours=random.randint(10,23),minutes=(0,59))
            # randomly set variable for the number of attempts this student had on the lesson
            num_student_attempts = random.randint(1,maxattempts)
            if num_student_attempts == 1:
                id = self.faker.uuid4() # NOTE: assuming that id is static per student lesson attempt, with varied answerids per specific question answered
                for index, answer in self.moodle_lesson_answers.iterrows():
                    # iterate through all lesson questions/answers
                    answerid = answer['id']
                    chance_of_correct = random.choices([0,1], weights=[0.22,0.78])
                    correct = chance_of_correct[0]
                    self.moodle_lesson_attempts.loc[len(self.moodle_lesson_attempts.index)] = [id,lessonid,pageid,userid,answerid,retry,correct,useranswer,timeseen]
            else:
                # iterate through number of student lesson attempts
                for n in range(1,num_student_attempts):
                    id = self.faker.uuid4() # NOTE: assuming that id is static per student lesson attempt, with varied answerids per specific question answered
                    for index, answer in self.moodle_lesson_answers.iterrows():
                        # iterate through all lesson question/answers
                        answerid = answer['id']
                        chance_of_correct = random.choices([0,1], weights=[0.22,0.78])
                        correct = chance_of_correct[0]
                        self.moodle_lesson_attempts.loc[len(self.moodle_lesson_attempts.index)] = [id,lessonid,pageid,userid,answerid,retry,correct,useranswer,timeseen]
    
    def _genLessonGrades(self,lesson_id,minquestions,dfEnroll):
        # manipulate the lesson_attempts table to extract the max lesson grade of each student (out of 100.00)
        dfMoodle_lattempts = spark.createDataFrame(self.moodle_lesson_attempts)
        dfMoodle_lattempts = dfMoodle_lattempts.withColumn('numQuestions', F.lit(1))
        dfMoodle_lattempts = dfMoodle_lattempts.groupBy('id','lessonid','userid').sum('correct','numQuestions')
        dfMoodle_lattempts = dfMoodle_lattempts.withColumnRenamed('sum(correct)','total_correct').withColumnRenamed('sum(numQuestions)','total_questions').withColumn('grade', F.col('total_correct')/F.col('total_questions')*100)
        dfMoodle_lattempts = dfMoodle_lattempts.withColumn('grade', F.round(F.col('grade'), 2))
        dfMoodle_lattempts = dfMoodle_lattempts.groupBy('userid').max('grade')
        dfFinal = dfEnroll.join(dfMoodle_lattempts, dfEnroll.StudentID == dfMoodle_lattempts.userid,how='left')
        dfFinal = dfFinal.na.fill(value=0,subset=["max(grade)"])
        df_lesson_attempts_summary = dfMoodle_lattempts.toPandas()
        for index, student in df_lesson_attempts_summary.iterrows():
            id = self.faker.uuid4()
            lessonid = lesson_id
            #userid = student['StudentID']
            userid = student['userid']
            grade = student['max(grade)']
            late = 0
            if grade == 0:
                completed = 0
            else:
                completed = 1 
            self.moodle_lesson_grades.loc[len(self.moodle_lesson_grades.index)] = [id,lessonid,userid,grade,late,completed]

    def genMessage_tables(self,num_convos=5):
        """This method generates 3 class-wide message tables: messages, message_conversations and message_conversation_members"""
        n = num_convos
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the message was sent to the class
            message_day = random.choice(date_range)
            # randomly choose which course to generate the message for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course}')
            num_students_in_course = dfEnroll.count()
            # find course name
            course_name = dfEnroll.groupBy('SectionName').count().collect()[0][0]
            # find context ID for message
            dfBT_sections = spark.createDataFrame(self.sections)
            school_id = dfBT_sections.filter(dfBT_sections['SectionID']==f'{course}').collect()[0][5]
            dfContext = spark.createDataFrame(self.moodle_context)
            dfContext = dfContext.filter(dfContext['instanceid']==f'{school_id}').filter(dfContext['contextlevel']==2).filter(dfContext['path'].endswith(f'{course_name}'))
            context_id = dfContext.collect()[0][0]
            # generate a fake message ID
            message_id = self.faker.uuid4()
            # finally generate the message tables
            first_message_datetime = self._genMessageConversations(message_id,course_name,course,message_day,context_id) 
            self._genMessageConversationMembers(message_id,dfEnroll,first_message_datetime)
            self._genMessages(message_id,dfEnroll,first_message_datetime,course_name)
            n = n - 1
        self.writetofile('message_conversations', self.moodle_message_conversations)
        self.writetofile('message_conversation_members', self.moodle_message_conversation_members)
        self.writetofile('messages', self.moodle_messages)
    
    def _genMessageConversations(self,messageid,coursename,courseid,messageday,contextid):
        # NOTE: generally unsure whether there's supposed to be one row per message threads,
            # or whether there's one row per student message to the conversation thread.
            # this code assumes the former.
        id = f'{messageid}'
        type = 1 # NOTE: unsure
        name = f'Message to {coursename}' # NOTE: unsure
        convhash = ''
        component = 'conversations' # NOTE: unsure
        itemtype = 'Message conversation in a course'
        itemid = f'{courseid}' # NOTE: unsure
        contextid = f'{contextid}'
        enabled = 0
        timecreated = messageday + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
        timemodified = timecreated + dt.timedelta(days=random.randint(0,7),hours=random.randint(0,23),minutes=random.randint(0,59))
        self.moodle_message_conversations.loc[len(self.moodle_message_conversations.index)] = [id,type,name,convhash,component,itemtype,itemid,contextid,enabled,timecreated,timemodified]
        return timecreated

    def _genMessageConversationMembers(self,messageid,dfEnroll,first_message_datetime):
        df_convo_members = dfEnroll.toPandas()
        for index, student in df_convo_members.iterrows():
            id = self.faker.uuid4()
            conversationid = f'{messageid}'
            userid = student['StudentID']
            timecreated = first_message_datetime
            self.moodle_message_conversation_members.loc[len(self.moodle_message_conversation_members.index)] = [id,conversationid,userid,timecreated]

    def _genMessages(self,messageid,dfEnroll,first_message_datetime,coursename):
        # randomly choose how many messages will be sent in this conversation thread
        num_messages = random.randint(1,5)
        df_enroll = dfEnroll.toPandas()
        df_students_messaging = df_enroll.sample(n=num_messages)
        # add dynamic variables per message added to a conversation
        counter = 0
        for index, student in df_students_messaging.iterrows():
            id = self.faker.uuid4()
            useridfrom = student['StudentID']
            conversationid = f'{messageid}'
            subject = f'Question for {coursename}' 
            if counter == 0:
                fullmessage = 'Sample question for the course by a student'
                timecreated = first_message_datetime
                smallmessage = 'Question by student'
            else:
                fullmessage = 'Sample response or answer to other students in the course'
                timecreated = last_message_sent + dt.timedelta(hours=random.randint(0,6),minutes=random.randint(0,59))
                smallmessage = 'Response by another student'
            fullmessageformat = 0
            fullmessagehtml = '' # NOTE: unsure
            fullmessagetrust = 0
            customdata = ''
            # update dynamic variables
            counter = counter + 1
            last_message_sent = timecreated
            self.moodle_messages.loc[len(self.moodle_messages.index)] = [id,useridfrom,conversationid,subject,fullmessage,fullmessageformat,fullmessagehtml,smallmessage,timecreated,fullmessagetrust,customdata]

    def writetofile(self,filename,dfout):
        # turns the pandas df into a pyspark df, and then writes out the generated tables to stage1
        genfilepath = 'stage1/Transactional/test_data/v0.1/moodle_gen/' + filename + '/snapshot_batch_data/rundate='+self.currentDateTime
        dfOutfile = spark.createDataFrame(dfout)
        dfOutfile = dfOutfile.na.drop('all')
        dfOutfile.coalesce(1).write.save(oea.to_url(f'{genfilepath}'), format='csv', mode='overwrite', header='true', mergeSchema='true')