# Test Data Generation: Moodle Tables Class

**Affiliation**: *Kwantum Edu Analytics*. **Last Modified**: *3/9/2023*.

This OEA test data generation class notebook generates fictitous Moodle tables, as seen in the Moodle module. This notebook is needed to successfully run the moodle_test_data_gen_demo notebook.

For reference of all Moodle outlined below, see Moodle table schemas here: https://www.examulator.com/er/4.0/

This class notebook primarily leans on the use of the OEA_py class notebook, ```Faker``` and ```random``` python packages, and already-generated base-truth tables to generate **27** Moodle module tables:

**CONSIDER ADDING MOODLE log TABLE? PER UNSW's USE**

 1. **assign** <- *method drafted*
 2. **assign_grades** <- *method drafted*
 3. **assign_submission** <- *method drafted*
 4. **assignsubmission_file** <- *method drafted*
 5. **assign_user_mapping** <- *method drafted*
 6. **context** <- *method drafted*
 7. **course** <- *method drafted*
 8. **course_categories** <- *method drafted*
 9. **course_sections** ... may remove.
 10. **enrol** <- *method drafted*
 11. **forum**
 12. **forum_discussions**
 13. **forum_grades**
 14. **lesson**
 15. **lesson_answers**
 16. **lesson_attempts**
 17. **lesson_grades**
 18. **messages**
 19. **message_conversations**
 20. **message_conversation_members**
 21. **quiz** <- *method started*
 22. **quiz_attempts**
 23. **quiz_grades**
 24. **role** <- *method drafted*
 25. **role_assignments** <- *method drafted*
 26. **user** <- *method drafted*
 27. **user_enrolments** <- *method drafted*

There is one main method ```genMoodle(startdate, enddate, ed_level, gen_activity, num_activity_signals)``` to generate the tables described. Parameter descriptions are given:
  - *startdate*: roster start date.
  - *enddate*: roster end date.
  - *ed_level*: accepts k12 or hed - used for activity data generation.
  - *gen_activity*: boolean argument indicating whether to generate activity data.
  - *num_activity_signals*: number of rows for student-activity signals desired to be generated.


In [1]:
import logging
import random, decimal
from tokenize import Ignore
from faker import Faker
import pandas as pd
import datetime as dt
import numpy as np
from pyspark.sql import functions as F

class MoodleDataGen():
    def __init__(self, startdate='2022-01-03T00:00:00', enddate='2022-06-03T00:00:00'):
        self.startdate = startdate
        self.enddate = enddate
        
        self.faker = Faker('en_US')

        # set current datetime for rundate folder for writing out files
        currentDate = dt.datetime.now()
        self.currentDateTime = currentDate.strftime("%Y-%m-%d %H-%M-%S")

        # initialize dfs for each Moodle table to be generated
        assign = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'alwaysshowdescription':[],
            'nosubmissions':[],
            'submissiondrafts':[],
            'sendnotifications':[],
            'sendlatenotifications':[],
            'duedate':[],
            'allowsubmissionsfromdate':[],
            'grade':[],
            'timemodified':[],
            'requiresubmissionstatement':[],
            'completionsubmit':[],
            'cutoffdate':[],
            'gradingduedate':[],
            'teamsubmission':[],
            'requireallteammemberssubmit':[],
            'teamsubmissiongroupingid':[],
            'blindmarking':[],
            'hidegrader':[],
            'revealidentities':[],
            'attemptreopenmethod':[],
            'maxattempts':[],
            'markingworkflow':[],
            'markingallocation':[],
            'sendstudentnotifications':[],
            'preventsubmissionnotingroup':[],
            'activity':[],
            'activityformat':[],
            'timelimit':[],
            'submissionattachments':[]
        }
        self.moodle_assign = pd.DataFrame(assign, dtype=object)
        assign_grades = {
            'id':[],
            'assignment':[],
            'userid':[],
            'timecreated':[],
            'timemodified':[],
            'grader':[],
            'grade':[],
            'attemptnumber':[]
        }
        self.moodle_assign_grades = pd.DataFrame(assign_grades, dtype=object)
        assign_submission = {
            'id':[],
            'assignment':[],
            'userid':[],
            'timecreated':[],
            'timemodified':[],
            'timestarted':[],
            'status':[],
            'groupid':[],
            'attemptnumber':[],
            'latest':[]
        }
        self.moodle_assign_submission = pd.DataFrame(assign_submission, dtype=object)
        assignsubmission_file = {
            'id':[],
            'assignment':[],
            'submission':[],
            'numfiles':[]
        }
        self.moodle_assignsubmission_file = pd.DataFrame(assignsubmission_file, dtype=object)
        assign_user_mapping = {
            'id':[],
            'assignment':[],
            'userid':[]
        }
        self.moodle_assign_user_mapping = pd.DataFrame(assign_user_mapping, dtype=object)
        context = {
            'id':[],
            'contextlevel':[],
            'instanceid':[],
            'path':[],
            'depth':[],
            'locked':[]
        }
        self.moodle_context = pd.DataFrame(context, dtype=object)
        course = {
            'id':[],
            'category':[],
            'sortorder':[],
            'fullname':[],
            'shortname':[],
            'idnumber':[],
            'summary':[],
            'summaryformat':[],
            'format':[],
            'showgrades':[],
            'newsitems':[],
            'startdate':[],
            'enddate':[],
            'relativedatesmode':[],
            'marker':[],
            'maxbytes':[],
            'legacyfiles':[],
            'showreports':[],
            'visible':[],
            'visibleold':[],
            'downloadcontent':[],
            'groupmode':[],
            'groupmodeforce':[],
            'defaultgroupingid':[],
            'lang':[],
            'calendartype':[],
            'theme':[],
            'timecreated':[],
            'timemodified':[],
            'requested':[],
            'enablecompletion':[],
            'completionnotify':[],
            'cacherev':[],
            'originalcourseid':[],
            'showactivitydates':[],
            'showcompletionconditions':[]
        }
        self.moodle_course = pd.DataFrame(course, dtype=object)
        course_categories = {
            'id':[],
            'name':[],
            'idnumber':[],
            'description':[],
            'descriptionformat':[],
            'parent':[],
            'sortorder':[],
            'coursecount':[],
            'visible':[],
            'visibleold':[],
            'timemodified':[],
            'depth':[],
            'path':[],
            'theme':[]
        }
        self.moodle_course_categories = pd.DataFrame(course_categories, dtype=object)
        course_sections = {
            'id':[],
            'course':[],
            'section':[],
            'name':[],
            'summary':[],
            'summaryformat':[],
            'sequence':[],
            'visible':[],
            'availability':[],
            'timemodified':[]
        }
        self.moodle_course_sections = pd.DataFrame(course_sections, dtype=object)
        enrol = {
            'id':[],
            'enrol':[],
            'status':[],
            'courseid':[],
            'sortorder':[],
            'name':[],
            'enrolperiod':[],
            'enrolstartdate':[],
            'enrolenddate':[],
            'expirynotify':[],
            'expirythreshold':[],
            'notifyall':[],
            'password':[],
            'cost':[],
            'currency':[],
            'roleid':[],
            'customint1':[],
            'customint2':[],
            'customint3':[],
            'customint4':[],
            'customint5':[],
            'customint6':[],
            'customint7':[],
            'customint8':[],
            'customchar1':[],
            'customchar2':[],
            'customchar3':[],
            'customdec1':[],
            'customdec2':[],
            'customtext1':[],
            'customtext2':[],
            'customtext3':[],
            'customtext4':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_enrol = pd.DataFrame(enrol, dtype=object)
        forum = {
            'id':[],
            'course':[],
            'type':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'duedate':[],
            'cutoffdate':[],
            'assessed':[],
            'assesstimestart':[],
            'assesstimefinish':[],
            'scale':[],
            'grade_forum':[],
            'grade_forum_notify':[],
            'maxbytes':[],
            'maxattachments':[],
            'forcesubscribe':[],
            'trackingtype':[],
            'rsstype':[],
            'rssarticles':[],
            'timemodified':[],
            'warnafter':[],
            'blockafter':[],
            'blockperiod':[],
            'completiondiscussions':[],
            'completionreplies':[],
            'completionposts':[],
            'displaywordcount'[],
            'lockdiscussionafter':[]
        }
        self.moodle_forum = pd.DataFrame(forum, dtype=object)
        forum_discussions = {
            'id':[],
            'course':[],
            'forum':[],
            'name':[],
            'firstpost':[],
            'userid':[],
            'groupid':[],
            'assessed':[],
            'timemodified':[],
            'usermodified':[],
            'timestart':[],
            'timeend':[],
            'pinned':[],
            'timelocked':[]
        }
        self.moodle_forum_discussions = pd.DataFrame(forum_discussions, dtype=object)
        forum_grades = {
            'id':[],
            'forum':[],
            'itemnumber':[],
            'userid':[],
            'grade':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_forum_grades = pd.DataFrame(forum_grades, dtype=object)
        lesson = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'practice':[],
            'modattempts':[],
            'usepassword':[],
            'password':[],
            'dependency':[],
            'conditions':[],
            'grade':[],
            'custom':[],
            'ongoing':[],
            'usemaxgrade':[],
            'maxanswers':[],
            'maxattempts':[],
            'review':[],
            'nextpagedefault':[],
            'feedback':[],
            'minquestions':[],
            'maxpages':[],
            'timelimit':[],
            'retake':[],
            'activitylink':[],
            'mediafile':[],
            'mediaheight':[],
            'mediawidth':[],
            'mediaclose':[],
            'slideshow':[],
            'width':[],
            'height':[],
            'bgcolor':[],
            'displayleft':[],
            'displayleftif':[],
            'progressbar':[],
            'available':[],
            'deadline':[],
            'timemodified':[],
            'completionendreached':[],
            'completiontimespent':[],
            'allowofflineattempts':[]
        }
        self.moodle_lesson = pd.DataFrame(lesson, dtype=object)
        lesson_answers = {
            'id':[],
            'lessonid':[],
            'pageid':[],
            'jumpto':[],
            'grade':[],
            'score':[],
            'flags':[],
            'timecreated':[],
            'timemodified':[],
            'answer':[],
            'answerformat':[],
            'response':[]
        }
        self.moodle_lesson_answers = pd.DataFrame(lesson_answers, dtype=object)
        lesson_attempts = {
            'id':[],
            'lessonid':[],
            'pageid':[],
            'userid':[],
            'answerid':[],
            'retry':[],
            'correct':[],
            'useranswer':[],
            'timeseen':[]
        }
        self.moodle_lesson_attempts = pd.DataFrame(lesson_attempts, dtype=object)
        lesson_grades = {
            'id':[],
            'lessonid':[],
            'userid':[],
            'grade':[],
            'late':[],
            'completed':[]
        }
        self.moodle_lesson_grades = pd.DataFrame(lesson_grades, dtype=object)
        messages = {
            'id':[],
            'useridfrom':[],
            'conversationid':[],
            'subject':[],
            'fullmessage':[],
            'fullmessageformat':[],
            'fullmessagehtml':[],
            'smallmessage':[],
            'timecreated':[],
            'fullmessagetrust':[],
            'customdata':[]
        }
        self.moodle_messages = pd.DataFrame(messages, dtype=object)
        message_conversations = {
            'id':[],
            'type':[],
            'name':[],
            'convhash':[],
            'component':[],
            'itemtype':[],
            'itemid':[],
            'contextid':[],
            'enabled':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_message_conversations = pd.DataFrame(message_conversations, dtype=object)
        message_conversation_members = {
            'id':[],
            'conversationid':[],
            'userid':[],
            'timecreated':[]
        }
        self.moodle_message_conversation_members = pd.DataFrame(message_conversation_members, dtype=object)
        quiz = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'timeopen':[],
            'timeclose':[],
            'timelimit':[],
            'overduehandling':[],
            'graceperiod':[],
            'preferredbehavior':[],
            'canredoquestions':[],
            'attempts':[],
            'attemptonlast':[],
            'grademethod':[],
            'decimalpoints':[],
            'questiondecimalpoints':[],
            'reviewattempt':[],
            'reviewcorrectness':[],
            'reviewremarks':[],
            'reviewspecificfeedback':[],
            'reviewgeneralfeedback':[],
            'reviewrightanswer':[],
            'reviewoverallfeedback':[],
            'questionsperpage':[],
            'navmethod':[],
            'shuffleanswers':[],
            'sumgrades':[],
            'grade':[],
            'timecreated':[],
            'timemodified':[],
            'password':[],
            'subnet':[],
            'browsersecurity':[],
            'delay1':[],
            'delay2':[],
            'showuserpicture':[],
            'showblocks':[],
            'completionattemptsexhausted':[],
            'completionminattempts':[],
            'allowofflineattempts':[]
        }
        self.moodle_quiz = pd.DataFrame(quiz, dtype=object)
        quiz_attempts = {
            'id':[],
            'quiz':[],
            'userid':[],
            'attempt':[],
            'uniqueid':[],
            'layout':[],
            'currentpage':[],
            'preview':[],
            'state':[],
            'timestart':[],
            'timefinish':[],
            'timemodified':[],
            'timemodifiedoffline':[],
            'timecheckstate':[],
            'sumgrades':[],
            'gradednotificationsenttime':[]
        }
        self.moodle_quiz_attempts = pd.DataFrame(quiz_attempts, dtype=object)
        quiz_grades = {
            'id':[],
            'quiz':[],
            'userid':[],
            'grade':[],
            'timemodified':[]
        }
        self.moodle_quiz_grades = pd.DataFrame(quiz_grades, dtype=object)
        role = {
            'id':[],
            'name':[],
            'shortname':[],
            'description':[],
            'sortorder':[],
            'archetype':[]
        }
        self.moodle_role = pd.DataFrame(role, dtype=object)
        role_assignments = {
            'id':[],
            'roleid':[],
            'contextid':[],
            'userid':[],
            'timemodified':[],
            'modifierid':[],
            'component':[],
            'itemid':[],
            'sortorder':[]
        }
        self.moodle_role_assignments = pd.DataFrame(role_assignments, dtype=object)
        user = {
            'id':[],
            'auth':[],
            'confirmed':[],
            'policyagreed':[],
            'deleted':[],
            'suspended':[],
            'mnethostid':[],
            'username':[],
            'password':[],
            'idnumber':[],
            'firstname':[],
            'lastname':[],
            'email':[],
            'emailstop':[],
            'phone1':[],
            'phone2':[],
            'institution':[],
            'department':[],
            'address':[],
            'city':[],
            'country':[],
            'lang':[],
            'calendartype':[],
            'theme':[],
            'timezone':[],
            'firstaccess':[],
            'lastaccess':[],
            'lastlogin':[],
            'currentlogin':[],
            'lastip':[],
            'secret':[],
            'picture':[],
            'description':[],
            'descriptionformat':[],
            'mailformat':[],
            'maildigest':[],
            'maildisplay':[],
            'autosubscribe':[],
            'trackforums':[],
            'timecreated':[],
            'timemodified':[],
            'trustbitmask':[],
            'imagealt':[],
            'lastnamephonetic':[],
            'firstnamephonetic':[],
            'middlename':[],
            'alternatename':[],
            'moodlenetprofile':[]
        }
        self.moodle_user = pd.DataFrame(user, dtype=object)
        user_enrolments = {
            'id':[],
            'status':[],
            'enrolid':[],
            'userid':[],
            'timestart':[],
            'timeend':[],
            'modifierid':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_user_enrolments = pd.DataFrame(user_enrolments, dtype=object)

        sourcepath = 'stage1/Transactional/test_data/v0.1/'
        self.students = oea.load_csv(sourcepath + 'base_students/', header=True).toPandas()
        self.schools = oea.load_csv(sourcepath + 'base_schools/', header=True).toPandas()
        self.courses = oea.load_csv(sourcepath + 'base_courses/', header=True).toPandas()
        self.sections = oea.load_csv(sourcepath + 'base_sections/', header=True).toPandas()
        self.enrollment = oea.load_csv(sourcepath + 'base_enrollment/', header=True).toPandas()
        self.studentroleid = self.faker.uuid4()
        self.modifieruserid = self.faker.uuid4() # NOTE: simulate single IT dept admin. modifying the Moodle data
        # currently, only one instance/row of seasonal context (i.e. Spring) - check if this is valid
        self.contextid = self.faker.uuid4()

    def genMoodle(self,startdate='2022-01-01T00:00:00',enddate='2022-06-01T00:00:00', ed_level='k12', gen_activity=True,num_activities=5):
        self.edlevel = ed_level
        self.startdate = startdate
        self.enddate = enddate
        self.genUser()
        self.genCourse()
        self.genCourseCategories()
        self.genRole()
        self.genContext()
        self.genRoleAssignments()
        self.genEnrol()
        self.getUserEnrolments()
        if gen_activity == True:
            self.genAssign_tables(num_assigns=num_activities)
            self.gen
        logger.info('Finished Moodle generation.')

    def __get_daterange(self):
        daterange = []
        startdate = dt.datetime(2022,1,3)
        enddate = dt.datetime(2022,1,28)
        while(startdate < enddate):
            daterange.append(startdate)
            startdate = startdate + dt.timedelta(days=1)
        return daterange
    
    def genUser(self):
        for index, student in self.students.iterrows():
            id = student['StudentID']
            auth = ''
            confirmed = '1'
            policyagreed = '1'
            deleted = '0'
            suspended = '0'
            mnethostid = 0 
            firstname = student['FirstName']
            lastname = student['LastName']
            username = f'{firstname}{lastname}'
            password = self.faker.uuid4()
            idnumber = '' # NOTE: unsure
            email = student['Email']
            emailstop = 0
            phone1 = student['Phone']
            phone2 = ''
            institution = student['SchoolName']
            department = student['Grade'] # NOTE: temp. using student grade
            address = student['Address']
            city = student['City']
            country = 'US'
            lang = 'en'
            calendartype = 'gregorian'
            theme = ''
            timezone = 'PST'
            firstaccess = # NOTE: schema says BIGINT columntype, but unsure how this should be formatted
            lastaccess = # ^
            lastlogin = # ^
            currentlogin = # ^
            lastip = ''
            secret = ''
            picture = 0
            description = 'null'
            descriptionformat = 1
            mailformat = 1
            maildigest = 0
            maildisplay = 2
            autosubscribe = 1
            trackforums = 1
            timecreated = self.startdate
            timemodified = 0
            trustbitmask = 0
            imagealt = 'null'
            lastnamephonetic = 'null'
            firstnamephonetic = 'null'
            middlename = student['MiddleName']
            alternatename = 'null'
            moodlenetprofile = f'{firstname}{lastname}@moodle.net'
            self.moodle_user.loc[len(self.moodle_user.index)] = [id,auth,confirmed,policyagreed,deleted,suspended,mnethostid,username,password,idnumber,firstname,lastname,email,emailstop,phone1,phone2, \
                                                                institution,department,address,city,country,lang,calendartype,theme,timezone,firstaccess,lastaccess,lastlogin,currentlogin,lastip,secret, \
                                                                picture,description,descriptionformat,mailformat,maildigest,maildisplay,autosubscribe,trackforums,timecreated,timemodified,trustbitmask, \
                                                                imagealt,lastnamephonetic,firstnamephonetic,middlename,alternatename,moodlenetprofile]
        self.writetofile('user', self.moodle_user)

    def genCourse(self):
        for index, section in self.sections.iterrows():
            id = section['SectionID']
            category = section['CourseID'] # NOTE: unsure about this field. Could also use SectionSubject
            sortorder = 0
            fullname = section['SectionName']
            shortname = section['CourseName']
            idnumber = '' # NOTE: unsure
            summary = 'null'
            summaryformat = 0
            format = 'topics' # NOTE: unsure
            showgrades = 1
            newsitems = 1
            startdate = self.startdate
            enddate = self.enddate
            relativedatesmode = 0 # NOTE: unsure
            marker = 0
            maxbytes = 0
            legacyfiles = 0
            showreports = 0
            visible = 1
            visibleold = 1
            downloadcontent = 'null'
            groupmode = 0 # NOTE: unsure
            groupmodeforce = 0
            defaultgroupingid = 0 # NOTE: unsure
            lang = 'en'
            calendartype = 'gregorian'
            theme = ''
            timecreated = self.startdate
            timemodified = 0
            requested = 0
            enablecompletion = 0 # NOTE: unsure
            completionnotify = 0
            cacherev = 0
            originalcourseid = section['CourseID'] # NOTE: unsure
            showactivitydates = 1
            showcompletionconditions = 0
            self.moodle_course.loc[len(self.moodle_course.index)] = [id,category,sortorder,fullname,shortname,idnumber,summary,summaryformat,format,showgrades,newsitems,startdate,enddate,relativedatesmode, \
                                                                    marker,maxbytes,legacyfiles,showreports,visible,visibleold,downloadcontent,groupmode,groupmodeforce,defaultgroupingid,lang,calendartype, \
                                                                    theme,timecreated,timemodified,requested,enablecompletion,completionnotify,cacherev,originalcourseid,showactivitydates,showcompletionconditions]
        self.writetofile('course', self.moodle_course)

    def genCourseCategories(self):
        for index, course in self.courses.iterrows():
            id = course['CourseID']
            name = course['CourseName']
            idnumber = 'null' # NOTE: unsure about this field
            description = ''
            descriptionformat = 0
            parent = 0 # NOTE: unsure
            sortorder = 0
            coursecount = 0 # NOTE: find way to aggregate the sections base-truth table to count
            visible = 1 # NOTE: unsure
            visibleold = 0
            timemodified = self.startdate
            depth = 0 # NOTE: unsure
            path = '' # NOTE: unsure
            theme = 'null' 
            self.moodle_course_categories.loc[len(self.moodle_course_categories.index)] = [id,name,idnumber,description,descriptionformat,parent,sortorder,coursecount,visible,visibleold,timemodified, \
                                                                    depth,path,theme]
        self.writetofile('course_categories', self.moodle_course_categories)

    def genRole(self):
        id = self.studentroleid
        name = 'Student'
        shortname = '' # NOTE: unsure for this field
        description = 'Student in the education system'
        sortorder = 0 # NOTE: unsure
        archetype = '' # NOTE: unsure
        self.moodle_role.loc[len(self.moodle_role.index)] = [id,name,shortname,description,sortorder,archetype]
        # instructor data is not generated as of now.
        id = self.faker.uuid4()
        name = 'Instructor'
        shortname = ''
        description = 'Instructor in the education system'
        sortorder = 0
        archetype = ''
        self.moodle_role.loc[len(self.moodle_role.index)] = [id,name,shortname,description,sortorder,archetype]
        self.writetofile('role', self.moodle_role)
    
    def genContext(self):
        # GENERALLY UNSURE IF THIS TABLE GENERATION PROCESS ACCURATELY REFLECTS EXPECTED PROD. DATA.
        for index, school in self.schools.iterrows():
            # base level: one per school
            id = self.faker.uuid4()
            contextlevel = 0 
            instanceid = school['SchoolID'] 
            schoolname = school['SchoolName']
            path = f'{schoolname}' 
            depth = 1 
            locked = 0
            self.moodle_context.loc[len(self.moodle_context.index)] = [id,contextlevel,instanceid,path,depth,locked]
            # level 1: one per school /session (e.g. semester, trimester, etc.)
            id = self.faker.uuid4()
            contextlevel = 1 
            instanceid = school['SchoolID'] 
            path = f'{schoolname}/Spring 2022' 
            depth = 1 
            locked = 0
            self.moodle_context.loc[len(self.moodle_context.index)] = [id,contextlevel,instanceid,path,depth,locked]
        self.writetofile('context', self.moodle_context)

    def genRoleAssignments(self):
        itemid = self.faker.uuid4() # NOTE: unsure if this is stagnant per role assignment; may need to use enrollment id per student per course instead
        # use context table to find context ID per student based on the school they attend
        dfContext = spark.createDataFrame(self.moodle_context)
        for index, student in self.students.iterrows():
            id = self.faker.uuid4()
            roleid = self.studentroleid
            contextid = dfContext.filter(dfContext['instanceid']==student['SchoolID']).filter(dfContext['contextlevel'] == 1).collect()[0][0]
            userid = student['StudentID']
            timemodified = self.startdate
            modifierid = self.modifieruserid
            component = '' # NOTE: unsure
            sortorder = 0
            self.moodle_role_assignments.loc[len(self.moodle_role_assignments.index)] = [id,roleid,contextid,userid,timemodified,modifierid,component,itemid,sortorder]
        self.writetofile('role_assignments', self.moodle_role_assignments)
    
    def genEnrol(self):
        for index, section in self.sections.iterrows():
            id = self.faker.uuid4()
            enrol = '' # NOTE: unsure about this field.
            status = 0
            courseid = section['SectionID']
            sortorder = 0
            name = 'Enrolled student' # NOTE: unsure
            enrolperiod = 'null' # NOTE: unsure
            enrolstartdate = self.startdate
            enrolenddate = self.enddate
            expirynotify = 0
            expirythreshold = 0
            notifyall = 0
            password = self.faker.uuid4()
            cost = 'null' 
            currency = 'null'
            roleid = self.studentroleid
            customint1 = ''
            customint2 = ''
            customint3 = ''
            customint4 = ''
            customint5 = ''
            customint6 = ''
            customint7 = ''
            customint8 = ''
            customchar1 = ''
            customchar2 = ''
            customchar3 = ''
            customdec1 = 0
            customdec2 = 0
            customtext1 = 'null'
            customtext2 = 'null'
            customtext3 = 'null'
            customtext4 = 'null'
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_enrol.loc[len(self.moodle_enrol.index)] = [id,enrol,status,courseid,sortorder,name,enrolperiod,enrolstartdate,enrolenddate,expirynotify,expirythreshold,notifyall,password, \
                                                                    cost,currency,roleid,customint1,customint2,customint3,customint4,customint5,customint6,customint7,customint8,customchar1, \
                                                                    customchar2,customchar3,customdec1,customdec2,customtext1,customtext2,customtext3,customtext4,timecreated,timemodified]
        self.writetofile('enrol', self.moodle_enrol)

    def genUserEnrolments(self):
        # cast both base-truth enrollment and moodle_enrol tables to Spark, then join together for enrollment id per student in a class
        dfBT_enroll = spark.createDataFrame(self.enrollment)
        dfMoodle_enrol = spark.createDataFrame(self.moodle_enrol)
        dfMoodle_enrol = dfMoodle_enrol.select('id', 'courseid').withColumnRenamed('courseid', 'course_id')
        dfEnroll = dfMoodle_enrol(dfBT_enroll, dfMoodle_enrol.courseid == dfBT_enroll.SectionID, how='inner').drop('course_id')
        enroll_joined = dfEnroll.toPandas()
        for index, enroll in enroll_joined.iterrows():
            id = self.faker.uuid4()
            status = 0 # NOTE: unsure - 0 means active participation according to doc.
            enrolid = enroll['id']
            userid = enroll['StudentID']
            timestart = self.startdate
            timeend = self.enddate
            modifierid = self.modifieruserid
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_user_enrolments.loc[len(self.moodle_user_enrolments.index)] = [id,status,enrolid,userid,timestart,timeend,modifierid,timecreated,timemodified]
        self.writetofile('user_enrolments', self.moodle_user_enrolments)

    def genAssign_tables(self,num_assigns=5):
        """This method generates the 5 assign tables: assign_user_mapping, assign, assign_submission, assignsubmission_file, assign_grades"""
        n = num_assigns
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the assignment was assigned
            assign_day = random.choice(date_range)
            # randomly choose which course to generate the assignment for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0)
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == course)
            num_students_in_course = dfEnroll.count()
            # generate a fake assignment ID
            assign_id = self.faker.uuid4()
            # finally generate the assign tables
            self._genAssignUserMapping(assign_id,dfEnroll) 
            num_students_not_submitted,max_attempts = self._genAssign(assign_id,course,assign_day,num_students_in_course)
            num_students_submit = num_students_in_course - num_students_not_submitted # calculate num students to have submitted
            self._genAssignSubmission(assign_id,assign_day,num_students_submit,dfEnroll,max_attempts)
            self._genAssignSubmissionFile(assign_id)
            self._genAssignGrades(assign_id,assign_day)
            n = n - 1
        self.writetofile('assign_user_mapping', self.moodle_assign_user_mapping)
        self.writetofile('assign', self.moodle_assign)
        self.writetofile('assign_submission', self.moodle_assign_submission)
        self.writetofile('assignsubmission_file', self.moodle_assignsubmission_file)
        self.writetofile('assign_grades', self.moodle_assign_grades)

    def _genAssignUserMapping(self,assignid,dfEnroll):
        df_enroll = dfEnroll.toPandas()
        for index, enroll in df_enroll.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            userid = enroll['StudentID']
            self.moodle_assign_user_mapping.loc[len(self.moodle_assign_user_mapping.index)] = [id,assignment,userid]

    def _genAssign(self,assignid,courseid,assignday,num_students_in_course):
        # NOTE: generally unsure whether there's supposed to be one row per assignment
            # or whether there's one row per student connected to an assignment.
            # this code assumes the former.
        
        id = assignid
        course = courseid
        name = 'Assignment for Course' # NOTE: can be modified for unique names
        intro = 'This is an assignment for a course'
        introformat = 0
        alwaysshowdescription = 1
        nosubmissions = random.randint(0,num_students_in_course)
        submissiondrafts = 0
        sendnotifications = 0
        sendlatenotifications = 0
        allowsubmissionsfromdate = 0
        grade = 100
        timemodified = assignday
        duedate = timemodified + dt.timedelta(days=7)
        requiresubmissionstatement = 0
        completionsubmit = 0 # NOTE: unsure
        cutoffdate = timemodified + dt.timedelta(days=7)
        gradingduedate = timemodified + dt.timedelta(days=14)
        teamsubmission = 0
        requireallteammemberssubmit = 0
        teamsubmissiongroupingid = 0
        blindmarking = 0
        hidegrader = 0
        revealidentities = 0
        attemptreopenmethod = 'none'
        maxattempts = random.randint(1,3)
        markingworkflow = 0
        markingallocation = 0
        sendstudentnotifications = 0
        preventsubmissionnotingroup = 0
        activity = 'Assignment Progress for the Course'
        activityformat = 0
        timelimit = 0
        submissionattachments = 3
        self.moodle_assign.loc[len(self.moodle_assign.index)] = [id,course,name,intro,introformat,alwaysshowdescription,nosubmissions,submissiondrafts,sendnotifications,sendlatenotifications, \
                                                                    duedate,allowsubmissionsfromdate,grade,timemodified,requiresubmissionstatement,completionsubmit,cutoffdate,gradingduedate,teamsubmission, \
                                                                    requireallteammemberssubmit,teamsubmissiongroupingid,blindmarking,hidegrader,revealidentities,attemptreopenmethod,maxattempts,markingworkflow, \
                                                                    markingallocation,sendstudentnotifications,preventsubmissionnotingroup,activity,activityformat,timelimit,submissionattachments]
        return nosubmissions,maxattempts

    def _genAssignSubmission(self,assignid,assignday,num_students_submit,dfEnroll,maxattempts):
        # randomly sample the students enrolled in the course for assignment submissions
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            userid = student['StudentID']
            random_num_attempts = random.randint(1,maxattempts)
            assignment = assignid
            latest = random_num_attempts
            timecreated = assignday + dt.timedelta(days=random.randint(1,6),hours=random.randint(0,23),minutes=random.randint(0,59))
            if random_num_attempts == 1:
                # if only one attempt for the student, then simply generate that single submission
                timemodified = timecreated
                timestarted = timestarted
                id = self.faker.uuid4()
                status = 'SUBMITTED'
                groupid = 0
                attemptnumber = 1
                latest = random_num_attempts
                self.moodle_assign_submission.loc[len(self.moodle_assign_submission.index)] = [id,assignment,userid,timecreated,timemodified,timestarted,status,groupid,attemptnumber,latest]
            else:
            # create variables for keeping track of the last attempt from same student
                last_attempt_daytime = timecreated + dt.timedelta(days=random.randint(0,3),hours=random.randint(0,23),minutes=random.randint(0,59))
                # iterate through total number of attempts
                for n in range (1,random_num_attempts):
                    id = self.faker.uuid4()
                    status = 'SUBMITTED'
                    groupid = 0
                    attemptnumber = n
                    timemodified = last_attempt_daytime
                    if n == random_num_attempts:
                        timestarted = last_attempt_daytime
                    else:
                        if previous_attempt.isNull():
                            timestarted = timecreated + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                            while timestarted > timemodified:
                                timestarted = timecreated + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                        else:
                            timestarted = previous_attempt + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                            while timestarted > timemodified:
                                timestarted = previous_attempt + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                    previous_attempt = timestarted
                    self.moodle_assign_submission.loc[len(self.moodle_assign_submission.index)] = [id,assignment,userid,timecreated,timemodified,timestarted,status,groupid,attemptnumber,latest]

    def _genAssignSubmissionFile(self,assignid):
        for index, submission in self.moodle_assign_submission.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            submission = submission['id']
            numfiles = random.randint(1,3)
            self.moodle_assignsubmission_file.loc[len(self.moodle_assignsubmission_file.index)] = [id,assignment,submission,numfiles]
    
    def _genAssignGrades(self,assignid,assignday):
        # grab the latest submission for each student that submitted and grade them
        df_submissions = self.moodle_assign_submission.copy()
        df_submissions = df_submissions.loc[(df_submissions['attemptnumber']==df_submissions['latest']),['userid','latest']]
        # set any static fields
        assign_graded_datetime = assignday + dt.timedelta(days=(9,14),hours=(6,22),minutes=(0,59))
        for index, submission in df_submissions.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            userid = submission['userid']
            timecreated = assign_graded_datetime
            timemodified = assign_graded_datetime
            grader = 0 # NOTE: unsure
            grade = '{}'.format(decimal.Decimal(random.randrange(4000, 10000))/100)
            attemptnumber = submission['latest']
            self.moodle_assign_grades.loc[len(self.moodle_assign_grades.index)] = [id,assignment,userid,timecreated,timemodified,grader,grade,attemptnumber]
    

    def genQuiz_tables(self,num_quizzes=5):
        """This method generates the 3 quiz tables: quiz, quiz_attempts and quiz_grades"""
        q = num_quizzes
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the quiz was assigned
            quiz_day = random.choice(date_range)
            # randomly choose which course to generate the quiz for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0)
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == course)
            num_students_in_course = dfEnroll.count()
            # generate a fake quiz ID
            quiz_id = self.faker.uuid4()
            # finally generate the quiz tables
            self._genQuiz(quiz_id,course) 
            #
            n = n - 1
        self.writetofile('quiz', self.moodle_quiz)
        self.writetofile('quiz_attempts', self.moodle_quiz_attempts)
        self.writetofile('quiz_grades', self.moodle_quiz_grades)

    def _genQuiz(self,quizid,courseid):
        # 
        # NEEDS TO BE FINISHED
        id = quizid
        course = courseid
        name = 
        intro = 
        introformat = 
        timeopen = 
        timeclose = 
        timelimit = 
        overduehandling = 
        graceperiod = 
        preferredbehavior = 
        canredoquestions = 
        attempts =
        attemptonlast = 
        grademethod = 
        decimalpoints = 
        questiondecimalpoints = 
        reviewattempt = 
        reviewcorrectness = 
        reviewmarks =
        reviewspecificfeedback = 
        reviewgeneralfeedback = 
        reviewrightanswer = 
        reviewoverallfeedback = 
        questionsperpage = 
        navmethod = 
        shuffleanswers = 
        sumgrades =
        grade = 
        timecreated = 
        timemodified =
        password = 
        subnet = 
        browsersecurity = 
        delay1 = 
        delay2 = 
        showuserpicture = 
        showblocks = 
        completionattemptsexhausted = 
        completionminattempts = 
        allowofflineattempts = 0
        self.moodle_quiz.loc[len(self.moodle_quiz.index)] = [id,course,name,intro,introformat,timeopen,timeclose,timelimit,overduehandling,graceperiod,preferredbehavior, \
                                                            canredoquestions,attempts,attemptonlast,grademethod,decimalpoints,questiondecimalpoints,reviewattempt,reviewcorrectness, \
                                                            reviewmarks,reviewspecificfeedback,reviewgeneralfeedback,reviewrightanswer,reviewoverallfeedback,questionsperpage,navmethod, \
                                                            shuffleanswers,sumgrades,grade,timecreated,timemodified,password,subnet,browsersecurity,delay1,delay2,showuserpicture, \
                                                            showblocks,completionattemptsexhausted,completionminattempts,allowofflineattempts]
    
    def _genQuizAttempts(self):
        # 
        #
        for index, assign in self.moodle_assign.iterrows():
            id 
            assignment
            userid = 
            timecreated = 
            timemodified = 
            grader = 
            grade = 
            attemptnumber = 
            self.moodle_quiz_attempts.loc[len(self.moodle_quiz_attempts.index)] = [id,assignment,userid,timecreated,timemodified,grader,grade,attemptnumber]

    def writetofile(self,filename,dfout):
        # turns the pandas df into a pyspark df, and then writes out the generated tables to stage1
        genfilepath = 'stage1/Transactional/test_data/v0.1/moodle_gen/' + filename + '/snapshot_batch_data/rundate='+self.currentDateTime
        dfOutfile = spark.createDataFrame(dfout)
        dfOutfile = dfOutfile.na.drop('all')
        dfOutfile.coalesce(1).write.save(oea.to_url(f'{genfilepath}'), format='csv', mode='overwrite', header='true', mergeSchema='true')