# Test Data Generation: Moodle Roster Tables Class

**Affiliation**: *Kwantum Edu Analytics*. **Last Modified**: *5/9/2023*.

This OEA test data generation class notebook generates fictitous Moodle tables, as seen in the Moodle module. This notebook is needed to successfully run the moodle_test_data_gen_demo notebook.

For reference of all Moodle outlined below, see Moodle table schemas here: https://www.examulator.com/er/output/index.html

This class notebook primarily leans on the use of the OEA_py class notebook, ```Faker``` and ```random``` python packages, and already-generated base-truth tables to generate **8** Moodle module SIS/rostering tables:

 1. **cohort**
 2. **course**
 3. **course_categories**
 4. **enrol**
 5. **role**
 6. **role_assignments**
 7. **user** 
 8. **user_enrolments**
 9. **user_info_data** <- Not created at the moment
 10. **user_info_field** <- Not created at the moment

There is one main method ```genMoodleRoster(startdate, enddate, reportgendate, use_general_module_base_truth)``` to generate the tables described. Parameter descriptions are given:
  - *startdate*: roster start date.
  - *enddate*: roster end date.
  - *reportgendate*: date the report(s) were generated (i.e., fictitous date when all tables were landed in the data lake).
  - *use_general_module_base_truth*: boolean argument indicating whether to use the general-module base-truth tables (i.e., base-truth tables that link students, courses, etc. across OEA modules)
    * If ```True``` - lands the general-module base-truth tables if they don't already exist, and generates Moodle test data based on these tables.
    * If ```False``` - uses the default, user-generated base-truth tables to generate Moodle test data.

In [1]:
import logging
import random, decimal
from tokenize import Ignore
from faker import Faker
import pandas as pd
import datetime as dt
import numpy as np
from pyspark.sql import functions as F

class MoodleRosterDataGen():
    def __init__(self, startdate='2022-01-03T00:00:00', enddate='2022-06-03T00:00:00'):
        #self.startdate = startdate
        #self.enddate = enddate
        
        self.faker = Faker('en_US')

        # set current datetime for rundate folder for writing out files
        currentDate = dt.datetime.now()
        self.currentDateTime = currentDate.strftime("%Y-%m-%d %H-%M-%S")

        # initialize dfs for each Moodle table to be generated
        cohort = {
            'id':[],
            'contextid':[],
            'name':[],
            'idnumber':[],
            'description':[],
            'descriptionformat':[],
            'visible':[],
            'component':[],
            'timecreated':[],
            'timemodified':[],
            'theme':[]
        }
        self.moodle_cohort = pd.DataFrame(cohort, dtype=object)
        course = {
            'id':[],
            'category':[],
            'sortorder':[],
            'fullname':[],
            'shortname':[],
            'idnumber':[],
            'summary':[],
            'summaryformat':[],
            'format':[],
            'showgrades':[],
            'newsitems':[],
            'startdate':[],
            'enddate':[],
            'relativedatesmode':[],
            'marker':[],
            'maxbytes':[],
            'legacyfiles':[],
            'showreports':[],
            'visible':[],
            'visibleold':[],
            'downloadcontent':[],
            'groupmode':[],
            'groupmodeforce':[],
            'defaultgroupingid':[],
            'lang':[],
            'calendartype':[],
            'theme':[],
            'timecreated':[],
            'timemodified':[],
            'requested':[],
            'enablecompletion':[],
            'completionnotify':[],
            'cacherev':[],
            'originalcourseid':[],
            'showactivitydates':[],
            'showcompletionconditions':[]
        }
        self.moodle_course = pd.DataFrame(course, dtype=object)
        course_categories = {
            'id':[],
            'name':[],
            'idnumber':[],
            'description':[],
            'descriptionformat':[],
            'parent':[],
            'sortorder':[],
            'coursecount':[],
            'visible':[],
            'visibleold':[],
            'timemodified':[],
            'depth':[],
            'path':[],
            'theme':[]
        }
        self.moodle_course_categories = pd.DataFrame(course_categories, dtype=object)
        enrol = {
            'id':[],
            'enrol':[],
            'status':[],
            'courseid':[],
            'sortorder':[],
            'name':[],
            'enrolperiod':[],
            'enrolstartdate':[],
            'enrolenddate':[],
            'expirynotify':[],
            'expirythreshold':[],
            'notifyall':[],
            'password':[],
            'cost':[],
            'currency':[],
            'roleid':[],
            'customint1':[],
            'customint2':[],
            'customint3':[],
            'customint4':[],
            'customint5':[],
            'customint6':[],
            'customint7':[],
            'customint8':[],
            'customchar1':[],
            'customchar2':[],
            'customchar3':[],
            'customdec1':[],
            'customdec2':[],
            'customtext1':[],
            'customtext2':[],
            'customtext3':[],
            'customtext4':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_enrol = pd.DataFrame(enrol, dtype=object)
        role = {
            'id':[],
            'name':[],
            'shortname':[],
            'description':[],
            'sortorder':[],
            'archetype':[]
        }
        self.moodle_role = pd.DataFrame(role, dtype=object)
        role_assignments = {
            'id':[],
            'roleid':[],
            'contextid':[],
            'userid':[],
            'timemodified':[],
            'modifierid':[],
            'component':[],
            'itemid':[],
            'sortorder':[]
        }
        self.moodle_role_assignments = pd.DataFrame(role_assignments, dtype=object)
        user = {
            'id':[],
            'auth':[],
            'confirmed':[],
            'policyagreed':[],
            'deleted':[],
            'suspended':[],
            'mnethostid':[],
            'username':[],
            'password':[],
            'idnumber':[],
            'firstname':[],
            'lastname':[],
            'email':[],
            'emailstop':[],
            'phone1':[],
            'phone2':[],
            'institution':[],
            'department':[],
            'address':[],
            'city':[],
            'country':[],
            'lang':[],
            'calendartype':[],
            'theme':[],
            'timezone':[],
            'firstaccess':[],
            'lastaccess':[],
            'lastlogin':[],
            'currentlogin':[],
            'lastip':[],
            'secret':[],
            'picture':[],
            'description':[],
            'descriptionformat':[],
            'mailformat':[],
            'maildigest':[],
            'maildisplay':[],
            'autosubscribe':[],
            'trackforums':[],
            'timecreated':[],
            'timemodified':[],
            'trustbitmask':[],
            'imagealt':[],
            'lastnamephonetic':[],
            'firstnamephonetic':[],
            'middlename':[],
            'alternatename':[],
            'moodlenetprofile':[]
        }
        self.moodle_user = pd.DataFrame(user, dtype=object)
        user_enrolments = {
            'id':[],
            'status':[],
            'enrolid':[],
            'userid':[],
            'timestart':[],
            'timeend':[],
            'modifierid':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_user_enrolments = pd.DataFrame(user_enrolments, dtype=object)
        user_info_data = {
            'id':[],
            'userid':[],
            'fieldid':[],
            'data':[],
            'dataformat':[]
        }
        self.moodle_user_info_data = pd.DataFrame(user_info_data, dtype=object)
        user_info_field = {
            'id':[],
            'shortname':[],
            'name':[],
            'datatype':[],
            'description':[],
            'descriptionformat':[],
            'categoryid':[],
            'sortorder':[],
            'required':[],
            'locked':[],
            'visible':[],
            'forceunique':[],
            'signup':[],
            'defaultdata':[],
            'defaultdataformat':[],
            'param1':[],
            'param2':[],
            'param3':[],
            'param4':[],
            'param5':[]
        }
        self.moodle_user_info_field = pd.DataFrame(user_info_field, dtype=object)

        self.studentroleid = self.faker.uuid4()
        self.instructorroleid = self.faker.uuid4()
        self.contextid = self.faker.uuid4() # NOTE: single overarching context ID for the fictitous education system
        self.modifieruserid = self.faker.uuid4() # NOTE: simulate single IT dept admin. modifying the Moodle data

    def genMoodleRoster(self,startdate='2022-01-01T00:00:00',enddate='2022-06-01T00:00:00',reportgendate='2022-02-02T00:00:00',use_general_module_base_truth=False):
        self.startdate = dt.datetime.strptime(startdate, "%Y-%m-%dT%H:%M:%S")
        self.enddate = dt.datetime.strptime(enddate, "%Y-%m-%dT%H:%M:%S")
        self.reportdate = dt.datetime.strptime(reportgendate, "%Y-%m-%dT%H:%M:%S")
        self.use_general_module_base_truth = use_general_module_base_truth
        if use_general_module_base_truth:
            sourcepath = 'stage1/Transactional/test_data/v0.1/base_general_modules'
            if oea.path_exists(sourcepath):
                logger.info('General module base-truth tables already exist - delete the "base_general_modules" folder/directory if you want to replace these.')
            else:
                # manually delete and replace the general module base_truth_tables CSVs as needed
                logger.info('General module base-truth tables do not currently exist - landing in stage1/.../test_data/v0.1/base_general_modules/')
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/students.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_students', 'general_module_base_truth_students.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/schools.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_schools', 'general_module_base_truth_schools.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/courses.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_courses', 'general_module_base_truth_courses.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/sections.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_sections', 'general_module_base_truth_sections.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/enrollment.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_student_enrollment', 'general_module_base_truth_student_enrollment.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/instructors.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_instructors', 'general_module_base_truth_instructors.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/instructors_enroll.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_instructors_enroll', 'general_module_base_truth_instructors_enroll.csv', oea.SNAPSHOT_BATCH_DATA)
            # NOTE: if tables are not read in properly - you may need to rename the rundate folder to replace colons with hyphens
            self.students = oea.load_csv(sourcepath + '/base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + '/base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + '/base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + '/base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + '/base_student_enrollment/', header=True).toPandas()
            self.instructors = oea.load_csv(sourcepath + '/base_instructors/', header=True).toPandas()
            self.instructors_enroll = oea.load_csv(sourcepath + '/base_instructors_enroll/', header=True).toPandas()
            logger.info('Generating Moodle test data based on general module base-truth tables...')
        else:
            # expectation is that base_truth_tables exist
            sourcepath = 'stage1/Transactional/test_data/v0.1/'
            self.students = oea.load_csv(sourcepath + 'base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + 'base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + 'base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + 'base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + 'base_student_enrollment/', header=True).toPandas()
            self.instructors = oea.load_csv(sourcepath + 'base_instructors/', header=True).toPandas()
            self.instructors_enroll = oea.load_csv(sourcepath + 'base_instructors_enroll/', header=True).toPandas()
            logger.info('Generating Moodle test data based on user-generated base-truth tables...')
        # generate Moodle test data tables, based on base-truth tables
        self.genUser()
        self.genCourse()
        self.genCourseCategories()
        self.genRole()
        self.genCohort()
        self.genEnrol()
        self.genUserEnrolments()
        self.genRoleAssignments()
        logger.info('Successfully generated Moodle SIS/rostering tables.')
        logger.info('Finished Moodle generation.')

    def __get_daterange(self):
        daterange = []
        startdate = dt.datetime(2022,1,3)
        enddate = dt.datetime(2022,1,28)
        while(startdate < enddate):
            daterange.append(startdate)
            startdate = startdate + dt.timedelta(days=1)
        return daterange
    
    def genUser(self):
        # set base date for "lastaccess" field
        base_lastaccess = dt.datetime(2022,2,2)
        for index, student in self.students.iterrows():
            id = student['StudentID']
            auth = ''
            confirmed = 1
            policyagreed = 1
            deleted = 0
            suspended = 0
            mnethostid = 0 
            firstname = student['FirstName']
            lastname = student['LastName']
            username = f'{firstname}{lastname}'
            password = self.faker.uuid4()
            idnumber = student['StudentID'] # NOTE: unsure; likely should use this field for base-truth StudentID
            email = student['Email']
            emailstop = 0
            phone1 = student['Phone']
            phone2 = ''
            institution = student['SchoolName']
            department = student['Grade'] # NOTE: temp. using student grade
            if self.use_general_module_base_truth:
                address = '' # blank 
            else:
                address = student['Address']
            city = student['City']
            country = 'US'
            lang = 'en'
            calendartype = 'gregorian'
            theme = ''
            timezone = 'PST'
            firstaccess = self.startdate + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59)) # NOTE: schema says BIGINT columntype, but unsure how this should be formatted
            lastaccess = base_lastaccess + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59)) # ^
            lastlogin = lastaccess - dt.timedelta(hours=random.randint(0,20),minutes=random.randint(0,59)) # ^
            currentlogin = 0 # ^
            lastip = ''
            secret = ''
            picture = 0
            description = f'Student at {institution}'
            descriptionformat = 1
            mailformat = 1
            maildigest = 0
            maildisplay = 2
            autosubscribe = 1
            trackforums = 1
            timecreated = self.startdate
            timemodified = 0
            trustbitmask = 0
            imagealt = ''
            lastnamephonetic = ''
            firstnamephonetic = ''
            middlename = student['MiddleName']
            alternatename = ''
            moodlenetprofile = f'{firstname}{lastname}@moodle.net'
            self.moodle_user.loc[len(self.moodle_user.index)] = [id,auth,confirmed,policyagreed,deleted,suspended,mnethostid,username,password,idnumber,firstname,lastname,email,emailstop,phone1,phone2, \
                                                                institution,department,address,city,country,lang,calendartype,theme,timezone,firstaccess,lastaccess,lastlogin,currentlogin,lastip,secret, \
                                                                picture,description,descriptionformat,mailformat,maildigest,maildisplay,autosubscribe,trackforums,timecreated,timemodified,trustbitmask, \
                                                                imagealt,lastnamephonetic,firstnamephonetic,middlename,alternatename,moodlenetprofile]
        for index, instructor in self.instructors.iterrows():
            id = instructor['InstructorID']
            auth = ''
            confirmed = 1
            policyagreed = 1
            deleted = 0
            suspended = 0
            mnethostid = 0 
            firstname = instructor['FirstName']
            lastname = instructor['LastName']
            username = f'{firstname}{lastname}'
            password = self.faker.uuid4()
            idnumber = instructor['InstructorID'] # NOTE: unsure; likely should use this field for base-truth InstructorID
            email = instructor['Email']
            emailstop = 0
            phone1 = instructor['Phone']
            phone2 = ''
            institution = 'Contoso University'
            department = '' # NOTE: unsure
            if self.use_general_module_base_truth:
                address = '' # blank 
            else:
                address = instructor['Address']
            city = instructor['City']
            country = 'US'
            lang = 'en'
            calendartype = 'gregorian'
            theme = ''
            timezone = 'PST'
            firstaccess = self.startdate + dt.timedelta(hours=random.randint(0,20),minutes=random.randint(0,59))
            lastaccess = base_lastaccess + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59)) # ^
            lastlogin = lastaccess - dt.timedelta(hours=random.randint(0,20),minutes=random.randint(0,59)) # ^
            currentlogin = 0 # ^
            lastip = ''
            secret = ''
            picture = 0
            description = f'Professor at {institution}'
            descriptionformat = 1
            mailformat = 1
            maildigest = 0
            maildisplay = 2
            autosubscribe = 1
            trackforums = 1
            timecreated = self.startdate
            timemodified = 0
            trustbitmask = 0
            imagealt = ''
            lastnamephonetic = ''
            firstnamephonetic = ''
            middlename = instructor['MiddleName']
            alternatename = ''
            moodlenetprofile = f'{firstname}{lastname}@moodle.net'
            self.moodle_user.loc[len(self.moodle_user)] = [id,auth,confirmed,policyagreed,deleted,suspended,mnethostid,username,password,idnumber,firstname,lastname,email,emailstop,phone1,phone2, \
                                                                institution,department,address,city,country,lang,calendartype,theme,timezone,firstaccess,lastaccess,lastlogin,currentlogin,lastip,secret, \
                                                                picture,description,descriptionformat,mailformat,maildigest,maildisplay,autosubscribe,trackforums,timecreated,timemodified,trustbitmask, \
                                                                imagealt,lastnamephonetic,firstnamephonetic,middlename,alternatename,moodlenetprofile]
        self.writetofile('user', self.moodle_user)
        

    def genCourse(self):
        for index, section in self.sections.iterrows():
            id = section['SectionID']
            category = section['CourseID'] # NOTE: currently using this field for mapping to school/session/course(_category)
            sortorder = 0
            fullname = section['SectionName']
            shortname = section['CourseName']
            idnumber = '' # NOTE: unsure
            summary = 'Course in education system'
            summaryformat = 0
            format = 'topics' # NOTE: unsure
            showgrades = 1
            newsitems = 1
            startdate = self.startdate
            enddate = self.enddate
            relativedatesmode = 0 # NOTE: unsure
            marker = 0
            maxbytes = 0
            legacyfiles = 0
            showreports = 0
            visible = 1
            visibleold = 1
            downloadcontent = 'null'
            groupmode = 0 # NOTE: unsure
            groupmodeforce = 0
            defaultgroupingid = 0 # NOTE: unsure
            lang = 'en'
            calendartype = 'gregorian'
            theme = ''
            timecreated = self.startdate
            timemodified = 0
            requested = 0
            enablecompletion = 0 # NOTE: unsure
            completionnotify = 0
            cacherev = 0
            originalcourseid = section['SectionID'] # NOTE: unsure
            showactivitydates = 1
            showcompletionconditions = 0
            self.moodle_course.loc[len(self.moodle_course.index)] = [id,category,sortorder,fullname,shortname,idnumber,summary,summaryformat,format,showgrades,newsitems,startdate,enddate,relativedatesmode, \
                                                                    marker,maxbytes,legacyfiles,showreports,visible,visibleold,downloadcontent,groupmode,groupmodeforce,defaultgroupingid,lang,calendartype, \
                                                                    theme,timecreated,timemodified,requested,enablecompletion,completionnotify,cacherev,originalcourseid,showactivitydates,showcompletionconditions]
        self.writetofile('course', self.moodle_course)

    def genCourseCategories(self):
        # use the base-truth sections table to count number of classes per school, and per course category
        dfBT_sections = spark.createDataFrame(self.sections)
        dfBT_sections = dfBT_sections.select('SectionID','CourseID','SchoolID')
        dfSchool_classescount = dfBT_sections.groupBy('SchoolID').count()
        dfCourseCategory_classescount = dfBT_sections.groupBy('CourseID').count()
        # set a static idnumber for Spring 2022 semester session(s) per school
        spring2022_idnum = self.faker.uuid4()
        for index, course in self.courses.iterrows():
            # level 2: one per school per section/class (i.e. class in moodle course table)
            id = course['CourseID']
            name = course['CourseName']
            idnumber = course['CourseID']
            description = 'Course category of a school session in the education system'
            descriptionformat = 1
            parent = spring2022_idnum
            sortorder = 0
            coursecount = dfCourseCategory_classescount.filter(dfCourseCategory_classescount['CourseID'] == idnumber).collect()[0][1]
            visible = 1 
            visibleold = 0
            timemodified = self.startdate
            depth = 2 # NOTE: currently using this field as a tracker for depth with respect to the path-level (course_category => level 2)
            path = course['SchoolID']+'/Spring 2022/'+course['CourseID']
            theme = '' 
            self.moodle_course_categories.loc[len(self.moodle_course_categories.index)] = [id,name,idnumber,description,descriptionformat,parent,sortorder,coursecount,visible,visibleold,timemodified, \
                                                                    depth,path,theme]
        self.writetofile('course_categories', self.moodle_course_categories)

    def genRole(self):
        id = self.studentroleid
        name = 'Student'
        shortname = 'S' # NOTE: unsure for this field
        description = 'Student in the education system'
        sortorder = 0 # NOTE: unsure
        archetype = '' # NOTE: unsure
        self.moodle_role.loc[len(self.moodle_role.index)] = [id,name,shortname,description,sortorder,archetype]
        # instructor data is not generated as of now.
        id = self.instructorroleid
        name = 'Instructor'
        shortname = 'I'
        description = 'Instructor in the education system'
        sortorder = 0
        archetype = ''
        self.moodle_role.loc[len(self.moodle_role)] = [id,name,shortname,description,sortorder,archetype]
        self.writetofile('role', self.moodle_role)
    
    def genCohort(self):
        # GENERALLY UNSURE IF THIS TABLE GENERATION PROCESS ACCURATELY REFLECTS EXPECTED PROD. DATA.
        # That is, uncertain as to whether this is intended to be one row per school
        for index, school in self.schools.iterrows():
            id = self.faker.uuid4()
            contextid = self.contextid
            name = school['SchoolName']
            idnumber = school['SchoolID']
            description = f'School: {name}' 
            descriptionformat = ''
            visible = 1
            component = ''
            timecreated = self.startdate
            timemodified = self.reportdate
            theme = ''
            self.moodle_cohort.loc[len(self.moodle_cohort.index)] = [id,contextid,name,idnumber,description,descriptionformat,visible,component,timecreated,timemodified,theme]
        self.writetofile('cohort', self.moodle_cohort)
    
    def genEnrol(self):
        customint1 = ''
        customint2 = ''
        customint3 = ''
        customint4 = ''
        customint5 = ''
        customint6 = ''
        customint7 = ''
        customint8 = ''
        customchar1 = ''
        customchar2 = ''
        customchar3 = ''
        customdec1 = 0
        customdec2 = 0
        customtext1 = 'null'
        customtext2 = 'null'
        customtext3 = 'null'
        customtext4 = 'null'
        for index, section in self.sections.iterrows():
            id = self.faker.uuid4()
            enrol = '' # NOTE: unsure about this field.
            status = 0
            courseid = section['SectionID']
            sortorder = 0
            name = 'Enrolled student' # NOTE: unsure
            enrolperiod = 'Spring 2022' # NOTE: unsure
            enrolstartdate = self.startdate
            enrolenddate = self.enddate
            expirynotify = 0
            expirythreshold = 0
            notifyall = 0
            password = self.faker.uuid4()
            cost = '' 
            currency = ''
            roleid = self.studentroleid
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_enrol.loc[len(self.moodle_enrol.index)] = [id,enrol,status,courseid,sortorder,name,enrolperiod,enrolstartdate,enrolenddate,expirynotify,expirythreshold,notifyall,password, \
                                                                    cost,currency,roleid,customint1,customint2,customint3,customint4,customint5,customint6,customint7,customint8,customchar1, \
                                                                    customchar2,customchar3,customdec1,customdec2,customtext1,customtext2,customtext3,customtext4,timecreated,timemodified]
        # now iterate through adding teachers
        for index, section in self.sections.iterrows():
            id = self.faker.uuid4()
            enrol = '' # NOTE: unsure about this field.
            status = 0
            courseid = section['SectionID']
            sortorder = 0
            name = 'Enrolled instructor' # NOTE: unsure
            enrolperiod = 'Spring 2022' # NOTE: unsure
            enrolstartdate = self.startdate
            enrolenddate = self.enddate
            expirynotify = 0
            expirythreshold = 0
            notifyall = 0
            password = self.faker.uuid4()
            cost = '' 
            currency = ''
            roleid = self.instructorroleid
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_enrol.loc[len(self.moodle_enrol)] = [id,enrol,status,courseid,sortorder,name,enrolperiod,enrolstartdate,enrolenddate,expirynotify,expirythreshold,notifyall,password, \
                                                                    cost,currency,roleid,customint1,customint2,customint3,customint4,customint5,customint6,customint7,customint8,customchar1, \
                                                                    customchar2,customchar3,customdec1,customdec2,customtext1,customtext2,customtext3,customtext4,timecreated,timemodified]
        self.writetofile('enrol', self.moodle_enrol)

    def genUserEnrolments(self):
        # cast both base-truth enrollment and moodle_enrol tables to Spark, then join together for enrollment id per student in a class
        dfBT_enroll = spark.createDataFrame(self.enrollment)
        dfMoodle_enrol = spark.createDataFrame(self.moodle_enrol)
        dfMoodle_enrol = dfMoodle_enrol.filter(dfMoodle_enrol['roleid'] == self.studentroleid)
        dfMoodle_enrol = dfMoodle_enrol.select('id', 'courseid').withColumnRenamed('courseid', 'class_id')
        dfEnroll = dfMoodle_enrol.join(dfBT_enroll, dfMoodle_enrol.class_id == dfBT_enroll.SectionID, how='inner').drop('class_id')
        enroll_joined = dfEnroll.toPandas()
        for index, enroll in enroll_joined.iterrows():
            id = self.faker.uuid4()
            status = 0 # NOTE: unsure - 0 means active participation according to doc.
            enrolid = enroll['id']
            userid = enroll['StudentID']
            timestart = self.startdate
            timeend = self.enddate
            modifierid = self.modifieruserid
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_user_enrolments.loc[len(self.moodle_user_enrolments.index)] = [id,status,enrolid,userid,timestart,timeend,modifierid,timecreated,timemodified]
        # do the same for instructors
        dfBT_i_enroll = spark.createDataFrame(self.instructors_enroll)
        dfMoodle_enrol = spark.createDataFrame(self.moodle_enrol)
        dfMoodle_enrol = dfMoodle_enrol.filter(dfMoodle_enrol['roleid'] == self.instructorroleid)
        dfMoodle_enrol = dfMoodle_enrol.select('id', 'courseid').withColumnRenamed('courseid', 'class_id')
        dfEnroll = dfMoodle_enrol.join(dfBT_i_enroll, dfMoodle_enrol.class_id == dfBT_i_enroll.InstructsClass_SectionId, how='inner').drop('class_id')
        enroll_joined = dfEnroll.toPandas()
        for index, enroll in enroll_joined.iterrows():
            id = self.faker.uuid4()
            status = 0 
            enrolid = enroll['id']
            userid = enroll['InstructorId']
            timestart = self.startdate
            timeend = self.enddate
            modifierid = self.modifieruserid
            timecreated = self.startdate
            timemodified = self.startdate
            self.moodle_user_enrolments.loc[len(self.moodle_user_enrolments)] = [id,status,enrolid,userid,timestart,timeend,modifierid,timecreated,timemodified]
        self.writetofile('user_enrolments', self.moodle_user_enrolments)

    def genRoleAssignments(self):
        # enrol and user_enrolments table must be created first
        # add students first
        dfEnrol = spark.createDataFrame(self.moodle_user_enrolments)
        dfBT_students = spark.createDataFrame(self.students).select('StudentID')
        dfStudent_enroll = dfEnrol.join(dfBT_students, dfEnrol.userid == dfBT_students.StudentID,how='inner')
        pdfStudents_enroll = dfStudent_enroll.toPandas()
        for index, student in pdfStudents_enroll.iterrows():
            id = self.faker.uuid4()
            roleid = self.studentroleid
            contextid = self.contextid
            userid = student['StudentID']
            timemodified = self.startdate
            modifierid = self.modifieruserid
            component = '' # NOTE: unsure
            itemid = student['id']
            sortorder = 0
            self.moodle_role_assignments.loc[len(self.moodle_role_assignments.index)] = [id,roleid,contextid,userid,timemodified,modifierid,component,itemid,sortorder]
        # then instructors
        dfEnrol = spark.createDataFrame(self.moodle_user_enrolments)
        dfBT_instruct = spark.createDataFrame(self.instructors).select('InstructorId')
        dfInstructors_enroll = dfEnrol.join(dfBT_instruct, dfEnrol.userid == dfBT_instruct.InstructorId,how='inner')
        pdfInstructors_enroll = dfInstructors_enroll.toPandas()
        for index, instructor in pdfInstructors_enroll.iterrows():
            id = self.faker.uuid4()
            roleid = self.instructorroleid
            contextid = self.contextid
            userid = instructor['InstructorId']
            timemodified = self.startdate
            modifierid = self.modifieruserid
            component = '' # NOTE: unsure
            itemid = instructor['id']
            sortorder = 0
            self.moodle_role_assignments.loc[len(self.moodle_role_assignments)] = [id,roleid,contextid,userid,timemodified,modifierid,component,itemid,sortorder]
        self.writetofile('role_assignments', self.moodle_role_assignments)

    def writetofile(self,filename,dfout):
        # turns the pandas df into a pyspark df, and then writes out the generated tables to stage1
        genfilepath = 'stage1/Transactional/test_data/v0.1/moodle_gen/' + filename + '/snapshot_batch_data/rundate='+self.currentDateTime
        dfOutfile = spark.createDataFrame(dfout)
        dfOutfile = dfOutfile.na.drop('all')
        dfOutfile.coalesce(1).write.save(oea.to_url(f'{genfilepath}'), format='csv', mode='overwrite', header='true', mergeSchema='true')