# Test Data Generation: Moodle Activity Tables Class

**Affiliation**: *Kwantum Edu Analytics*. **Last Modified**: *5/9/2023*.

This OEA test data generation class notebook generates fictitous Moodle tables, as seen in the Moodle module. This notebook is needed to successfully run the moodle_test_data_gen_demo notebook.

For reference of all Moodle outlined below, see Moodle table schemas here: https://www.examulator.com/er/output/index.html

This class notebook primarily leans on the use of the OEA_py class notebook, ```Faker``` and ```random``` python packages, and already-generated base-truth tables to generate **16** Moodle module tables (only the 22 activity tables; SIS/rostering tables are expected to already have been created):

 1. **assign**
 2. **assign_grades**
 3. **assign_submission**
 4. **assignsubmission_file**
 5. **assign_user_mapping**
 6. **attendance** <- *unsure about this table; Analytikus listed in tables used*
 7. **feedback** <- Not created at the moment
 8. **feedback_item** <- Not created at the moment
 9. **feedback_value** <- Not created at the moment
 10. **forum** <- *Method drafted*
 11. **forum_discussions** <- *Method drafted*
 12. **forum_grades** <- *Method drafted*
 13. **forum_posts** <- *Method drafted*
 14. **grade_grades**
 15. **groups** <- Not created at the moment
 16. **lesson**
 17. **lesson_answers**
 18. **lesson_attempts**
 19. **lesson_grades**
 20. **lesson_pages** 
 21. **lesson_timer** 
 22. **logstore_standard_log** <- Not created at the moment
 23. **messages** <- *Method drafted*; needs editing
 24. **message_conversations** <- *Method drafted*; needs editing
 25. **message_conversation_members** <- *Method drafted*; needs editing
 26. **page** 
 27. **question** <- Not created at the moment
 28. **question_answers** <- Not created at the moment
 29. **question_attempts** <- Not created at the moment
 30. **question_categories** <- Not created at the moment
 31. **question_usages** <- Not created at the moment
 32. **quiz**
 33. **quiz_attempts**
 34. **quiz_grades**

There is one main method ```genMoodleActivity(startdate, enddate, reportgendate, moodle_roster_tables_source_path, max_num_activities_per_class)``` to generate the tables described. Parameter descriptions are given:
  - *startdate*: semester start date.
  - *enddate*: semester end date.
  - *reportgendate*: date the report(s) were generated (i.e., fictitous date when all tables were landed in the data lake).
  - *moodle_roster_tables_source_path*: source of Moodle roster/SIS tables previously generated.
  - *max_num_activities_per_class*: randomly samples all courses from Moodle roster data, then randomly selects the number of activities per course from 0 up to this parameter value. For example:
    * ```max_num_activities_per_class = 3``` means when generating assignment test data, it will go through every course and randomly choose how many assignments will be generated per class - from 0 to 3.

In [1]:
import logging
import random, decimal
from tokenize import Ignore
from faker import Faker
import pandas as pd
import datetime as dt
import numpy as np
from pyspark.sql import functions as F

class MoodleActivityDataGen():
    def __init__(self, startdate='2022-01-03T00:00:00', enddate='2022-06-03T00:00:00'):
        #self.startdate = startdate
        #self.enddate = enddate
        
        self.faker = Faker('en_US')

        # set current datetime for rundate folder for writing out files
        currentDate = dt.datetime.now()
        self.currentDateTime = currentDate.strftime("%Y-%m-%d %H-%M-%S")

        # initialize dfs for each Moodle table to be generated
        assign = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'alwaysshowdescription':[],
            'nosubmissions':[],
            'submissiondrafts':[],
            'sendnotifications':[],
            'sendlatenotifications':[],
            'duedate':[],
            'allowsubmissionsfromdate':[],
            'grade':[],
            'timemodified':[],
            'requiresubmissionstatement':[],
            'completionsubmit':[],
            'cutoffdate':[],
            'gradingduedate':[],
            'teamsubmission':[],
            'requireallteammemberssubmit':[],
            'teamsubmissiongroupingid':[],
            'blindmarking':[],
            'hidegrader':[],
            'revealidentities':[],
            'attemptreopenmethod':[],
            'maxattempts':[],
            'markingworkflow':[],
            'markingallocation':[],
            'sendstudentnotifications':[],
            'preventsubmissionnotingroup':[],
            'activity':[],
            'activityformat':[],
            'timelimit':[],
            'submissionattachments':[]
        }
        self.moodle_assign = pd.DataFrame(assign, dtype=object)
        assign_grades = {
            'id':[],
            'assignment':[],
            'userid':[],
            'timecreated':[],
            'timemodified':[],
            'grader':[],
            'grade':[],
            'attemptnumber':[]
        }
        self.moodle_assign_grades = pd.DataFrame(assign_grades, dtype=object)
        assign_submission = {
            'id':[],
            'assignment':[],
            'userid':[],
            'timecreated':[],
            'timemodified':[],
            'timestarted':[],
            'status':[],
            'groupid':[],
            'attemptnumber':[],
            'latest':[]
        }
        self.moodle_assign_submission = pd.DataFrame(assign_submission, dtype=object)
        assignsubmission_file = {
            'id':[],
            'assignment':[],
            'submission':[],
            'numfiles':[]
        }
        self.moodle_assignsubmission_file = pd.DataFrame(assignsubmission_file, dtype=object)
        assign_user_mapping = {
            'id':[],
            'assignment':[],
            'userid':[]
        }
        self.moodle_assign_user_mapping = pd.DataFrame(assign_user_mapping, dtype=object)
        feedback_value = {
            'id':[],
            'course_id':[],
            'item':[],
            'completed':[],
            'tmp_completed':[],
            'value':[]
        }
        self.moodle_feedback_value = pd.DataFrame(feedback_value, dtype=object)
        forum = {
            'id':[],
            'course':[],
            'type':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'duedate':[],
            'cutoffdate':[],
            'assessed':[],
            'assesstimestart':[],
            'assesstimefinish':[],
            'scale':[],
            'grade_forum':[],
            'grade_forum_notify':[],
            'maxbytes':[],
            'maxattachments':[],
            'forcesubscribe':[],
            'trackingtype':[],
            'rsstype':[],
            'rssarticles':[],
            'timemodified':[],
            'warnafter':[],
            'blockafter':[],
            'blockperiod':[],
            'completiondiscussions':[],
            'completionreplies':[],
            'completionposts':[],
            'displaywordcount':[],
            'lockdiscussionafter':[]
        }
        self.moodle_forum = pd.DataFrame(forum, dtype=object)
        forum_discussions = {
            'id':[],
            'course':[],
            'forum':[],
            'name':[],
            'firstpost':[],
            'userid':[],
            'groupid':[],
            'assessed':[],
            'timemodified':[],
            'usermodified':[],
            'timestart':[],
            'timeend':[],
            'pinned':[],
            'timelocked':[]
        }
        self.moodle_forum_discussions = pd.DataFrame(forum_discussions, dtype=object)
        forum_grades = {
            'id':[],
            'forum':[],
            'itemnumber':[],
            'userid':[],
            'grade':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_forum_grades = pd.DataFrame(forum_grades, dtype=object)
        forum_posts = {
            'id':[],
            'discussion':[],
            'parent':[],
            'userid':[],
            'created':[],
            'modified':[],
            'mailed':[],
            'subject':[],
            'message':[],
            'messgaeformat':[],
            'messagetrust':[],
            'attachment':[],
            'totalscore':[],
            'mailnow':[],
            'deleted':[],
            'privatereplyto':[],
            'wordcount':[],
            'charcount':[]
        }
        self.moodle_forum_posts = pd.DataFrame(forum_posts, dtype=object)
        grade_grades = {
            'id':[],
            'itemid':[],
            'userid':[],
            'rawgrade':[],
            'rawgrademax':[],
            'rawgrademin':[],
            'rawscaleid':[],
            'usermodfied':[],
            'finalgrade':[],
            'hidden':[],
            'locked':[],
            'locktime':[],
            'exported':[],
            'overridden':[],
            'excluded':[],
            'feedback':[],
            'feedbackformat':[],
            'information':[],
            'informationformat':[],
            'timecreated':[],
            'timemodified':[],
            'aggregationstatus':[],
            'aggregationweight':[]
        }
        self.moodle_grade_grades = pd.DataFrame(grade_grades, dtype=object)
        groups = {
            'id':[],
            'courseid':[],
            'idnumber':[],
            'name':[],
            'description':[],
            'descriptionformat':[],
            'enrolmentkey':[],
            'picture':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_groups = pd.DataFrame(groups, dtype=object)
        lesson = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'practice':[],
            'modattempts':[],
            'usepassword':[],
            'password':[],
            'dependency':[],
            'conditions':[],
            'grade':[],
            'custom':[],
            'ongoing':[],
            'usemaxgrade':[],
            'maxanswers':[],
            'maxattempts':[],
            'review':[],
            'nextpagedefault':[],
            'feedback':[],
            'minquestions':[],
            'maxpages':[],
            'timelimit':[],
            'retake':[],
            'activitylink':[],
            'mediafile':[],
            'mediaheight':[],
            'mediawidth':[],
            'mediaclose':[],
            'slideshow':[],
            'width':[],
            'height':[],
            'bgcolor':[],
            'displayleft':[],
            'displayleftif':[],
            'progressbar':[],
            'available':[],
            'deadline':[],
            'timemodified':[],
            'completionendreached':[],
            'completiontimespent':[],
            'allowofflineattempts':[]
        }
        self.moodle_lesson = pd.DataFrame(lesson, dtype=object)
        lesson_answers = {
            'id':[],
            'lessonid':[],
            'pageid':[],
            'jumpto':[],
            'grade':[],
            'score':[],
            'flags':[],
            'timecreated':[],
            'timemodified':[],
            'answer':[],
            'answerformat':[],
            'response':[],
            'responseformat':[]
        }
        self.moodle_lesson_answers = pd.DataFrame(lesson_answers, dtype=object)
        lesson_attempts = {
            'id':[],
            'lessonid':[],
            'pageid':[],
            'userid':[],
            'answerid':[],
            'retry':[],
            'correct':[],
            'useranswer':[],
            'timeseen':[]
        }
        self.moodle_lesson_attempts = pd.DataFrame(lesson_attempts, dtype=object)
        lesson_grades = {
            'id':[],
            'lessonid':[],
            'userid':[],
            'grade':[],
            'late':[],
            'completed':[]
        }
        self.moodle_lesson_grades = pd.DataFrame(lesson_grades, dtype=object)
        lesson_pages = {
            'id':[],
            'lessonid':[],
            'prevpageid':[],
            'nextpageid':[],
            'qtype':[],
            'qoption':[],
            'layout':[],
            'display':[],
            'timecreated':[],
            'timemodified':[],
            'title':[],
            'contents':[],
            'contentsformat':[]
        }
        self.moodle_lesson_pages = pd.DataFrame(lesson_pages, dtype=object)
        lesson_timer = {
            'id':[],
            'lessonid':[],
            'userid':[],
            'starttime':[],
            'lessontime':[],
            'completed':[],
            'timemodifiedoffline':[]
        }
        self.moodle_lesson_timer = pd.DataFrame(lesson_timer, dtype=object)
        logstore_standard_log = {
            'id':[],
            'eventname':[],
            'component':[],
            'action':[],
            'target':[],
            'objecttable':[],
            'objectid':[],
            'crud':[],
            'edulevel':[],
            'contextid':[],
            'contextlevel':[],
            'contextinstanceid':[],
            'userid':[],
            'courseid':[],
            'relateduserid':[],
            'anonymous':[],
            'other':[],
            'timecreated':[],
            'origin':[],
            'ip':[],
            'realuserid':[]
        }
        self.moodle_logstore_standard_log = pd.DataFrame(logstore_standard_log, dtype=object)
        messages = {
            'id':[],
            'useridfrom':[],
            'conversationid':[],
            'subject':[],
            'fullmessage':[],
            'fullmessageformat':[],
            'fullmessagehtml':[],
            'smallmessage':[],
            'timecreated':[],
            'fullmessagetrust':[],
            'customdata':[]
        }
        self.moodle_messages = pd.DataFrame(messages, dtype=object)
        message_conversations = {
            'id':[],
            'type':[],
            'name':[],
            'convhash':[],
            'component':[],
            'itemtype':[],
            'itemid':[],
            'contextid':[],
            'enabled':[],
            'timecreated':[],
            'timemodified':[]
        }
        self.moodle_message_conversations = pd.DataFrame(message_conversations, dtype=object)
        message_conversation_members = {
            'id':[],
            'conversationid':[],
            'userid':[],
            'timecreated':[]
        }
        self.moodle_message_conversation_members = pd.DataFrame(message_conversation_members, dtype=object)
        page = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'content':[],
            'contentformat':[],
            'legacyfiles':[],
            'legacyfileslast':[],
            'display':[],
            'displayoptions':[],
            'revision':[],
            'timemodified':[]
        }
        self.moodle_page = pd.DataFrame(page, dtype=object)
        question = {
            'id':[],
            'parent':[],
            'name':[],
            'questiontext':[],
            'questiontextformat':[],
            'generalfeedback':[],
            'generalfeedbackformat':[],
            'defaultmark':[],
            'penalty':[],
            'qtype':[],
            'length':[],
            'stamp':[],
            'timecreated':[],
            'timemodified':[],
            'createdby':[],
            'modifiedby':[]
        }
        self.moodle_question = pd.DataFrame(question, dtype=object)
        question_answers = {
            'id':[],
            'question':[],
            'answer':[],
            'answerformat':[],
            'fraction':[],
            'feedback':[],
            'feedbackformat':[]
        }
        self.moodle_question_answers = pd.DataFrame(question_answers, dtype=object)
        question_attempts = {
            'id':[],
            'questionusageid':[],
            'slot':[],
            'behaviour':[],
            'questionid':[],
            'variant':[],
            'maxmark':[],
            'minfraction':[],
            'maxfraction':[],
            'flagged':[],
            'questionsummary':[],
            'rightanswer':[],
            'responsesummary':[],
            'timemodified':[]
        }
        self.moodle_question_attempts = pd.DataFrame(question_attempts, dtype=object)
        question_categories = {
            'id':[],
            'name':[],
            'contextid':[],
            'info':[],
            'infoformat':[],
            'stamp':[],
            'parent':[],
            'sortorder':[],
            'idnumber':[]
        }
        self.moodle_question_categories = pd.DataFrame(question_categories, dtype=object)
        question_usages = {
            'id':[],
            'contextid':[],
            'component':[],
            'preferredbehaviour':[]
        }
        self.moodle_question_usages = pd.DataFrame(question_usages, dtype=object)
        quiz = {
            'id':[],
            'course':[],
            'name':[],
            'intro':[],
            'introformat':[],
            'timeopen':[],
            'timeclose':[],
            'timelimit':[],
            'overduehandling':[],
            'graceperiod':[],
            'preferredbehavior':[],
            'canredoquestions':[],
            'attempts':[],
            'attemptonlast':[],
            'grademethod':[],
            'decimalpoints':[],
            'questiondecimalpoints':[],
            'reviewattempt':[],
            'reviewcorrectness':[],
            'reviewremarks':[],
            'reviewspecificfeedback':[],
            'reviewgeneralfeedback':[],
            'reviewrightanswer':[],
            'reviewoverallfeedback':[],
            'questionsperpage':[],
            'navmethod':[],
            'shuffleanswers':[],
            'sumgrades':[],
            'grade':[],
            'timecreated':[],
            'timemodified':[],
            'password':[],
            'subnet':[],
            'browsersecurity':[],
            'delay1':[],
            'delay2':[],
            'showuserpicture':[],
            'showblocks':[],
            'completionattemptsexhausted':[],
            'completionminattempts':[],
            'allowofflineattempts':[]
        }
        self.moodle_quiz = pd.DataFrame(quiz, dtype=object)
        quiz_attempts = {
            'id':[],
            'quiz':[],
            'userid':[],
            'attempt':[],
            'uniqueid':[],
            'layout':[],
            'currentpage':[],
            'preview':[],
            'state':[],
            'timestart':[],
            'timefinish':[],
            'timemodified':[],
            'timemodifiedoffline':[],
            'timecheckstate':[],
            'sumgrades':[],
            'gradednotificationsenttime':[]
        }
        self.moodle_quiz_attempts = pd.DataFrame(quiz_attempts, dtype=object)
        quiz_grades = {
            'id':[],
            'quiz':[],
            'userid':[],
            'grade':[],
            'timemodified':[]
        }
        self.moodle_quiz_grades = pd.DataFrame(quiz_grades, dtype=object)

    def genMoodleActivity(self,startdate='2022-01-01T00:00:00',enddate='2022-06-01T00:00:00',reportgendate='2022-02-02T00:00:00',moodle_roster_tables_source_path='stage1/Transactional/test_data/v0.1/moodle_gen',max_num_activities_per_class=5):
        self.startdate = dt.datetime.strptime(startdate, "%Y-%m-%dT%H:%M:%S")
        self.enddate = dt.datetime.strptime(enddate, "%Y-%m-%dT%H:%M:%S")
        self.reportdate = dt.datetime.strptime(reportgendate, "%Y-%m-%dT%H:%M:%S")
        use_general_module_base_truth = True
        if use_general_module_base_truth:
            sourcepath = 'stage1/Transactional/test_data/v0.1/base_general_modules'
            if oea.path_exists(sourcepath):
                logger.info('General module base-truth tables already exist - delete the "base_general_modules" folder/directory if you want to replace these.')
            else:
                # manually delete and replace the general module base_truth_tables CSVs as needed
                logger.info('General module base-truth tables do not currently exist - landing in stage1/.../test_data/v0.1/base_general_modules/')
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/students.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_students', 'general_module_base_truth_students.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/schools.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_schools', 'general_module_base_truth_schools.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/courses.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_courses', 'general_module_base_truth_courses.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/sections.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_sections', 'general_module_base_truth_sections.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/enrollment.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_student_enrollment', 'general_module_base_truth_student_enrollment.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/instructors.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_instructors', 'general_module_base_truth_instructors.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/instructors_enroll.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_instructors_enroll', 'general_module_base_truth_instructors_enroll.csv', oea.SNAPSHOT_BATCH_DATA)
            # NOTE: if tables are not read in properly - you may need to rename the rundate folder to replace colons with hyphens
            self.students = oea.load_csv(sourcepath + '/base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + '/base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + '/base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + '/base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + '/base_student_enrollment/', header=True).toPandas()
            self.instructors = oea.load_csv(sourcepath + '/base_instructors/', header=True).toPandas()
            self.instructors_enroll = oea.load_csv(sourcepath + '/base_instructors_enroll/', header=True).toPandas()
            logger.info('Generating Moodle test data based on general module base-truth tables...')
        else:
            # expectation is that base_truth_tables exist
            sourcepath = 'stage1/Transactional/test_data/v0.1/'
            self.students = oea.load_csv(sourcepath + 'base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + 'base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + 'base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + 'base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + 'base_student_enrollment/', header=True).toPandas()
            self.instructors = oea.load_csv(sourcepath + 'base_instructors/', header=True).toPandas()
            self.instructors_enroll = oea.load_csv(sourcepath + 'base_instructors_enroll/', header=True).toPandas()
            logger.info('Generating Moodle test data based on user-generated base-truth tables...')
        # load in Moodle SIS/roster tables (NOTE: these are expected to already have been created)
        moodle_cohort = oea.load_csv(f'{moodle_roster_tables_source_path}/cohort/snapshot_batch_data/*/*.csv', header=True)
        moodle_course = oea.load_csv(f'{moodle_roster_tables_source_path}/course/snapshot_batch_data/*/*.csv', header=True)
        moodle_course_categories = oea.load_csv(f'{moodle_roster_tables_source_path}/course_categories/snapshot_batch_data/*/*.csv', header=True)
        moodle_enrol = oea.load_csv(f'{moodle_roster_tables_source_path}/enrol/snapshot_batch_data/*/*.csv', header=True)
        moodle_role = oea.load_csv(f'{moodle_roster_tables_source_path}/role/snapshot_batch_data/*/*.csv', header=True)
        moodle_role_assignments = oea.load_csv(f'{moodle_roster_tables_source_path}/role_assignments/snapshot_batch_data/*/*.csv', header=True)
        moodle_user = oea.load_csv(f'{moodle_roster_tables_source_path}/user/snapshot_batch_data/*/*.csv', header=True)
        moodle_user_enrolments = oea.load_csv(f'{moodle_roster_tables_source_path}/user_enrolments/snapshot_batch_data/*/*.csv', header=True)
        # extract pre-existing static values (student and instructor role ids, context id, and tech admin modifier user id)
        self.studentroleid = moodle_role.filter(moodle_role['name'] == 'Student').collect()[0][0]
        self.instructorroleid = moodle_role.filter(moodle_role['name'] == 'Instructor').collect()[0][0]
        self.modifieruserid = moodle_user_enrolments.select('modifierid').collect()[0][0]
        self.contextid = moodle_cohort.select('contextid').collect()[0][0]
        # then turn the moodle roster spark dfs into pandas dfs
        self.moodle_cohort = moodle_cohort.toPandas()
        self.moodle_course = moodle_course.toPandas()
        self.moodle_course_categories = moodle_course_categories.toPandas()
        self.moodle_enrol = moodle_enrol.toPandas()
        self.moodle_role = moodle_role.toPandas()
        self.moodle_role_assignments = moodle_role_assignments.toPandas()
        self.moodle_user = moodle_user.toPandas()
        self.moodle_user_enrolments = moodle_user_enrolments.toPandas()
        logger.info('Successfully loaded Moodle SIS/rostering tables. Now generating Moodle activity tables...')
        self.genAssign_tables(max_num_assigns=max_num_activities_per_class)
        self.genQuiz_tables(max_num_quizzes=max_num_activities_per_class)
        #self.genForum_tables(max_num_forums=max_num_activities_per_class)
        self.genLesson_tables(max_num_lessons=max_num_activities_per_class)
        self.genPage()
        self.genGradeGrades()
        #self.genMessage_tables(num_convos=num_courses_to_gen_activity)
        logger.info('Successfully generated Moodle activity tables (for assignments, quizzes, forums, lessons and messages).')
        logger.info('Finished Moodle generation.')

    def __get_daterange(self):
        daterange = []
        startdate = dt.datetime(2022,1,3)
        enddate = dt.datetime(2022,1,28)
        while(startdate < enddate):
            daterange.append(startdate)
            startdate = startdate + dt.timedelta(days=1)
        return daterange
    
    def genPage(self):
        # NOTE: lesson_pages table must be developed first; only holds contents related to this table
        dfLesson = spark.createDataFrame(self.moodle_lesson).select('id','course').withColumnRenamed('id','lesson_id')
        dfLessonPages = spark.createDataFrame(self.moodle_lesson_pages)
        dfLessonPages = dfLessonPages.join(dfLesson, dfLessonPages.lessonid == dfLesson.lesson_id,how='inner').drop('lesson_id')
        pdf_lesson_pages = dfLessonPages.toPandas()
        for index, page in pdf_lesson_pages.iterrows():
            id = page['id']
            course = page['course']
            name = page['title']
            intro = 'Page of a lesson for a course'
            introformat = 1
            content = 'Lesson questions for students'
            contentformat = 1
            legacyfiles = 0
            legacyfileslast = 0
            display = 1
            displayoptions = ''
            revision = 1
            timemodified = page['timemodified']
            self.moodle_page.loc[len(self.moodle_page.index)] = [id,course,name,intro,introformat,content,contentformat,legacyfiles,legacyfileslast,display,displayoptions,revision,timemodified]
        self.writetofile('page', self.moodle_page)

    def genGradeGrades(self):
        # NOTE: May need to update if this does not process properly
        # iterate through moodle_assign_grades table
        for index, assign_grade in self.moodle_assign_grades.iterrows():
            id = assign_grade['id']
            itemid = assign_grade['assignment']
            userid = assign_grade['userid']
            rawgrade = assign_grade['grade']
            rawgrademax = 100
            rawgrademin = 0
            rawscaleid = '' # NOTE: unsure
            usermodified = assign_grade['grader'] 
            finalgrade = assign_grade['grade']
            hidden = 0
            locked = 0
            locktime = 0
            exported = 0
            overridden = 0
            excluded = 0
            feedback = 0
            feedbackformat = 0
            information = assign_grade['attemptnumber']
            informationformat = 'attempt number of the assignment'
            timecreated = assign_grade['timecreated']
            timemodified = assign_grade['timemodified']
            aggregationstatus = 'unknown'
            aggregationweight = ''
            self.moodle_grade_grades.loc[len(self.moodle_grade_grades.index)] = [id,itemid,userid,rawgrade,rawgrademax,rawgrademin,rawscaleid,usermodified,finalgrade,hidden,locked,locktime, \
                                                                                exported,overridden,excluded,feedback,feedbackformat,information,informationformat,timecreated,timemodified,aggregationstatus,aggregationweight]
        # iterate through moodle_quiz_grades table
        for index, quiz_grade in self.moodle_quiz_grades.iterrows():
            id = quiz_grade['id']
            itemid = quiz_grade['quiz']
            userid = quiz_grade['userid']
            rawgrade = quiz_grade['grade']
            rawgrademax = 100
            rawgrademin = 0
            rawscaleid = '' # NOTE: unsure
            usermodified = '' # NOTE: unsure
            finalgrade = quiz_grade['grade']
            hidden = 0
            locked = 0
            locktime = 0
            exported = 0
            overridden = 0
            excluded = 0
            feedback = 0
            feedbackformat = 0
            information = 0
            informationformat = ''
            timecreated = ''
            timemodified = quiz_grade['timemodified']
            aggregationstatus = 'unknown'
            aggregationweight = ''
            self.moodle_grade_grades.loc[len(self.moodle_grade_grades)] = [id,itemid,userid,rawgrade,rawgrademax,rawgrademin,rawscaleid,usermodified,finalgrade,hidden,locked,locktime, \
                                                                                exported,overridden,excluded,feedback,feedbackformat,information,informationformat,timecreated,timemodified,aggregationstatus,aggregationweight]
        # iterate through moodle_forum_grades table
        for index, forum_grade in self.moodle_forum_grades.iterrows():
            id = forum_grade['id']
            itemid = forum_grade['itemnumber']
            userid = forum_grade['userid']
            rawgrade = forum_grade['grade']
            rawgrademax = 100
            rawgrademin = 0
            rawscaleid = '' # NOTE: unsure
            usermodified = '' # NOTE: unsure
            finalgrade = forum_grade['grade']
            hidden = 0
            locked = 0
            locktime = 0
            exported = 0
            overridden = 0
            excluded = 0
            feedback = 0
            feedbackformat = 0
            information = forum_grade['forum']
            informationformat = 'forum id of the graded discussion'
            timecreated = forum_grade['timecreated']
            timemodified = forum_grade['timemodified']
            aggregationstatus = 'unknown'
            aggregationweight = ''
            self.moodle_grade_grades.loc[len(self.moodle_grade_grades)] = [id,itemid,userid,rawgrade,rawgrademax,rawgrademin,rawscaleid,usermodified,finalgrade,hidden,locked,locktime, \
                                                                                exported,overridden,excluded,feedback,feedbackformat,information,informationformat,timecreated,timemodified,aggregationstatus,aggregationweight]
        # iterate through moodle_lesson_grades table
        for index, lesson_grade in self.moodle_lesson_grades.iterrows():
            id = lesson_grade['id']
            itemid = lesson_grade['lessonid']
            userid = lesson_grade['userid']
            rawgrade = lesson_grade['grade']
            rawgrademax = 100
            rawgrademin = 0
            rawscaleid = '' # NOTE: unsure
            usermodified = '' # NOTE: unsure
            finalgrade = lesson_grade['grade']
            hidden = 0
            locked = 0
            locktime = 0
            exported = 0
            overridden = 0
            excluded = 0
            feedback = 0
            feedbackformat = 0
            information = str(lesson_grade['late'])
            informationformat = 'flag of whether the lesson completed was late'
            timecreated = ''
            timemodified = ''
            aggregationstatus = 'unknown'
            aggregationweight = ''
            self.moodle_grade_grades.loc[len(self.moodle_grade_grades)] = [id,itemid,userid,rawgrade,rawgrademax,rawgrademin,rawscaleid,usermodified,finalgrade,hidden,locked,locktime, \
                                                                                exported,overridden,excluded,feedback,feedbackformat,information,informationformat,timecreated,timemodified,aggregationstatus,aggregationweight]
        self.writetofile('grade_grades', self.moodle_grade_grades)

    def genAssign_tables(self,max_num_assigns=3):
        """This method generates 5 assign tables: assign_user_mapping, assign, assign_submission, assignsubmission_file and assign_grades"""
        date_range = self.__get_daterange()
        for index, course in self.moodle_course.iterrows():
            # find course id
            course_id = course['id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course_id}')
            num_students_in_course = dfEnroll.count()
            # randomly choose how many assignments have been assigned in this class
            n = random.randint(0,max_num_assigns)
            while n > 0:
                # choose the day this assignment was assigned
                assign_day = random.choice(date_range)
                # generate the assign id
                assign_id = self.faker.uuid4()
                # finally generate the assign tables
                self._genAssignUserMapping(assign_id,dfEnroll) 
                num_students_not_submitted,max_attempts = self._genAssign(assign_id,course_id,assign_day,num_students_in_course)
                num_students_submit = num_students_in_course - num_students_not_submitted # calculate num students to have submitted
                self._genAssignSubmission(assign_id,assign_day,num_students_submit,dfEnroll,max_attempts)
                self._genAssignSubmissionFile(assign_id)
                self._genAssignGrades(assign_id,course_id,assign_day)
                n = n - 1
        self.writetofile('assign_user_mapping', self.moodle_assign_user_mapping)
        self.writetofile('assign', self.moodle_assign)
        self.writetofile('assign_submission', self.moodle_assign_submission)
        self.writetofile('assignsubmission_file', self.moodle_assignsubmission_file)
        self.writetofile('assign_grades', self.moodle_assign_grades)

    def _genAssign(self,assignid,courseid,assignday,num_students_in_course):
        # NOTE: this code assumes there's supposed to be one row per assignment
        id = assignid
        course = courseid
        name = 'Assignment for Course' # NOTE: can be modified for unique names
        intro = 'This is an assignment for a course'
        introformat = 0
        alwaysshowdescription = 1
        nosubmissions = random.randint(0,num_students_in_course)
        submissiondrafts = 0
        sendnotifications = 0
        sendlatenotifications = 0
        allowsubmissionsfromdate = 0
        grade = 100
        timemodified = assignday
        duedate = timemodified + dt.timedelta(days=7)
        requiresubmissionstatement = 0
        completionsubmit = 0 # NOTE: unsure
        cutoffdate = timemodified + dt.timedelta(days=7)
        gradingduedate = timemodified + dt.timedelta(days=14)
        teamsubmission = 0
        requireallteammemberssubmit = 0
        teamsubmissiongroupingid = 0
        blindmarking = 0
        hidegrader = 0
        revealidentities = 0
        attemptreopenmethod = 'none'
        maxattempts = random.randint(1,3)
        markingworkflow = 0
        markingallocation = 0
        sendstudentnotifications = 0
        preventsubmissionnotingroup = 0
        activity = 'Assignment Progress for the Course'
        activityformat = 0
        timelimit = 0
        submissionattachments = 3
        self.moodle_assign.loc[len(self.moodle_assign)] = [id,course,name,intro,introformat,alwaysshowdescription,nosubmissions,submissiondrafts,sendnotifications,sendlatenotifications, \
                                                                    duedate,allowsubmissionsfromdate,grade,timemodified,requiresubmissionstatement,completionsubmit,cutoffdate,gradingduedate,teamsubmission, \
                                                                    requireallteammemberssubmit,teamsubmissiongroupingid,blindmarking,hidegrader,revealidentities,attemptreopenmethod,maxattempts,markingworkflow, \
                                                                    markingallocation,sendstudentnotifications,preventsubmissionnotingroup,activity,activityformat,timelimit,submissionattachments]
        return nosubmissions,maxattempts

    def _genAssignUserMapping(self,assignid,dfEnroll):
        df_enroll = dfEnroll.toPandas()
        for index, enroll in df_enroll.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            userid = enroll['StudentID']
            self.moodle_assign_user_mapping.loc[len(self.moodle_assign_user_mapping)] = [id,assignment,userid]

    def _genAssignSubmission(self,assignid,assignday,num_students_submit,dfEnroll,maxattempts):
        # randomly sample the students enrolled in the course for assignment submissions
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            userid = student['StudentID']
            random_num_attempts = random.randint(1,maxattempts)
            assignment = assignid
            latest = random_num_attempts
            timecreated = assignday + dt.timedelta(days=random.randint(1,6),hours=random.randint(0,23),minutes=random.randint(0,59))
            if random_num_attempts == 1:
                # if only one attempt for the student, then simply generate that single submission
                timemodified = timecreated
                timestarted = timecreated - dt.timedelta(hours=random.randint(0,4),minutes=random.randint(0,59))
                id = self.faker.uuid4()
                status = 'SUBMITTED'
                groupid = 0
                attemptnumber = 1
                latest = random_num_attempts
                self.moodle_assign_submission.loc[len(self.moodle_assign_submission)] = [id,assignment,userid,timecreated,timemodified,timestarted,status,groupid,attemptnumber,latest]
            else:
            # create variables for keeping track of the last attempt from same student
                last_attempt_daytime = timecreated + dt.timedelta(days=random.randint(0,3),hours=random.randint(0,23),minutes=random.randint(0,59))
                previous_attempt = None
                # iterate through total number of attempts
                for n in range (0,random_num_attempts):
                    id = self.faker.uuid4()
                    status = 'SUBMITTED'
                    groupid = 0
                    attemptnumber = n + 1
                    timemodified = last_attempt_daytime
                    if n == (random_num_attempts - 1):
                        timestarted = last_attempt_daytime
                    else:
                        if isinstance(previous_attempt, type(None)):
                            timestarted = timecreated + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                            while timestarted > timemodified:
                                timestarted = timecreated + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                        else:
                            timestarted = previous_attempt + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                            while timestarted > timemodified:
                                timestarted = previous_attempt + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                    previous_attempt = timestarted
                    self.moodle_assign_submission.loc[len(self.moodle_assign_submission)] = [id,assignment,userid,timecreated,timemodified,timestarted,status,groupid,attemptnumber,latest]

    def _genAssignSubmissionFile(self,assignid):
        # grab only the rows from assign_submission that pertain to this assignment
        dfSubmissions = spark.createDataFrame(self.moodle_assign_submission)
        dfSubmissions = dfSubmissions.filter(dfSubmissions['assignment'] == f'{assignid}')
        pdfSubmissions = dfSubmissions.toPandas()
        for index, submission in pdfSubmissions.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            submission = submission['id']
            numfiles = random.randint(1,3)
            self.moodle_assignsubmission_file.loc[len(self.moodle_assignsubmission_file)] = [id,assignment,submission,numfiles]
    
    def _genAssignGrades(self,assignid,courseid,assignday):
        # grab the latest submission for each student that submitted that assignment and grade them
        dfSubmissions = spark.createDataFrame(self.moodle_assign_submission)
        dfSubmissions = dfSubmissions.filter(dfSubmissions['assignment'] == f'{assignid}')
        dfSubmissions = dfSubmissions.filter(dfSubmissions['attemptnumber']==dfSubmissions['latest'])
        pdfSubmissions = dfSubmissions.toPandas()
        # find the instructor ID for that course
        dfInstructor = spark.createDataFrame(self.instructors_enroll)
        instructor_id = dfInstructor.select('InstructorId', 'InstructsClass_SectionId').filter(dfInstructor['InstructsClass_SectionId'] == f'{courseid}').collect()[0][0]
        # set any static fields
        assign_graded_datetime = assignday + dt.timedelta(days=random.randint(9,14),hours=random.randint(6,22),minutes=random.randint(0,59))
        for index, submission in pdfSubmissions.iterrows():
            id = self.faker.uuid4()
            assignment = assignid
            userid = submission['userid']
            timecreated = assign_graded_datetime
            timemodified = assign_graded_datetime
            grader = instructor_id
            #grade = '{}'.format(decimal.Decimal(random.randrange(4000, 10000))/100)
            grade = round(random.triangular(40.00,100.00,73.50))
            attemptnumber = submission['latest']
            self.moodle_assign_grades.loc[len(self.moodle_assign_grades)] = [id,assignment,userid,timecreated,timemodified,grader,grade,attemptnumber]
    
    def genQuiz_tables(self,max_num_quizzes=5):
        """This method generates 3 quiz tables: quiz, quiz_attempts and quiz_grades"""
        date_range = self.__get_daterange()
        for index, course in self.moodle_course.iterrows():
            # find course id
            course_id = course['id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course_id}')
            num_students_in_course = dfEnroll.count()
            # randomly choose final grading method per quiz (0=highest quiz grade, 1=average grade, 2=first quiz attempt, 3=last quiz attempt)
            # one method is kept static per class
            random_grade_method = random.randint(0,3)
            if random_grade_method == 0:
                grade_method = 0
            elif random_grade_method == 1:
                grade_method = 1
            else:
                grade_method = 3
            # randomly choose how many quizzes have been assigned in this class
            n = random.randint(0,max_num_quizzes)
            while n > 0:
                # choose the day the quiz was assigned
                quiz_day = random.choice(date_range)
                # generate the quiz ID
                quiz_id = self.faker.uuid4()
                # finally generate the quiz tables
                max_attempts = self._genQuiz(quiz_id,course_id,quiz_day, grade_method) 
                self._genQuizAttempts(quiz_id,quiz_day,num_students_in_course,dfEnroll,max_attempts)
                self._genQuizGrades(quiz_id, grade_method)
                n = n - 1
        self.writetofile('quiz', self.moodle_quiz)
        self.writetofile('quiz_attempts', self.moodle_quiz_attempts)
        self.writetofile('quiz_grades', self.moodle_quiz_grades)

    def _genQuiz(self,quizid,courseid,quizday,grade_method):
        id = quizid
        course = courseid
        name = 'Quiz for Course' # NOTE: can be modified for unique names
        intro = 'This is a quiz for a course'
        introformat = 0
        timeopen = quizday
        timeclose = timeopen + dt.timedelta(days=1)
        timelimit = 0
        overduehandling = 'autoabandon' # NOTE: unsure
        graceperiod = 0
        preferredbehavior = ''
        canredoquestions = 0
        attempts = random.randint(1,3)
        attemptonlast = 0
        grademethod = grade_method
        decimalpoints = 0
        questiondecimalpoints = 0
        reviewattempt = 0
        reviewcorrectness = 0
        reviewmarks = 0
        reviewspecificfeedback = 0
        reviewgeneralfeedback = 0
        reviewrightanswer = 0
        reviewoverallfeedback = 0
        questionsperpage = 20 # NOTE: unsure - currently using this to say there are a total of 20 questions
        navmethod = 'free'
        shuffleanswers = 1
        sumgrades = 100 # NOTE: unsure
        grade = 100
        timecreated = quizday - dt.timedelta(days=1,hours=random.randint(8,12))
        timemodified = quizday
        password = ''
        subnet = ''
        browsersecurity = 'securewindow'
        delay1 = 120
        delay2 = 120
        showuserpicture = 0 
        showblocks = 0
        completionattemptsexhausted = 0
        completionminattempts = 0
        allowofflineattempts = 0
        self.moodle_quiz.loc[len(self.moodle_quiz)] = [id,course,name,intro,introformat,timeopen,timeclose,timelimit,overduehandling,graceperiod,preferredbehavior, \
                                                            canredoquestions,attempts,attemptonlast,grademethod,decimalpoints,questiondecimalpoints,reviewattempt,reviewcorrectness, \
                                                            reviewmarks,reviewspecificfeedback,reviewgeneralfeedback,reviewrightanswer,reviewoverallfeedback,questionsperpage,navmethod, \
                                                            shuffleanswers,sumgrades,grade,timecreated,timemodified,password,subnet,browsersecurity,delay1,delay2,showuserpicture, \
                                                            showblocks,completionattemptsexhausted,completionminattempts,allowofflineattempts]
        return attempts
    
    def _genQuizAttempts(self,quizid,quizday,num_students_in_course,dfEnroll,maxattempts):
        # randomly sample the students enrolled in the course for quiz submissions
        half_of_students_in_class = round(num_students_in_course / 2)
        num_students_submit = random.randint(half_of_students_in_class, num_students_in_course)
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            quiz = quizid
            userid = student['StudentID']
            layout = ''
            currentpage = 0
            preview = 0
            timemodifiedoffline = 0
            timecheckstate = 0
            gradednotificationsenttime = ''
            random_num_attempts = random.randint(1,maxattempts)
            if random_num_attempts == 1:
                # if only one attempt for the student, then simply generate that single submission
                id = self.faker.uuid4()
                uniqueid = self.faker.uuid4()
                state = 'finished'
                attempt = 1
                timestart = quizday + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                timemodified = timefinish
                sumgrades = round(random.triangular(35.00,100.00,73.50))
                self.moodle_quiz_attempts.loc[len(self.moodle_quiz_attempts)] = [id,quiz,userid,attempt,uniqueid,layout,currentpage,preview,state,timestart,timefinish,timemodified,timemodifiedoffline,timecheckstate,sumgrades,gradednotificationsenttime]
            else:
            # create variables for keeping track of the last attempt from same student
                last_attempt_daytime = quizday + dt.timedelta(hours=random.randint(20,23),minutes=random.randint(0,59))
                previous_attempt = None
                # iterate through total number of attempts
                for n in range(0,random_num_attempts):
                    id = self.faker.uuid4()
                    attempt = n + 1
                    uniqueid = self.faker.uuid4()
                    state = 'finished'
                    timemodified = last_attempt_daytime
                    sumgrades = round(random.triangular(35.00,100.00,73.50))
                    if n == (random_num_attempts - 1):
                        timestart = last_attempt_daytime - dt.timedelta(minutes=random.randint(0,20))
                        timefinish = last_attempt_daytime
                    else:
                        if isinstance(previous_attempt, type(None)):
                            timestart = quizday + dt.timedelta(hours=random.randint(6,20),minutes=random.randint(0,59))
                            timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                        else:
                            timestart = previous_attempt + dt.timedelta(hours=random.randint(0,2),minutes=random.randint(0,59))
                            timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                    previous_attempt = timefinish
                    self.moodle_quiz_attempts.loc[len(self.moodle_quiz_attempts)] = [id,quiz,userid,attempt,uniqueid,layout,currentpage,preview,state,timestart,timefinish,timemodified,timemodifiedoffline,timecheckstate,sumgrades,gradednotificationsenttime]

    def _genQuizGrades(self,quizid, grade_method):
        # grab quiz attempts
        dfAttempts = spark.createDataFrame(self.moodle_quiz_attempts)
        dfAttempts = dfAttempts.filter(dfAttempts['quiz'] == f'{quizid}')
        # choose how each attempt is graded
        if grade_method == 0:
            # grab highest quiz grade
            dfGrades = dfAttempts.groupBy('userid').max('sumgrades')
            dfGrades = dfGrades.withColumnRenamed('max(sumgrades)', 'grade')
            df_final = dfAttempts.filter(dfAttempts['timemodified'] == dfAttempts['timefinish']).select('userid', 'timemodified').withColumnRenamed('userid', 'uid')
            df_final = dfGrades.join(df_final, dfGrades.userid == df_final.uid, how='inner').drop('uid')
        elif grade_method == 1:
            # grab average quiz grade over all attempts
            dfGrades = dfAttempts.groupBy('userid').avg('sumgrades')
            dfGrades = dfGrades.withColumnRenamed('avg(sumgrades)', 'grade')
            dfGrades = dfGrades.withColumn('grade', F.round('grade'))
            df_final = dfAttempts.filter(dfAttempts['timemodified'] == dfAttempts['timefinish']).select('userid', 'timemodified').withColumnRenamed('userid', 'uid')
            df_final = dfGrades.join(df_final, dfGrades.userid == df_final.uid, how='inner').drop('uid')
        else:
            # grab latest quiz attempt
            dfGrades = dfAttempts.filter(dfAttempts['timemodified'] == dfAttempts['timefinish'])
            df_final = dfGrades.select('userid', 'sumgrades', 'timemodified').withColumnRenamed('sumgrades', 'grade')
        pdf_final = df_final.toPandas()
        for index, attempt in pdf_final.iterrows():
            id = self.faker.uuid4()
            quiz = f'{quizid}'
            userid = attempt['userid']
            grade = attempt['grade']
            timemodified = attempt['timemodified']
            self.moodle_quiz_grades.loc[len(self.moodle_quiz_grades)] = [id,quiz,userid,grade,timemodified]

    def genForum_tables(self,max_num_forums=5):
        """This method generates 4 forum tables: forum, forum_discussions, forum_grades and forum_posts"""
        date_range = self.__get_daterange()
        for index, course in self.moodle_course.iterrows():
            # find course id
            course_id = course['id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course_id}')
            num_students_in_course = dfEnroll.count()
            # randomly choose how many forums have been assigned in this class
            n = random.randint(0,max_num_forums)
            while n > 0:
                # choose the day the forum was assigned
                forum_day = random.choice(date_range)
                # generate the forum ID
                forum_id = self.faker.uuid4()
                # finally generate the forum tables
                time_modified,complete_discuss = self._genForum(forum_id,course_id,forum_day,num_students_in_course)
                self._genForumDiscussions(forum_id,course_id,dfEnroll,time_modified,complete_discuss,date_range)
                self._genForumGrades(forum_id,time_modified)
                self._genForumPosts()
                n = n - 1
        self.writetofile('forum', self.moodle_forum)
        self.writetofile('forum_discussions', self.moodle_forum_discussions)
        self.writetofile('forum_grades', self.moodle_forum_grades)
        self.writetofile('forum_posts', self.moodle_forum_posts)

    def _genForum(self,forumid,courseid,forumgradedate,num_students_in_course):
        id = forumid
        course = courseid
        type = 'general'
        name = 'Forum for Course' # NOTE: can be modified for unique names
        intro = 'This is a forum for a course'
        introformat = 0
        duedate = forumgradedate 
        cutoffdate = self.enddate # NOTE: currently set to end of semester
        assessed = num_students_in_course # NOTE: unsure if this should be representing how many are graded
        assesstimestart = forumgradedate + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59))
        assesstimefinish = assesstimestart + dt.timedelta(hours=random.randint(0,4),minutes=random.randint(0,59))
        scale = 1 # NOTE: unsure
        grade_forum = 1 # NOTE: unsure
        grade_forum_notify = 0
        maxbytes = 0
        maxattachments = 1
        forcesubscribe = 0
        trackingtype = 1
        rsstype = 0
        rssarticles = 0
        timemodified = assesstimefinish
        warnafter = duedate - dt.timedelta(hours=12)
        blockafter = self.enddate
        blockperiod = 0
        completiondiscussions = random.randint(1,3)
        completionreplies = 0
        completionposts = completiondiscussions + completionreplies # NOTE: generated by the num posts and replies needed to be marked as complete
        displaywordcount = 0
        lockdiscussionafter = cutoffdate
        self.moodle_forum.loc[len(self.moodle_forum)] = [id,course,type,name,intro,introformat,duedate,cutoffdate,assessed,assesstimestart,assesstimefinish,scale,grade_forum, \
                                                            grade_forum_notify,maxbytes,maxattachments,forcesubscribe,trackingtype,rsstype,rssarticles,timemodified,warnafter,blockafter, \
                                                            blockperiod,completiondiscussions,completionreplies,completionposts,displaywordcount,lockdiscussionafter]
        return timemodified,completiondiscussions,completionreplies

    def _genForumDiscussions(self,forumid,courseid,dfEnroll,time_modified,complete_discuss,daterange):
        # currently set to all students have successfully completed the necessary posts/replies
        df_enroll = dfEnroll.toPandas()
        for index, student in df_enroll.iterrows():
            # assign static varibles per student submission(s)
            course = courseid
            forum = forumid
            firstpost = self.faker.uuid4()
            userid = student['StudentID']
            groupid = -1
            assessed = 1
            timemodified = time_modified
            pinned = 0
            timelocked = self.enddate
            # randomly set variable for the datetime the student added a post or reply
            last_post = random.choice(daterange)
            if complete_discuss == 1:
                id = firstpost
                name = 'post'
                timestart = last_post + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                timeend = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,59))
                usermodified = timeend
                self.moodle_forum_discussions.loc[len(self.moodle_forum_discussions)] = [id,course,forum,name,firstpost,userid,groupid,assessed,timemodified,usermodified,timestart,timeend,pinned,timelocked]
            else:
                # iterate through adding student posts based on the completion requirement
                for n in range (0,complete_discuss):
                    name = 'post'
                    timestart = last_post + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                    timeend = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,59))
                    usermodified = timeend
                    if n == 0:
                        id = firstpost
                    else:
                        id = self.faker.uuid4()
                    last_post = timeend
                    self.moodle_forum_discussions.loc[len(self.moodle_forum_discussions)] = [id,course,forum,name,firstpost,userid,groupid,assessed,timemodified,usermodified,timestart,timeend,pinned,timelocked]

    def _genForumGrades(self,forumid,time_modified):
        # grade each student forum post/reply in moodle_forum_discussions table
        df_discuss = self.moodle_forum_discussions.copy()
        for index, discuss in df_discuss.iterrows():
            id = self.faker.uuid4()
            forum = forumid
            itemnumber = discuss['id']
            userid = discuss['userid']
            grade = round(random.triangular(60,100,85))
            timecreated = discuss['timeend']
            timemodified = time_modified
            self.moodle_forum_grades.loc[len(self.moodle_forum_grades)] = [id,forum,itemnumber,userid,grade,timecreated,timemodified]

    def _genForumPosts(self):
        # holds all forum posts; joins tables together to make data look realistic
        dfUsers = spark.createDataFrame(self.moodle_user).select('id', 'firstname', 'lastname')
        dfUsers = dfUsers.withColumnRenamed('id', 'uid').withColumnRenamed('firstname', 'user_firstname').withColumnRenamed('lastname', 'user_lastname')
        dfCourse = spark.createDataFrame(self.moodle_course).select('id', 'fullname', 'shortname')
        dfCourse = dfCourse.withColumnRenamed('id', 'cid').withColumnRenamed('fullname', 'course_fullname').withColumnRenamed('shortname', 'course_shortname')
        dfDiscussGrades = spark.createDataFrame(self.moodle_forum_grades).select('itemnumber', 'grade')
        dfDiscussGrades = dfDiscussGrades.withColumnRenamed('itemnumber', 'discuss_id')
        dfDiscuss = spark.createDataFrame(self.moodle_forum_discussions)
        dfDiscuss = dfDiscuss.join(dfUsers, dfDiscuss.userid == dfUsers.uid,how='inner').drop('uid')
        dfDiscuss = dfDiscuss.join(dfCourse, dfDiscuss.course == dfCourse.cid,how='inner').drop('cid')
        dfDiscuss = dfDiscuss.join(dfDiscussGrades, dfDiscuss.id == dfDiscussGrades.discuss_id,how='inner').drop('discuss_id')
        # then only extract the posts, convert back to pandas df, and fill table
        dfDiscuss = dfDiscuss.filter(dfDiscuss['name']=='post')
        df_discuss = dfDiscuss.toPandas()
        for index, discuss in df_discuss.iterrows():
            id = self.faker.uuid4()
            discussion = discuss['id']
            parent = f'{id}'
            userid = discuss['userid']
            created = discuss['timestart']
            modified = discuss['usermodified']
            mailed = 0
            # extract useful items
            user_firstname = discuss['user_firstname']
            user_lastname = discuss['user_lastname']
            course_fullname = discuss['course_fullname']
            # continue generating fields
            subject = f'Post by {user_firstname} {user_lastname}'
            message = f'This post is for the course {course_fullname}. This is a sample discussion-post message.'
            messageformat = 0
            messagetrust = 0
            attachment = '' # unsure
            totalscore = discuss['grade']
            mailnow = 0
            deleted = 0
            privatereplyto = 0
            wordcount = 15 # NOTE: inaccurate since this varies
            charcount = 35 # NOTE: inaccurate since this varies
            self.moodle_forum_posts.loc[len(self.moodle_forum_posts)] = [id,discussion,parent,userid,created,modified,mailed,subject,message,messageformat,messagetrust,attachment, \
                                                                            totalscore,mailnow,deleted,privatereplyto,wordcount,charcount]

    def genLesson_tables(self,max_num_lessons=5):
        """This method generates the 6 lesson tables: lesson, lesson_pages, lesson_attempts, lesson_answers, lesson_grades and lesson_timer"""
        date_range = self.__get_daterange()
        for index, course in self.moodle_course.iterrows():
            # find course id
            course_id = course['id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course_id}')
            num_students_in_course = dfEnroll.count()
            # randomly choose how many lessons have been assigned in this class
            n = random.randint(0,max_num_lessons)
            while n > 0:
                # choose the day the lesson was assigned
                lesson_day = random.choice(date_range)
                # generate a lesson ID and static lesson page ID (currently set to one page per lesson)
                lesson_id = self.faker.uuid4()
                lesson_page_id = self.faker.uuid4()
                # finally generate the lesson tables
                max_attempts,min_questions,max_pages = self._genLesson(lesson_id,course_id,lesson_day) 
                self._genLessonPages(lesson_id,lesson_day,max_pages)
                self._genLessonAnswers(lesson_id,lesson_page_id,min_questions,lesson_day)
                self._genLessonAttempts(lesson_id,lesson_page_id,dfEnroll,num_students_in_course,max_attempts,lesson_day,min_questions)
                self._genLessonGrades(lesson_id,min_questions,dfEnroll)
                self._genLessonTimer(lesson_id,lesson_day)
                n = n - 1
        self.writetofile('lesson', self.moodle_lesson)
        self.writetofile('lesson_pages', self.moodle_lesson_pages)
        self.writetofile('lesson_attempts', self.moodle_lesson_attempts)
        self.writetofile('lesson_answers', self.moodle_lesson_answers)
        self.writetofile('lesson_grades', self.moodle_lesson_grades)
        self.writetofile('lesson_timer', self.moodle_lesson_timer)

    def _genLesson(self,lesson_id,courseid,lessonday):
        # NOTE: generally unsure whether there's supposed to be one row per lesson,
            # or whether there's one row per student connected to a lesson.
            # this code assumes the former.
        id = lesson_id
        course = courseid
        name = 'Lesson for Course' # NOTE: can be modified for unique names
        intro = 'This is a lesson for a course'
        introformat = 0
        practice = 0
        modattempts = 0 # NOTE: unsure
        usepassword = 0
        password = ''
        dependency = 0
        conditions = 'null'
        grade = 100
        custom = 0
        ongoing = 0
        usemaxgrade = 1 # NOTE: unsure
        maxanswers = 20
        maxattempts = random.randint(1,5)
        review = 0 # NOTE: unsure
        nextpagedefault = 0
        feedback = 1
        minquestions = random.randint(5,20)
        maxpages = 1 # NOTE: currently set to 1 page per lesson; to generate more pages, adjust the lesson answers and lesson attempts methods. 
        timelimit = 0
        retake = 1 # NOTE: unsure
        activitylink = 0
        mediafile = '' # NOTE: unsure; local file path or full external URL
        mediaheight = 100
        mediawidth = 650
        mediaclose = 0
        slideshow = 0
        width = 640
        height = 480
        bgcolor = '#FFFFFF'
        displayleft = 0
        displayleftif = 0
        progressbar = 0
        available = lessonday
        deadline = lessonday + dt.timedelta(days=random.randint(7,21))
        timemodified = lessonday
        completionendreached = 0 # NOTE: unsure
        completiontimespent = 0 # NOTE: unsure
        allowofflineattempts = 0
        self.moodle_lesson.loc[len(self.moodle_lesson)] = [id,course,name,intro,introformat,practice,modattempts,usepassword,password,dependency,conditions,grade,custom,ongoing, \
                                                            usemaxgrade,maxanswers,maxattempts,review,nextpagedefault,feedback,minquestions,maxpages,timelimit,retake,activitylink,mediafile, \
                                                            mediaheight,mediawidth,mediaclose,slideshow,width,height,bgcolor,displayleft,displayleftif,progressbar,available,deadline, \
                                                            timemodified,completionendreached,completiontimespent,allowofflineattempts]
        return maxattempts,minquestions,maxpages

    def _genLessonPages(self,lesson_id,lesson_day,max_pages):
        # develops data for the pages associated with each lesson
        if max_pages == 1:
            id = self.faker.uuid4()
            lessonid = lesson_id
            prevpageid = ''
            nextpageid = ''
            qtype = 0 # NOTE: unsure
            qoption = 0
            layout = 1
            display = 1
            timecreated = lesson_day - dt.timedelta(days=random.randint(1,7))
            timemodified = lesson_day
            title = 'Example of a single page for a lesson of a course'
            contents = 'Questions pertaining to a particular study in the course' # NOTE: unsure
            contentsformat = 0
            self.moodle_lesson_pages.loc[len(self.moodle_lesson_pages)] = [id,lessonid,prevpageid,nextpageid,qtype,qoption,layout,display,timecreated,timemodified,title,contents,contentsformat]
        else:  
            first_page_id = self.faker.uuid4()
            # instantiate dynamic variables per iteration for tracking ids to link between row info
            previous_page_id = ''
            old_nextpageid = ''
            for n in range(max_pages):
                lessonid = lesson_id
                qtype = 0 # NOTE: unsure
                qoption = 0
                layout = 1
                display = 1
                timecreated = lesson_day - dt.timedelta(days=random.randint(1,7))
                timemodified = lesson_day
                contents = 'Questions pertaining to a particular study in a course' # NOTE: unsure
                contentsformat = 0
                if n == 0:
                    id = first_page_id
                    prevpageid = ''
                    nextpageid = self.faker.uuid4()
                    title = 'Example of the first page for a lesson'
                    previous_page_id = id
                    old_nextpageid = nextpageid
                elif n == (max_pages-1):
                    id = old_nextpageid
                    prevpageid = previous_page_id
                    nextpageid = ''
                    title = 'Example of the last page for a lesson'
                else:
                    id = old_nextpage_id
                    prevpageid = previous_page_id
                    nextpageid = self.faker.uuid4()
                    title = 'Example of a page for a lesson'
                    previous_page_id = id
                    old_nextpageid = nextpageid
                self.moodle_lesson_pages.loc[len(self.moodle_lesson_pages)] = [id,lessonid,prevpageid,nextpageid,qtype,qoption,layout,display,timecreated,timemodified,title,contents,contentsformat]

    def _genLessonAnswers(self,lesson_id,page_id,min_questions,lessonday):
        # generally unsure whether minquestion field can be used in this way, and whether this table looks this way in production data
        timecreated = lessonday + dt.timedelta(hours=6,minutes=random.randint(0,59))
        timemodified = timecreated + dt.timedelta(hours=2,minutes=random.randint(0,59))
        for n in range(min_questions):
            id = self.faker.uuid4()
            lessonid = lesson_id
            pageid = page_id
            jumpto = 0
            grade = 1 # NOTE: unsure
            score = 1 # NOTE: unsure
            flags = n+1 # NOTE: currently set to the corresponding question number in the lesson
            answer = 'sample instructor answer for this lesson question'
            answerformat = 0
            response = 'sample correct response for this lesson question' # NOTE: unsure
            responseformat = 0
            self.moodle_lesson_answers.loc[len(self.moodle_lesson_answers)] = [id,lessonid,pageid,jumpto,grade,score,flags,timecreated,timemodified,answer,answerformat,response,responseformat]
        for m in range(min_questions):
            id = self.faker.uuid4()
            lessonid = lesson_id
            pageid = page_id
            jumpto = 0
            grade = 0 # NOTE: unsure
            score = 1 # NOTE: unsure
            flags = m+1 # NOTE: currently set to the corresponding question number in the lesson
            answer = 'sample instructor answer for this lesson question'
            answerformat = 0
            response = 'sample incorrect response for this lesson question' # NOTE: unsure
            responseformat = 0
            self.moodle_lesson_answers.loc[len(self.moodle_lesson_answers)] = [id,lessonid,pageid,jumpto,grade,score,flags,timecreated,timemodified,answer,answerformat,response,responseformat]

    def _genLessonAttempts(self,lesson_id,page_id,dfEnroll,num_students_in_course,maxattempts,lessonday,min_questions):
        # randomly sample the students enrolled in the course for lesson attempts
        half_of_students_in_class = round(num_students_in_course / 2)
        num_students_submit = random.randint(half_of_students_in_class,num_students_in_course)
        df_enroll = dfEnroll.toPandas()
        df_students_attempt = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_attempt.iterrows():
            # assign static varibles per student attempt(s)
            lessonid = lesson_id
            pageid = page_id
            userid = student['StudentID']
            useranswer = 'sample student response/answer to this lesson question'
            timeseen = lessonday + dt.timedelta(days=random.randint(0,14),hours=random.randint(10,23),minutes=random.randint(0,59))
            # randomly set variable for the number of attempts this student had on the lesson
            num_student_attempts = random.randint(1,maxattempts)
            # and use lesson_answers table to map to answerid
            dfLessonAnswers = spark.createDataFrame(self.moodle_lesson_answers)
            dfLessonAnswers = dfLessonAnswers.filter(dfLessonAnswers['lessonid'] == f'{lesson_id}')
            if num_student_attempts == 1:
                id = self.faker.uuid4() # NOTE: assuming that id is static per student lesson attempt, with varied answerids per specific question answered
                retry = 0
                for m in range(0,min_questions):
                    # iterate through all lesson questions and generate answers with attempt values
                    chance_of_correct = random.choices([0,1], weights=[0.22,0.78])
                    correct = chance_of_correct[0]
                    answerid = dfLessonAnswers.filter(dfLessonAnswers['flags'] == m+1).filter(dfLessonAnswers['grade'] == correct).select('id').collect()[0][0]
                    self.moodle_lesson_attempts.loc[len(self.moodle_lesson_attempts)] = [id,lessonid,pageid,userid,answerid,retry,correct,useranswer,timeseen]
            else:
                # iterate through number of student lesson attempts
                for n in range(0,num_student_attempts):
                    id = self.faker.uuid4() # NOTE: assuming that id is static per student lesson attempt, with varied answerids per specific question answered
                    if n == 0:
                        retry = 0
                    else:
                        retry = n
                    for m in range(0,min_questions):
                        # iterate through all lesson question/answers
                        chance_of_correct = random.choices([0,1], weights=[0.2,0.8])
                        correct = chance_of_correct[0]
                        answerid = dfLessonAnswers.filter(dfLessonAnswers['flags'] == m+1).filter(dfLessonAnswers['grade'] == correct).select('id').collect()[0][0]
                        self.moodle_lesson_attempts.loc[len(self.moodle_lesson_attempts)] = [id,lessonid,pageid,userid,answerid,retry,correct,useranswer,timeseen]
    
    def _genLessonGrades(self,lesson_id,minquestions,dfEnroll):
        # manipulate the lesson_attempts table to extract the max lesson grade of each student (out of 100)
        dfMoodle_lattempts = spark.createDataFrame(self.moodle_lesson_attempts)
        dfMoodle_lattempts = dfMoodle_lattempts.filter(dfMoodle_lattempts['lessonid'] == f'{lesson_id}')
        dfMoodle_lattempts = dfMoodle_lattempts.withColumn('numQuestions', F.lit(1))
        dfMoodle_lattempts = dfMoodle_lattempts.groupBy('id','lessonid','userid').sum('correct','numQuestions')
        dfMoodle_lattempts = dfMoodle_lattempts.withColumnRenamed('sum(correct)','total_correct').withColumnRenamed('sum(numQuestions)','total_questions').withColumn('grade', F.col('total_correct')/F.col('total_questions')*100)
        dfMoodle_lattempts = dfMoodle_lattempts.groupBy('userid').max('grade')
        dfFinal = dfEnroll.join(dfMoodle_lattempts, dfEnroll.StudentID == dfMoodle_lattempts.userid,how='left')
        dfFinal = dfFinal.na.fill(value=0,subset=["max(grade)"])
        df_lesson_attempts_summary = dfMoodle_lattempts.toPandas()
        for index, student in df_lesson_attempts_summary.iterrows():
            id = self.faker.uuid4()
            lessonid = lesson_id
            userid = student['userid']
            grade = round(student['max(grade)'])
            late = 0
            if grade == 0:
                completed = 0
            else:
                completed = 1 
            self.moodle_lesson_grades.loc[len(self.moodle_lesson_grades)] = [id,lessonid,userid,grade,late,completed]

    def _genLessonTimer(self,lesson_id,lesson_day):
        # creates the lesson timer table - may need updating to look realistic
        dfMoodle_lattempts = spark.createDataFrame(self.moodle_lesson_attempts)
        dfMoodle_lattempts = dfMoodle_lattempts.filter(dfMoodle_lattempts['lessonid'] == f'{lesson_id}')
        dfMoodle_lattempts = dfMoodle_lattempts.groupBy('id', 'lessonid', 'pageid', 'userid')
        df_lesson_attempts_summary = dfMoodle_lattempts.toPandas()
        for index, student_attempt in df_lesson_attempts_summary.iterrows():
            id = self.faker.uuid4()
            lessonid = lesson_id
            userid = student_attempt['userid']
            starttime = lesson_day + dt.timedelta(days=random.randint(0,5),hours=random.randint(0,23),minutes=random.randint(0,59),seconds=random.randint(0,59))
            lessontime = starttime + dt.timedelta(hours=random.randint(0,2),minutes=random.randint(0,59),seconds=random.randint(0,59)) # NOTE: time lesson was completed
            completed = 1
            timemodifiedoffline = lessontime + dt.timedelta(days=random.randint(0,7),hours=random.randint(0,23),minutes=random.randint(0,59),seconds=random.randint(0,59))
            self.moodle_lesson_timer.loc[len(self.moodle_lesson_timer)] = [id,lessonid,userid,starttime,lessontime,completed,timemodifiedoffline]

    def genMessage_tables(self,num_convos=5):
        """This method generates 3 class-wide message tables: messages, message_conversations and message_conversation_members"""
        n = num_convos
        date_range = self.__get_daterange()
        while n > 0:
            # choose the day the message was sent to the class
            message_day = random.choice(date_range)
            # randomly choose which course to generate the message for
            num_courses = len(self.moodle_course.index) - 1
            random_course = random.randint(0,num_courses)
            course = self.moodle_course.filter(items=[random_course], axis=0).at[random_course,'id']
            # find number of students in course
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{course}')
            num_students_in_course = dfEnroll.count()
            # find course name
            course_name = dfEnroll.groupBy('SectionName').count().collect()[0][0]
            # generate a fake message ID
            message_id = self.faker.uuid4()
            # finally generate the message tables
            first_message_datetime = self._genMessageConversations(message_id,course_name,course,message_day,context_id) 
            self._genMessageConversationMembers(message_id,dfEnroll,first_message_datetime)
            self._genMessages(message_id,dfEnroll,first_message_datetime,course_name)
            n = n - 1
        self.writetofile('message_conversations', self.moodle_message_conversations)
        self.writetofile('message_conversation_members', self.moodle_message_conversation_members)
        self.writetofile('messages', self.moodle_messages)
    
    def _genMessageConversations(self,messageid,coursename,courseid,messageday,contextid):
        # NOTE: generally unsure whether there's supposed to be one row per message threads,
            # or whether there's one row per student message to the conversation thread.
            # this code assumes the former.
        id = f'{messageid}'
        type = 1 # NOTE: unsure
        name = f'Message to {coursename}' # NOTE: unsure
        convhash = ''
        component = 'conversations' # NOTE: unsure
        itemtype = 'Message conversation in a course'
        itemid = f'{courseid}' # NOTE: unsure
        contextid = self.contextid
        enabled = 0
        timecreated = messageday + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
        timemodified = timecreated + dt.timedelta(days=random.randint(0,7),hours=random.randint(0,23),minutes=random.randint(0,59))
        self.moodle_message_conversations.loc[len(self.moodle_message_conversations.index)] = [id,type,name,convhash,component,itemtype,itemid,contextid,enabled,timecreated,timemodified]
        return timecreated

    def _genMessageConversationMembers(self,messageid,dfEnroll,first_message_datetime):
        df_convo_members = dfEnroll.toPandas()
        for index, student in df_convo_members.iterrows():
            id = self.faker.uuid4()
            conversationid = f'{messageid}'
            userid = student['StudentID']
            timecreated = first_message_datetime
            self.moodle_message_conversation_members.loc[len(self.moodle_message_conversation_members.index)] = [id,conversationid,userid,timecreated]

    def _genMessages(self,messageid,dfEnroll,first_message_datetime,coursename):
        # randomly choose how many messages will be sent in this conversation thread
        num_messages = random.randint(1,5)
        df_enroll = dfEnroll.toPandas()
        df_students_messaging = df_enroll.sample(n=num_messages)
        # add dynamic variables per message added to a conversation
        counter = 0
        for index, student in df_students_messaging.iterrows():
            id = self.faker.uuid4()
            useridfrom = student['StudentID']
            conversationid = f'{messageid}'
            subject = f'Question for {coursename}' 
            if counter == 0:
                fullmessage = 'Sample question for the course by a student'
                timecreated = first_message_datetime
                smallmessage = 'Question by student'
            else:
                fullmessage = 'Sample response or answer to other students in the course'
                timecreated = last_message_sent + dt.timedelta(hours=random.randint(0,6),minutes=random.randint(0,59))
                smallmessage = 'Response by another student'
            fullmessageformat = 0
            fullmessagehtml = '' # NOTE: unsure
            fullmessagetrust = 0
            customdata = ''
            # update dynamic variables
            counter = counter + 1
            last_message_sent = timecreated
            self.moodle_messages.loc[len(self.moodle_messages.index)] = [id,useridfrom,conversationid,subject,fullmessage,fullmessageformat,fullmessagehtml,smallmessage,timecreated,fullmessagetrust,customdata]

    def writetofile(self,filename,dfout):
        # turns the pandas df into a pyspark df, and then writes out the generated tables to stage1
        genfilepath = 'stage1/Transactional/test_data/v0.1/moodle_activity_gen/' + filename + '/snapshot_batch_data/rundate='+self.currentDateTime
        dfOutfile = spark.createDataFrame(dfout)
        dfOutfile = dfOutfile.na.drop('all')
        dfOutfile.coalesce(1).write.save(oea.to_url(f'{genfilepath}'), format='csv', mode='overwrite', header='true', mergeSchema='true')