# Test Data Generation: Canvas Activity Tables Class

**Affiliation**: *Kwantum Edu Analytics*. **Last Modified**: *5/26/2023*.

This OEA test data generation class notebook generates fictitous Canvas tables, as seen in the Canvas module. This notebook is needed to successfully run the canvas_test_data_gen_demo notebook.

For reference of all Canvas tables outlined below, see Canvas table schemas here: 
 - v1: https://canvas.instructure.com/doc/api/all_resources.html
 - v2: https://docs.google.com/spreadsheets/d/1kqCXAD9K45L0QeEtbuuMAFp2fW8o0oC8EBzJf58SjrY/edit#gid=527545099

This class notebook primarily leans on the use of the OEA_py class notebook, ```Faker``` and ```random``` python packages, and already-generated base-truth tables to generate **9** Canvas module tables (only the activity tables; rostering tables are expected to already have been created): 

v1
 1. **assignments_v1**
 2. **assignment_submissions_v1**
 3. **assignment_submission_summary_v1**
 4. **modules_v1** 
 5. **module_items_v1** 
 7. **quizzes_v1** 
 8. **quiz_submissions_v1**

v2
 1. **assignments_v2**
 2. **context_modules_v2** 
 3. **content_tags_v2** 
 4. **quizzes_v2** 
 5. **quiz_submissions_v2**
 6. **submissions_v2**

There is one main method ```genCanvasActivity(startdate, enddate, reportgendate, canvas_version, canvas_roster_tables_source_path, max_num_activities_per_class)``` to generate the tables described. Parameter descriptions are given:
  - *startdate*: semester start date.
  - *enddate*: semester end date.
  - *reportgendate*: date the report(s) were generated (i.e., fictitous date when all tables were landed in the data lake).
  - *canvas_version*: version of Canvas test data desired to be generated (accepted values are v1 or v2).
  - *canvas_roster_tables_source_path*: source of Canvas roster/SIS tables previously generated.
  - *max_num_activities_per_class*: randomly samples all courses from Canvas roster data, then randomly selects the number of activities per course from 0 up to this parameter value. For example:
    * ```max_num_activities_per_class = 3``` means when generating assignment test data, it will go through every course and randomly choose how many assignments will be generated per class - from 0 to 3.

In [1]:
import logging
import random, decimal
from tokenize import Ignore
from faker import Faker
import pandas as pd
import datetime as dt
import numpy as np
from pyspark.sql import functions as F

class CanvasActivityDataGen():
    def __init__(self, startdate='2022-01-03T00:00:00', enddate='2022-06-03T00:00:00'):
        #self.startdate = startdate
        #self.enddate = enddate
        
        self.faker = Faker('en_US')

        # set current datetime for rundate folder for writing out files
        currentDate = dt.datetime.now()
        self.currentDateTime = currentDate.strftime("%Y-%m-%d %H-%M-%S")

        # initialize dfs for each Canvas table to be generated
        assignments_v1 = {
            "id":[],
            "name":[],
            "description":[],
            "created_at":[],
            "updated_at":[],
            "due_at":[],
            "lock_at":[],
            "unlock_at":[],
            "has_overrides":[],
            "course_id":[],
            "html_url":[],
            "submissions_download_url":[],
            "assignment_group_id":[],
            "due_date_required":[],
            "allow_extensions":[],
            "max_name_length":[],
            "turnitin_enabled":[],
            "verticite_enabled":[],
            "turnitin_settings":[],
            "grade_group_students_individually":[],
            #"external_tool_tag_attributes":[],
            "peer_reviews":[],
            "automatic_peer_reviews":[],
            "peer_review_count":[],
            "peer_reviews_assign_at":[],
            "intra_group_peer_reviews":[],
            "group_category_id":[],
            "needs_grading_count":[],
            "needs_grading_count_by_section":[],
            "position":[],
            "points_possible":[],
            "submission_types":[],
            "has_submitted_submissions":[],
            "grading_type":[],
            "grading_standard_id":[],
            "published":[],
            "unpublishable":[],
            "only_visible_to_overrides":[],
            "locked_for_user":[],
            #"submission":[],
            "moderated_grading":[],
            "grader_count":[],
            "final_grader_id":[],
            "grader_comments_visible_to_graders":[],
            "grader_anonymous_to_graders":[],
            "grader_names_visible_to_final_grader":[],
            "anonymous_grading":[],
            "allowed_attempts":[],
            "post_manually":[],
            #"score_statistics":[],
            "annotatable_attachment_id":[]
        }
        self.canvas_assignments_v1 = pd.DataFrame(assignments_v1, dtype=object)
        assignment_submissions_v1 = {
            "assignment_id":[],
            "assignment":[],
            "course":[],
            "attempt":[],
            "body":[],
            "grade":[],
            "grade_matches_current_submission":[],
            "html_url":[],
            "preview_url":[],
            "score":[],
            #"submission_comments":[],
            "submission_type":[],
            "submitted_at":[],
            "url":[],
            "user_id":[],
            "grader_id":[],
            "graded_at":[],
            "user":[],
            "late":[],
            "assignment_visible":[],
            "excused":[],
            "missing":[],
            "late_policy_status":[],
            "points_deducted":[],
            "seconds_late":[],
            "workflow_state":[],
            "extra_attempts":[],
            "anonymous_id":[],
            "posted_at":[],
            #"read_status":[],
            "redo_request":[]
        }
        self.canvas_assignment_submissions_v1 = pd.DataFrame(assignment_submissions_v1, dtype=object)
        assignment_submission_summary_v1 = {
            'assignment_id':[], # NOTE: this is field is not in production data - will need to be updated
            'graded':[],
            'ungraded':[],
            'not_submitted':[]
        }
        self.canvas_assignment_submission_summary_v1 = pd.DataFrame(assignment_submission_summary_v1, dtype=object)
        modules_v1 = {
            "id":[],
            "name":[],
            "position":[],
            "unlock_at":[],
            "require_sequential_progress":[],
            "publish_final_grade":[],
            #"workflow_state":[],
            "prerequisite_module_ids":[],
            "state":[],
            "completed_at":[],
            "items_count":[],
            "items_url":[]
        }
        self.canvas_modules_v1 = pd.DataFrame(modules_v1, dtype=object)
        module_items_v1 = {
            "id":[],
            "title":[],
            "position":[],
            "indent":[],
            "quiz_lti":[],
            "type":[],
            "module_id":[],
            "html_url":[],
            "external_url":[],
            "new_tab":[]
        }
        self.canvas_module_items_v1 = pd.DataFrame(module_items_v1, dtype=object)
        quiz_submissions_v1 = {
            "id":[],
            "quiz_id":[],
            "user_id":[],
            "submission_id":[],
            "started_at":[],
            "finished_at":[],
            "end_at":[],
            "attempt":[],
            "extra_attempts":[],
            "extra_time":[],
            "manually_unlocked":[],
            "time_spent":[],
            "score":[],
            "score_before_regrade":[],
            "kept_score":[],
            "fudge_points":[],
            "has_seen_results":[],
            "workflow_state":[],
            "overdue_and_needs_submission":[]
        }
        self.canvas_quiz_submissions = pd.DataFrame(quiz_submissions_v1, dtype=object)
        quizzes_v1 = {
            "id":[],
            "title":[],
            "html_url":[],
            "mobile_url":[],
            #"preview_url":[],
            "description":[],
            "quiz_type":[],
            "assignment_group_id":[],
            "time_limit":[],
            "shuffle_answers":[],
            "hide_results":[],
            "show_correct_answers":[],
            "show_correct_answers_last_attempt":[],
            "show_correct_answers_at":[],
            "hide_correct_answers_at":[],
            "one_time_results":[],
            "scoring_policy":[],
            "allowed_attempts":[],
            "one_question_at_a_time":[],
            "question_count":[],
            "points_possible":[],
            "cant_go_back":[],
            "access_code":[],
            "ip_filter":[],
            "due_at":[],
            "lock_at":[],
            "unlock_at":[],
            "published":[],
            "unpublishable":[],
            "locked_for_user":[],
            #"lock_info":[],
            #"lock_explanation":[],
            "speedgrader_url":[],
            "quiz_extensions_url":[],
            "permissions":[],
            "all_dates":[],
            "version_number":[],
            "question_types":[],
            "anonymous_submissions":[]
        }
        self.canvas_quizzes_v1 = pd.DataFrame(quizzes_v1, dtype=object)
        # NOTE: start of v2 tables
        assignments_v2 = {
            "id":[],
            "title":[],
            "description":[],
            "context_id":[],
            "assignment_group_id":[],
            "due_at":[],
            "unlock_at":[],
            "lock_at":[],
            "created_at":[],
            "updated_at":[],
            "points_possible":[],
            "grading_type":[],
            "submission_types":[],
            "workflow_state":[],
            "peer_reviews":[],
            "peer_review_count":[],
            "peer_reviews_due_at":[],
            "peer_reviews_assigned":[],
            "automatic_peer_reviews":[],
            "all_day":[],
            "all_day_date":[],
            "could_be_locked":[],
            "grade_group_students_individually":[],
            "anonymous_peer_reviews":[],
            "position":[],
            "visibility":[]
        }
        self.canvas_assignments_v2 = pd.DataFrame(assignments_v2, dtype=object)
        # modules in v1 is converted to context_modules according to the schema spec: https://docs.google.com/spreadsheets/d/1kqCXAD9K45L0QeEtbuuMAFp2fW8o0oC8EBzJf58SjrY/edit#gid=1857332749
        context_modules_v2 = {
            "id":[],
            "name":[],
            "context_id":[],
            "position":[],
            "require_sequential_progress":[],
            "workflow_state":[],
            "created_at":[],
            "deleted_at":[],
            "unlock_at":[],
            "updated_at":[]
        }
        self.canvas_context_modules_v2 = pd.DataFrame(context_modules_v2, dtype=object)
        # module_items in v1 is converted to content_tags according to the schema spec: https://docs.google.com/spreadsheets/d/1kqCXAD9K45L0QeEtbuuMAFp2fW8o0oC8EBzJf58SjrY/edit#gid=1715262273
        content_tags_v2 = {
            "content_id":[],
            "context_id":[],
            "context_module_id":[],
            "content_type":[],
            "workflow_state":[],
            "position":[],
            "title":[],
            "url":[],
            "created_at":[],
            "updated_at":[]
        }
        self.canvas_content_tags_v2 = pd.DataFrame(content_tags_v2, dtype=object)
        quizzes_v2 = {
            "id":[],
            "title":[],
            "context_id":[],
            "assignment_id":[],
            "points_possible":[],
            "description":[],
            "quiz_type":[],
            "workflow_state":[],
            "scoring_policy":[],
            "anonymous_submissions":[],
            "shuffle_answers":[],
            "cant_go_back":[],
            "could_be_locked":[],
            "require_lockdown_browser":[],
            "require_lockdown_browser_for_results":[],
            "require_lockdown_browser_monitor":[],
            "ip_filter":[],
            "hide_results":[],
            "show_correct_answers":[],
            "show_correct_answers_at":[],
            "hide_correct_answers_at":[],
            "created_at":[],
            "updated_at":[],
            "published_at":[],
            "unlock_at":[],
            "lock_at":[],
            "due_at":[],
            "deleted_at":[],
            "time_limit":[],
            "allowed_attempts":[],
            "unpublished_question_count":[],
            "question_count":[]
        }
        #self.canvas_quizzes_v2 = pd.DataFrame(quizzes_v2, dtype=object)
        quiz_submissions_v2 = {
            "id":[],
            "quiz_id":[],
            "user_id":[],
            "submission_id":[],
            "workflow_state":[],
            "manually_unlocked":[],
            "was_preview":[],
            "has_seen_results":[],
            "temporary_user_code":[],
            "created_at":[],
            "updated_at":[],
            "started_at":[],
            "finished_at":[],
            "end_at":[],
            "score":[],
            "kept_score":[],
            "quiz_points_possible":[],
            "score_before_regrade":[],
            "fudge_points":[],
            "attempt":[],
            "extra_attempts":[],
            "extra_time":[]
        }
        #self.canvas_quiz_submissions_v2 = pd.DataFrame(quiz_submissions_v2, dtype=object)
        submissions_v2 = {
            "id":[],
            "body":[],
            "url":[],
            "assignment_id":[],
            "group_id":[],
            "quiz_submission_id":[],
            "user_id":[], # unsure if this column is expected to be in the table. Check doc: https://docs.google.com/spreadsheets/d/1kqCXAD9K45L0QeEtbuuMAFp2fW8o0oC8EBzJf58SjrY/edit#gid=1148566159
            "score":[],
            "published_score":[],
            "grade":[],
            "published_grade":[],
            "graded_anonymously":[],
            "grader_id":[],
            "graded_at":[],
            "posted_at":[],
            "submitted_at":[],
            "submission_type":[],
            "workflow_state":[],
            "created_at":[],
            "updated_at":[],
            "processed":[],
            "grade_matches_current_submission":[],
            "attempt":[],
            "excused":[],
            "student_entered_score":[],
            "submission_comments_count":[]
        }
        self.canvas_submissions_v2 = pd.DataFrame(submissions_v2, dtype=object)

    def genCanvasActivity(self,startdate='2022-01-01T00:00:00',enddate='2022-06-01T00:00:00',reportgendate='2022-02-02T00:00:00',canvas_version='v2',canvas_roster_tables_source_path='stage1/Transactional/test_data/v0.1/canvas_gen',max_num_activities_per_class=5):
        self.startdate = dt.datetime.strptime(startdate, "%Y-%m-%dT%H:%M:%S")
        self.enddate = dt.datetime.strptime(enddate, "%Y-%m-%dT%H:%M:%S")
        self.reportdate = dt.datetime.strptime(reportgendate, "%Y-%m-%dT%H:%M:%S")
        use_general_module_base_truth = True
        if use_general_module_base_truth:
            sourcepath = 'stage1/Transactional/test_data/v0.1/base_general_modules'
            if oea.path_exists(sourcepath):
                logger.info('General module base-truth tables already exist - delete the "base_general_modules" folder/directory if you want to replace these.')
            else:
                # manually delete and replace the general module base_truth_tables CSVs as needed
                logger.info('General module base-truth tables do not currently exist - landing in stage1/.../test_data/v0.1/base_general_modules/')
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/students.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_students', 'general_module_base_truth_students.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/schools.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_schools', 'general_module_base_truth_schools.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/courses.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_courses', 'general_module_base_truth_courses.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/sections.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_sections', 'general_module_base_truth_sections.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/enrollment.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_student_enrollment', 'general_module_base_truth_student_enrollment.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/instructors.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_instructors', 'general_module_base_truth_instructors.csv', oea.SNAPSHOT_BATCH_DATA)
                data = requests.get('https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/module_test_data_generation_kit/test_data/base_truth_tables/instructors_enroll.csv').text
                oea.land(data, 'test_data/v0.1/base_general_modules/base_instructors_enroll', 'general_module_base_truth_instructors_enroll.csv', oea.SNAPSHOT_BATCH_DATA)
            # NOTE: if tables are not read in properly - you may need to rename the rundate folder to replace colons with hyphens
            self.students = oea.load_csv(sourcepath + '/base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + '/base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + '/base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + '/base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + '/base_student_enrollment/', header=True).toPandas()
            self.instructors = oea.load_csv(sourcepath + '/base_instructors/', header=True).toPandas()
            self.instructors_enroll = oea.load_csv(sourcepath + '/base_instructors_enroll/', header=True).toPandas()
            logger.info('Generating Canvas test data based on general module base-truth tables...')
        else:
            # expectation is that base_truth_tables exist
            sourcepath = 'stage1/Transactional/test_data/v0.1/'
            self.students = oea.load_csv(sourcepath + 'base_students/', header=True).toPandas()
            self.schools = oea.load_csv(sourcepath + 'base_schools/', header=True).toPandas()
            self.courses = oea.load_csv(sourcepath + 'base_courses/', header=True).toPandas()
            self.sections = oea.load_csv(sourcepath + 'base_sections/', header=True).toPandas()
            self.enrollment = oea.load_csv(sourcepath + 'base_student_enrollment/', header=True).toPandas()
            self.instructors = oea.load_csv(sourcepath + 'base_instructors/', header=True).toPandas()
            self.instructors_enroll = oea.load_csv(sourcepath + 'base_instructors_enroll/', header=True).toPandas()
            logger.info('Generating Canvas test data based on user-generated base-truth tables...')
        if canvas_version == 'v1':
            # load in Canvas SIS/roster tables (NOTE: these are expected to already have been created, and using v1 Canvas roster data)
            self.canvas_accounts = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/accounts/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_courses = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/courses/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_enrollments = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/enrollments/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_roles = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/roles/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_sections = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/sections/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_users = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/users/snapshot_batch_data/*/*.json'), lines=True)
        elif canvas_version == 'v2':
            # load in Canvas SIS/roster tables (NOTE: these are expected to already have been created, and using v2 Canvas roster data)
            self.canvas_accounts = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/accounts/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_courses = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/courses/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_enrollments = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/enrollments/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_roles = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/roles/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_sections = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/course_sections/snapshot_batch_data/*/*.json'), lines=True)
            self.canvas_users = pd.read_json(oea.to_url(f'{canvas_roster_tables_source_path}/users/snapshot_batch_data/*/*.json'), lines=True)
        logger.info('Successfully loaded Canvas SIS/rostering tables. Now generating Canvas activity tables...')
        if canvas_version == 'v1':
            self.genModule_v1_tables(max_num_modules=max_num_activities_per_class)
            #self.genQuiz_v1_tables(max_num_quizzes=max_num_activities_per_class)
            #self.genAssignment_v1_tables(max_num_assigns=max_num_activities_per_class)
        elif canvas_version == 'v2':
            self.genModule_v2_tables(max_num_modules=max_num_activities_per_class)
            self.genQuiz_v2_tables(max_num_quizzes=max_num_activities_per_class)
            self.genAssignment_v2_tables(max_num_assigns=max_num_activities_per_class)
        logger.info('Successfully generated Canvas activity tables (for assignments, quizzes and modules).')
        logger.info('Finished Canvas generation.')

    def __get_daterange(self,start_date=dt.datetime(2022,1,3),end_date=dt.datetime(2022,1,28)):
        daterange = []
        startdate = start_date
        enddate = end_date
        while(startdate < enddate):
            daterange.append(startdate)
            startdate = startdate + dt.timedelta(days=1)
        return daterange
    
    def __get_lettergrade(self,percentage):
        if percentage >= 0.93:
            lettergrade = "A"
        elif percentage >= 0.9:
            lettergrade = "A-"
        elif percentage >= 0.87:
            lettergrade = "B+" 
        elif percentage >= 0.83:
            lettergrade = "B" 
        elif percentage >= 0.8:
            lettergrade = "B-" 
        elif percentage >= 0.77:
            lettergrade = "C+"
        elif percentage >= 0.73:
            lettergrade = "C" 
        elif percentage >= 0.7:
            lettergrade = "C-"
        elif percentage >= 0.67:
            lettergrade = "D+" 
        elif percentage >= 0.63:
            lettergrade = "D" 
        elif percentage >= 0.6:
            lettergrade = "D-"
        else:
            lettergrade = "F"
        return lettergrade
    
    def genAssignment_v1_tables(self,max_num_assigns=3):
        """This method generates 3 assignment Canvas v1 tables: assignments, assignment_submissions and assignment_submission_summary"""
        date_range = self.__get_daterange(start_date=dt.datetime(2022,1,3),end_date=dt.datetime(2022,1,21))
        # count the total number of assignments
        m = 1
        for index, section in self.canvas_sections.iterrows():
            # find section id and associated course id
            section_id = section['id']
            sis_section_id = section['sis_section_id']
            course_id = section['course_id']
            section_name = section['name']
            # find number of students in section
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{sis_section_id}')
            num_students_in_section = dfEnroll.count()
            # find instructor ID for the section
            dfBT_instructor_enroll = spark.createDataFrame(self.instructors_enroll)
            instructor_sis_id = dfBT_instructor_enroll.filter(dfBT_instructor_enroll['InstructsClass_SectionId'] == f'{sis_section_id}').collect()[0][0]
            df_users = spark.createDataFrame(self.canvas_users)
            instructor_id = df_users.filter(df_users['sis_user_id'] == f'{instructor_sis_id}').collect()[0][0]
            # randomly choose how many assignments have been assigned in this class
            n = random.randint(0,max_num_assigns)
            l = 1
            while n > 0:
                # choose the day this assignment was assigned
                assign_day = random.choice(date_range)
                # generate the assignment id
                assign_id = self.faker.unique.random_int(min=100000, max=999999)
                # finally generate the assignment tables
                allowed_attempts,duedate = self._genAssignments_v1(assign_id,course_id,l,m,section_name,assign_day,num_students_in_section,instructor_id)
                if duedate > dt.datetime(2022,2,2):
                    graded = False
                else:
                    graded = True
                self._genAssignmentSubmissions_v1(assign_id,course_id,l,dfEnroll,allowed_attempts,graded,duedate,section_name,assign_day,num_students_in_section,instructor_id)
                self._genAssignmentSubmissionSummary_v1(assign_id,graded,num_students_in_section,dfEnroll)
                n = n - 1
                m = m + 1
                l = l + 1
        self.writetojsonfile('assignments_v1', self.canvas_assignments_v1)
        self.writetojsonfile('assignment_submissions_v1', self.canvas_assignment_submissions_v1)
        self.writetojsonfile('assignment_submission_summary_v1', self.canvas_assignment_submission_summary_v1)

    def _genAssignments_v1(self,assignid,courseid,assignnumber_in_section,assignnumber_in_system,sectionname,assignday,num_students_in_section,instructor_id):
        # NOTE: this code assumes there's supposed to be one row per assignment
        id = assignid
        name = f"Assignment {str(assignnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
        description = "<p>Do the following:</p>..."
        created_at = f"{assignday - dt.timedelta(days=random.randint(0,1),hours=random.randint(0,23),minutes=random.randint(0,59))}"
        updated_at = f"{created_at}"
        duedate = assignday + dt.timedelta(days=random.randint(7,14))
        due_at = f"{duedate}"
        lock_at = f"{self.enddate}"
        unlock_at = f"{assignday}"
        has_overrides = True
        course_id = courseid
        html_url = f"https://canvas.cu.edu/courses/{str(courseid)}/assignments/{str(assignid)}"
        submissions_download_url = f"https://canvas.cu.edu/courses/{str(courseid)}/assignments{str(assignid)}/submissions?zip=1"
        assignment_group_id = self.faker.random_int(min=1000, max=9999)
        due_date_required = True
        allow_extensions = "[docx, pptx, xlsx, pdf]" # NOTE: this is supposed to be an array, but temporarily creating this field as a string
        max_name_length = 20
        turnitin_enabled = True
        verticite_enabled = False 
        turnitin_settings = ""
        grade_group_students_individually = None # NOTE: typically boolean for group assignments - temp assumption that all assignments are individual
        #external_tool_tag_attributes =
        peer_reviews = False
        automatic_peer_reviews = False 
        peer_review_count = 0
        peer_reviews_assign_at = ""
        intra_group_peer_reviews = False 
        group_category_id = None
        if duedate < dt.datetime(2022,2,2):
            needs_grading_count = 0
        else:
            needs_grading_count = num_students_in_section
        needs_grading_count_by_section = "" # NOTE: supposed to be an array representing section_ids with the number of assignments needed to be graded
        position = assignnumber_in_system
        points_possible = 100
        submission_types = ["on_paper"] # NOTE: should be an arry; using string for now. can be: discussion_topic, online_quiz, on_paper, none, external_tool, online_text_entry, online_url, online_upload, media_recording, student_annotation
        has_submitted_submissions = True
        grading_type = "points" # NOTE: can be: pass_fail, percent, letter_grade, gpa_scale or points
        grading_standard_id = None
        published = True
        unpublishable = False
        only_visible_to_overrides = False 
        locked_for_user = False
        moderated_grading = True
        grader_count = 1
        final_grader_id = instructor_id
        grader_comments_visible_to_graders = True
        grader_anonymous_to_graders = False
        grader_names_visible_to_final_grader = True
        anonymous_grading = True
        allowed_attempts = random.randint(1,3)
        post_manually = True
        annotatable_attachment_id = None
        self.canvas_assignments_v1.loc[len(self.canvas_assignments_v1)] = [id,name,description,created_at,updated_at,due_at,lock_at,unlock_at,has_overrides,course_id,html_url,submissions_download_url,assignment_group_id, \
                                                                    due_date_required,allow_extensions,max_name_length,turnitin_enabled,verticite_enabled,turnitin_settings,grade_group_students_individually, \
                                                                    peer_reviews,automatic_peer_reviews,peer_review_count,peer_reviews_assign_at,intra_group_peer_reviews,group_category_id,needs_grading_count,needs_grading_count_by_section, \
                                                                    position,points_possible,submission_types,has_submitted_submissions,grading_type,grading_standard_id,published,unpublishable,only_visible_to_overrides,locked_for_user, \
                                                                    moderated_grading,grader_count,final_grader_id,grader_comments_visible_to_graders,grader_anonymous_to_graders,grader_names_visible_to_final_grader,anonymous_grading, \
                                                                    allowed_attempts,post_manually,annotatable_attachment_id]
        return allowed_attempts,duedate

    def _genAssignmentSubmissions_v1(self,assignid,courseid,assignnumber_in_section,dfEnroll,allowed_attempts,graded,duedate,sectionname,assignday,num_students_in_section,instructor_id):
        # randomly sample the students enrolled in the course for assignment submissions
        half_of_students_in_class = round(num_students_in_section / 2)
        num_students_submit = random.randint(half_of_students_in_class, num_students_in_section)
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        dfStudents_submit = spark.createDataFrame(df_students_submit).select('StudentID').withColumnRenamed('StudentID','submit_studentID')
        dfStudents_not_submit = dfEnroll.join(dfStudents_submit, dfEnroll.StudentID == dfStudents_submit.submit_studentID, how='left_anti') # this table is used to add students that haven't submitted assignments
        df_students_not_submit = dfStudents_not_submit.toPandas()
        # set users roster table to extract user IDs
        df_users = spark.createDataFrame(self.canvas_users)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            assignment_id = assignid
            assignment = f"Assignment {str(assignnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
            course = courseid
            body = "" # NOTE: assumption is that the assignment was uploaded rather than a filled field
            user_id = df_users.filter(df_users['sis_user_id'] == student['StudentID']).collect()[0][0]
            html_url = f"https://canvas.cu.edu/courses/{str(courseid)}/assignments/{str(assignid)}/submissions/{str(user_id)}"
            submission_type = "online_upload" # NOTE: accepted values - online_text_entry, online_url, online_upload, media_recording or student_annotation
            url = "" # NOTE: only for online_url submission type
            user = df_users.filter(df_users['sis_user_id'] == student['StudentID']).select('name').collect()[0][0]
            assignment_visible = True
            excused = True
            missing = False
            late_policy_status = "can be late" # NOTE: accepted values - can be late, missing, extended, none, or null
            points_deducted = 0
            workflow_state = "submitted"
            extra_attempts = 0
            redo_request = False
            # optional table attributes currently left out: submission_comments, read_status
            # randomly create the number of attempts/assignment submissions for this student
            if allowed_attempts == 1:
                random_num_attempts = 1
            else:
                random_num_attempts = random.randint(1,allowed_attempts)
            for n in range(0,random_num_attempts):
                # if only one attempt for the student, then simply generate that single submission
                attempt = n + 1
                preview_url = html_url = f"https://canvas.cu.edu/courses/{str(courseid)}/assignments/{str(assignid)}/submissions/{str(user_id)}?preview={str(attempt)}"
                submitted_daytime = assignday + dt.timedelta(days=random.randint(1,10),hours=random.randint(0,23),minutes=random.randint(0,59))
                submitted_at = f"{submitted_daytime}"
                anonymous_id = self.faker.uuid4()
                if graded == True:
                    grader_id = instructor_id
                    score = round(random.triangular(40.00,100.00,78.00))
                    # calculate letter grade associated
                    percent = score/100
                    grade = self.__get_lettergrade(percent)
                    graded_daytime = submitted_daytime + dt.timedelta(days=random.randint(0,3),hours=random.randint(0,23),minutes=random.randint(0,59))
                    graded_at = f"{graded_daytime}"
                    grade_matches_current_submission = True # NOTE: manual work will be needed for correcting this
                    posted_at = f"{graded_daytime}"
                else: 
                    grader_id = None
                    score = None
                    grade = ""
                    graded_at = ""
                    grade_matches_current_submission = False
                    posted_at = ""
                if submitted_daytime > duedate:
                    late = True
                    #seconds_late = F.unix_timestamp(submitted_daytime) - F.unix_timestamp(duedate) 
                    #t1 = dt.datetime.strptime(duedate, "%Y-%m-%d %H:%M:%S")
                    #t2 = dt.datetime.strptime(submitted_daytime, "%Y-%m-%d %H:%M:%S")
                    delta = submitted_daytime - duedate
                    seconds_late = round(delta.total_seconds())
                    #seconds_late = dt.datetime.strptime(str(submitted_daytime), '%Y-%m-%d %H:%M:%S')
                else:
                    late = False
                    seconds_late = 0
                self.canvas_assignment_submissions_v1.loc[len(self.canvas_assignment_submissions_v1)] = [assignment_id,assignment,course,attempt,body,grade,grade_matches_current_submission,html_url,preview_url,score, \
                                                                                                submission_type,submitted_at,url,user_id,grader_id,graded_at,user,late,assignment_visible,excused,missing,late_policy_status, \
                                                                                                points_deducted,seconds_late,workflow_state,extra_attempts,anonymous_id,posted_at,redo_request]
        # if assignment has been graded, then add students without submissions
        if graded == True:
            for index, student in df_students_not_submit.iterrows():
                # assign static varibles per student submission(s)
                # optional table attributes currently left out: submission_comments, read_status
                assignment_id = assignid
                assignment = f"Assignment {str(assignnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
                course = courseid
                attempt = None
                body = "" 
                grade = "F"
                grade_matches_current_submission = True
                html_url = ""
                preview_url = ""
                score = 0
                submission_type = "online_upload" # NOTE: accepted values - online_text_entry, online_url, online_upload, media_recording or student_annotation
                submitted_at = ""
                url = "" # NOTE: only for online_url submission type
                user_id = df_users.filter(df_users['sis_user_id'] == student['StudentID']).collect()[0][0]
                grader_id = instructor_id
                graded_daytime = duedate + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                graded_at = f"{graded_daytime}"
                user = df_users.filter(df_users['sis_user_id'] == student['StudentID']).select('name').collect()[0][0]
                late = False
                assignment_visible = False
                excused = False
                missing = True
                late_policy_status = "missing" # NOTE: accepted values - can be late, missing, extended, none, or null
                points_deducted = 100
                seconds_late = 0
                workflow_state = "not submitted"
                extra_attempts = 0
                anonymous_id = self.faker.uuid4()
                posted_at = f"{graded_daytime}"
                redo_request = False
                self.canvas_assignment_submissions_v1.loc[len(self.canvas_assignment_submissions_v1)] = [assignment_id,assignment,course,attempt,body,grade,grade_matches_current_submission,html_url,preview_url,score, \
                                                                                                submission_type,submitted_at,url,user_id,grader_id,graded_at,user,late,assignment_visible,excused,missing,late_policy_status, \
                                                                                                points_deducted,seconds_late,workflow_state,extra_attempts,anonymous_id,posted_at,redo_request]
    
    def _genAssignmentSubmissionSummary_v1(self,assignid,bool_graded,num_students_in_section,dfEnroll):
        # NOTE: This table is not supposed to include the assignment_id in this base level structure - this will need updating
        df_assignment_subs = spark.createDataFrame(self.canvas_assignment_submissions)
        df_assignment_subs = df_assignment_subs.filter(df_assignment_subs['assignment_id'] == f'{assignid}').groupBy('user_id','workflow_state').count()
        
        assignment_id = assignid
        if bool_graded == True:
            graded = num_students_in_section
            ungraded = 0
            notsubmitted = df_assignment_subs.filter(df_assignment_subs['workflow_state'] == "not submitted").count()
        else:
            graded = 0
            ungraded = num_students_in_section
            num_students_submitted = df_assignment_subs.filter(df_assignment_subs['workflow_state'] == "submitted").count()
            notsubmitted = num_students_in_section - num_students_submitted
        self.canvas_assignment_submission_summary_v1.loc[len(self.canvas_assignment_submission_summary_v1)] = [assignment_id,graded,ungraded,notsubmitted]

    
    def genQuiz_v1_tables(self,max_num_quizzes=5):
        """This method generates 2 quiz Canvas v1 tables: quizzes and quiz_submissions"""
        date_range = self.__get_daterange(start_date=dt.datetime(2022,1,3),end_date=dt.datetime(2022,1,21))
        # count the total number of quizzes
        m = 1
        for index, section in self.canvas_sections.iterrows():
            # find section id and associated course id
            section_id = section['id']
            sis_section_id = section['sis_section_id']
            course_id = section['course_id']
            section_name = section['name']
            # find number of students in section
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{sis_section_id}')
            num_students_in_section = dfEnroll.count()
            # find instructor ID for the section
            dfBT_instructor_enroll = spark.createDataFrame(self.instructors_enroll)
            instructor_sis_id = dfBT_instructor_enroll.filter(dfBT_instructor_enroll['InstructsClass_SectionId'] == f'{sis_section_id}').collect()[0][0]
            df_users = spark.createDataFrame(self.canvas_users)
            instructor_id = df_users.filter(df_users['sis_user_id'] == f'{instructor_sis_id}').collect()[0][0]
            # randomly choose how many quizzes have been assigned in this class
            n = random.randint(0,max_num_quizzes)
            l = 1
            while n > 0:
                # choose the day this quiz was assigned
                quiz_day = random.choice(date_range)
                # generate the quiz id
                quiz_id = self.faker.unique.random_int(min=100000, max=999999)
                # finally generate the quiz tables
                allowed_attempts,duedate = self._genQuizzes_v1(quiz_id,course_id,l,section_name,quiz_day) 
                if duedate > dt.datetime(2022,2,2):
                    graded = False
                else:
                    graded = True
                self._genQuizSubmissions_v1(assign_id,course_id,l,dfEnroll,allowed_attempts,graded,duedate,section_name,assign_day,num_students_in_section,instructor_id)
                n = n - 1
                m = m + 1
                l = l + 1
        self.writetojsonfile('quizzes_v1', self.canvas_quizzes_v1)
        self.writetojsonfile('quiz_submissions_v1', self.canvas_quiz_submissions_v1)

    def _genQuizzes_v1(self,quizid,courseid,quiznumber_in_section,sectionname,quizday):
        id = quizid
        title = f"Quiz {str(quiznumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
        html_url = f"https://canvas.cu.edu/courses/{str(courseid)}/quizzes/{str(quizid)}"
        mobile_url = f"https://canvas.cu.edu/courses/{str(courseid)}/quizzes/{str(quizid)}?persist_headless=1&force_user=1"
        description = "Sample quiz description for a section."
        quiz_type = "assignment" # NOTE: options - practice_quiz, assignment, graded_survey or survey
        assignment_group_id = self.faker.random_int(min=1000, max=9999) # does not mean anything at the moment
        time_limit = 10 # NOTE: in minutes
        shuffle_answers = False
        hide_results = "always" # NOTE: options - null, always or until_after_last_attempt
        show_correct_answers = False # NOTE: only valid if hide_results = null
        show_correct_answers_last_attempt = True
        duedate = quizday + dt.timedelta(days=random.randint(0,2))
        due_at = f"{duedate}"
        show_correct_answers_at = f"{duedate + dt.timedelta(days=1)}"
        hide_correct_answers_at = f"{self.enddate}"
        one_time_results = False
        allowed_attempts = random.randint(1,3)
        if allowed_attempts == 1:
            scoring_policy = ""
        else:
            scoring_policy = "keep_highest" # NOTE: options - keep_highest or keep_latest
        one_question_at_a_time = False
        question_count = random.randint(10,20)
        points_possible = 20
        cant_go_back = False # only true if one_question_at_a_time is true
        access_code = ""
        ip_filter = ""
        lock_at = f"{duedate}"
        unlock_at = f"{quizday}"
        published = True
        unpublishable = False
        locked_for_user = False
        speedgrader_url = ""
        quiz_extensions_url = f"https://canvas.cu.edu/courses/{str(courseid)}/quizzes/{str(quizid)}/quiz_extensions"
        permissions = ""
        all_dates = ""
        version_number = 1 
        question_types = ["multiple_choice"]
        anonymous_submissions = False
        self.canvas_quizzes_v1.loc[len(self.canvas_quizzes_v1)] = [id,title,html_url,mobile_url,description,quiz_type,assignment_group_id,time_limit,shuffle_answers,hide_results, \
                                                            show_correct_answers,show_correct_answers_last_attempt,show_correct_answers_at,hide_correct_answers_at,one_time_results,scoring_policy, \
                                                            allowed_attempts,one_question_at_a_time,question_count,points_possible,cant_go_back,access_code,ip_filter,due_at,lock_at,unlock_at,published, \
                                                            unpublishable,locked_for_user,speedgrader_url,quiz_extensions_url,permissions,all_dates,version_number,question_types,anonymous_submissions]
        return allowed_attempts,duedate
    
    def _genQuizSubmissions_v1(self,quizid,quizday,num_students_in_course,dfEnroll,maxattempts):
        # randomly sample the students enrolled in the course for quiz submissions
        half_of_students_in_class = round(num_students_in_course / 2)
        num_students_submit = random.randint(half_of_students_in_class, num_students_in_course)
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            quiz = quizid
            userid = student['StudentID']
            layout = ''
            currentpage = 0
            preview = 0
            timemodifiedoffline = 0
            timecheckstate = 0
            gradednotificationsenttime = ''
            random_num_attempts = random.randint(1,maxattempts)
            if random_num_attempts == 1:
                # if only one attempt for the student, then simply generate that single submission
                id = self.faker.uuid4()
                uniqueid = self.faker.uuid4()
                state = 'finished'
                attempt = 1
                timestart = quizday + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                timemodified = timefinish
                sumgrades = round(random.triangular(35.00,100.00,73.50))
                self.moodle_quiz_attempts_v1.loc[len(self.moodle_quiz_attempts_v1)] = [id,quiz,userid,attempt,uniqueid,layout,currentpage,preview,state,timestart,timefinish,timemodified,timemodifiedoffline,timecheckstate,sumgrades,gradednotificationsenttime]
            else:
            # create variables for keeping track of the last attempt from same student
                last_attempt_daytime = quizday + dt.timedelta(hours=random.randint(20,23),minutes=random.randint(0,59))
                previous_attempt = None
                # iterate through total number of attempts
                for n in range(0,random_num_attempts):
                    id = self.faker.uuid4()
                    attempt = n + 1
                    uniqueid = self.faker.uuid4()
                    state = 'finished'
                    timemodified = last_attempt_daytime
                    sumgrades = round(random.triangular(35.00,100.00,73.50))
                    if n == (random_num_attempts - 1):
                        timestart = last_attempt_daytime - dt.timedelta(minutes=random.randint(0,20))
                        timefinish = last_attempt_daytime
                    else:
                        if isinstance(previous_attempt, type(None)):
                            timestart = quizday + dt.timedelta(hours=random.randint(6,20),minutes=random.randint(0,59))
                            timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                        else:
                            timestart = previous_attempt + dt.timedelta(hours=random.randint(0,2),minutes=random.randint(0,59))
                            timefinish = timestart + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                    previous_attempt = timefinish
                    self.canvas_quiz_submissions_v1.loc[len(self.canvas_quiz_submissions_v1)] = [id,quiz_id,user_id,submission_id,started_at,finished_at,end_at,attempt,extra_attempts,extra_time,manually_unlocked,time_spent,score,score_before_regrade, \
                                                                                        kept_score,fudge_points,has_seen_results,workflow_state,overdue_and_needs_submission]

    def genModule_v1_tables(self,max_num_modules=5):
        """This method generates 2 module Canvas v1 tables: modules_v1 and module_items_v1"""
        date_range = self.__get_daterange(start_date=dt.datetime(2022,1,5),end_date=dt.datetime(2022,1,21))
        # count the total number of modules
        m = 1
        for index, section in self.canvas_course_sections.iterrows():
            # find section id and associated course id
            section_id = section['id']
            #sis_section_id = section['sis_section_id'] # <- uncomment if generating on Canvas v1 roster data
            sis_section_id = section['sis_source_id'] # <- comment-out this if generating on Canvas v1 roster data
            course_id = section['course_id']
            section_name = section['name']
            # find number of students in section
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{sis_section_id}')
            num_students_in_section = dfEnroll.count()
            # find instructor ID for the section
            dfBT_instructor_enroll = spark.createDataFrame(self.instructors_enroll)
            instructor_sis_id = dfBT_instructor_enroll.filter(dfBT_instructor_enroll['InstructsClass_SectionId'] == f'{sis_section_id}').collect()[0][0]
            df_users = spark.createDataFrame(self.canvas_users)
            #instructor_id = df_users.filter(df_users['sis_user_id'] == f'{instructor_sis_id}').collect()[0][0] # <- uncomment this if generating on Canvas v1 roster data
            instructor_id = df_users.filter(df_users['global_canvas_id'] == f'{instructor_sis_id}').collect()[0][0] # <- comment-out this if generating on Canvas v1 roster data
            # randomly choose how many modules have been assigned in this class
            n = random.randint(0,max_num_modules)
            l = 1
            while n > 0:
                # choose the day this module was assigned
                mod_day = random.choice(date_range)
                # generate the module id
                mod_id_start = self.faker.unique.random_int(min=10000, max=99999)
                mod_id_end = self.faker.unique.random_int(min=1000, max=9999)
                mod_id = str(mod_id_start) + '000000000' + str(mod_id_end)
                mod_id = int(mod_id)
                # finally generate the module tables
                num_mod_items,due_date = self._genModules_v1(mod_id,course_id,m,l,section_name,mod_day,mod_id_start,mod_id_end) 
                self._genModuleItems_v1(mod_id,course_id,num_mod_items,l,section_name,mod_day,due_date) 
                n = n - 1
                m = m + 1
                l = l + 1
        self.writetojsonfile('modules_v1', self.canvas_modules_v1)
        self.writetojsonfile('module_items_v1', self.canvas_module_items_v1)

    def _genModules_v1(self,modid,courseid,modnumber_in_system,modnumber_in_section,sectionname,modday,modid_start,modid_end):
        id = modid
        name = f"Module {str(modnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
        position = modnumber_in_system
        unlock_at = f"{modday}"
        require_sequential_progress = False
        publish_final_grade = False
        prerequisite_module_ids = []
        state = "completed"
        duedate = modday + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23))
        completed_at = f"{duedate}"
        items_count = self.faker.random_int(min=1, max=4)
        items_url = f"https://cu.instructure.com/api/v1/courses/{str(courseid)}/modules/{str(modid_start)}~{str(modid_end)}/items"
        self.canvas_modules_v1.loc[len(self.canvas_modules_v1)] = [id,name,position,unlock_at,require_sequential_progress,publish_final_grade,prerequisite_module_ids,state,completed_at,items_count,items_url]
        return items_count,duedate

    def _genModuleItems_v1(self,modid,courseid,num_mod_items,modnumber_in_section,sectionname,modday,duedate):
        for n in range(num_mod_items):
            id = self.faker.unique.random_int(min=1000000, max=9999999)
            title = f"Module Item {str(n + 1)} from Module {str(modnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
            position = n + 1
            indent = 0
            quiz_lti = False
            type = "Page"
            module_id = modid
            html_url = f"https://cu.instructure.com/api/v1/courses/{str(courseid)}/modules/items/{str(id)}"
            external_url = "https://www.youtube.com/watch?example_fake_url"
            new_tab = False
            self.canvas_module_items_v1.loc[len(self.canvas_module_items_v1)] = [id,title,position,indent,quiz_lti,type,module_id,html_url,external_url,new_tab]

    def genQuiz_v2_tables(self,max_num_quizzes=5):
        """This method generates 2 quiz Canvas v2 tables: quizzes and quiz_submissions"""
        date_range = self.__get_daterange(start_date=dt.datetime(2022,1,5),end_date=dt.datetime(2022,1,28))
        # count the total number of quizzes
        m = 1
        for index, section in self.canvas_sections.iterrows():
            # find section id and associated course id
            section_id = section['id']
            sis_section_id = section['sis_source_id']
            course_id = section['course_id']
            section_name = section['name']
            # find students in section
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{sis_section_id}')
            df_users = spark.createDataFrame(self.canvas_users)
            dfEnroll = dfEnroll.join(df_users, dfEnroll.StudentID == df_users.global_canvas_id,how='inner')
            num_students_in_section = dfEnroll.count()
            # find instructor ID for the section
            dfBT_instructor_enroll = spark.createDataFrame(self.instructors_enroll)
            instructor_sis_id = dfBT_instructor_enroll.filter(dfBT_instructor_enroll['InstructsClass_SectionId'] == f'{sis_section_id}').collect()[0][0]
            df_users = spark.createDataFrame(self.canvas_users)
            instructor_id = df_users.filter(df_users['global_canvas_id'] == f'{instructor_sis_id}').collect()[0][0]
            # randomly choose how many quizzes have been assigned in this class
            n = random.randint(0,max_num_quizzes)
            l = 1
            while n > 0:
                # choose the day this quiz was assigned
                quiz_day = random.choice(date_range)
                # generate the quiz id
                quiz_id = self.faker.unique.random_int(min=100000, max=999999)
                # finally generate the quiz tables
                allowed_attempts,due_date,question_count = self._genQuizzes_v2(quiz_id,course_id,l,section_name,quiz_day)
                self._genQuizSubmissions_v2(quiz_id,num_students_in_section,dfEnroll,allowed_attempts,due_date,quiz_day,question_count)
                n = n - 1
                m = m + 1
                l = l + 1
        self.writetojsonfile('quizzes_v2', self.canvas_quizzes_v2)
        self.writetojsonfile('quiz_submissions_v2', self.canvas_quiz_submissions_v2)

    def _genQuizzes_v2(self,quizid,courseid,quiznumber_in_section,sectionname,quizday):
        id = quizid
        title = f"Quiz {str(quiznumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
        context_id = courseid
        assignment_id = self.faker.unique.random_int(min=1000000, max=9999999)
        points_possible = 20
        description = "Sample quiz description for a section."
        quiz_type = "assignment" # NOTE: options - practice_quiz, assignment, graded_survey or survey
        workflow_state = "published"
        allowed_attempts = random.randint(1,3)
        if allowed_attempts == 1:
            scoring_policy = ""
        else:
            scoring_policy = "keep_highest" # NOTE: options - keep_highest or keep_latest
        anonymous_submissions = False
        shuffle_answers = False
        cant_go_back = False 
        could_be_locked = True
        require_lockdown_browser = True
        require_lockdown_browser_for_results = True
        require_lockdown_browser_monitor = False
        ip_filter = ""
        hide_results = "until_after_last_attempt" # NOTE: options - null, always or until_after_last_attempt
        show_correct_answers = False # NOTE: only valid if hide_results = null
        duedate = quizday + dt.timedelta(days=1)
        due_at = f"{duedate}"
        show_correct_answers_at = f"{duedate + dt.timedelta(days=1)}"
        hide_correct_answers_at = f"{self.enddate}"
        created_dt = quizday - dt.timedelta(days=random.randint(1,3),hours=random.randint(0,23),minutes=random.randint(0,59))
        created_at = f"{created_dt}"
        updated_at = f"{created_dt}"
        published_at = f"{created_dt}"
        unlock_at = f"{quizday}"
        lock_at = f"{duedate}"
        deleted_at = f"{self.enddate}"
        time_limit = 80 # NOTE: in minutes
        unpublished_question_count = random.randint(0,5)
        question_count = random.randint(10,20)
        self.canvas_quizzes_v2.loc[len(self.canvas_quizzes_v2)] = [id,title,context_id,assignment_id,points_possible,description,quiz_type,workflow_state,scoring_policy,anonymous_submissions,shuffle_answers, \
                                                            cant_go_back,could_be_locked,require_lockdown_browser,require_lockdown_browser_for_results,require_lockdown_browser_monitor,ip_filter, \
                                                            hide_results,show_correct_answers,show_correct_answers_at,hide_correct_answers_at,created_at,updated_at,published_at,unlock_at,lock_at,due_at,deleted_at, \
                                                            time_limit,allowed_attempts,unpublished_question_count,question_count]
        return allowed_attempts,duedate,question_count
    
    def _genQuizSubmissions_v2(self,quizid,num_students_in_section,dfEnroll,maxattempts,duedate,quizday,questioncount):
        # randomly sample the students enrolled in the course for quiz submissions
        three_fourths_of_students_in_class = round(num_students_in_section / 4) * 3 
        num_students_submit = random.randint(three_fourths_of_students_in_class, num_students_in_section)
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        for index, student in df_students_submit.iterrows():
            # assign static varibles per student submission(s)
            quiz_id = quizid
            user_id = student['id']
            workflow_state = "complete"
            manually_unlocked = False
            was_preview = False
            has_seen_results = True
            temporary_user_code = ""
            quiz_points_possible = 20 # defaulted
            fudge_points = 0
            extra_attempts = 0
            extra_time = 0
            random_num_attempts = random.randint(1,maxattempts)
            if random_num_attempts == 1:
                # if only one attempt for the student, then simply generate that single submission
                id = self.faker.unique.random_int(min=10000000, max=99999999)
                submission_id = self.faker.unique.random_int(min=100000000, max=999999999)
                attempt = 1
                started_dt = quizday + dt.timedelta(hours=random.randint(0,23),minutes=random.randint(0,59))
                started_at = f"{started_dt}"
                finished_dt = started_dt + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                finished_at = f"{finished_dt}"
                created_at = f"{finished_dt}"
                updated_at = f"{finished_dt}"
                end_at = f"{duedate}"
                score = round(random.randint(3,questioncount)/20, 2)
                kept_score = score
                score_before_regrade = None
                self.canvas_quiz_submissions_v2.loc[len(self.canvas_quiz_submissions_v2)] = [id,quiz_id,user_id,submission_id,workflow_state,manually_unlocked,was_preview,has_seen_results,temporary_user_code,created_at,updated_at,started_at,finished_at, \
                                                                                        end_at,score,kept_score,quiz_points_possible,score_before_regrade,fudge_points,attempt,extra_attempts,extra_time]
            else:
            # create variables for keeping track of the last attempt from same student
                last_attempt_daytime = quizday + dt.timedelta(hours=random.randint(20,23),minutes=random.randint(0,59))
                highest_attempt_numcorrect = random.randint(9,questioncount) 
                previous_attempt_time = None
                previous_attempt_score = None
                # iterate through total number of attempts
                for n in range(0,random_num_attempts):
                    id = self.faker.unique.random_int(min=10000000, max=99999999)
                    submission_id = self.faker.unique.random_int(min=100000000, max=999999999)
                    attempt = n + 1
                    updated_at = f"{last_attempt_daytime}"
                    end_at = f"{duedate}"
                    kept_score = round(highest_attempt_numcorrect/20, 2)
                    score_before_regrade = previous_attempt_score
                    if n == 0:
                        started_dt = quizday + dt.timedelta(hours=random.randint(0,20),minutes=random.randint(0,59))
                        started_at = f"{started_dt}"
                        finished_dt = started_dt + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                        finished_at = f"{finished_dt}"
                        created_at = f"{finished_dt}"
                        score = round(random.randint(3,highest_attempt_numcorrect-1)/20, 2)
                    elif n == (random_num_attempts - 1):
                        # currently defaulted to last attempt being the highest score per student quiz submission
                        started_dt = last_attempt_daytime - dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                        started_at = f"{started_dt}"
                        finished_at = f"{last_attempt_daytime}"
                        created_at = f"{last_attempt_daytime}"
                        score = kept_score
                    else:
                        started_dt = previous_attempt_time + dt.timedelta(hours=random.randint(0,2),minutes=random.randint(0,59))
                        started_at = f"{started_dt}"
                        finished_dt = started_dt + dt.timedelta(hours=random.randint(0,1),minutes=random.randint(0,20))
                        finished_at = f"{finished_dt}"
                        created_at = f"{finished_dt}"
                        score = round(random.randint(3,highest_attempt_numcorrect-1)/20, 2)
                    previous_attempt_time = finished_dt
                    previous_attempt_score = score
                    self.canvas_quiz_submissions_v2.loc[len(self.canvas_quiz_submissions_v2)] = [id,quiz_id,user_id,submission_id,workflow_state,manually_unlocked,was_preview,has_seen_results,temporary_user_code,created_at,updated_at,started_at,finished_at, \
                                                                                        end_at,score,kept_score,quiz_points_possible,score_before_regrade,fudge_points,attempt,extra_attempts,extra_time]
    
    def genModule_v2_tables(self,max_num_modules=5):
        """This method generates 2 module Canvas v2 tables: context_modules_v2 and content_tags_v2"""
        date_range = self.__get_daterange(start_date=dt.datetime(2022,1,5),end_date=dt.datetime(2022,1,21))
        # count the total number of modules
        m = 1
        for index, section in self.canvas_course_sections.iterrows():
            # find section id and associated course id
            section_id = section['id']
            sis_section_id = section['sis_source_id'] 
            course_id = section['course_id']
            section_name = section['name']
            # find number of students in section
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{sis_section_id}')
            num_students_in_section = dfEnroll.count()
            # find instructor ID for the section
            dfBT_instructor_enroll = spark.createDataFrame(self.instructors_enroll)
            instructor_sis_id = dfBT_instructor_enroll.filter(dfBT_instructor_enroll['InstructsClass_SectionId'] == f'{sis_section_id}').collect()[0][0]
            df_users = spark.createDataFrame(self.canvas_users)
            instructor_id = df_users.filter(df_users['global_canvas_id'] == f'{instructor_sis_id}').collect()[0][0] 
            # randomly choose how many modules have been assigned in this class
            n = random.randint(0,max_num_modules)
            l = 1
            while n > 0:
                # choose the day this module was assigned
                mod_day = random.choice(date_range)
                # generate the module id
                mod_id_start = self.faker.unique.random_int(min=10000, max=99999)
                mod_id_end = self.faker.unique.random_int(min=1000, max=9999)
                mod_id = str(mod_id_start) + '000000000' + str(mod_id_end)
                mod_id = int(mod_id)
                # finally generate the module tables
                created_date,due_date = self._genContextModules_v2(mod_id,course_id,m,l,section_name,mod_day) 
                self._genContentTags_v2(mod_id,course_id,created_date,l,section_name,due_date) 
                n = n - 1
                m = m + 1
                l = l + 1
        self.writetojsonfile('context_modules_v2', self.canvas_context_modules_v2)
        self.writetojsonfile('content_tags_v2', self.canvas_content_tags_v2)

    def _genContextModules_v2(self,modid,courseid,modnumber_in_system,modnumber_in_section,sectionname,modday):
        id = modid
        name = f"Module {str(modnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
        context_id = courseid # from documentation: https://docs.google.com/spreadsheets/d/1kqCXAD9K45L0QeEtbuuMAFp2fW8o0oC8EBzJf58SjrY/edit#gid=1857332749
        position = modnumber_in_system
        require_sequential_progress = False
        workflow_state = "completed"
        createddate = modday - dt.timedelta(days=random.randint(1,3),hours=random.randint(0,23),minutes=random.randint(0,59))
        created_at = f"{createddate}"
        deleted_at = ""
        unlock_at = f"{modday}"
        duedate = modday + dt.timedelta(days=random.randint(0,2),hours=random.randint(0,23))
        updated_at = f"{duedate}"
        self.canvas_context_modules_v2.loc[len(self.canvas_context_modules_v2)] = [id,name,context_id,position,require_sequential_progress,workflow_state,created_at,deleted_at,unlock_at,updated_at]
        return createddate, duedate

    def _genContentTags_v2(self,modid,courseid,created_date,modnumber_in_section,sectionname,duedate):
        num_mod_items = random.randint(1,4)
        for n in range(num_mod_items):
            content_id = self.faker.unique.random_int(min=1000000, max=9999999)
            context_id = courseid
            context_module_id = modid
            content_type = "Attachment" # unsure if this is accurate, see doc: https://docs.google.com/spreadsheets/d/1kqCXAD9K45L0QeEtbuuMAFp2fW8o0oC8EBzJf58SjrY/edit#gid=1715262273
            workflow_state = "completed"
            position = n + 1
            title = f"Module Item {str(position)} from Module {str(modnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
            url = f"https://cu.instructure.com/courses/{str(courseid)}/modules/items/{str(id)}"
            created_at = f"{created_date}"
            updated_at = f"{duedate}"
            self.canvas_content_tags_v2.loc[len(self.canvas_content_tags_v2)] = [content_id,context_id,context_module_id,content_type,workflow_state,position,title,url,created_at,updated_at]

    def genAssignment_v2_tables(self,max_num_assigns=3):
        """This method generates 2 assignment Canvas v2 tables: assignments and submissions"""
        date_range = self.__get_daterange(start_date=dt.datetime(2022,1,3),end_date=dt.datetime(2022,1,21))
        # count the total number of assignments
        m = 1
        # start by adding the previously generated quizzes and quiz_submissions to assignments and submissions, respectively
        m = self._addQuizzes_to_Assignments_v2(m)
        self._addQuizSubmissions_to_Submissions_v2()
        for index, section in self.canvas_sections.iterrows():
            # find section id and associated course id
            section_id = section['id']
            sis_section_id = section['sis_source_id']
            course_id = section['course_id']
            section_name = section['name']
            # find number of students in section
            dfBT_enroll = spark.createDataFrame(self.enrollment)
            dfEnroll = dfBT_enroll.filter(dfBT_enroll['SectionID'] == f'{sis_section_id}')
            num_students_in_section = dfEnroll.count()
            # find instructor ID for the section
            dfBT_instructor_enroll = spark.createDataFrame(self.instructors_enroll)
            instructor_sis_id = dfBT_instructor_enroll.filter(dfBT_instructor_enroll['InstructsClass_SectionId'] == f'{sis_section_id}').collect()[0][0]
            df_users = spark.createDataFrame(self.canvas_users)
            instructor_id = df_users.filter(df_users['global_canvas_id'] == f'{instructor_sis_id}').collect()[0][0]
            # randomly choose how many assignments have been assigned in this class
            n = random.randint(0,max_num_assigns)
            l = 1
            while n > 0:
                # choose the day this assignment was assigned
                assign_day = random.choice(date_range)
                # generate the assignment id
                assign_id = self.faker.unique.random_int(min=1000000, max=9999999)
                # finally generate the assignment tables
                duedate = self._genAssignments_v2(assign_id,course_id,l,m,section_name,assign_day,num_students_in_section)
                if duedate > dt.datetime(2022,2,2):
                    graded = False
                else:
                    graded = True
                self._genSubmissions_v2(assign_id,course_id,l,dfEnroll,graded,duedate,section_name,assign_day,num_students_in_section,instructor_id)
                n = n - 1
                m = m + 1
                l = l + 1
        self.writetojsonfile('assignments_v2', self.canvas_assignments_v2)
        self.writetojsonfile('submissions_v2', self.canvas_submissions_v2)
    
    def _addQuizzes_to_Assignments_v2(self,assignnumber_in_system):
        # NOTE: canvas_quizzes is expected to already have been generated
        for index, quiz in self.canvas_quizzes_v2.iterrows():
            id = quiz['assignment_id']
            quiz_title = quiz['title'] # NOTE: can be modified for more unique names
            title = f"Assignment Info for {quiz_title}"
            description = quiz['description']
            context_id = quiz['context_id']
            assignment_group_id = None
            duedate = quiz['due_at']
            due_at = f"{duedate}"
            unlockdate = quiz['unlock_at']
            unlock_at = f"{unlockdate}"
            lock_at = f"{self.enddate}"
            createddate = quiz['created_at']
            created_at = f"{createddate}"
            updateddate = quiz['updated_at']
            updated_at = f"{updateddate}"
            points_possible = quiz['points_possible']
            grading_type = "letter_grade" # NOTE: can be: gpa_scale, pass_fail, percent, points, not_graded or letter_grade
            submission_types = ["online_quiz"] # NOTE: should be an arry; using string for now. can be: online_url, media_recording, online_upload, online_quiz, external_tool, online_text_entry or online_file_upload
            workflow_state = "published"
            peer_reviews = False
            peer_review_count = 0
            peer_reviews_due_at = ""
            peer_reviews_assigned = False
            automatic_peer_reviews = False 
            all_day = False
            all_day_date = ""
            could_be_locked = True
            has_overrides = True
            grade_group_students_individually = None # NOTE: typically boolean for group assignments - temp assumption that all assignments are individual
            anonymous_peer_reviews = False
            position = assignnumber_in_system
            visibility = "everyone" # accepted values: everyone or only_visible_to_overrides
            self.canvas_assignments_v2.loc[len(self.canvas_assignments_v2)] = [id,title,description,context_id,assignment_group_id,due_at,unlock_at,lock_at,created_at,updated_at,points_possible,grading_type,submission_types, \
                                                                            workflow_state,peer_reviews,peer_review_count,peer_reviews_due_at,peer_reviews_assigned,automatic_peer_reviews,all_day,all_day_date,could_be_locked, \
                                                                            grade_group_students_individually,anonymous_peer_reviews,position,visibility]
            assignnumber_in_system = assignnumber_in_system + 1
        return assignnumber_in_system
    
    def _addQuizSubmissions_to_Submissions_v2(self):
        # NOTE: canvas_quiz_submissions is expected to already have been generated
        # join quizzes table to get the context_id and assignment_id of the quiz_submission
        df_quizzes = spark.createDataFrame(self.canvas_quizzes_v2)
        df_quiz_submissions = spark.createDataFrame(self.canvas_quiz_submissions_v2)
        df_quizzes = df_quizzes.select('id', 'context_id', 'assignment_id').withColumnRenamed('id', 'qid').withColumn('assignment_id', df_quizzes['assignment_id'].cast(LongType()))
        df_quiz_submissions = df_quiz_submissions.join(df_quizzes, df_quiz_submissions.quiz_id == df_quizzes.qid, how='left').drop('qid')
        # then get the instuctor_id for the Instructor of the course (for the grader_id field in the submissions table)
        df_enrollments = spark.createDataFrame(self.canvas_enrollments)
        df_enrollments = df_enrollments.select('type', 'course_section_id', 'user_id').filter(df_enrollments['type'] == "TeacherEnrollment")
        df_sections = spark.createDataFrame(self.canvas_sections)
        df_sections = df_sections.select('id', 'course_id')
        df_i_enroll = df_enrollments.join(df_sections, df_enrollments.course_section_id == df_sections.id, how='inner').drop('id').drop('type').drop('course_section_id')
        df_i_enroll = df_i_enroll.withColumnRenamed('user_id', 'instructor_id')
        df_quiz_submissions = df_quiz_submissions.join(df_i_enroll, df_quiz_submissions.context_id == df_i_enroll.course_id, how='left').drop('course_id')
        df_quiz_submissions = df_quiz_submissions.toPandas()
        for index, sub in df_quiz_submissions.iterrows():
            id = self.faker.unique.random_int(min=10000000, max=99999999)
            body = "Graded quiz results" # NOTE: assumption is that the assignment was uploaded rather than a filled field
            courseid = sub['context_id']
            assignment_id = sub['assignment_id']
            group_id = None 
            quiz_submission_id = sub['submission_id']
            user_id = sub['user_id']
            url = f"https://canvas.cu.edu/courses/{str(courseid)}/assignments/{str(assignment_id)}/submissions/{str(user_id)}/{str(quiz_submission_id)}"
            score = sub['score']
            published_score = sub['kept_score']
            grade = self.__get_lettergrade(score)
            published_grade = self.__get_lettergrade(published_score)
            graded_anonymously = ['not_graded_anonymously']
            grader_id = sub['instructor_id']
            duedate = sub['end_at']
            graded_at = f"{duedate + dt.timedelta(days=1)}"
            posted_at = f"{duedate + dt.timedelta(days=1)}"
            submitteddate = sub['finished_at']
            submitted_at = f"{submitteddate}"
            submission_type = "online_quiz" # NOTE: accepted values - discussion_topic, external_tool, media_recording, online_file_upload, online_quiz, online_text_entry, online_upload or online_url
            workflow_state = "graded"
            createddate = sub['created_at']
            created_at = f"{createddate}"
            updateddate = sub['updated_at']
            updated_at = f"{updateddate}"
            processed = False # NOTE: only valid when there's a file/attachment associated
            if score == published_score:
                grade_matches_current_submission = True
            else:
                grade_matches_current_submission = False
            attempt = sub['attempt']
            excused = ['regular_submission']
            student_entered_score = 0.00
            submission_comments_count = 0
            self.canvas_submissions_v2.loc[len(self.canvas_submissions_v2)] = [id,body,url,assignment_id,group_id,quiz_submission_id,user_id,score,published_score,grade,published_grade,graded_anonymously, \
                                                                                grader_id,graded_at,posted_at,submitted_at,submission_type,workflow_state,created_at,updated_at,processed,grade_matches_current_submission, \
                                                                                attempt,excused,student_entered_score,submission_comments_count]
        # now add students without quiz submissions
        df_submissions = spark.createDataFrame(df_quiz_submissions)
        df_enrollments = spark.createDataFrame(self.canvas_enrollments)
        df_s_enrollments = df_enrollments.select('type', 'course_section_id', 'user_id').filter(df_enrollments['type'] == "StudentEnrollment").drop('type').withColumnRenamed('user_id', 'student_not_submit_id')
        df_students_not_submit = df_s_enrollments.join(df_submissions, df_s_enrollments.student_not_submit_id == df_submissions.user_id, how='left_anti') # this table is used to add students that haven't submitted assignments
        df_sections = spark.createDataFrame(self.canvas_sections)
        df_sections = df_sections.select('id', 'course_id')
        df_students_not_submit = df_students_not_submit.join(df_sections, df_students_not_submit.course_section_id == df_sections.id, how='inner').drop('id')
        df_quiz_info = spark.createDataFrame(self.canvas_quizzes_v2)
        df_students_not_submit = df_students_not_submit.join(df_quiz_info, df_students_not_submit.course_id == df_quiz_info.context_id, how='inner')
        df_i_enroll = df_i_enroll.withColumnRenamed('course_id', 'cid')
        df_students_not_submit = df_students_not_submit.join(df_i_enroll, df_students_not_submit.course_id == df_i_enroll.cid, how='inner').drop('cid')
        df_students_not_submit = df_students_not_submit.toPandas()
        for index, not_sub in df_students_not_submit.iterrows():
            id = self.faker.unique.random_int(min=10000000, max=99999999)
            body = "Graded quiz results" # NOTE: assumption is that the assignment was uploaded rather than a filled field
            courseid = not_sub['context_id']
            assignment_id = not_sub['assignment_id']
            group_id = None 
            quiz_submission_id = 0
            user_id = not_sub['student_not_submit_id']
            url = ""
            score = 0.00
            published_score = 0.00
            grade = "F"
            published_grade = "F"
            graded_anonymously = ['not_graded_anonymously']
            grader_id = not_sub['instructor_id']
            duedate = not_sub['due_at']
            graded_at = f"{duedate + dt.timedelta(days=1)}"
            posted_at = f"{duedate + dt.timedelta(days=1)}"
            submitted_at = ""
            submission_type = "online_quiz" # NOTE: accepted values - discussion_topic, external_tool, media_recording, online_file_upload, online_quiz, online_text_entry, online_upload or online_url
            workflow_state = "graded"
            created_at = ""
            updated_at = ""
            processed = False # NOTE: only valid when there's a file/attachment associated
            grade_matches_current_submission = True
            attempt = 0
            excused = ['regular_submission']
            student_entered_score = 0.00
            submission_comments_count = 1
            self.canvas_submissions_v2.loc[len(self.canvas_submissions_v2)] = [id,body,url,assignment_id,group_id,quiz_submission_id,user_id,score,published_score,grade,published_grade,graded_anonymously, \
                                                                                grader_id,graded_at,posted_at,submitted_at,submission_type,workflow_state,created_at,updated_at,processed,grade_matches_current_submission, \
                                                                                attempt,excused,student_entered_score,submission_comments_count]

    def _genAssignments_v2(self,assignid,courseid,assignnumber_in_section,assignnumber_in_system,sectionname,assignday,num_students_in_section):
        # NOTE: this code assumes there's supposed to be one row per assignment
        id = assignid
        title = f"Assignment {str(assignnumber_in_section)} for {sectionname}" # NOTE: can be modified for more unique names
        description = "<p>Do the following:</p>..."
        context_id = courseid
        assignment_group_id = None
        duedate = assignday + dt.timedelta(days=random.randint(7,14))
        due_at = f"{duedate}"
        unlock_at = f"{assignday}"
        lock_at = f"{self.enddate}"
        created_at = f"{assignday - dt.timedelta(days=random.randint(0,1),hours=random.randint(0,23),minutes=random.randint(0,59))}"
        updated_at = f"{created_at}"
        points_possible = 100
        grading_type = "letter_grade" # NOTE: can be: pass_fail, percent, letter_grade, gpa_scale or points
        submission_types = ["online_upload", "online_file_upload"] # NOTE: should be an arry; using string for now. can be: online_url, media_recording, online_upload, online_quiz, external_tool, online_text_entry or online_file_upload
        workflow_state = "published"
        peer_reviews = False
        peer_review_count = 0
        peer_reviews_due_at = ""
        peer_reviews_assigned = False
        automatic_peer_reviews = False 
        all_day = False
        all_day_date = ""
        could_be_locked = False
        has_overrides = True
        grade_group_students_individually = None # NOTE: typically boolean for group assignments - temp assumption that all assignments are individual
        anonymous_peer_reviews = False
        position = assignnumber_in_system
        visibility = "everyone"
        self.canvas_assignments_v2.loc[len(self.canvas_assignments_v2)] = [id,title,description,context_id,assignment_group_id,due_at,unlock_at,lock_at,created_at,updated_at,points_possible,grading_type,submission_types, \
                                                                    workflow_state,peer_reviews,peer_review_count,peer_reviews_due_at,peer_reviews_assigned,automatic_peer_reviews,all_day,all_day_date,could_be_locked, \
                                                                    grade_group_students_individually,anonymous_peer_reviews,position,visibility]
        return duedate

    def _genSubmissions_v2(self,assignid,courseid,assignnumber_in_section,dfEnroll,graded,duedate,sectionname,assignday,num_students_in_section,instructor_id):
        # randomly sample the students enrolled in the course for assignment submissions
        half_of_students_in_class = round(num_students_in_section / 2)
        num_students_submit = random.randint(half_of_students_in_class, num_students_in_section)
        df_enroll = dfEnroll.toPandas()
        df_students_submit = df_enroll.sample(n=num_students_submit)
        dfStudents_submit = spark.createDataFrame(df_students_submit).select('StudentID').withColumnRenamed('StudentID','submit_studentID')
        dfStudents_not_submit = dfEnroll.join(dfStudents_submit, dfEnroll.StudentID == dfStudents_submit.submit_studentID, how='left_anti') # this table is used to add students that haven't submitted assignments
        df_students_not_submit = dfStudents_not_submit.toPandas()
        # set users roster table to extract user IDs
        df_users = spark.createDataFrame(self.canvas_users)
        for index, student in df_students_submit.iterrows():
            # currently only assuming 1 student submission per assignment
            id = self.faker.unique.random_int(min=10000000, max=99999999)
            body = "" # NOTE: assumption is that the assignment was uploaded rather than a filled field
            assignment_id = assignid
            group_id = None # assuming all assignments are submitted individually
            quiz_submission_id = None # none of these assignments will be quizzes
            user_id = df_users.filter(df_users['global_canvas_id'] == student['StudentID']).collect()[0][0]
            url = f"https://canvas.cu.edu/courses/{str(courseid)}/assignments/{str(assignment_id)}/submissions/{str(user_id)}/{str(id)}"
            graded_anonymously = ['not_graded_anonymously']
            grader_id = instructor_id
            submit_datetime = assignday + dt.timedelta(days=random.randint(0,11),hours=random.randint(0,23),minutes=random.randint(0,59))
            submitted_at = f"{submit_datetime}"
            submission_type = "online_upload"
            if graded == True:
                workflow_state = "graded"
                score_calc = random.triangular(60.00,100.00,85.00)/100
                score = round(score_calc, 2)
                published_score = score
                grade = self.__get_lettergrade(score)
                published_grade = self.__get_lettergrade(score)
                graded_datetime = duedate + dt.timedelta(days=random.randint(1,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                graded_at = f"{graded_datetime}"
                posted_at = f"{graded_datetime}"
            else:
                workflow_state = "submitted"
                score = None
                published_score = None
                grade = ""
                published_grade = ""
                graded_at = ""
                posted_at = ""
            created_at = f"{submit_datetime}"
            updated_at = f"{submit_datetime}"
            processed = True
            grade_matches_current_submission = True
            attempt = 1
            excused = True
            student_entered_score = 0.00
            submission_comments_count = 0
            self.canvas_submissions_v2.loc[len(self.canvas_submissions_v2)] = [id,body,url,assignment_id,group_id,quiz_submission_id,user_id,score,published_score,grade,published_grade,graded_anonymously, \
                                                                                grader_id,graded_at,posted_at,submitted_at,submission_type,workflow_state,created_at,updated_at,processed,grade_matches_current_submission, \
                                                                                attempt,excused,student_entered_score,submission_comments_count]
        # if assignment has been graded, then add students without submissions
        if graded == True:
            for index, student in df_students_not_submit.iterrows():
                # assign static varibles per student submission(s)
                # optional table attributes currently left out: submission_comments, read_status
                id = self.faker.unique.random_int(min=10000000, max=99999999)
                body = "" # NOTE: assumption is that the assignment was uploaded rather than a filled field
                assignment_id = assignid
                group_id = None # assuming all assignments are submitted individually
                quiz_submission_id = None # none of these assignments will be quizzes
                user_id = df_users.filter(df_users['global_canvas_id'] == student['StudentID']).collect()[0][0]
                url = f"https://canvas.cu.edu/courses/{str(courseid)}/assignments/{str(assignment_id)}/submissions/{str(user_id)}"
                score = 0.00
                published_score = 0.00
                grade = "F"
                published_grade = "F"
                graded_anonymously = ['not_graded_anonymously']
                grader_id = instructor_id
                graded_datetime = duedate + dt.timedelta(days=random.randint(1,2),hours=random.randint(0,23),minutes=random.randint(0,59))
                graded_at = f"{graded_datetime}"
                posted_at = f"{graded_datetime}"
                submitted_at = ""
                submission_type = "online_upload"
                workflow_state = "unsubmitted"
                created_at = ""
                updated_at = ""
                processed = False
                grade_matches_current_submission = True
                attempt = 0
                excused = True
                student_entered_score = 0.00
                submission_comments_count = 0
                self.canvas_submissions_v2.loc[len(self.canvas_submissions_v2)] = [id,body,url,assignment_id,group_id,quiz_submission_id,user_id,score,published_score,grade,published_grade,graded_anonymously, \
                                                                                grader_id,graded_at,posted_at,submitted_at,submission_type,workflow_state,created_at,updated_at,processed,grade_matches_current_submission, \
                                                                                attempt,excused,student_entered_score,submission_comments_count]

    def writetojsonfile(self,filename,dfOutfile):
        finalgenfilepath = 'stage1/Transactional/test_data/v0.1/canvas_activity_gen/' + filename + '/' + filename + '.json'
        dfOutfile.to_json(oea.to_url(finalgenfilepath), orient='records', force_ascii=False, lines=True)

StatementMeta(, , , Cancelled, )