In this notebook, we will finally create predictive features using the logs we cleaned on notebook 2.2. 

Our focus will be to obtain the Temporal Data. We consider this to be the number of daily clicks 

#### 1. Importing the relevant packages, setting global variables and importing the relevant files

In [1]:
#import libs
import pandas as pd
import numpy as np
from pandas.tseries.offsets import *

#viz related tools
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from matplotlib.colors import LogNorm, Normalize
from matplotlib.ticker import MaxNLocator
import matplotlib as mpl
from matplotlib import cm
import seaborn as sns

#tqdm to monitor progress
from tqdm.notebook import tqdm, trange
tqdm.pandas(desc="Progress")

#time related features
from datetime import timedelta
from copy import copy, deepcopy

#starting with other tools
sns.set()

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

#to save
import xlsxwriter

In [2]:
#global variables that may come in handy
#course threshold sets the % duration that will be considered (1 = 100%)
duration_threshold = [0.1, 0.25, 0.33, 0.5, 1]

#colors for vizualizations
nova_ims_colors = ['#BFD72F', '#5C666C']

#standard color for student aggregates
student_color = '#474838'

#standard color for course aggragates
course_color = '#1B3D2F'

#standard continuous colormap
standard_cmap = 'viridis_r'

#Function designed to deal with multiindex and flatten it
def flattenHierarchicalCol(col,sep = '_'):
    '''converts multiindex columns into single index columns while retaining the hierarchical components'''
    if not type(col) is tuple:
        return col
    else:
        new_col = ''
        for leveli,level in enumerate(col):
            if not level == '':
                if not leveli == 0:
                    new_col += sep
                new_col += level
        return new_col

In [None]:
#loading student log data 
student_logs = pd.read_csv('../Data/Modeling Stage/NovaIMS_cleaned_logs.csv',
                           dtype = {
                                   'cd_curso': float,
                                   'userid': float,
                                   'courseid': float,
                                   },
                                   parse_dates = ['time']).drop('Unnamed: 0', axis = 1) #logs

#converting to object
student_logs['userid'], student_logs['cd_curso'], student_logs['courseid'] = student_logs['userid'].astype(object), student_logs['cd_curso'].astype(object), student_logs['courseid'].astype(object)

#other tables with support information
support_table = pd.read_csv('../Data/Nova_IMS_support_table.csv',
                             dtype = {
                                 'cd_curso' : float,
                                 'courseid' : float,
                                 'userid' : float,
                                 'assign_id': float,
                             }, parse_dates = ['startdate', 'end_date']).drop('Unnamed: 0', axis = 1)

#converting to object
support_table['userid'], support_table['cd_curso'], support_table['courseid'], support_table['assign_id'] = support_table['userid'].astype(object), support_table['cd_curso'].astype(object), support_table['courseid'].astype(object), support_table['assign_id'].astype(object)

#save tables 
class_list = pd.read_csv('../Data/Modeling Stage/NovaIMS_class_duration.csv', 
                         dtype = {
                                   'cd_curso': float,
                                   'courseid': float,                                   
                                   },
                        parse_dates = ['Start Date','End Date', 'cuttoff_point']).rename(columns = {'cuttoff_point' : 'Week before start'})

#converting to object
class_list['cd_curso'], class_list['courseid'] = class_list['cd_curso'].astype(object), class_list['courseid'].astype(object)

#targets tables 
targets_table = pd.read_csv('../Data/Modeling Stage/Nova_IMS_targets_table.csv',
                           dtype = {
                                   'cd_curso': float,
                                   'userid': float,
                                   'courseid': float,
                                   },).drop('Unnamed: 0', axis = 1)

#converting to float
targets_table['userid'], targets_table['courseid'] = targets_table['userid'].astype(object), targets_table['courseid'].astype(object)

We'll start with the general verification of the different datasets we've imported. 

**Starting with the targets table, which includes all valid student-course logs with Final-Grade.**

In [None]:
#get info
targets_table.info()

In [None]:
targets_table.describe(include = 'all', datetime_is_numeric = True)

In [None]:
targets_table

Then, we repeat the same for the list of courses and their respective start and end dates. We know that the number of students attending each course is the number found in the logs. We will need to make further cuts 

In [None]:
class_list.info()

In [None]:
class_list.describe(include = 'all', datetime_is_numeric = True)

In [None]:
class_list

We still note a significant presence of courses with small numbers of students. The first step we will take is the removal of all courses whose number of attending students is below 50.

In [None]:
class_list = class_list[class_list['Users per course'] >= 50]

#updating student logs
student_logs = student_logs[student_logs['courseid'].isin(class_list['courseid']) & 
                            student_logs['cd_curso'].isin(class_list['cd_curso']) &
                            student_logs['semestre'].isin(class_list['semestre'])].reset_index(drop = True)

#additionally updating targets_table
targets_table = targets_table[targets_table['courseid'].isin(class_list['courseid']) & 
                              targets_table['cd_curso'].isin(class_list['cd_curso']) &
                              targets_table['semestre'].isin(class_list['semestre'])].reset_index(drop = True)

#additionally updating support_table
support_table = support_table[support_table['courseid'].isin(class_list['courseid']) & 
                              support_table['cd_curso'].isin(class_list['cd_curso']) &
                              support_table['semestre'].isin(class_list['semestre'])].reset_index(drop = True)

class_list.describe(include = 'all', datetime_is_numeric = True)

In [None]:
targets_table

In [None]:
student_logs

We'll follow up with taking a closer look logs we cleaned in the previous section. 

In [None]:
student_logs.keys()

In [None]:
student_logs.describe(include = 'all', datetime_is_numeric = True)

In [None]:
#then we plot an histogram with all courses, we are not interested in keeping courses with a number of students inferior to 10
sns.set_theme(context='paper', style='whitegrid', font='Calibri', rc={"figure.figsize":(16, 10)}, font_scale=2)
hist4 = sns.histplot(data=class_list, x='Users per course', kde=True, color= student_color, binwidth = 5,)

fig = hist4.get_figure()
fig.savefig('../Images/Nova_students_per_course_bin_5, filtered.png', transparent=True, dpi=300)

#delete to remove from memory
del fig, hist4


Likewise, there is some attention to be found on courses with abnormally high numbers of attending students in a face-to-face context (over 200). We will pay closer attention to those courses.

In [None]:
#create df only with high affluence courses
most_affluent_courses = class_list[class_list['Users per course'] >= 100]

#separate logs accordingly
high_attendance_logs = student_logs[student_logs['courseid'].isin(most_affluent_courses['courseid']) & 
                            student_logs['cd_curso'].isin(most_affluent_courses['cd_curso']) &
                            student_logs['semestre'].isin(most_affluent_courses['semestre'])].reset_index(drop = True)
high_attendance_logs

In [None]:
high_attendance_logs.describe(include = 'all', datetime_is_numeric = 'all')

We can plot the weekly interactions of these courses.

In [None]:
#Then, when it comes to logs, we aggregate by week
grouped_data = high_attendance_logs.groupby([pd.Grouper(key='time', freq='W'), 'cd_curso', 'semestre', 'courseid']).agg({
                                                                             'action': 'count',
                                                                             }).reset_index().sort_values('time')

#change for better reading
grouped_data['Date (week)'] = grouped_data['time'].astype(str)

#creating pivot table to create heatmap
grouped_data = grouped_data.pivot_table(index =['cd_curso', 'semestre', 'courseid'], 
                       columns = 'Date (week)',
                        values = 'action', 
                       aggfunc =np.sum,
                        fill_value=np.nan)

#now, we will sort the courses according to the starting date
grouped_data = grouped_data.reset_index().rename(columns = {'courseid': 'Course',
                                                            'cd_curso': 'Program',
                                                            'semestre': 'Semester',
                                                           })

grouped_data['Course'] = pd.to_numeric(grouped_data['Course']).astype(int)

#finally we create the pivot_table that we will use to create our heatmap
grouped_data = grouped_data.set_index(['Program', 'Semester', 'Course'], drop = True)
grouped_data.T.describe(include = 'all').T

In [None]:
sns.set_theme(context='paper', style='whitegrid', font='Calibri', rc={"figure.figsize":(20, 12)}, font_scale=2)

#here, we are plotting the nex
heat4 = sns.heatmap(grouped_data, robust=True, norm=LogNorm(), xticklabels = 2, yticklabels= 1,
            cmap = standard_cmap, cbar_kws={'label': 'Weekly interactions'})

fig = heat4.get_figure()
fig.savefig('../Images/Nova_IMS_highest_attendance_weekly_clicks_heat.png', transparent=True, dpi=300)

#delete to remove from memory
del fig, heat4

After consideration, we find that the student interactions seem to be consistent with the course duration. 

**1. First, we filter by our current list of valid courses.**

In [None]:
#Representation of different targets depending 
g = sns.PairGrid(targets_table, diag_sharey=False, corner=True)
g.map_diag(sns.histplot)
g.map_lower(sns.scatterplot)
g.add_legend()

In [None]:
#a larger overlook at the different courses
targets_table.groupby(['cd_curso', 'semestre', 'courseid']).agg({
                                    'userid' : 'count', 
                                    'exam_mark' : ['min', 'mean', 'max'],                                    
                                    'final_mark' : ['min', 'mean', 'max'],
                                    }).describe(include = 'all')

**We will finish by taking a look at our support table**. This table associates all students attending a specific course and the partial grades obtained by each student.

In the Nova IMS these grades are not timestamped (i.e. we do not know to which assignment-quizz-event the partial grade refers to nor when the specific assignment refers to).

In [None]:
#get info
support_table.info()

In [None]:
support_table.describe(include = 'all', datetime_is_numeric = True)

In [None]:
support_table

In [None]:
#a larger overlook at the different courses
support_table.groupby(['cd_curso', 'semestre', 'courseid']).agg({
                                    'userid' : 'nunique',
                                    'assign_id' : 'nunique', 
#                                     'mandatory_status' : 'mean',
#                                     'delivered' : 'mean',                                    
                                    'assignment_mark' : 'mean',
                                    }).describe(include = 'all')

In [None]:
#Representation of different targets depending 
g = sns.PairGrid(support_table, diag_sharey=False, corner=True)
g.map_diag(sns.histplot)
g.map_lower(sns.scatterplot)
g.add_legend()

**Going forward**.

After this preliminary look, we will go forward with extracting features from the Moodle logs. 

In this notebook, we will consider a temporal representation that considers each student-course pair as a row and each column to represent a day in the course.

An important distinction is that, when working in this manner, we do not have to perform multiple pre-processing steps. Instead we get, for each row, a sequence of the daily number of clicks.

For the purposes of retaining the same sets of rows as the ones we got previously we will follow through with the preprocessing steps we had determined in the non-temporal representation. Again, this is exclusively performed to retain the same rows we had previously.

In [None]:
#correct objecttable
other_objects =  ['tag_instance', 'badge', 'feedback_completed', 'feedback', 'course_modules_completion', 'feedback']

#badges on target id
badges_on_target = ['badge_listing', 'badge', 'recent_activity']

#grades 
grading_objects = ['gradereport_overview', 'gradereport_user']

#assignment from elements in the component column
assign_objects = ['assignsubmission_onlinetext', 'assignsubmission_comments', 'mod_assign', 'assignsubmission_file']

#workshop
workshops = ['workshop_submissions', 'workshop']

#course
courses_on_target = ['course', 'course_resources_list', 'course_user_report']

#corrections on the objecttable column
student_logs['objecttable'] = np.where(student_logs['objecttable'].isin(other_objects),
                                      'other',
                                       np.where(student_logs['target'].isin(badges_on_target),
                                      'other',
                                       np.where(student_logs['target'].isin(courses_on_target),
                                      'course',
                                      np.where(student_logs['component'] == 'mod_forum',
                                      'forum',
                                      np.where(student_logs['component'].isin(grading_objects),
                                      'grade_grades',
                                       np.where(student_logs['component'].isin(assign_objects),
                                      'assignments',
                                       np.where(student_logs['objecttable'].isin(workshops),
                                      'workshop', 
                                       np.where(student_logs['objecttable'] == 'book_chapters',
                                      'book',
                                       np.where(student_logs['component'] == 'mod_coice',
                                      'choice',
                                       np.where(student_logs['component'] == 'mod_choicegroup',
                                      'groups',
                                      np.where((student_logs['component'] == 'mod_quiz') & (student_logs['objecttable'].isna()),'quiz',
                                      student_logs['objecttable'])))))))))))

del other_objects, badges_on_target, grading_objects, assign_objects, workshops, courses_on_target

In [None]:
# #uncomment to verify pairings
# with pd.option_context('display.max_rows', None,):
#      display(student_logs['objecttable'].value_counts())

In [None]:
# #uncomment to verify pairings
# with pd.option_context('display.max_rows', None,):
#      display(student_logs['component'].value_counts())

In [None]:
# #uncomment to verify pairings
# with pd.option_context('display.max_rows', None,):
#      display(student_logs['target'].value_counts())

In [None]:
# #uncomment to verify pairings
# with pd.option_context('display.max_rows', None,):
#      display(student_logs['action'].value_counts())

Likewise, we will need to take a look at the different actions in order to understand how common these may be. 

Again, we will look at different actions and see how we can group them together in a way that, at least, makes intuitive sense. There is use in keeping the distinction between different types of view.

In [None]:
#updates and edits related to making editions on presented information:

#additions 
addition = ['created', 'added']

#deletion
deletion = ['deleted', 'removed']

#other actions
other_actions = ['awarded', 'printed', 'abandoned', 'searched']

#submissions
submission = ['submitted', 'submission', 'submissions']

#converts discussion points to forum or, alternatively,groups other elements to other category
student_logs['action'] = np.where(student_logs['action'].isin(addition), 'added', #addition list
                                  np.where(student_logs['action'].isin(deletion), 'delete', #deletion list
                                  np.where(student_logs['action'].isin(other_actions), 'other actions', #other actions
                                  np.where(student_logs['action'].isin(submission), 'submission', #submissions to submission
                                  student_logs['action']
                                 ))))

#we finish by ending these lists we've created
del addition, deletion, other_actions, submission

In [None]:
# # #uncomment to verify pairings
# with pd.option_context('display.max_rows', None,):
#      display(student_logs.groupby(['objecttable', 'target', 'action']).size().to_frame())

We do not know, yet, whether we can make an effective association between the partial grades and the logs. We can explore the submissions made in each discipline and calculate the likely dates of submission.

For that, for every course, we will look at submissions performed by the different students.

We will count the number of assignments assigned to each discipline - the number of graded assignments givenaverage number of grades attributed in each curricular unit.

In [None]:
#then we get all unique entries and assign them an index number
courses = class_list[['cd_curso', 'semestre', 'courseid']].drop_duplicates().reset_index(drop = True).reset_index()

#then, we create a dict using the combination of index, courseid and status as keys
courses = courses.set_index(['cd_curso', 'semestre', 'courseid']).to_dict()['index']

#set index of df to match same index
class_list.set_index(['cd_curso', 'semestre', 'courseid'], drop = True, inplace = True)

#set index of df to match same index
student_logs.set_index(['cd_curso', 'semestre', 'courseid'], drop = True, inplace = True)

#set index of support_df to match same index
support_table.set_index(['cd_curso', 'semestre', 'courseid'], drop = True, inplace = True)

#set index of targets_table to match same index
targets_table.set_index(['cd_curso', 'semestre', 'courseid'], drop = True, inplace = True)

#use index as key for dict
student_logs['course_encoding'] = student_logs.index.map(courses).astype(object)
class_list['course_encoding'] = class_list.index.map(courses).astype(object)
support_table['course_encoding'] = support_table.index.map(courses).astype(object)
targets_table['course_encoding'] = targets_table.index.map(courses).astype(object)

#resetting index
student_logs = student_logs.dropna(subset = ['course_encoding']).reset_index()
class_list.reset_index(inplace = True)
support_table = support_table.dropna(subset = ['course_encoding']).reset_index()
targets_table = targets_table.dropna(subset = ['course_encoding']).reset_index()

In [None]:
#first, get count of assignments as defined in the support_table 
assignments_per_course = support_table.groupby('course_encoding').agg({'assign_id' : 'nunique'})
assignments_per_course = assignments_per_course.to_dict()['assign_id']

#next, we filter by only keepting submissions, avoid the first 2 weeks of submissions
submission_logs = student_logs[(student_logs['action'] == 'submission') & (student_logs['time'] >= '2020-09-18')].sort_values(by = 'time').reset_index(drop = True)

In [None]:
#second filtering condition - keep add, assessable and attempts -> edits and deletes are not relevant to determine number of submissions
submission_logs = submission_logs[submission_logs['target'].isin(target_to_keep := ['add',
                                                                                    'assessable',
                                                                                    'attempt',
                                                                                   ])]
#add number of assignments of submission_logs
submission_logs['nbr_assignments'] = submission_logs['course_encoding'].map(assignments_per_course)

#we give each submission made by each student in the context of certain curricular units
submission_logs['course_student_submission_number'] = submission_logs.groupby(['course_encoding', 'userid']).cumcount() + 1
submission_logs = submission_logs.filter(['course_encoding', 'userid', 'time', 'course_student_submission_number', 'nbr_assignments'])

#then, we get the number of submissions made by each student and the average date for each submission
submission_logs = submission_logs.groupby(['course_encoding', 'course_student_submission_number']).agg(
                                                                                {
                                                                                    'userid': 'count',
                                                                                    'time': ['mean', 'median'],
                                                                                    'nbr_assignments': 'mean' #equal to same 
                                                                                })

#applies the function that removes multiindex
submission_logs.columns = submission_logs.columns.map(flattenHierarchicalCol)
submission_logs.reset_index(inplace = True)

#then, we only keep the number of submissions that is in line with the number on the support_table
submission_logs = submission_logs[submission_logs['course_student_submission_number'] <= submission_logs['nbr_assignments_mean']].reset_index(drop = True)
submission_logs['course_student_submission_number'] = submission_logs['course_student_submission_number'].astype(object)
submission_logs['course_encoding'] = submission_logs['course_encoding'].astype(object)

#then, in order to make the proper merge, we'll need to make the proper adjustments - namely label the submission date of each discipline
support_table['course_student_submission_number'] = pd.to_numeric(support_table['statusAvaliacao'].str.extract('(\d+)', expand=False)).astype(object)

In this next step we, we will timestamp each assignment recorded on the support table. It is likely that, in group assigments, only one student submits for all coleagues. Therefore, it is not possible to make a 1 to 1 between submission and grade.

In general -> Partial grade 1 refers to a student's first submission, partial grade 2 to the second submission, etc...
We will assume deadline date for each partial grade to be the median delivery date of the classe's ith submission.

In [None]:
#filtering columns before merge
submission_logs = submission_logs.filter(['course_encoding', 'course_student_submission_number',
                                         'time_median'])

#making a rightward merge with the support_table on course_encoding and course_student_submission_number
support_table = pd.merge(support_table, submission_logs, on = ['course_encoding', 'course_student_submission_number'], how = 'left')

#at this stage, we can now start dropping columns that are ultimately unnecessary and reclaibrating the assign_id column
support_table['assign_id'] = support_table.groupby(['course_encoding', 'statusAvaliacao']).ngroup()
support_table['sup_time'] = pd.to_datetime(support_table['time_median'].dt.date)
support_table.drop(['statusAvaliacao', 'course_student_submission_number', 'time_median'], axis = 1, inplace = True)

del submission_logs, assignments_per_course

support_table.describe(include = 'all', datetime_is_numeric = True)

We have addressed the most obvious possible aggregations. Now, we will go forward with our intended feature extraction and selection.

We can start by removing all of the unnecessary columns that we will not be using going forward and, then, create 5 distinct dicts of dataframes. Each dict refers to a certain course duration threshold.

In [None]:
#filtering for 
student_logs = student_logs.filter(['cd_curso', 'semestre', 'courseid', 'objecttable', 'action', 'target', 'component',
                                   'userid', 'time', 'course_encoding'])

In [None]:
#additionally, we will look at our estimated course duration
for i in tqdm(duration_threshold):
    #create, for each desired threshold, the appropriate cutoff date 
    class_list[f'Date_threshold_{int(i*100)}'] = pd.to_datetime((class_list['Start Date'] + pd.to_timedelta(class_list['Course duration days'] * i, unit = 'Days')).dt.date)
    
        #setting up duration threshold to be on friday -> reason being that it will be easier to 
    class_list[f'Date_threshold_{int(i*100)}'] = class_list[f'Date_threshold_{int(i*100)}'].where( class_list[f'Date_threshold_{int(i*100)}'] == (( class_list[f'Date_threshold_{int(i*100)}'] + Week(weekday=4)) - Week()), class_list[f'Date_threshold_{int(i*100)}'] + Week(weekday=4))

#then, we will create a dictionary of dictionaries, each main dictionary storing and a version of the logs
logs_dict = {}

for i in tqdm(duration_threshold):
    #create, for each desired threshold, a different dictionary of dataframes wherein we will perform the different operations
    print(f'Date_threshold_{int(i*100)}\n' +
          f'Logs')
    logs_dict[f'Date_threshold_{int(i*100)}'] = {course: student_logs.loc[student_logs['course_encoding'] == course].reset_index(drop = True) for course in tqdm(student_logs['course_encoding'].unique())}

Now, we have a nested dictionary with different dataframes inside it. We will use this data structure to perform the most of the operations we are interested in.

**First, we will add, to each dataframe, a column with the corresponding threshold date**

After this cleaning procedure, we will, for each course get the daily clicks and put them on a pivot_table. We will finish the procedure by exporting updated versions of the class list as these will also need to be used going forward.

In [None]:
#for each intended course duration threshold
for i in tqdm(logs_dict):
    #start with creating a dictionary of course and intended cuttoff date
    cut = class_list.set_index('course_encoding').to_dict()[i] 
    
    #for each dataframe
    for j in tqdm(logs_dict[i]):
        #where the course is the same as in the class_list, get the corresponding value of the appropriate column,
        logs_dict[i][j]['Date Threshold'] = logs_dict[i][j]['course_encoding'].map(cut)
        logs_dict[i][j] = logs_dict[i][j][logs_dict[i][j]['time'] <= logs_dict[i][j]['Date Threshold']].reset_index(drop = True).drop('Date Threshold', axis = 1)

        #Aggregate by day
        logs_dict[i][j] = logs_dict[i][j].groupby(['course_encoding', 'cd_curso', 'semestre', 'courseid', pd.Grouper(key='time', freq='D'), 'userid']).agg({
                                                                                                                'action': 'count', 
                                                                                                                }).reset_index().sort_values('time')
        
        #then, we create a pivot_table
        logs_dict[i][j] = pd.pivot_table(logs_dict[i][j], index=['course_encoding', 'cd_curso', 'semestre', 'courseid', 'userid'], columns = 'time', values = 'action',
                    aggfunc='sum').fillna(0)
        
        #and rename columns to fit with the number of days
        logs_dict[i][j] = logs_dict[i][j].rename(columns={x:y for x,y in zip(logs_dict[i][j].columns,range(1,len(logs_dict[i][j].columns) + 1))})
        logs_dict[i][j].columns.name = None
        
        #joining final grade for target
        logs_dict[i][j] = logs_dict[i][j].merge(targets_table.filter(['course_encoding', 'userid', 'exam_mark', 'final_mark']), on = ['course_encoding', 'userid'], how = 'right')
    
    #after the end of the loops:
    logs_dict[i] = pd.concat(logs_dict[i], ignore_index=True)
    logs_dict[i] = logs_dict[i].sort_values(by = ['course_encoding', 'userid', 'final_mark']).reset_index(drop = True)

In [None]:
logs_dict['Date_threshold_100'][71]

In [None]:
#create backup of logs dict, we will need it for later
backup = deepcopy(logs_dict)

In [None]:
#create backup of logs dict, we will need it for later
logs_dict = deepcopy(backup)

In order to account for situations where registered students only access Moodle later in the course, we will make an additional, but necessary adaptation. 

We will start by looking at the complete set of valid students/courses in our 100% dataset. From these, we get the indexes of the rows that are valid (i.e. have a valid click count at 100% duration), get the indexes and retain only these.

In [None]:
#we gather the index number of valid rows in the 100% df
rows_to_keep = logs_dict['Date_threshold_100'][~logs_dict['Date_threshold_100'][1].isna()].index

# #then slice accordingly
for i in tqdm(logs_dict):
    logs_dict[i] = logs_dict[i].iloc[rows_to_keep, :].reset_index(drop = True)

#### Almost Done.

We will finish this step momentarily. Before we do, we need to save all dfs in an easily accessible Excel File.

In [None]:
writer = pd.ExcelWriter('../Data/Modeling Stage/Nova_IMS_Temporal_Datasets.xlsx', engine='xlsxwriter')

#now loop thru and put each on a specific sheet
for sheet, frame in  logs_dict.items(): 
    frame.to_excel(writer, sheet_name = sheet)

#critical last step
writer.save()

#also saving additional info on class list
class_list.to_csv('../Data/Modeling Stage/Nova_IMS_updated_classlist.csv')