# Deep Learning
Training a Deep Learning model on the Enrollment Trends and Pattern and then using it to predict course demand in upcoming semesters, based on previous semester trends.

This is a model that predicts what the actual course demand will look like for all courses in future. The ultimate goal for this model is to extract behaioral enrollment trends and features for various courses, and then use these features to improve the accuracy of course demand prediction to as close to reality as possible. It will be carried out in an iterative manner, where it begins with just the base model that only uses the past enrollment history, and then add more features with time as they are discovered. The best part about using a Deep Learning Model is that it automatically performs feature selection, and also learns from intricate patterns and relationships between the features, which is very crucial for this problem owning to a lot of complex factors that affect the enrollment of a course. Some of the major areas to focus are:
- Course Information
    - Course Contents
    - Course Prerequisites
    - Course Difficulty
    - Course Requirement (Core / Concentration Core / Elective / Capstone)
    - Course Offered by (Department)
    - Course Offered in (Fall / Spring / Summer)
    - Course Cluster (Use clustering algorithm to group similar courses together)
- Course Instructor Information
    - Insturctor Ratings
    - Instructor Popularity (based on which sections are filled first)
- Student Information (Aggregated at the course level)
    - Student Program
    - Student Admit Term
    - Student Enrollment Term (1st Sem / 2nd Sem / ...)
    - Student Status (Domestic / International)
    - Prior Waitlisted (whether the student was waitlisted for this course in the prior semesters)
    - Prior Enrolled (whether the student was enrolled for this course in the prior semesters)
    - Prior Dropped (whether the student dropped this course in the prior semesters)
- 

## Initialization

In [2]:
# importing the required libraries
import os
import pandas as pd

os.chdir( os.path.join("..", "..") )

# importing custom modules
from Code.src.modules.db_ops import *
from Code.src.modules.dataManager import DataManager
from Code.src.modules.eda import *
from Code.src.modules.db_ops import *

# initializing the DataManager
DM = DataManager()

In [3]:
# importing the data for analysis
df_finalEnrollment = DM.get_data('EnrollmentFinalStatus', 'pkl', 'processed')
db_finalEnrollment = DM.get_data('EnrollmentFinalStatus', 'db', 'processed')

In [11]:
# Testing
df_finalEnrollment.columns

Index(['rec_id', 'rec_ext_date', 'file_name', 'file_index', 'reg_term_code',
       'reg_term_year', 'reg_term_name', 'reg_term_desc', 'stu_id',
       'stu_deg_level', 'stu_college', 'stu_res', 'stu_visa', 'stu_bam',
       'stu_new_ret', 'stu_dept', 'stu_dept_desc', 'stu_prog_code',
       'stu_prog_level', 'stu_prog_desc', 'stu_admit_term_code',
       'stu_admit_term_year', 'stu_admit_term_name', 'stu_admit_term_desc',
       'crs', 'crs_type', 'crs_credits', 'crs_hours', 'crs_sect',
       'crs_sect_clg', 'crs_sect_modality', 'crs_sect_wiley_ind', 'reg_status',
       'reg_status_date', 'stu_act_reg_ind'],
      dtype='object')

In [15]:
df_finalEnrollment.crs_sect_modality.value_counts()

F2F                   40111
Online                17738
Mix F2F and Online     1560
Name: crs_sect_modality, dtype: int64

## Data Preparation
Here, We prepare the data for Deep Learning.

In [27]:
raw_df = db_finalEnrollment.runQuery("""
    SELECT
        crs,
        SUM(CASE WHEN reg_term_code = 201820 THEN 1 ELSE 0 END) AS Spring2018,
        SUM(CASE WHEN reg_term_code = 201870 THEN 1 ELSE 0 END) AS Fall2018,
        SUM(CASE WHEN reg_term_code = 201920 THEN 1 ELSE 0 END) AS Spring2019,
        SUM(CASE WHEN reg_term_code = 201970 THEN 1 ELSE 0 END) AS Fall2019,
        SUM(CASE WHEN reg_term_code = 202020 THEN 1 ELSE 0 END) AS Spring2020,
        SUM(CASE WHEN reg_term_code = 202070 THEN 1 ELSE 0 END) AS Fall2020,
        SUM(CASE WHEN reg_term_code = 202120 THEN 1 ELSE 0 END) AS Spring2021,
        SUM(CASE WHEN reg_term_code = 202170 THEN 1 ELSE 0 END) AS Fall2021,
        SUM(CASE WHEN reg_term_code = 202220 THEN 1 ELSE 0 END) AS Spring2022,
        SUM(CASE WHEN reg_term_code = 202270 THEN 1 ELSE 0 END) AS Fall2022,
        SUM(CASE WHEN reg_term_code = 202320 THEN 1 ELSE 0 END) AS Spring2023
    FROM EnrollmentFinalStatus
    WHERE
                reg_status IN ("**Web Registered**", "Wait Listed", "**Registered**")
    GROUP BY
        crs
""")

raw_df

Unnamed: 0,crs,Spring2018,Fall2018,Spring2019,Fall2019,Spring2020,Fall2020,Spring2021,Fall2021,Spring2022,Fall2022,Spring2023
0,ACCT 672,0,0,0,1,0,0,0,0,0,0,0
1,AIT 502,0,11,0,10,0,2,0,0,0,7,0
2,AIT 512,0,24,0,47,0,35,0,29,0,49,0
3,AIT 524,0,121,0,98,0,101,0,129,0,128,0
4,AIT 526,0,0,0,0,0,0,0,21,0,83,0
...,...,...,...,...,...,...,...,...,...,...,...,...
665,TECM 746,0,0,0,0,0,0,0,1,0,1,0
666,TECM 747,0,0,0,0,0,0,0,1,0,1,0
667,TECM 749,0,0,0,0,0,0,0,1,0,1,0
668,TECM 761,0,0,0,0,0,0,0,1,0,0,0


In [37]:

raw_df.shape[1] - (trend_length+1)

7

In [45]:
raw_df.columns[1:1+5]

Index(['Spring2018', 'Fall2018', 'Spring2019', 'Fall2019', 'Spring2020'], dtype='object')

In [64]:
trend_length=4

def generateTrends(trend_length):
    sem_list = []
    for i in range( raw_df.shape[1] - (trend_length+1) ):
        sem_list.append(raw_df.columns[i+1:i+2+trend_length].to_list())
    return sem_list

df = pd.DataFrame()
for i in generateTrends(trend_length):
    print(i)

['Spring2018', 'Fall2018', 'Spring2019', 'Fall2019', 'Spring2020']
['Fall2018', 'Spring2019', 'Fall2019', 'Spring2020', 'Fall2020']
['Spring2019', 'Fall2019', 'Spring2020', 'Fall2020', 'Spring2021']
['Fall2019', 'Spring2020', 'Fall2020', 'Spring2021', 'Fall2021']
['Spring2020', 'Fall2020', 'Spring2021', 'Fall2021', 'Spring2022']
['Fall2020', 'Spring2021', 'Fall2021', 'Spring2022', 'Fall2022']
['Spring2021', 'Fall2021', 'Spring2022', 'Fall2022', 'Spring2023']
