# Rule-baased Constrained Proportional Fucking Projections

The way the problem has been presented, there are basically two ways to make predictions for course demand in the future.
1. Realistic Models
2. Idealistic Models

**Realistic Models** focus on the randomness in the enrollment process, which aggregates the student enrollment data and accounts for the randomness to make predictions that are as close to reality as possible.
**Idealistic Models** on the other hand focus on the department recommendations, and graduation pathway rules to make predictions while satisfying these constraints.
Perhaps later, a combination of these models can be employed to make predictions that is built on the Idealistic Model, but also includes variability defined by the random nature of enrollment.

The **RbCPP (Rule-based Constrained Proportional Projectoin)** lies closer to the idealistic model to make predictions for international students and mainly the core and capstone courses. The elective courses are predicted using the probability defined from previous enrollment data.

## Rules:
1. The core courses must be registered prior to any elective courses.
2. The capstone course which is also a core course in certain degrees, is considered as a separate category as they must be enrolled in the final semester only.
3. Some core courses have options, students can choose to take some of the options. In such cases, the model will treat them as elective courses.
3. Some elective courses are preferred to be enrolled in the first two semesters (elementary courses), whereas some advanced courses are preferred to be enrolled later in the semester. (Inferred form past enrollment trend.)
4. The proportions of the weights remains relatively same throughout the semesters.

Algorithms:
1. Create and record the catalog data (core, elective and capstone courses) for each program.
2. Create a Pathway to Graduation for International Students based on the catalog data.
3. Get the Course Demand History for each program, course, student admit term, registration term

Calculations:
1. Course Requirements:

In [2]:
# importing the required libraries
import os
import pandas as pd
import matplotlib.pyplot as plt

os.chdir( os.path.join("..", "..", "..") )

from Code.src.modules.db_ops import *
from Code.src.modules.dataManager import DataManager
from Code.src.modules.eda import *

DM = DataManager()

In [3]:
DM.get_data_info()

{'course': {'d_desc': 'Course information obtained from Web Scraping the '
                      'course catalog.',
            'd_name': 'Courses Information',
            'd_state': {'processed': {'db': 'Data\\02_processed\\course.db'}}},
 'coursecatalog': {'d_desc': 'Program Requirements Information obtained from '
                             'the Course Catalog.',
                   'd_name': 'Course Catalog Data',
                   'd_state': {'processed': {'json': 'Code\\src\\prop\\course_catalog.json'}}},
 'enrollment': {'d_desc': 'Student Enrollment Status Information processed by '
                          'merging 110 CSV files obtained from Enrollment '
                          'Management at George Mason University.',
                'd_name': 'Student Enrollment Status Data',
                'd_state': {'processed': {'csv': 'Data\\02_processed\\enrollment.csv',
                                          'db': 'Data\\02_processed\\CECData.db',
                           

In [4]:
course_catalog              = DM.get_data('coursecatalog', 'json')
db                          = DM.get_data('enrollmentfinalstatus', 'db')
df_enrollmentFinalStatus    = DM.get_data('enrollmentfinalstatus', 'pkl')

# Automating the RbCPP

Assumptions:
- Only working for International Students as they have a fixed 3-3-3-1 enrollment pattern.
- The core courses are registered within the first two semesters of the degree program.
    - x% of the core courses are registered in the first semester.
    - 100% - x% of the core courses are registered in the second semester.
    - Remaining course registrations are elective courses.
- The elective course registrations are similar for first two semesters.
- The third semester has a different enrollment proportions for elective courses compared to first two semesters.
- The capstone courses (also core) are registered in the final semester.

Algorithm:
1. Get User Inputs:
    - A program is selected for analysis.
2. Course Requiremetns are extracted for the selected program.\
~~ An algorithm defines the Pathway to graduate for International Student Course Enrollments based on number of core courses required in the program.~~
3. Get the Course Demand History from either the `Final Snapshot` or the `Final Enrollment Status`
4. 
The residual course enrollment is calculated for each semester, by subtrating the number of core courses registered in that semester, from the total number of core courses required for the program.
    - The residuals are calculated for each semester as follows:
        $$\mathscr{Residual} = \mathscr{Total Number of Courses to register} - \mathscr{Number of Core Courses Registered in that Semester}$$
- Weight Calculation:
    - The weights are calculated for each elective course enrolled in the first and second semester with respect to the total number of elective courses enrolled in the first and second semester.
    - Fall 2021 data is used to calculate the weights 

## International Student Pathway to Graduation

There is a massive scope for improvement in the algorithm that defines the Pathway to Graduation for International Students. The current algorithm is BS, and needs to be improved.

In [6]:
# Defining the International Student Pathway to Graduation for all recorded programs
dict_gradPathway = {}
for i_prog in course_catalog.keys():
    # Set total course registrations for international students in each semester
    dict_gradPathway[(i_prog, 'Total', 'Sem1')] = 3
    dict_gradPathway[(i_prog, 'Total', 'Sem2')] = 3
    dict_gradPathway[(i_prog, 'Total', 'Sem3')] = 3
    dict_gradPathway[(i_prog, 'Total', 'Sem4')] = 1

    # Set course registrations to zero for all course types in each semester
    for i_crs_req in ['Core', 'Capstone', 'Elective']:
        for i_sem in ['Sem1', 'Sem2', 'Sem3', 'Sem4']:
            dict_gradPathway[(i_prog, i_crs_req, i_sem)] = 0

    # Assigning core courses to initial semesters
    # # Automatic
    # i = n_core
    # i_sem = 1
    # str_sem = 'Sem' + str(i_sem)
    # while i >= 3:
    #     if i = 
    # Manual Labor
    n_core = len(course_catalog[i_prog]['Core'])
    if n_core == 6:
        dict_gradPathway[(i_prog, 'Core', 'Sem1')] = 3
        dict_gradPathway[(i_prog, 'Core', 'Sem2')] = 3
    if n_core > 3:
        dict_gradPathway[(i_prog, 'Core', 'Sem1')] = 3
        dict_gradPathway[(i_prog, 'Core', 'Sem2')] = n_core%3
    else:
        dict_gradPathway[(i_prog, 'Core', 'Sem1')] = n_core
        dict_gradPathway[(i_prog, 'Core', 'Sem2')] = 0
    
    n_capstone = len(course_catalog[i_prog]['Capstone'])
    if n_capstone == 2:
        dict_gradPathway[(i_prog, 'Capstone', 'Sem3')] = 1
        dict_gradPathway[(i_prog, 'Capstone', 'Sem4')] = 1
    else:
        dict_gradPathway[(i_prog, 'Capstone', 'Sem4')] = n_capstone
    
    for i_sem in ['Sem1', 'Sem2', 'Sem3', 'Sem4']:
        dict_gradPathway[(i_prog, 'Elective', i_sem)] = dict_gradPathway[(i_prog, 'Total', i_sem)] - \
            dict_gradPathway[(i_prog, 'Core', i_sem)] - \
            dict_gradPathway[(i_prog, 'Capstone', i_sem)]

## Latest Semester to fetch data
Need an algorithm here to automatically fetch the latest semester data.

In [7]:
CoreSems = {
    'Sem1'          : 'Fall 2022',
    'Sem2'          : 'Spring 2023',
    'Fall 2022'     : 'Sem1',
    'Spring 2023'   : 'Sem2'
}

ElecSems = {
    'Sem1'          : 'Fall 2021',
    'Sem2'          : 'Spring 2022',
    'Sem3'          : 'Fall 2022',
    'Sem4'          : 'Spring 2023',
    'Fall 2021'     : 'Sem1',
    'Spring 2022'   : 'Sem2',
    'Fall 2022'     : 'Sem3',
    'Spring 2023'   : 'Sem4'
}

## Getting the Course Demand History

In [21]:
df_CoreCrsDemHist = db.runQuery(f""" --sql
    SELECT
        stu_prog_desc, crs_req, crs, stu_admit_term_desc, reg_term_desc, COUNT(*) AS demand
    FROM EnrollmentFinalStatus
    WHERE
                stu_admit_term_desc = "{CoreSems.get('Sem1')}"
        AND     reg_status IN ("**Web Registered**", "Wait Listed", "**Registered**")
        AND     stu_visa = "F1 Visa"
        AND     crs_req = "Core"
        AND     stu_prog_desc IN {tuple(course_catalog.keys())}
    GROUP BY
        stu_prog_desc, crs_req, crs, stu_admit_term_desc, reg_term_desc
    ORDER BY
        stu_prog_desc, crs_req, crs, stu_admit_term_code, reg_term_code
""")

df_ElecCrsDemHist = db.runQuery(f""" --sql
    SELECT
        stu_prog_desc, crs_req, crs, stu_admit_term_desc, reg_term_desc, COUNT(*) AS demand
    FROM EnrollmentFinalStatus
    WHERE
                stu_admit_term_desc = "{ElecSems.get('Sem1')}"
        AND     reg_status IN ("**Web Registered**", "Wait Listed", "**Registered**")
        AND     stu_visa = "F1 Visa"
        AND     crs_req = "Elective"
        AND     stu_prog_desc IN {tuple(course_catalog.keys())}
    GROUP BY
        stu_prog_desc, crs_req, crs, stu_admit_term_desc, reg_term_desc
    ORDER BY
        stu_prog_desc, crs_req, crs, stu_admit_term_code, reg_term_code
""")

df_CrsDemHist = pd.concat([df_CoreCrsDemHist, df_ElecCrsDemHist], ignore_index=True)
dict_CrsDemHist = df_CrsDemHist.set_index(['stu_prog_desc', 'crs_req', 'crs', 'stu_admit_term_desc', 'reg_term_desc']).to_dict('index')
df_CrsDemHist

Unnamed: 0,stu_prog_desc,crs_req,crs,stu_admit_term_desc,reg_term_desc,demand
0,MS Applied Info Technology,Core,AIT 524,Fall 2022,Fall 2022,5
1,MS Applied Info Technology,Core,AIT 524,Fall 2022,Spring 2023,1
2,MS Applied Info Technology,Core,AIT 542,Fall 2022,Fall 2022,6
3,MS Applied Info Technology,Core,AIT 542,Fall 2022,Spring 2023,1
4,MS Applied Info Technology,Core,AIT 664,Fall 2022,Fall 2022,2
...,...,...,...,...,...,...
297,MS Information Systems,Elective,SWE 645,Fall 2021,Spring 2022,3
298,MS Information Systems,Elective,SWE 645,Fall 2021,Fall 2022,1
299,MS Information Systems,Elective,SWE 645,Fall 2021,Spring 2023,1
300,MS Information Systems,Elective,SYST 530,Fall 2021,Spring 2022,1


In [22]:
dict_CrsDemHist

{('MS Applied Info Technology',
  'Core',
  'AIT 524',
  'Fall 2022',
  'Fall 2022'): {'demand': 5},
 ('MS Applied Info Technology',
  'Core',
  'AIT 524',
  'Fall 2022',
  'Spring 2023'): {'demand': 1},
 ('MS Applied Info Technology',
  'Core',
  'AIT 542',
  'Fall 2022',
  'Fall 2022'): {'demand': 6},
 ('MS Applied Info Technology',
  'Core',
  'AIT 542',
  'Fall 2022',
  'Spring 2023'): {'demand': 1},
 ('MS Applied Info Technology',
  'Core',
  'AIT 664',
  'Fall 2022',
  'Fall 2022'): {'demand': 2},
 ('MS Applied Info Technology',
  'Core',
  'AIT 664',
  'Fall 2022',
  'Spring 2023'): {'demand': 3},
 ('MS Computer Science',
  'Core',
  'CS 530',
  'Fall 2022',
  'Fall 2022'): {'demand': 392},
 ('MS Computer Science',
  'Core',
  'CS 530',
  'Fall 2022',
  'Spring 2023'): {'demand': 42},
 ('MS Computer Science',
  'Core',
  'CS 531',
  'Fall 2022',
  'Fall 2022'): {'demand': 407},
 ('MS Computer Science',
  'Core',
  'CS 531',
  'Fall 2022',
  'Spring 2023'): {'demand': 21},
 ('MS 

In [92]:
('MS Information Systems', 'Elective', 'Sem4') == (i_key[0], i_key[1], latestSem.get(i_key[3]))

True

## Weights from Excel

In [19]:
df_weights = pd.read_csv( os.path.join('Data', '02_processed', 'weights_rbcpp.csv') )
dict_weights = df_weights.set_index(['stu_prog_desc', 'crs_req', 'crs', 'Sem']).to_dict('index')
df_weights

Unnamed: 0,crs,crs_req,stu_prog_desc,Sem,weight
0,AIT 580,Core,MS Data Analytics Engineering,Sem1,0.294900
1,AIT 580,Core,MS Data Analytics Engineering,Sem2,0.038500
2,AIT 580,Core,MS Data Analytics Engineering,Sem3,0.000000
3,AIT 580,Core,MS Data Analytics Engineering,Sem4,0.000000
4,CS 504,Core,MS Data Analytics Engineering,Sem1,0.142600
...,...,...,...,...,...
383,SWE 681,Elective,MS Computer Science,Sem4,0.013459
384,SWE 795,Elective,MS Computer Science,Sem1,0.000000
385,SWE 795,Elective,MS Computer Science,Sem2,0.000000
386,SWE 795,Elective,MS Computer Science,Sem3,0.007402


## Weights Calculation Skipped

In [91]:
for i_key in dict_CrsDemHist.keys():
    if i_key[1] == 'Core':
        print(
            (i_key[0], i_key[1], latestSem.get(i_key[3])),
            dict_CrsDemHist[i_key],
            dict_gradPathway[i_key[0], i_key[1], latestSem.get(i_key[3])]
        )

('MS Applied Info Technology', 'Core', 'Sem3') {'demand': 5} 0
('MS Applied Info Technology', 'Core', 'Sem4') {'demand': 1} 0
('MS Applied Info Technology', 'Core', 'Sem3') {'demand': 6} 0
('MS Applied Info Technology', 'Core', 'Sem4') {'demand': 1} 0
('MS Applied Info Technology', 'Core', 'Sem3') {'demand': 2} 0
('MS Applied Info Technology', 'Core', 'Sem4') {'demand': 3} 0
('MS Computer Science', 'Core', 'Sem3') {'demand': 392} 0
('MS Computer Science', 'Core', 'Sem4') {'demand': 42} 0
('MS Computer Science', 'Core', 'Sem3') {'demand': 407} 0
('MS Computer Science', 'Core', 'Sem4') {'demand': 21} 0
('MS Data Analytics Engineering', 'Core', 'Sem3') {'demand': 184} 0
('MS Data Analytics Engineering', 'Core', 'Sem4') {'demand': 16} 0
('MS Data Analytics Engineering', 'Core', 'Sem3') {'demand': 89} 0
('MS Data Analytics Engineering', 'Core', 'Sem4') {'demand': 118} 0
('MS Data Analytics Engineering', 'Core', 'Sem3') {'demand': 160} 0
('MS Data Analytics Engineering', 'Core', 'Sem4') {'de

In [60]:
dict_gradPathway

{('MS Data Analytics Engineering', 'Total', 'Sem1'): 3,
 ('MS Data Analytics Engineering', 'Total', 'Sem2'): 3,
 ('MS Data Analytics Engineering', 'Total', 'Sem3'): 3,
 ('MS Data Analytics Engineering', 'Total', 'Sem4'): 1,
 ('MS Data Analytics Engineering', 'Core', 'Sem1'): 3,
 ('MS Data Analytics Engineering', 'Core', 'Sem2'): 1,
 ('MS Data Analytics Engineering', 'Core', 'Sem3'): 0,
 ('MS Data Analytics Engineering', 'Core', 'Sem4'): 0,
 ('MS Data Analytics Engineering', 'Capstone', 'Sem1'): 0,
 ('MS Data Analytics Engineering', 'Capstone', 'Sem2'): 0,
 ('MS Data Analytics Engineering', 'Capstone', 'Sem3'): 0,
 ('MS Data Analytics Engineering', 'Capstone', 'Sem4'): 1,
 ('MS Data Analytics Engineering', 'Elective', 'Sem1'): 0,
 ('MS Data Analytics Engineering', 'Elective', 'Sem2'): 2,
 ('MS Data Analytics Engineering', 'Elective', 'Sem3'): 3,
 ('MS Data Analytics Engineering', 'Elective', 'Sem4'): 0,
 ('MS Computer Science', 'Total', 'Sem1'): 3,
 ('MS Computer Science', 'Total', 'Sem

## Incoming Student Enrollment

In [90]:
# Querying the incoming students using SQL
dict_incommingEnrollment = db.runQuery(f""" --sql
    SELECT
        stu_prog_desc, stu_admit_term_desc, COUNT(DISTINCT stu_id) AS incoming_students
    FROM EnrollmentFinalStatus
    WHERE
                stu_visa = "F1 Visa"
        AND     stu_prog_desc IN {tuple(course_catalog.keys())}
        AND     stu_admit_term_year > 2017
    GROUP BY
        stu_prog_desc, stu_admit_term_desc
    ORDER BY
        stu_prog_desc, stu_admit_term_code
""") \
        .set_index(['stu_prog_desc', 'stu_admit_term_desc']).to_dict('index')
dict_incommingEnrollment

{('MS Applied Info Technology', 'Spring 2018'): {'incoming_students': 7},
 ('MS Applied Info Technology', 'Fall 2018'): {'incoming_students': 14},
 ('MS Applied Info Technology', 'Spring 2019'): {'incoming_students': 11},
 ('MS Applied Info Technology', 'Fall 2019'): {'incoming_students': 8},
 ('MS Applied Info Technology',
  'Spring 2020 - COVID-19'): {'incoming_students': 14},
 ('MS Applied Info Technology', 'Fall 2020'): {'incoming_students': 5},
 ('MS Applied Info Technology', 'Spring 2021'): {'incoming_students': 5},
 ('MS Applied Info Technology', 'Fall 2021'): {'incoming_students': 10},
 ('MS Applied Info Technology', 'Spring 2022'): {'incoming_students': 5},
 ('MS Applied Info Technology', 'Fall 2022'): {'incoming_students': 7},
 ('MS Applied Info Technology', 'Spring 2023'): {'incoming_students': 5},
 ('MS Computer Science', 'Spring 2018'): {'incoming_students': 3},
 ('MS Computer Science', 'Fall 2018'): {'incoming_students': 26},
 ('MS Computer Science', 'Spring 2019'): {'inc

## Predictions

In [None]:
# Defining Semester pairing algorithm
str_Sem = "Fall 2021"

def getSem(str_Sem):
    if "Fall" in str_Sem:
        return "Fall"
    elif "Spring" in str_Sem:
        return "Spring"
    else:
        raise ValueError("Semester not recognized")

def getYear(str_Sem):
    return int(str_Sem.split(" ")[1])

def nextSem(str_Sem):
    if getSem(str_Sem) == "Fall":
        return f"Spring {getYear(str_Sem) + 1}"
    elif getSem(str_Sem) == "Spring":
        return f"Fall {getYear(str_Sem)}"
    else:
        raise ValueError("Semester not recognized")
    
def prevSem(str_Sem):
    if getSem(str_Sem) == "Fall":
        return f"Spring {getYear(str_Sem)}"
    elif getSem(str_Sem) == "Spring":
        return f"Fall {getYear(str_Sem) - 1}"
    else:
        raise ValueError("Semester not recognized")

def getCohorts(str_Sem, n_cohorts):
    return [f"{getSem(str_Sem)} {getYear(str_Sem) + i}" for i in range(n_cohorts)]


# Testing

## Future Work:
- The proportions can be calculated for each semester, and the relative change can be integrated into the model, such that the new predictions are made with projected weights.