# **Rule-baased Constrained Proportional Fucking Projections**

The way the problem has been presented, there are basically two ways to make predictions for course demand in the future.
1. Realistic Models
2. Idealistic Models

**Realistic Models** focus on the randomness in the enrollment process, which aggregates the student enrollment data and accounts for the randomness to make predictions that are as close to reality as possible.
**Idealistic Models** on the other hand focus on the department recommendations, and graduation pathway rules to make predictions while satisfying these constraints.
Perhaps later, a combination of these models can be employed to make predictions that is built on the Idealistic Model, but also includes variability defined by the random nature of enrollment.

The **RbCPP (Rule-based Constrained Proportional Projectoin)** lies closer to the idealistic model to make predictions for international students and mainly the core and capstone courses. The elective courses are predicted using the probability defined from previous enrollment data.

### **Rules:**
1. The core courses must be registered in the first two semesters.
2. The core courses must be registered prior to any elective courses.
3. The capstone course must be registered in the final semester.
4. An international student must register for 10 courses in total.
5. An international student registers for 3 courses in each semester, unless it is their final semester.
6. The international students must graduate within 4 semesters.
7. The international students does not register for any courses in the summer semester.
8. The international students registers for only those courses that are counted towards their degree program.


### **Assumptions:**
- The proportions of the weights remains relatively same throughout the semesters.
- Only working for International Students as they have a fixed 3-3-3-1 enrollment pattern.
- The core courses are registered within the first two semesters of the degree program.
    - x% of the core courses are registered in the first semester.
    - 100% - x% of the core courses are registered in the second semester.
    - Remaining course registrations are elective courses.
- The capstone course which is also a core course in certain degrees, is considered as a separate category as they must be enrolled in the final semester only.
- Some core courses have options, students can choose to take some of the options. In such cases, the model will treat them as elective courses.
- Some elective courses are preferred to be enrolled in the first two semesters (elementary courses), whereas some advanced courses are preferred to be enrolled later in the semester. (Inferred form past enrollment trend.)
    - The elective course registrations are similar for first two semesters.
    - The third semester has a different enrollment proportions for elective courses compared to first two semesters.
- The capstone courses (also core) are registered in the final semester.

### **Methodology:**
1. Get the Incoming Enrollment for previous semesters as well as expected incoming enrollment for the future semesters.
2. Generate Pathway to Graduation for International Students based on the catalog data.
3. Get the Course Demand History for latest 4 semesters.
4. Calculate the weights for each course based on the course demand history.
5. Calculate Expected Course Demand for the future semesters based on the incoming enrollment and the weights for each cohort.
6. Calculate the Cumulative Course Demand for the future semesters by adding the expected course demand for each cohort.

# Initialization
### Importing libraries, custom modules and data

In [2]:
# importing the required libraries
import os
import pandas as pd
import matplotlib.pyplot as plt

os.chdir( os.path.join("..", "..", "..") )

from Code.src.modules.db_ops import *
from Code.src.modules.dataManager import DataManager
from Code.src.modules.eda import *

DM = DataManager()

In [3]:
DM.get_data_info()

{'course': {'d_desc': 'Course information obtained from Web Scraping the '
                      'course catalog.',
            'd_name': 'Courses Information',
            'd_state': {'processed': {'db': 'Data\\02_processed\\course.db'}}},
 'coursecatalog': {'d_desc': 'Program Requirements Information obtained from '
                             'the Course Catalog.',
                   'd_name': 'Course Catalog Data',
                   'd_state': {'processed': {'json': 'Code\\src\\prop\\course_catalog.json'}}},
 'enrollment': {'d_desc': 'Student Enrollment Status Information processed by '
                          'merging 110 CSV files obtained from Enrollment '
                          'Management at George Mason University.',
                'd_name': 'Student Enrollment Status Data',
                'd_state': {'processed': {'csv': 'Data\\02_processed\\enrollment.csv',
                                          'db': 'Data\\02_processed\\CECData.db',
                           

In [4]:
course_catalog              = DM.get_data('coursecatalog', 'json')
db                          = DM.get_data('enrollmentfinalstatus', 'db')
df_enrollmentFinalStatus    = DM.get_data('enrollmentfinalstatus', 'pkl')

---
## Automating the RbCPP

### Classes and Functions
- Semester (Module)
    - Functions
        - getSem
        - getYear
        - getNextSem
        - getPrevSem
        - getSemRange
        - getCohorts
- Program (Class)
    - Attributes
        - catalog
        - pathway
        - weights
    - Methods
        - getCatalog
        - getPathway
        - getWeights
        - getCoreCourses
        - getCapsCourses
        - getElecCourses
        - getCourseWeight
- Weight (Module)
    - Functions
        - calSem1Weight
        - calSem2Weight
        - calSem3Weight
        - calSem4Weight
        - checkWeights
        - calSem1WeightAlt

### Semester Module

In [5]:


def codetosem(code):
    if code%100 == 10:
        sem = 'Spring'
    elif code%100 == 70:
        sem = 'Fall'
    elif code%100 == 40:
        raise ValueError("Detected Summer term in the data. Something's wrong. You shall not pass!!")
    else:
        raise ValueError("Invalid term code.")
    return sem + ' ' + str(code//100)


def __getSem__(str_sem):
    return str_sem.split(' ')[0]


def __getYear__(str_sem):
    return int(str_sem.split(' ')[1])


def getNextSem(str_sem, n=1):
    # q: What would be a better name for this function?
    # a: 
    sem = __getSem__(str_sem)
    year = __getYear__(str_sem)
    if n%2 == 0:
        return __getSem__(str_sem) + ' ' + str(year+n//2)
    else:
        if sem == 'Fall':
            return 'Spring ' + str(year+n//2+1)
        elif sem == 'Spring':
            return 'Fall ' + str(year+n//2)


def getSemRange(str_sem_start, str_sem_end):
    # Check if str_sem_start is before str_sem_end
    if __getYear__(str_sem_start) >= __getYear__(str_sem_end):
        if __getYear__(str_sem_start) == __getYear__(str_sem_end):
            if __getSem__(str_sem_start) == 'Fall' and __getSem__(str_sem_end) == 'Spring':
                raise Exception('Invalid Semester Range. You messed up brah...')
        else:
            raise Exception('Invalid Semester Range. You messed up brah...')
    # Get the range of semesters
    sem_range = []
    str_sem_curr = str_sem_start
    sem_range.append(str_sem_curr)
    while str_sem_curr != str_sem_end:
        str_sem_curr = getNextSem(str_sem_curr)
        sem_range.append(str_sem_curr)
    return sem_range


def getCohorts(str_sem):
    sem_range = {}
    str_sem_curr = str_sem
    SemN = 1
    while SemN <= 4:
        sem_range["Sem"+str(SemN)] = str_sem_curr
        str_sem_curr = getNextSem(str_sem_curr, -1)
        SemN += 1
    return sem_range



In [6]:
getCohorts('Spring 2023')

{'Sem1': 'Spring 2023',
 'Sem2': 'Fall 2022',
 'Sem3': 'Spring 2022',
 'Sem4': 'Fall 2021'}

---
# **Rb-CPP**
### Defining Prediction Ranges 

In [7]:
# Getting the latest semester in the data
latest_sem = df_enrollmentFinalStatus \
    .loc[
        df_enrollmentFinalStatus.stu_admit_term_name.isin(["Fall", "Spring"]),
        'stu_admit_term_code' ] \
    .max()

latest_sem = int(latest_sem)
latest_sem = codetosem(latest_sem)
latest_sem

'Spring 2023'

In [8]:
# Defining the range of semester to be considered for the analysis
HistTerms = getSemRange( getNextSem(latest_sem, 0), latest_sem )
AdmitTerms = getSemRange( getNextSem(latest_sem, 1), getNextSem(latest_sem, 2) )
HistTerms, AdmitTerms

(['Spring 2023'], ['Fall 2023', 'Spring 2024'])

In [9]:
# Getting the cohorts for each registration term
a = HistTerms.copy()
HistTerms = {}
for sem in a:
    HistTerms[sem] = getCohorts(sem)
a = AdmitTerms.copy()
AdmitTerms = {}
for sem in a:
    AdmitTerms[sem] = getCohorts(sem)
HistTerms, AdmitTerms

({'Spring 2023': {'Sem1': 'Spring 2023',
   'Sem2': 'Fall 2022',
   'Sem3': 'Spring 2022',
   'Sem4': 'Fall 2021'}},
 {'Fall 2023': {'Sem1': 'Fall 2023',
   'Sem2': 'Spring 2023',
   'Sem3': 'Fall 2022',
   'Sem4': 'Spring 2022'},
  'Spring 2024': {'Sem1': 'Spring 2024',
   'Sem2': 'Fall 2023',
   'Sem3': 'Spring 2023',
   'Sem4': 'Fall 2022'}})

## 1. Get the Incoming Enrollment for previous semesters as well as expected incoming enrollment for the future semesters


In [41]:
# Storing the incoming international students for all programs in each semester
df_incomingEnrollment = db.runQuery(""" --sql
    SELECT stu_prog_desc, stu_admit_term_desc, COUNT(DISTINCT stu_id) AS 'IncomingEnrollment'
    FROM enrollmentfinalstatus
    WHERE
            stu_visa = 'F1 Visa'
        AND stu_admit_term_name IN ('Fall', 'Spring')
        AND stu_admit_term_year >= 2018
    GROUP BY
        stu_prog_desc, stu_admit_term_desc
    ORDER BY
        stu_prog_desc, stu_admit_term_code
""")
df_incomingEnrollment

Unnamed: 0,stu_prog_desc,stu_admit_term_desc,IncomingEnrollment
0,MENG GeoConStruct Engineering,Spring 2018,2
1,MS Applied Info Technology,Spring 2018,7
2,MS Applied Info Technology,Fall 2018,14
3,MS Applied Info Technology,Spring 2019,11
4,MS Applied Info Technology,Fall 2019,8
...,...,...,...
137,MS Telecommunications,Spring 2021,3
138,MS Telecommunications,Fall 2021,3
139,MS Telecommunications,Spring 2022,2
140,MS Telecommunications,Fall 2022,6


TODO: Add the code for predicting the incoming enrollment in future semesters.

In [51]:
df_incomingEnrollment.loc[df_incomingEnrollment.stu_prog_desc == 'MS Data Analytics Engineering']

Unnamed: 0,stu_prog_desc,stu_admit_term_desc,IncomingEnrollment
58,MS Data Analytics Engineering,Spring 2018,44
59,MS Data Analytics Engineering,Fall 2018,90
60,MS Data Analytics Engineering,Spring 2019,50
61,MS Data Analytics Engineering,Fall 2019,85
62,MS Data Analytics Engineering,Spring 2020 - COVID-19,69
63,MS Data Analytics Engineering,Fall 2020,54
64,MS Data Analytics Engineering,Spring 2021,85
65,MS Data Analytics Engineering,Fall 2021,149
66,MS Data Analytics Engineering,Spring 2022,121
67,MS Data Analytics Engineering,Fall 2022,190


In [42]:
# Predicting the future incoming enrollment for each program


## 2. Generate Pathway to Graduation for International Students based on the catalog data.
The Graduation Pathway must be defined by program, Sem, CourseReq, and N_Reg. Also, it has to be stored in a dictionary as it will be used a lot of calculations, and it is faster to access the data from a dictionary than from a dataframe.

``` python
{
    'program': {
        'sem1': {
            'Core'      : 2,
            'Elective'  : 1,
            'Capstone'  : 0
        },
        'sem2': {
            'Core'      : 0,
            'Elective'  : 3,
            'Capstone'  : 0
        },
    }
}
```
and so on....

In [49]:
dict_GradPathway = {}

In [47]:
for i_prog in course_catalog:
    print(i_prog, "\t", len(course_catalog[i_prog]['Core']))

    n_core = len(course_catalog[i_prog]['Core'])

MS Data Analytics Engineering 	 4
MS Computer Science 	 2
MS Information Systems 	 5
MS Applied Info Technology 	 3


## Pathway, History, and Weights (Skipped)

## Weights (Pre-calculated)

In [127]:
df_Weights = pd.read_csv( os.path.join('Data', '02_processed', 'weights_rbcpp.csv') )
dict_Weights = df_Weights.set_index(['stu_prog_desc', 'crs', 'Sem']).to_dict('index')
df_Weights

Unnamed: 0,crs,crs_req,stu_prog_desc,Sem,weight
0,AIT 580,Core,MS Data Analytics Engineering,Sem1,0.294900
1,AIT 580,Core,MS Data Analytics Engineering,Sem2,0.038500
2,AIT 580,Core,MS Data Analytics Engineering,Sem3,0.000000
3,AIT 580,Core,MS Data Analytics Engineering,Sem4,0.000000
4,CS 504,Core,MS Data Analytics Engineering,Sem1,0.142600
...,...,...,...,...,...
383,SWE 681,Elective,MS Computer Science,Sem4,0.013459
384,SWE 795,Elective,MS Computer Science,Sem1,0.000000
385,SWE 795,Elective,MS Computer Science,Sem2,0.000000
386,SWE 795,Elective,MS Computer Science,Sem3,0.007402


##

## Future Work:
- The proportions can be calculated for each semester, and the relative change can be integrated into the model, such that the new predictions are made with projected weights.