# Rule-baased Constrained Proportional Fucking Projections

## Rules:
1. The core courses must be registered prior to any elective courses.
2. The capstone course which is also a core course in certain degrees, is considered as a separate category as they must be enrolled in the final semester only.
3. Some elective courses are preferred to be enrolled in the first two semesters (elementary courses), whereas some advanced courses are preferred to be enrolled later in the semester. (Inferred form past enrollment trend.)
4. The proportions of the weights remains relatively same throughout the semesters.

Algorithms:
1. Create and record the catalog data (core, elective and capstone courses) for each program.
2. Create a Pathway to Graduation for International Students based on the catalog data.
3. Get the Course Demand History for each program, course, student admit term, registration term

Calculations:
1. Course Requirements:

## Future Work:
- The proportions can be calculated for each semester, and the relative change can be integrated into the model, such that the new predictions are made with projected weights.

In [1]:
# importing the required libraries
import os
import pandas as pd
import matplotlib.pyplot as plt

os.chdir( os.path.join("..", "..") )

from Code.src.modules.db_ops import *
from Code.src.modules.dataManager import DataManager
from Code.src.modules.eda import *

DM = DataManager()

In [2]:
DM.get_data_info()

{'course': {'d_desc': 'Course information obtained from Web Scraping the '
                      'course catalog.',
            'd_name': 'Courses Information',
            'd_state': {'processed': {'db': 'Data\\02_processed\\course4EDA.db'}}},
 'enrollment': {'d_desc': 'Student Enrollment Status Information processed by '
                          'merging 110 CSV files obtained from Enrollment '
                          'Management at George Mason University.',
                'd_name': 'Student Enrollment Status Data',
                'd_state': {'processed': {'csv': 'Data\\02_processed\\enrollment.csv',
                                          'db': 'Data\\02_processed\\enrollment4EDA.db',
                                          'pkl': 'Data\\02_processed\\enrollment.pkl'}}},
 'enrollmentfinalstatus': {'d_desc': 'Student Final Enrollment Status '
                                     'Information processed from the '
                                     'Enrollment Data.',
     

In [3]:
# Connecting to the main database to get the enrollment data
db_enrollment = DM.get_data('enrollment', 'db', 'processed')
df_enrollment = DM.get_data('enrollment', 'pkl', 'processed')

# Creating a SQLite Database to store all the shit
db_EnrollmentStatus = ConnectDB( os.path.join('Data', '02_processed', 'RbCPP', 'EnrollmentStatus.db') )

## Adding Course Requirements to the data

Recording the course requirements for all programs in a Dictionary Format as follows:
```
dict_course = {
    'stu_prog_desc' : ['crs xxx', 'crs xxx', 'crs xxx']
}
```
NOTE:
- The capstone courses are considered separate from the core courses, as they are supposed to be taken in the final semester.
- In reality, there are core requirements for each concentration, but since we do not have data on that, we are rolling like this.

In [4]:
dict_core_courses = {
    #'MENG GeoConStruct Engineering' : ['CEIE 524', 'CEIE 525', 'CEIE 531', 'CEIE 575', 'CEIE 605'],
    'MS Applied Info Technology' :
        ['AIT 524', 'AIT 542', 'AIT 664'],
    'MS Bioengineering' :
        ['BENG 520', 'BENG 521', 'BENG 541', 'BENG 526', 'BENG 537', 'BENG 538', 'BENG 501', 'BENG 514', 'BENG 517', 'BENG 601', 'BENG 602', 'STAT 535', 'STAT 560'],
    'MS Biostatistics' :
        ['STAT 544', 'STAT 554', 'STAT 654', 'GCH 712', 'BENG 501', 'BINF 630', 'BENG 538', 'STAT 560', 'STAT 634', 'STAT 798'],
    'MS Civil & Infrastructure Engr':
        ['CEIE 601', 'CEIE 605'],
    'MS Computer Engineering':
        ['CS 530', 'CS 531'],
    'MS Computer Science':
        ['CS 530', 'CS 531'],
    'MS Cyber Security Engineering':
        ['CYSE 550', 'CYSE 570', 'CYSE 580', 'CYSE 610', 'CYSE 587', 'CYSE 690', 'CYSE 799'],
    'MS Data Analytics Engineering':
        ['AIT 580', 'STAT 515', 'OR 531', 'CS 504'],
    'MS Digital Forensics':
        ['DFOR 510','DFOR 660','DFOR 661','DFOR 663','DFOR 664','DFOR 670','DFOR 671','DFOR 672','DFOR 790'],
    'MS Electrical Engineering':
        ['ECE 511', 'ECE 521', 'ECE 527', 'ECE 528', 'ECE 535', 'ECE 539', 'ECE 542', 'ECE 552', 'ECE 580', 'ECE 584', 'ECE 586', 'ECE 587', 'ECE 621', 'ECE 630', 'ECE 799'],
    'MS Information Systems':
        ['COMP 502', 'CS 550', 'INFS 622', 'INFS 580', 'INFS 611'],
    'MS Infrmatn Security & Assrnce':
        ['ISA 562', 'ISA 656', 'INFS 612', 'CS 555'],
    'MS Operations Research':
        ['OR 541', 'OR 542', 'OR 568', 'OR 635'],
    'MS Software Engineering':
        ['SWE 619', 'SWE 621', 'SWE 632', 'SWE 637'],
    'MS Statistical Science':
        ['STAT 544', 'STAT 554', 'STAT 634', 'STAT 652', 'STAT 654'],
    'MS Systems Engineering':
        ['SYST 505', 'SYST 510', 'SYST 520', 'SYST 530', 'SYST 611'],
    'MS Telecommunications':
        ['TCOM 500', 'TCOM 514', 'TCOM 515', 'TCOM 535', 'TCOM 570', 'TCOM 610', 'TCOM 750']
}

dict_capstone_courses = {
    'MS Computer Science' : [None],
    'MS Data Analytics Engineering' : ['DAEN 690'],
    'MS Information Systems' : [None]
}

dict_elective_courses = {
    'MS Data Analytics Engineering' :
        [
            'AIT 524', 'AIT 526', 'AIT 582', 'AIT 590',
            'AIT 614', 'AIT 622', 'AIT 624', 'AIT 636', 'AIT 664',
            'AIT 722', 'AIT 724', 'AIT 736', 'AIT 736', 'AIT 746',
            'DAEN 698',
            'DFOR 510', 'DFOR 660', 'DFOR 661', 'DFOR 663', 'DFOR 664', 'DFOR 698', 'DFOR 761',
            'DFOR 767', 'DFOR 768', 'DFOR 780',
            'BENG 501', 'BENG 526', 'BENG 538', 'BENG 550', 'BENG 575',
            'CS 550', 'CS 580', 'CS 650', 'CS 657', 'CS 688', 'CS 775', 'CS 782', 'CS 787',
            'ECE 508', 'ECE 527', 'ECE 528', 'ECE 530', 'ECE 535', 'ECE 537', 'ECE 612',
            'GBUS 720', 'GBUS 721', 'GBUS 738', 'GBUS 739', 'GBUS 740', 'GBUS 744',
            'HAP 671', 'HAP 719', 'HAP 720', 'HAP 725', 'HAP 730', 'HAP 770', 'HAP 780', 'HAP 819', 'HAP 823', 'HAP 880',
            'INFS 623', 'INFS 740',
            'LING 650', 'LING 675', 'LING 685', 'LING 687', 'LING 689', 'LING 775',
            'ME 551', 'ME 552', 'ME 553', 'ME 554', 'ME 620', 'ME 621',
            'ME 714', 'ME 721', 'ME 742', 'ME 745', 'ME 750', 'ME 751', 'ME 753', 'ME 754', 'ME 755', 'ME 762',
            'OR 538', 'OR 541', 'OR 542', 'OR 568', 'OR 588',
            'OR 603', 'OR 604', 'OR 610', 'OR 645', 'OR 670', 'OR 688',
            'STAT 544', 'STAT 654', 'STAT 662', 'STAT 663', 'STAT 672',
            'SYST 508', 'SYST 538', 'SYST 542', 'SYST 568', 'SYST 573',
            'SYST 584', 'SYST 588', 'SYST 618', 'SYST 664', 'SYST 670', 'SYST 688',
        ],
    'MS Computer Science' :
        [
            'CS 540', 'CS 550', 'CS 551' 'CS 555', 'CS 571', 'CS 580', 'CS 583', 'CS 584', 'CS 587','CS 595',
            'CS 600','CS 630', 'CS 633', 'CS 635', 'CS 640',
            'CS 650', 'CS 655', 'CS 657', 'CS 658','CS 662', 'CS 663', 'CS 667', 
            'CS 672', 'CS 673', 'CS 675', 'CS 678', 
            'CS 681', 'CS 682', 'CS 683', 'CS 684', 'CS 685', 'CS 686', 'CS 687', 'CS 688', 'CS 689',
            'CS 695', 'CS 697', 
            'CS 706','CS 719', 'CS 747', 'CS 752','CS 756','CS 773', 'CS 774', 'CS 777', 'CS 779',
            'CS 782', 'CS 787', 'CS 788', 'CS 795', 'CS 798', 'CS 799',
            'CS 895',
            'INFS 623', 'INFS 740', 'INFS 760', 'INFS 772', 'INFS 774',
            'ISA 562', 'ISA 564', 'ISA 656', 'ISA 673', 'ISA 674', 'ISA 681', 'ISA 697', 'ISA 763', 'ISA 764', 'ISA 785',
            'SWE 619', 'SWE 620', 'SWE 621', 'SWE 622', 'SWE 631', 'SWE 632', 'SWE 637', 'SWE 642',
            'SWE 645', 'SWE 681','SWE 699' 'SWE 721', 'SWE 737', 'SWE 760', 'SWE 795', 'SWE 796'
        ],
    'MS Information Systems' :
        [
            'AIT 526', 'AIT 646', 'AIT 642', 'AIT 660', 'AIT 664', 'AIT 670', 'AIT 684',
            'AIT 716', 'AIT 724', 'AIT 726', 'AIT 734', 'AIT 736', 'AIT 746',
            'COMP 642', 'COMP 505', 'COMP 522',
            'ECE 611', 'ECE 612', 'ECE 642', 'ECE 643', 'ECE 646', 'ECE 732', 'ECE 746',
            'CS 531', 'CS 540', 'CS 580', 'CS 583', 'CS 584',
            'CS 635', 'CS 640', 'CS 650', 'CS 657', 'CS 662', 'CS 663', 'CS 672', 'CS 673', 'CS 678',
            'CS 681', 'CS 682', 'CS 683', 'CS 684', 'CS 685', 'CS 686', 'CS 687', 'CS 688',
            'CS 706', 'CS 752', 'CS 755', 'CS 756', 'CS 773', 'CS 777', 'CS 779', 'CS 782', 'CS 787', 'CS 795',
            'INFS 623', 'INFS 640', 'INFS 697', 'INFS 740', 'INFS 760', 'INFS 770', 'INFS 772', 'INFS 774', 'INFS 796', 'INFS 797', 'INFS 799',
            'ISA 562', 'ISA 564', 'ISA 650', 'ISA 652', 'ISA 656', 'ISA 673', 'ISA 674', 'ISA 681', 'ISA 697',
            'ISA 763', 'ISA 764', 'ISA 785', 'ISA 797',
            'OR 541', 'OR 542', 'OR 635', 'OR 640', 'OR 641', 'OR 642', 'OR 643', 'OR 644', 'OR 645', 'OR 647', 'OR 681', 'OR 690',
            'PSYC 734',
            'STAT 544', 'STAT 554', 'STAT 652', 'STAT 656', 'STAT 662', 'STAT 663', 'STAT 674',
            'SWE 620', 'SWE 622', 'SWE 625', 'SWE 626', 'SWE 631', 'SWE 632', 'SWE 642', 'SWE 645', 'SWE 681', 'SWE 699',
            'SWE 721', 'SWE 681', 'SWE 763', 'SWE 795', 'SWE 796', 'SWE 798',
            'SYST 520', 'SYST 530', 'SYST 542', 'SYST 560', 'SYST 573', 'SYST 611', 'SYST 620', 'SYST 659', 'SYST 671', 'SYST 680', 'SYST 683'
        ],
    'MS Applied Info Technology' :
        [
            'AIT 512', 'AIT 526', 'AIT 580', 'AIT 582', 'AIT 590',
            'AIT 602', 'AIT 614', 'AIT 622', 'AIT 624', 'AIT 636', 'AIT 655', 'AIT 660', 'AIT 665', 'AIT 670', 'AIT 672',
            'AIT 677', 'AIT 678', 'AIT 679', 'AIT 682', 'AIT 684', 'AIT 685', 'AIT 690', 'AIT 697', 'AIT 699',
            'AIT 701', 'AIT 702', 'AIT 711', 'AIT 712', 'AIT 716', 'AIT 722', 'AIT 724',
            'AIT 726','AIT 734', 'AIT 736','AIT 746', 'AIT 790', 'AIT 799'
        ]
}

Warning, Do not try this at home. This code will cause a serious head-ache. Please maintain proper caution when going through this.

In [5]:
for i_prog in list(dict_core_courses.keys()):
    # i_prog = list(dict_core_courses.keys())[8]

    # Setting all the courses in the program to "Not Required"/"Elective" initially based on whether elective courses are recorded for the program
    if i_prog in list(dict_elective_courses.keys()):
        def_val = "Not Required"
    else:
        def_val = "Elective"
    
    try:
        df_enrollment.loc[
            df_enrollment['stu_prog_desc'].isin([i_prog]), 'crs_req'
        ] = def_val
    except:
        pass

    # Setting all the core courses in the program to "Core"
    try:
        df_enrollment.loc[
            df_enrollment['stu_prog_desc'].isin([i_prog]) &
            df_enrollment['crs'].isin(dict_core_courses[i_prog]),
            'crs_req'
        ] = "Core"
    except:
        pass

    # Setting all the capstone courses in the program to "Capstone"
    try:
        df_enrollment.loc[
            df_enrollment['stu_prog_desc'].isin([i_prog]) &
            df_enrollment['crs'].isin(dict_capstone_courses[i_prog]),
            'crs_req'
        ] = "Capstone"
    except:
        pass

    # Setting all the elective courses in the program to "Elective"
    try:
        df_enrollment.loc[
            df_enrollment['stu_prog_desc'].isin([i_prog]) &
            df_enrollment['crs'].isin(dict_elective_courses[i_prog]),
            'crs_req'
        ] = "Elective"
    except:
        pass

df_enrollment

Unnamed: 0,rec_id,rec_ext_date,file_name,file_index,reg_term_code,reg_term_year,reg_term_name,reg_term_desc,stu_id,stu_deg_level,...,crs_credits,crs_hours,crs_sect,crs_sect_clg,crs_sect_modality,crs_sect_wiley_ind,reg_status,reg_status_date,stu_act_reg_ind,crs_req
0,0,2017-05-01,Data/01_raw/EnrollmentData/CEC Graduate Regist...,5,201770,2017,Fall,Fall 2017,CEC3286,Master,...,3,3.0,INFS 640 001,VSE,F2F,No Value,**Web Registered**,2017-04-11,Y,Elective
1,1,2017-05-01,Data/01_raw/EnrollmentData/CEC Graduate Regist...,6,201770,2017,Fall,Fall 2017,CEC3289,Master,...,3,3.0,SWE 619 002,VSE,F2F,No Value,**Web Registered**,2017-04-11,Y,Elective
2,2,2017-05-01,Data/01_raw/EnrollmentData/CEC Graduate Regist...,10,201770,2017,Fall,Fall 2017,CEC865,Master,...,3,3.0,CEIE 639 001,VSE,F2F,No Value,**Web Registered**,2017-04-11,Y,Elective
3,3,2017-05-01,Data/01_raw/EnrollmentData/CEC Graduate Regist...,11,201770,2017,Fall,Fall 2017,CEC865,Master,...,3,3.0,CEIE 679 001,VSE,F2F,No Value,**Web Registered**,2017-04-11,Y,Elective
4,4,2017-05-01,Data/01_raw/EnrollmentData/CEC Graduate Regist...,12,201770,2017,Fall,Fall 2017,CEC901,Master,...,3,3.0,SWE 645 001,VSE,F2F,No Value,**Web Registered**,2017-04-11,Y,Elective
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
344114,344114,2022-12-15,Data/01_raw/EnrollmentData/CEC Graduate Regist...,612,202310,2023,Spring,Spring 2023,CEC23433,Master,...,1-3,1.0,CEC 794 001,CEC,F2F,N,**Registered**,2023-02-09,Y,Not Required
344115,344115,2022-12-15,Data/01_raw/EnrollmentData/CEC Graduate Regist...,842,202310,2023,Spring,Spring 2023,CEC26826,Master,...,3,3.0,CS 678 002,CEC,F2F,N,**Registered**,2023-02-09,Y,Elective
344116,344116,2022-12-15,Data/01_raw/EnrollmentData/CEC Graduate Regist...,1091,202310,2023,Spring,Spring 2023,CEC27967,Master,...,3,3.0,CS 504 006,CEC,Online,Y,**Web Registered**,2023-02-09,Y,Core
344117,344117,2022-12-15,Data/01_raw/EnrollmentData/CEC Graduate Regist...,1278,202310,2023,Spring,Spring 2023,CEC27498,Master,...,3,3.0,INFS 580 DL1,CEC,Online,N,**Registered**,2023-02-09,Y,Core


## Exporting the data

In [6]:
# Exporting all the Enrollment Data
df_enrollment.to_csv( os.path.join('Data', '02_processed', 'RbCPP', 'EnrollmentStatus.csv') )
df_enrollment.to_pickle( os.path.join('Data', '02_processed', 'RbCPP', 'EnrollmentStatus.pkl') )
df_enrollment.to_sql('EnrollmentStatus', db_EnrollmentStatus.connection, if_exists='replace', index=False)

344119

### Final Snapshot Data

Just filtering the data for the most latest snapshot. There might be a lot of missing information on enrolled courses, as the data wasn't extracted propoperly. Moreover, the waitlist information might have been dropped by then.

In [7]:
# Get Course Enrollment History for each course in each registration semester and student admit term
df = db_EnrollmentStatus.runQuery(""" --sql
    SELECT *
    FROM EnrollmentStatus
    WHERE
        (reg_term_year >= 2018) AND
        ((
            strftime('%d', rec_Ext_date) = '01' AND
            strftime('%m', rec_Ext_date) = '02'
        ) OR
        (
            strftime('%d', rec_Ext_date) = '01' AND
            strftime('%m', rec_Ext_date) = '09'
        ))
""")

df.to_csv(os.path.join('Data', '02_processed', 'RbCPP', 'FinalSnapshot.csv'), index=False)
df.to_pickle(os.path.join('Data', '02_processed', 'RbCPP', 'FinalSnapshot.pkl'))
df.to_sql('FinalSnapshot', db_EnrollmentStatus.connection, if_exists='replace', index=False)
df

Unnamed: 0,rec_id,rec_ext_date,file_name,file_index,reg_term_code,reg_term_year,reg_term_name,reg_term_desc,stu_id,stu_deg_level,...,crs_credits,crs_hours,crs_sect,crs_sect_clg,crs_sect_modality,crs_sect_wiley_ind,reg_status,reg_status_date,stu_act_reg_ind,crs_req
0,29067,2018-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,634,201810,2018,Spring,Spring 2018,CEC9690,Master,...,3,3.0,CS 755 001,VSE,F2F,No Value,**Web Registered**,2018-01-01 00:00:00,Y,Elective
1,29068,2018-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,49,201810,2018,Spring,Spring 2018,CEC994,Master,...,3,0.0,AIT 524 003,VSE,F2F,No Value,Drop-Course Cancelled,2018-01-10 00:00:00,Y,Core
2,29069,2018-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,69,201810,2018,Spring,Spring 2018,CEC6082,Master,...,3,3.0,SYST 618 DL1,VSE,Online,N,**Web Registered**,2018-01-10 00:00:00,Y,Elective
3,29070,2018-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,131,201810,2018,Spring,Spring 2018,CEC1309,Master,...,3,3.0,OR 671 001,VSE,F2F,No Value,**Web Registered**,2018-01-10 00:00:00,Y,Elective
4,29071,2018-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,521,201810,2018,Spring,Spring 2018,CEC4508,Master,...,3,3.0,ECE 630 001,VSE,F2F,No Value,**Web Registered**,2018-01-10 00:00:00,Y,Core
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37993,318574,2023-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,3375,202310,2023,Spring,Spring 2023,CEC30889,Master,...,3,3.0,AIT 614 002,CEC,F2F,N,**Web Registered**,2022-12-09 00:00:00,Y,Elective
37994,318575,2023-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,4294,202310,2023,Spring,Spring 2023,CEC11679,Master,...,3,3.0,CEIE 605 001,CEC,F2F,N,**Web Registered**,2022-12-09 00:00:00,Y,Core
37995,318576,2023-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,4295,202310,2023,Spring,Spring 2023,CEC11679,Master,...,3,3.0,CEIE 639 003,CEC,F2F,N,**Web Registered**,2022-12-09 00:00:00,Y,Elective
37996,318577,2023-02-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,4296,202310,2023,Spring,Spring 2023,CEC11679,Master,...,0,0.0,CEIE 795 001,CEC,F2F,N,**Web Registered**,2022-12-09 00:00:00,Y,Elective


In [8]:
# Get aggreagated Course Demand History for each course in each registration semester and student admit term
df = db_EnrollmentStatus.runQuery(""" --sql
    SELECT stu_prog_desc, stu_visa, reg_status, stu_admit_term_desc, reg_term_desc, crs, crs_req, COUNT(*) AS demand
    FROM FinalSnapshot
    GROUP BY
        stu_prog_desc, stu_visa, reg_status, stu_admit_term_desc, reg_term_desc, crs, crs_req
    ORDER BY
        stu_prog_desc, stu_admit_term_code, reg_term_code, crs_req, crs, stu_visa, reg_status
""")

df.to_csv( os.path.join('Data', '02_processed', 'RbCPP', 'Dem_FinalSnapshot.csv'), index=False)
df.to_pickle( os.path.join('Data', '02_processed', 'RbCPP', 'Dem_FinalSnapshot.pkl'))
df.to_sql('Dem_FinalSnapshot', db_EnrollmentStatus.connection, if_exists='replace', index=False)
df

Unnamed: 0,stu_prog_desc,stu_visa,reg_status,stu_admit_term_desc,reg_term_desc,crs,crs_req,demand
0,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Spring 2018,CEIE 690,,1
1,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Spring 2018,CEIE 795,,1
2,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Fall 2018,CEIE 639,,1
3,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Fall 2018,CEIE 795,,1
4,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Spring 2019,CEIE 526,,1
...,...,...,...,...,...,...,...,...
14327,MS Telecommunications,Not Relevent,**Web Registered**,Spring 2023,Spring 2023,TCOM 500,Core,2
14328,MS Telecommunications,Not Relevent,**Web Registered**,Spring 2023,Spring 2023,TCOM 514,Core,1
14329,MS Telecommunications,F1 Visa,**Web Registered**,Spring 2023,Spring 2023,TCOM 535,Core,1
14330,MS Telecommunications,Not Relevent,**Web Registered**,Spring 2023,Spring 2023,TCOM 535,Core,2


### FinalEnrollmentStatus

Getting the latest status for the course enrollment for each student in a program, in each semester, and for each course, then taking the latest extracted record to find the status. There might be some missing data on dropped courses, but we roll with it.

In [9]:
# Get Course Enrollment History for each course in each registration semester and student admit term
df = db_EnrollmentStatus.runQuery(""" --sql
    SELECT
        a.rec_id, a.rec_ext_date, a.file_name, a.file_index, a.reg_term_code,
        a.reg_term_year, a.reg_term_name, a.reg_term_desc, a.stu_id,
        a.stu_deg_level, a.stu_college, a.stu_res, a.stu_visa, a.stu_bam,
        a.stu_new_ret, a.stu_dept, a.stu_dept_desc, a.stu_prog_code,
        a.stu_prog_level, a.stu_prog_desc, a.stu_admit_term_code,
        a.stu_admit_term_year, a.stu_admit_term_name, a.stu_admit_term_desc,
        a.crs, a.crs_req, a.crs_type, a.crs_credits, a.crs_hours, a.crs_sect,
        a.crs_sect_clg, a.crs_sect_modality, a.crs_sect_wiley_ind, a.reg_status,
        a.reg_status_date, a.stu_act_reg_ind
    FROM EnrollmentStatus AS a
    INNER JOIN 
        (SELECT 
            stu_prog_desc, stu_id, reg_term_desc, crs, MAX(rec_ext_date) AS max_rec_ext_date
        FROM EnrollmentStatus
        WHERE
            reg_term_year >= 2018
        GROUP BY
            stu_prog_desc, stu_id, reg_term_desc, crs
        ORDER BY
            stu_prog_desc, stu_id, crs, reg_term_desc) AS b
    ON
        a.stu_prog_desc = b.stu_prog_desc AND
        a.stu_id = b.stu_id AND
        a.reg_term_desc = b.reg_term_desc AND
        a.crs = b.crs AND
        a.rec_ext_date = b.max_rec_ext_date
    WHERE
        reg_term_year >= 2018

""")

df.to_csv(os.path.join('Data', '02_processed', 'RbCPP', 'FinalEnrollmentStatus.csv'), index=False)
df.to_pickle(os.path.join('Data', '02_processed', 'RbCPP', 'FinalEnrollmentStatus.pkl'))
df.to_sql('FinalEnrollmentStatus', db_EnrollmentStatus.connection, if_exists='replace', index=False)
df

Unnamed: 0,rec_id,rec_ext_date,file_name,file_index,reg_term_code,reg_term_year,reg_term_name,reg_term_desc,stu_id,stu_deg_level,...,crs_type,crs_credits,crs_hours,crs_sect,crs_sect_clg,crs_sect_modality,crs_sect_wiley_ind,reg_status,reg_status_date,stu_act_reg_ind
0,24379,2018-01-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,2,201810,2018,Spring,Spring 2018,CEC2514,Master,...,Lecture,3,3.0,ECE 699 001,VSE,F2F,No Value,**Web Registered**,2017-11-10 00:00:00,Y
1,24442,2018-01-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,312,201810,2018,Spring,Spring 2018,CEC8917,Master,...,Seminar,3,3.0,ECE 747 001,VSE,F2F,No Value,**Web Registered**,2017-11-12 00:00:00,Y
2,24450,2018-01-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,728,201810,2018,Spring,Spring 2018,CEC10148,Master,...,Lecture,3,3.0,SYST 584 001,VSE,F2F,No Value,**Web Registered**,2017-11-12 00:00:00,Y
3,24457,2018-01-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,1250,201810,2018,Spring,Spring 2018,CEC41,Master,...,Lecture,3,0.0,CS 580 001,VSE,F2F,No Value,Wait Listed,2017-11-12 00:00:00,Y
4,24462,2018-01-01 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,1363,201810,2018,Spring,Spring 2018,CEC2648,Master,...,Lecture,3,3.0,OR 604 001,VSE,F2F,No Value,**Web Registered**,2017-11-12 00:00:00,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56916,323682,2023-02-15 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,612,202310,2023,Spring,Spring 2023,CEC23433,Master,...,Internship,1-3,1.0,CEC 794 001,CEC,F2F,N,**Registered**,2023-02-09 00:00:00,Y
56917,323683,2023-02-15 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,842,202310,2023,Spring,Spring 2023,CEC26826,Master,...,Lecture,3,3.0,CS 678 002,CEC,F2F,N,**Registered**,2023-02-09 00:00:00,Y
56918,323684,2023-02-15 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,1091,202310,2023,Spring,Spring 2023,CEC27967,Master,...,Lecture,3,3.0,CS 504 006,CEC,Online,Y,**Web Registered**,2023-02-09 00:00:00,Y
56919,323685,2023-02-15 00:00:00,Data/01_raw/EnrollmentData/CEC Graduate Regist...,1278,202310,2023,Spring,Spring 2023,CEC27498,Master,...,Lecture,3,3.0,INFS 580 DL1,CEC,Online,N,**Registered**,2023-02-09 00:00:00,Y


In [10]:
df = db_EnrollmentStatus.runQuery(""" --sql
    SELECT stu_prog_desc, stu_visa, reg_status, stu_admit_term_desc, reg_term_desc, crs, crs_req, COUNT(*) AS demand
    FROM FinalEnrollmentStatus
    GROUP BY
        stu_prog_desc, stu_visa, reg_status, stu_admit_term_desc, reg_term_desc, crs
    ORDER BY
        stu_prog_desc, stu_admit_term_code, reg_term_code, crs_req, crs, stu_visa, reg_status
""")

df.to_csv( os.path.join('Data', '02_processed', 'RbCPP', 'Dem_FinalEnrollmentStatus.csv'), index=False)
df.to_pickle( os.path.join('Data', '02_processed', 'RbCPP', 'Dem_FinalEnrollmentStatus.pkl'))
df.to_sql('Dem_FinalEnrollmentStatus', db_EnrollmentStatus.connection, if_exists='replace', index=False)
df

Unnamed: 0,stu_prog_desc,stu_visa,reg_status,stu_admit_term_desc,reg_term_desc,crs,crs_req,demand
0,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Spring 2018,CEIE 690,,1
1,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Spring 2018,CEIE 795,,1
2,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Fall 2018,CEIE 623,,1
3,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Fall 2018,CEIE 639,,1
4,MENG GeoConStruct Engineering,Not Relevent,**Web Registered**,Fall 2013,Fall 2018,CEIE 795,,1
...,...,...,...,...,...,...,...,...
20221,MS Telecommunications,Not Relevent,**Web Registered**,Spring 2023,Spring 2023,TCOM 500,Core,2
20222,MS Telecommunications,Not Relevent,**Web Registered**,Spring 2023,Spring 2023,TCOM 514,Core,1
20223,MS Telecommunications,F1 Visa,**Web Registered**,Spring 2023,Spring 2023,TCOM 535,Core,1
20224,MS Telecommunications,Not Relevent,**Web Registered**,Spring 2023,Spring 2023,TCOM 535,Core,2


# Automating the RbCPP
Not really going this route though, so 💨

Assumptions:
- Only working for International Students as they have a fixed 3-3-3-1 enrollment pattern.
- The core courses are registered within the first two semesters of the degree program.
    - x% of the core courses are registered in the first semester.
    - 100% - x% of the core courses are registered in the second semester.
    - Remaining course registrations are elective courses.
- The elective course registrations are similar for first two semesters.
- The third semester has a different enrollment proportions for elective courses compared to first two semesters.
- The capstone courses (also core) are registered in the final semester.

Algorithm:
1. Get User Inputs:
    - A program is selected for analysis.
2. Course Requiremetns are extracted for the selected program.\
~~ An algorithm defines the Pathway to graduate for International Student Course Enrollments based on number of core courses required in the program.~~
3. Get the Course Demand History from either the `Final Snapshot` or the `Final Enrollment Status`
4. 
The residual course enrollment is calculated for each semester, by subtrating the number of core courses registered in that semester, from the total number of core courses required for the program.
    - The residuals are calculated for each semester as follows:
        $$\mathscr{Residual} = \mathscr{Total Number of Courses to register} - \mathscr{Number of Core Courses Registered in that Semester}$$
- Weight Calculation:
    - The weights are calculated for each elective course enrolled in the first and second semester with respect to the total number of elective courses enrolled in the first and second semester.
    - Fall 2021 data is used to calculate the weights 