# Comparing Enrollment Of Previous Semester With Current Semester

The main barometer of the successful efforts of the academic advising staff is whether *current* students reenroll for the next semester. Marketing and admissions are typically responsible for recruitment efforts. While this is not true at all Higher Ed institutions, it is true at most. Academic Advisors are responsible for making sure students understand school policies, are enrolled in the appropriate classes for their major, are taking courses in a strategic manner so as to knock out prerequisite courses first, and to help students learn how to balance personal, professional, and collegiate obligations. Typically, advisors do not have any direct impact on intake of *new students*. Since this is the case, I wrote the following algorithm to sort through students who were enrolled in the previous semester and compare them against the students who are enrolled in the current semester. When I first wrote the algorithm several years ago, I quickly realized that we needed to be clear about which students we were reaching out to for enrollment in the current semester from the previous semester. 

**Eligible Students**

I realized that we needed to be clear about who the *eligible* students from the previous semester was. If I was not clear with that in my report to the Director of Advising and the VP of Enrollment Management, it would incorrectly communicate we had, say, 2700 students from one semester who had not enrolled when in reality we only 1500 *eligible* students who had not enrolled. There are some students who should not be in our list of *eligible* students. 

First, many colleges have an office that exclusively works with high school students--an office of High School Partnerships. Since there is an office that carries out that task, then high school students should be removed from the list because the academic advising staff is not responsible for those students. Every college also has a specific set of advisors who work with International Students. Given the myriad of laws these advisors are aware of and most follow, the general advising staff does not reach out to these students or have interaction with them. Therefore, international students are taken out of the mix. Every semester, some students graduate. These students, most of the time, are not continuing with the college. Therefore, to make sure that the outreach efforts reflect these students' accomplishments and graduation, they are removed from the eligible students who could enroll in the current semester. Finally, sometimes there are holds on students' accounts that prevent them from enrolling in the current semester--AR holds, Academic Suspension, TB Test Holds, and VP of Student Services holds. These students are also removed from the eligible students. 

Once these sets of filters are ran on the students from the previous semester as compared to those enrolled for the current semester, the final list can be pulled and distributed to advisors for outreach. This list shows the *eligible* students who have not yet enrolled for the current semester but were enrolled in the previous semester; it is used by the advising staff to try to encourage *current* students to reenroll for the new semester. These kinds of targeted outreach efforts have increased the persistence of continuing students up to 7% some semesters. 

### Data Pulled At hh:mm on MM.DD.YY For Previous And Current Semester

Best practice is to always record when data is pulled from the database because it is dynamic. This helps to address questions when someone else pulls data later in the day or a different day altogether. Moreover, it gives me a record of when the last time is that I pulled the data and ran the report. 

The data is pulled directly from the Banner DB, a product of Oracle, using PL/SQL that I write. Then the csv files are uploaded here. 

In [None]:
import pandas as pd
import numpy as np
import openpyxl
import xlsxwriter

In [None]:
upcoming = (pd.read_csv('202380 Enrollment.csv')
              .rename(columns = {'STDTNO':'ID', 'CURTRM':'TERM', 'MIDDLE':'MI',
                                 'DEGR':'DEGCODE','PROGR':'PROGRAM','STYP':'STYPE',
                                 'RESD':'RESCODE'})
           )

previous = (pd.read_csv('202310 Enrollment.csv', encoding = 'cp1252')
              .rename(columns = {'STDTNO':'ID', 'CURTRM':'TERM', 'MIDDLE':'MI',
                                 'DEGR':'DEGCODE','PROGR':'PROGRAM','STYP':'STYPE',
                                 'RESD':'RESCODE'})
           )

#Major Descriptions
majr_desc = pd.read_csv('Major Descriptions.csv')

#Major Description Dictionary
majr_desc_d = dict(zip(list(majr_desc['MAJR']), list(majr_desc['MAJR_DESC'])))

def enrolled_for_upcoming(previous, upcoming):
    df = previous.copy()

    #Change all column headings in both datasets to lowercase.
    
    prev_cols = [i.lower() for i in previous.columns]
    up_cols = [i.lower() for i in upcoming.columns]

    previous.columns, upcoming.columns = prev_cols, up_cols

    #Compare upcoming semester enrolled to previous semester enrolled, account 
    #for who is missing from upcoming that was enrolled for previous.

    up_ids = list(upcoming['id'])

    enrolled = []

    for i in previous['id']:
        if i in up_ids:
            enrolled.append('Enrolled')
        else:
            enrolled.append('Not Enrolled')

    #Add enrolled column to previous dataframe (copy {df}).

    df['enrolled'] = enrolled
    
    return df

In [None]:
#Combine dataframes

df = enrolled_for_upcoming(previous, upcoming)
df.columns = [i.upper() for i in df.columns]

In [None]:
df.head()

In [None]:
#Remove International Students
df = df[df['RESCODE'] != 'I']

In [None]:
df = df[['ID', 'TERM', 'FNAME', 'LNAME', 'STYPE', 'DEGCODE', 'MAJR', 'ENROLLED']]

#Add major descriptions

df['MAJR_DESC'] = [majr_desc_d.get(i) for i in df['MAJR']]
df['MAJR_DESC'] = df['MAJR_DESC'].fillna("WD Before EOT")

In [None]:
#Create dataframe that shows count of number of students
#from previous semester who are enrolled in upcoming semester.

tot_enrolled = (pd.DataFrame(df.groupby('ENROLLED')['ID'].count())
                  .rename(columns = {'ID':'COUNT'})
                  .reset_index()
               )

tot_enrolled

In [None]:
#This data is pulled from the ZSHOLDS in Argos from the current term.

phone = pd.read_csv('202310 Holds and Phone Numbers.csv')
phone_cols = [i.upper() for i in list(phone.columns)]
phone.columns = phone_cols
phone = phone.rename(columns = {'STUDENTID':'ID'})

def final_df(main_df, phone):
    
    holds = []

    for i, j, k, l in zip(phone['AR_Holds'], phone['AcadSusHolds'], phone['TBHolds'], phone['VPStudServHolds']):
        if i == 'Y':
            holds.append('A/R Hold')
        elif j == 'Y':
            holds.append('Acad Sus Hold')
        elif k == 'Y':
            holds.append('TB Test Hold')
        elif l == 'Y':
            holds.append('VP Stdt Svcs')
        else:
            holds.append("")

    phone['HOLDS'] = holds

    phone = phone[['ID', 'PRPHONE', 'BRPHONE', 'CARPHONE', 'EMAIL', 'OTHEREMAIL', 'HOLDS']]

    final = main_df.merge(phone, how = 'left', on = 'ID').drop_duplicates('ID')\
                   .reset_index(drop = True)
    return final

final = final_df(df, phone).sort_values('MAJR').reset_index(drop = True)

final = final[final['ENROLLED'] == 'Not Enrolled'].sort_values('MAJR', axis = 0)\
                                                  .reset_index(drop = True)

#Out of the students who are not enrolled from Fall 2023, ### 
#Have some form of a hold on their accounts. This leaves #### students from 
#Fall 2023 who are not enrolled for Spring 2024.

final = final[final['HOLDS'].isin(['A/R Hold', 'TB Test Hold', 'Acad Sus Hold', 'Hold-VP Stdt Svcs']) == False]

In [None]:
print("There are " + str(tot_enrolled.iloc[1, 1] - len(final)) + " students with holds on their accounts.")
print(len(final))

### Retrieving Graduates

For some reason, I was not able to retrieve the ZSGRAD1 from the Registrar Office > Graduation folder while in Firefox. I had to open up Microsoft Edge and log into Argos from there, access the folder, and then select the beginning date at the beginning of the semester, the end date at the end of the semester, the effective term as the current term, and then use the "Reports" drop down and "Run." Even though there was nothing populated in the Argos Web Viewer, the csv had the graduates in it. 

A second note, I added to the filter .isin(['AW', 'RT', 'PN']) the 'PN' which is for "Pending" because we are going to assume the majority of these students will graduate. 'AW' is 'Awarded' and 'RT' is 'Reverse Transfer.'

In [None]:
#Import graduates

grads = pd.read_csv('FA23 Graduates.csv')
grads.columns = [i.upper() for i in grads.columns]
grads = grads.rename(columns = {'STUDENTID':'ID'})
grads = grads[grads['STS'].isin(['AW', 'RT', 'PN'])]
grads = grads[['ID', 'STS']]

#Merge Graduates with final

final = final.merge(grads, how = 'left', on = 'ID').drop_duplicates('ID')

#There're ### students out of the FA23 cohort that gradauated. We remove them with the 
#code below. This leaves #### students who have not enrolled for Spring 2024 but have no holds
#did not graduate, and are not international students.

final = final[final['STS'].isnull()].reset_index(drop = True)

In [None]:
print("There are " + str(len(final[final['STYPE'] == "H"])) + " High School Students unenrolled.")
print("And there are " + str(len(grads)) + " graduates for Spring 2023 that were removed.")

#There are ### High School Students unenrolled.
#And there are ### graduates for Fall 2022 that were removed.

In [None]:
len(final)

In [None]:
#Filter out High School Students
final = final[final['STYPE'] != 'H'].reset_index(drop = True)

In [None]:
print("There are " + str(len(final)) + " students who have no holds, have not graduated, \
and are not high school students that still have not enrolled for Fall 2023.")

#There are ### students who have no holds, have not graduated, 
#and are not high school students that still have not enrolled for Spring 2024.

### Remove Unusable Marks

Some of the majors will not export if there is a '/' or ':' in the major description. Therefore, I replaced that as shown below. 


In [None]:
final['MAJR_DESC'] = final['MAJR_DESC'].str.replace(r'/', ' ')
final['MAJR_DESC'] = final['MAJR_DESC'].str.replace(r':', ' ')

In [None]:
# Import data into several worksheets (https://www.easytweaks.com/pandas-save-to-excel-mutiple-sheets/)

# First, split the data into a dataframe list, similar to what you do when
# you use pd.concat()

dfls = []

for i in sorted(final['MAJR_DESC'].unique()):
    temp = final[final['MAJR_DESC'] == i]
    dfls.append(temp)

Excelwriter = pd.ExcelWriter('202380 Not Enrolled in 202410.xlsx', engine = 'xlsxwriter')

for df in dfls:
    sheet = list(df['MAJR_DESC'].unique())[0]
    df.to_excel(Excelwriter, sheet_name = sheet, index = False)

Excelwriter.save()

In [None]:
#I also want to print off a master sheet to excel that I can merge into the workbook
#with all of the majors split into separate sheets.

final.to_excel(r'202380 Not Enrolled in 202410 Master DF.xlsx', index = False, header=True)


In [None]:
#filter by major

def filter_by_major(df, column, select):
    
    filt = df[column].isin(select)
    new_df = df[filt].reset_index(drop = True)
    
    return new_df

col = 'MAJR_DESC'

selection = ['Culinary Arts', 'Hotel Management', 'Restaurant Management']

filter_by_major(final, col, selection)