**`Data Pulled at 11:00am 1.8.25 Comparing Those Present In Fall 2024 With Those Enrolled in Spring 2025`**

# Comparing Enrollment Of Previous Semester With Current Semester

The main barometer of the successful efforts of the academic advising staff is whether *current* students reenroll for the next semester. Marketing and admissions are typically responsible for recruitment efforts. While this is not true at all Higher Ed institutions, it is true at most. Academic Advisors are responsible for making sure students understand school policies, are enrolled in the appropriate classes for their major, are taking courses in a strategic manner so as to knock out prerequisite courses first, and to help students learn how to balance personal, professional, and collegiate obligations. Typically, advisors do not have any direct impact on intake of *new students*. Since this is the case, I wrote the following algorithm to sort through students who were enrolled in the previous semester and compare them against the students who are enrolled in the current semester. When I first wrote the algorithm several years ago, I quickly realized that we needed to be clear about which students we were reaching out to for enrollment in the current semester from the previous semester. 

**Eligible Students**

I realized that we needed to be clear about who the *eligible* students from the previous semester was. If I was not clear with that in my report to the Director of Advising and the VP of Enrollment Management, it would incorrectly communicate we had, say, 2700 students from one semester who had not enrolled when in reality we only 1500 *eligible* students who had not enrolled. There are some students who should not be in our list of *eligible* students. 

First, many colleges have an office that exclusively works with high school students--an office of High School Partnerships. Since there is an office that carries out that task, then high school students should be removed from the list because the academic advising staff is not responsible for those students. Every college also has a specific set of advisors who work with International Students. Given the myriad of laws these advisors are aware of and most follow, the general advising staff does not reach out to these students or have interaction with them. Therefore, international students are taken out of the mix. Every semester, some students graduate. These students, most of the time, are not continuing with the college. Therefore, to make sure that the outreach efforts reflect these students' accomplishments and graduation, they are removed from the eligible students who could enroll in the current semester. Finally, sometimes there are holds on students' accounts that prevent them from enrolling in the current semester--AR holds, Academic Suspension, TB Test Holds, and VP of Student Services holds. These students are also removed from the eligible students. 

Once these sets of filters are ran on the students from the previous semester as compared to those enrolled for the current semester, the final list can be pulled and distributed to advisors for outreach. This list shows the *eligible* students who have not yet enrolled for the current semester but were enrolled in the previous semester; it is used by the advising staff to try to encourage *current* students to reenroll for the new semester. These kinds of targeted outreach efforts have increased the persistence of continuing students up to 7% some semesters. 

In [None]:
import pandas as pd
import numpy as np

from pathlib import Path
import os

import openpyxl
import xlsxwriter

from processing import (
    select_term,
    enrolled_for_upcoming,
    final_df
)

import warnings
warnings.filterwarnings('ignore')

## Parameters

In [None]:
# File directory path
DATA_PATH = Path.cwd().parent

# ----------------------------------------------------------------
# Excel file of last five semesters (fall or spring) created with 
# data cleaning and set up of credit hour report for BI reporting
# ----------------------------------------------------------------
SEM_DATAFRAME = 'SP21 - SP25 CrHr Enrollment.xlsx'

# Upcoming semester code (six digit integer code)
UPCOMING_TERM = 202510

# csv of previous semester or semester that is in progress pulled from Argos
PREVIOUS_SEM_ENROLLMENT = '202480 Enrollment.csv'

# Major List
MAJR_LIST = 'Major Descriptions.csv'

# Holds data pulled from Argos. Should be holds for semester that is 
# currently in progress or just completed
HOLDS_DATA = '202480 Holds and Phone Numbers.csv'

# Semester that just completed or is currently in progress graduation data
# This is pulled from Banner through PL/SQL
GRAD_DATA = 'FA24 Graduates.csv'

# Create saved file names
ELIGIBLE_STUDENTS_XLSX_FILE = '202480 Not Enrolled in 202510.xlsx'
ELIGIBLE_STUDENTS_XLSX_FILE_MASTER = '202480 Not Enrolled in 202510 Master DF.xlsx'

In [None]:
# Download last five fall or spring semesters from daily dashboard update
enrollments = (pd.read_excel(f'{DATA_PATH}/{SEM_DATAFRAME}')
                 .rename(columns = str.upper)
              )

In [None]:
# Isolate the unique IDs from Fall or Spring dataframe
unique_enroll = pd.DataFrame(enrollments.groupby('TERMID').first()).reset_index()

# load semester that is about to start ('upcoming')
upcoming = select_term(unique_enroll, UPCOMING_TERM)

# Load previous semester that just ended or is still in progres
previous = (pd.read_csv(f'{DATA_PATH}/{PREVIOUS_SEM_ENROLLMENT}', encoding = 'cp1252')
              .rename(columns = {'STDTNO':'ID', 'CURTRM':'TERM', 'MIDDLE':'MI',
                                'DEGR':'DEGCODE','PROGR':'PROGRAM','STYP':'STYPE',
                                'RESD':'RESCODE'})
           )

#Major Descriptions
majr_desc = pd.read_csv(f'{DATA_PATH}/{MAJR_LIST}')

#Major Description Dictionary
majr_desc_d = dict(zip(list(majr_desc['MAJR']), list(majr_desc['MAJR_DESC'])))

In [None]:
#Combine dataframes
compared_enrollment = (enrolled_for_upcoming(previous, upcoming)
                           .rename(columns = str.upper)
                      )

In [None]:
#Remove International Students
compared_enrollment = (compared_enrollment[compared_enrollment['RESCODE'] != 'I']
                       [['ID', 'TERM', 'FNAME', 'LNAME', 'STYPE', 'DEGCODE', 'MAJR', 'ENROLLED']]
                      )

In [None]:
#Add major descriptions
compared_enrollment['MAJR_DESC'] = [majr_desc_d.get(i) for i in compared_enrollment['MAJR']]
compared_enrollment['MAJR_DESC'] = compared_enrollment['MAJR_DESC'].fillna("WD Before EOT")

In [None]:
#Create dataframe that shows count of number of students
#from previous semester who are enrolled in upcoming semester.
tot_enrolled = (pd.DataFrame(compared_enrollment.groupby('ENROLLED')['ID'].count())
                  .rename(columns = {'ID':'COUNT'})
                  .reset_index()
               )

print('Total # of Students = ', tot_enrolled['COUNT'].sum(), 
      f"\n% of Prev Sem Enrolled = {round((tot_enrolled.iloc[0, 1] / tot_enrolled['COUNT'].sum()) * 100, 2)}\n", sep = "")

print(tot_enrolled)

##    enrolled     count
## 0  Enrolled      4110
## 1  Not Enrolled  2600
##    Total         6710 (no International)

In [None]:
#This data is pulled from the ZSHOLDS in Argos from the current term.
holds = (pd.read_csv(f'{DATA_PATH}/{HOLDS_DATA}')
           .rename(columns = str.upper)
           .rename(columns = {"STUDENTID":"ID"})
        )

final = final_df(compared_enrollment, holds).sort_values('MAJR').reset_index(drop = True)

final = (final[final['ENROLLED'] == 'Not Enrolled']
             .sort_values('MAJR', axis = 0)
             .reset_index(drop = True)
        )
#Out of the students who are not enrolled from Spring 2025, 492  
#Have some form of a hold on their accounts. This leaves 2159 students from 
#Fall 2024 who are not enrolled for Spring 2025.
final = final[final['HOLDS'].isin(['A/R Hold', 'TB Test Hold', 'Acad Sus Hold', 'Hold-VP Stdt Svcs']) == False]


In [None]:
# This only produces the correct result if you switch the "final" object in the cell above from == False to == True

#final[final['MAJR'].isin(['CLAR', 'HOMG', 'HORM'])].sort_values('HOLDS').reset_index(drop = True)\
#  .to_csv('ELED, BEST Students With Holds FA23.csv', index = False)

In [None]:
final
Output = None
print("There are " + str(tot_enrolled.iloc[1, 1] - len(final)) + " students with holds on their accounts.")
print(len(final))

### Retrieving Graduates

I added to the filter .isin(['AW', 'RT', 'PN']) the 'PN' which is for "Pending" because we are going to assume the majority of these students will graduate. 'AW' is 'Awarded' and 'RT' is 'Reverse Transfer.'

In [None]:
#Import graduates
grads = (pd.read_csv(f'{DATA_PATH}/{GRAD_DATA}')
           .rename(columns = str.upper)
           .rename(columns = {'STUDENTID':'ID'})
           .query("STS in ['AW', 'RT','PN']")
           [['ID', 'STS']]
        )

#Merge Graduates with final
final = (final.merge(grads, how = 'left', on = 'ID')
              .drop_duplicates('ID')
              .query("STS.isnull()")
              .reset_index(drop = True)
        )

In [None]:
print("There are " + str(len(final[final['STYPE'] == "H"])) + " High School Students unenrolled.")
print("And there are " + str(len(grads)) + " graduates for Fall 2024 that were removed.")

# There are 403 High School Students unenrolled.
# And there are 412 graduates for Fall 2024 that were removed.

In [None]:
# Filter out High School Students
final = final[final['STYPE'] != 'H'].reset_index(drop = True)

In [None]:
print("There are " + str(len(final)) + " students who have no holds, have not graduated, \
and are not high school students that still have not enrolled for Spring 2025.", '\n\n',
     'This means there are ', str(round(len(final)/tot_enrolled['COUNT'].sum() * 100, 2)), '% of Fall students \
that are eligible to reenroll but have not.', sep = '')

# There are 1412 students who have no holds, have not graduated, 
# and are not high school students that still have not enrolled for Spring 2025.

#This means there are 21.04% of Spring students that are eligible to reenroll but have not.

### Remove Unusable Marks

It will not allow exportation of the file if there is a '/' or ':' in the major description. Therefore, I replaced that as shown below. I actually had to change out the .replace() method--once for (r':', ' ') and once for (r'/', ' '). There is certainly a more efficient way to do this. 

In [None]:
final['MAJR_DESC'] = final['MAJR_DESC'].replace(
    {
        r':': ' ',
        r'/': ' ',
        r'Early Childhood Business Administration': 'Early Child Bus Admin',
        r'Secondary Education Social Science': 'Sec Ed Soc Sci',
        r'eSports Digital Events Management': 'eSports'
    }, regex=True)

In [None]:
#First, split the data into a dataframe list
dfls = []
for i in sorted(final['MAJR_DESC'].unique()):
    temp = final[final['MAJR_DESC'] == i]
    dfls.append(temp)

Excelwriter = pd.ExcelWriter(ELIGIBLE_STUDENTS_XLSX_FILE, engine = 'xlsxwriter')

for df in dfls:
    sheet = list(df['MAJR_DESC'].unique())[0]
    df.to_excel(Excelwriter, sheet_name = sheet, index = False)

Excelwriter.close()

In [None]:
# I also want to print off a master sheet to excel that I can merge into the workbook
# with all of the majors split into separate sheets.

final.to_excel(ELIGIBLE_STUDENTS_XLSX_FILE_MASTER, index = False, header=True)


In [None]:
# Filter by major
def filter_by_major(df, column, select):
    
    filt = df[column].isin(select)
    new_df = df[filt].reset_index(drop = True)
    
    return new_df

# Parameters
col = 'MAJR_DESC'
selection = ['Culinary Arts', 'Hotel Management', 'Restaurant Management', 'Cul Arts (ACF Cert  Sous Chef)', 
             'Cul Arts (ACT Cert  Cul)']

# Run function
(filter_by_major(final, col, selection)
     .fillna('')
)