# Major Change Analysis

This was an interesting request from a Director of a program. One of the things we are always interested in in business analytics is retention of customers. Sometimes, a customer, in this case, students at a college, will stay with the organization but just switch what product they are interested in (i.e. switch majors). In this case, the customer is not *lost*, so to say, but rather has been retained, while switching to a new product. For Directors whose jobs might be somewhat contingent upon retention of customers, it is important to note who they brought in, that, while they may not have stayed with the original product, still stayed with the company or organization overall. Likewise, it is interesting to see the types of products a customer *starts with* who later switches to the Director's product (i.e. switches majors *to* the Director's majors). This kind of analysis can be good for starting to see trends on what products customers switch away to and what products customers switch from, all while staying within the organization. 

A secondary part of this analysis was to track the churn of the customer. There is always going to be a certain amount of loss from one period to another as well as a certain amount of new customers from one period to another. If one conducts a sloppy analysis, they might just look at how many customers where in a certain pipeline the previous period, how many are in it this period, and then subtract the current and previous to find the total "gain." This would not be a proper analysis because it fails to take into consideration the churn of the customers from the previous to the current period. For instance, there is a set of programs at this college that sees about 30 students leave the college altogether, every semester, and then it gains about 45 new students. From the high level analysis, that is a gain of only 15 students, while in reality it is a gain of 45 new students (i.e. customers). Moreover, suppose that there was only a loss of 20 students but then a gain of 35, it would still look like a net gain of 15 students, when in reality, the new customers (students) are declining (from 45 to 35). Consequently, both of these analysis are important. 

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Load previous semeseter and current semester
previous = pd.read_csv('202380 Enrollment.csv')
current = pd.read_csv('202410 Enrollment.csv')

# Alter column names to match
previous = previous.rename(columns = {'STDTNO':'ID', 'DEGR':'DEGREE', 'CURTRM':'TERM', 'STYP':'STYPE',\
                           'RESD':'RESCODE', 'MIDDLE':'MI'})

current = current.rename(columns = {'STDTNO':'ID', 'DEGR':'CUR_DEGREE', 'CURTRM':'CUR_TERM', 'STYP':'CUR_STYPE',\
                                    'RESD':'CUR_RESCODE', 'MIDDLE':'MI'})

# Unify column headings
previous.columns = [i.upper() for i in previous.columns]

In [None]:
# Load table of major codes with their descriptions
majr_desc = pd.read_csv('Major Descriptions.csv')

# Create dictionary of majors with their descriptions
d_majrs = dict(zip(majr_desc['MAJR'], majr_desc['MAJR_DESC']))

# Loop through previous and current semesters, connecting the major codes 
# with their major descriptions
previous['MAJR_DESC1'] = ['Undeclared' if i == '0' else d_majrs.get(i) for i in previous['MAJR']]

current['CUR_MAJR_DESC1'] = ['Undeclared' if i == '0' else d_majrs.get(i) for i in current['MAJR']]

# Select only the columns needed for the analysis
current = current[['ID', 'CUR_TERM', 'LNAME', 'FNAME', 'MI', 'CUR_STYPE', 'CUR_MAJR_DESC1']]

previous = previous[['ID', 'TERM', 'LNAME', 'FNAME', 'MI', 'STYPE', 'MAJR_DESC1']]

**Creating a Dataframe of Previous and Current Semesters**

In [None]:
#Merge the previous and current semesters
newdf = previous.merge(current, how = 'left', on = ['ID', 'LNAME', 'FNAME', 'MI'])
newdf['SAME_MAJR'] = newdf['MAJR_DESC1'] == newdf['CUR_MAJR_DESC1']

#The function below will filter though whatever majors we wish, comparing the 
#previous semester majors with the current semester majors, spitting out a 
#pivot table showing students who shifted majors.

def new_majr(df, majr_lst):
    """
    df (pd.DataFrame()): This is the merged dataframe of previous semester left joined with the current semester
    majr_list (list): List of major names you wish to filter by.
    
    """
    n_majr = df.copy()

    filt_majr = n_majr[n_majr['MAJR_DESC1'].isin(majr_lst)].reset_index(drop = True)

    cur_term = []
    cur_stype = []
    cur_majr_desc1 = []

    for i in range(len(filt_majr)):
        if filt_majr['CUR_TERM'].isnull()[i] == True:
            cur_term.append('Did Not Return')
        else:
            cur_term.append(filt_majr['CUR_TERM'][i])
        if filt_majr['CUR_STYPE'].isnull()[i] == True:
            cur_stype.append('Did Not Return')
        else:
            cur_stype.append(filt_majr['CUR_STYPE'][i])
        if filt_majr['CUR_MAJR_DESC1'].isnull()[i] == True:
            cur_majr_desc1.append('Did Not Return')
        else:
            cur_majr_desc1.append(filt_majr['CUR_MAJR_DESC1'][i])

    filt_majr['CUR_TERM'], filt_majr['CUR_STYPE'], filt_majr['CUR_MAJR_DESC1'] = cur_term, cur_stype, cur_majr_desc1

    filt_majr = pd.DataFrame(filt_majr.groupby(['MAJR_DESC1', 'CUR_MAJR_DESC1'])['ID'].count()).reset_index()\
                                      .rename(columns = {'ID':'COUNT'})

    filt_majr = filt_majr.pivot_table(values = 'COUNT', index = 'CUR_MAJR_DESC1', columns = 'MAJR_DESC1')

    for i in filt_majr.columns:
        filt_majr[i] = filt_majr[i].fillna(0).astype(int)

    filt_majr['TOTAL'] = filt_majr.sum(axis = 1)

    return filt_majr
    
#The program below will take a dataframe and output students from the previous semester
#who **SWITCHED TO** the majors under consideration. Whereas the 'new_majr' program shows
#how many students **SWITCHED AWAY** from the majors under consideration.

def switched_to_majrs(df, majr_lst):
    """
    df (pd.DataFrame): Dataframe that left joins the *current* semester with the previous semester (i.e. different than the
                       newdf), which left joins *previous* semester with current.
    majr_list (list): This is the same list that will be used in other functions. It is the list of majors you by which you
                      wish to filter.
    """

    majr_desc = []
    stype = []
    term = []
    for i in range(len(df)):
        if df['MAJR_DESC1'].isnull()[i] == True:
            majr_desc.append('New Students')
        else:
            majr_desc.append(df['MAJR_DESC1'][i])
        if df['STYPE'].isnull()[i] == True:
            stype.append('New Students')
        else:
            stype.append(df['STYPE'][i])
        if df['TERM'].isnull()[i] == True:
            term.append('New Students')
        else:
            term.append(df['TERM'][i])

    df['MAJR_DESC1'], df['STYPE'], df['TERM'] = majr_desc, stype, term

    df['SAME_MAJR'] = df['CUR_MAJR_DESC1'] == df['MAJR_DESC1']

    filt_majr = df[df['CUR_MAJR_DESC1'].isin(majr_lst)]\
                                       .reset_index(drop = True)
    filt_majr = pd.DataFrame(filt_majr.groupby(['CUR_MAJR_DESC1', 'MAJR_DESC1'])['ID'].count()).reset_index()\
                              .rename(columns = {'ID':'COUNT'})

    filt_majr = filt_majr.pivot_table(values = 'COUNT', index = 'MAJR_DESC1', columns = 'CUR_MAJR_DESC1')

    for i in filt_majr.columns:
        filt_majr[i] = filt_majr[i].fillna(0).astype(int)

    filt_majr['TOTAL'] = filt_majr.sum(axis = 1)

    return filt_majr

In [None]:
#Students not present in the current semester from previous semester.

def left_college(df, majr_lst, curr_year):
    """
    df (pd.DataFrame()): Dataframe of previous semester left joined with current semester.
    majr_list (list): List of majors by which you wish to filter.
    curr_year (int): Six digit semester code for current semester. 
    
    """
    
    left = df[(df['SAME_MAJR'] == False) & (df['CUR_TERM'] != curr_year)]

    filt_majr2 = left[left['MAJR_DESC1'].isin(majr_lst)]\
                                        .reset_index(drop = True)
    
    filt_majr2 = pd.DataFrame(filt_majr2.groupby('MAJR_DESC1')['ID'].count())\
                                        .rename(columns = {'ID':'COUNT'}).reset_index()\
                                        .rename(columns = {'MAJR_DESC1':'LEFT COLLEGE'})
    return filt_majr2

In [None]:
#new = newdf[(newdf['SAME_MAJR'] == False) & (newdf['TERM'] != 202280)]
    
#filt_majr3 = new[new['CUR_MAJR_DESC1'].isin(['Prof Culinary Arts-Culinarian','Prof Culinary Arts-Sous Chef','Professional Culinary Arts','Hotel Management', 'Restaurant Management'])]
    
#filt_majr3

In [None]:
# newdf2 is a dataframe comparing *current* semester enrollees to the previous semester enrollees
newdf2 = current.merge(previous, how = 'left', on = ['ID', 'LNAME', 'FNAME', 'MI'])
newdf2['SAME_MAJR'] = newdf2['CUR_MAJR_DESC1'] == newdf2['MAJR_DESC1']

#New students to select majors
def new_student(df, majr_lst, prev_year):
    """
    df (pd.DataFrame()): *Current* semester left joined with previous semester.
    majr_list (list): List of majors by which you wish to filter.
    prev_year (int): Six digit semester code of previous semester. 
    
    """
    new = df[(df['SAME_MAJR'] == False) & (df['TERM'] != prev_year)]
    
    filt_majr3 = new[new['CUR_MAJR_DESC1'].isin(majr_lst)]
    
    filt_majr3 = pd.DataFrame(filt_majr3.groupby('CUR_MAJR_DESC1')['ID'].count())\
                                        .rename(columns = {'ID':'COUNT'}).reset_index()\
                                        .rename(columns = {'CUR_MAJR_DESC1':'NEW STUDENTS'})
    return filt_majr3

In [None]:
new = newdf2[(newdf2['SAME_MAJR'] == False) & (newdf2['TERM'] != 202010)]
    
filt_majr3 = new[new['CUR_MAJR_DESC1'].isin(['Culinary Arts','Hotel Management', 'Restaurant Management'])].reset_index(drop = True)

filt_majr3

output = None

In [None]:
left = newdf[(newdf['SAME_MAJR'] == False) & (newdf['CUR_TERM'] != 202080)]

filt_majr2 = left[left['MAJR_DESC1'].isin(['Culinary Arts','Hotel Management', 'Restaurant Management'])]\
                                        .reset_index(drop = True)

present = []
for i in list(filt_majr3['ID']):
    if i in list(filt_majr2['ID']):
        present.append(i)
    else:
        present.append("Not Present")
        
present

ouput = None

In [None]:
#Df of current majors and how many students are in each one. 

pd.DataFrame(newdf2.groupby('CUR_MAJR_DESC1')['ID'].count()).rename(columns = {'ID':'COUNT'}).reset_index()\
                                                   .sort_values('COUNT', ascending = False)[:50].reset_index(drop = True)
#output = None

In [None]:
# Type in the majors you want to filter
majors = ['Culinary Arts', 'Hotel Management', 'Restaurant Management']

# The newdf is the dataframe that prioritizes the previous semester 
# over the current semester

# The 'term' in this case needs to be the current semester term
left_college(newdf, majors, 202410)

#output = None

In [None]:
# newdf2 is the dataframe that prioritizes the current semester
# over the previous semester

# The 'term' needs to be the previous term
new_student(newdf2, majors, 202380)

In [None]:
n = new_majr(newdf, majors)
n.loc[len(n.index)] = n.sum(axis = 0)
idx = list(n.index)
n.index = idx[:-1] + ['TOTAL']
n

In [None]:
sm_mjr = current.merge(previous, how = 'left', on = ['ID', 'LNAME', 'FNAME', 'MI'])
n2 = switched_to_majrs(sm_mjr, majors)
n2.loc[len(n2.index)] = n2.sum(axis = 0)
idx2 = list(n2.index)
n2.index = idx2[:-1] + ['TOTAL']
n2