# Retention calculator, simplifying retention for academic institutions

## Author : Eric Ríos Soderman 
## Introduction on Problem: Time and Human Resource 

#### Greetings, my project here was designed to address the various academic units' difficulty in calculating cohort retention due to time and resource constraints. For a quick overview on the terminology, a cohort is a group of students designated according to a varied set of specific criteria and are often identified by their entry year. Each cohort's retention is evaluated by taking the members of the cohort who successfully enrolled in the following semester and/or graduated during the academic year. Then the amount of members retained are divided by the overall cohort's total denominator. 

#### As for explaining the problem, there are a total of four statisticians in the institution, spread across three different units. One of the more difficult reports to generate is the retention data for 7 or 8 cohorts according to a timeline. Despite using a combination of microsoft excel and access, it remains a very time consuming task, 2 weeks to a full month, of trying to identify the semesters in which multiple cohorts' students enrolled or graduated, besides juggling our other reports. The second problem is a lack of resource. Since we statisticians are spread across different units, we don't have the flexibility to help each other when we work with this report. 

## Summary of Solution: Retention Calculator

#### First, the user's cohort file is turned into a list of lists. Then an instantiation of a cohort class and its attributes is run to identify key student identification and cohort columns in the user's cohort file. The following steps, dependent on user submitted enrollment data, append a 1, 0 and 2 for enrollment, non-enrolled, and graduation respectively to the user's cohort file. In addition, the user names the terms of the retention timeline and defines the graduation dates for them in a graduation dictionary. The reasoning is to provide flexibility; for example, my institution has two graduation dates for the first academic term, september and december or january. The final step is the calculator itself. It uses a nested index finder function and a dictionary-based function to tally the numbers according to the correct column corresponding to a given term. The calculator then uses these functions in tandem to continuously update a final resulting dictionary with all the tallies for a given cohort's terms. Lastly, the user has many options to see the results in raw numbers or percentages, separating graduate and enrolled tallies, and generating the date for single or multiple cohorts.

#### With this tool, my goal is for the retention report to be a matter of hours, not days or weeks.


### Step 1: Supply cohort student list(s) as csv file(s)

In [1]:
#This function will turn a csv file into a list of lists. It is assumed the list has a header.

def csv_to_list_opener(dataset_csv):
    opened_file = open(dataset_csv)
    from csv import reader
    read_file = reader(opened_file)
    list_mode = list(read_file)
    return list_mode

In [2]:
#The purpose of this class's method will be to record the future start point of the first semester in the timeline.
#The second purpose is to record the indexes of the student_id_column and the cohort_identifier_column.

class CohortCenter ():
    
    def __init__ (self, cohort_list_form, number_of_cohort_identifier_column, number_of_student_id_column):
        
        self.start = len(cohort_list_form[0])
        
        self.cohort_identifier = number_of_cohort_identifier_column - 1 #My target user knows nothing about python.
        
        self.sid_column_index = number_of_student_id_column - 1 #Subtracting 1 will help calculate the index number.
   

In [3]:
#Here is my example mock-data
#The user will have the flexibility to rename the file, and it will be mentioned to them in markdown.

test_cohort = csv_to_list_opener('Cohort2012.csv')

In [4]:
#This part is to show the first few rows

for row in test_cohort[:6]:
    print(row)
    print('\n')

['Example Student ID', 'Cohort_Yr']


['1', '2012']


['2', '2012']


['3', '2012']


['4', '2012']


['5', '2012']




In [5]:
#The user places the cohort's name inside the CohortCenter's parentheses, same as the one used for the csv function
#This later becomes crucial for the retention calculator itself

retention_marker = CohortCenter (test_cohort,2,1)

retention_start_point = retention_marker.start #For the retention calculator function

cohort_point = retention_marker.cohort_identifier #This is for a file with multiple cohorts

sid_point = retention_marker.sid_column_index #For semester attacher function

In [6]:
print(retention_start_point)
print(cohort_point)
print(sid_point)

2
1
0


### Step 2: Supply enrollment data

#### The semester_attacher function isolates the enrollment_id in a list and verifies if a given cohort record's student_id exists in said list. If it exists, the value appended under the term the user defines is a '1', indicating a term in which a student enrolled. A '0' is appended when the opposite is true, a student who is not enrolled. 

In [7]:
def semester_attacher(dataset_csv_with_header, semester_name_assigned_by_user, student_id_column_in_enrollment_data,
                      your_cohort_file, student_id_column_in_your_cohort = sid_point):
    
    semester_data = csv_to_list_opener(dataset_csv_with_header)
    
    your_cohort_file[0].append(semester_name_assigned_by_user) # The user names the semester to be attached
    
    semester_data_no_header = semester_data[1:] 
    
    new_semester_data_with_id_only = [] #Isolate all student ids to make a search easier
    
    student_index = 1 # The index of the first student record in the cohort data will be 1, 0 being the header
    
    
    for row in semester_data_no_header:
        
        student_id_enrollment = row[student_id_column_in_enrollment_data - 1] #SID location in semester file
        
        new_semester_data_with_id_only.append(student_id_enrollment)
            
    
    for row in your_cohort_file[1:]:
    
        
        student_id = row[student_id_column_in_your_cohort] # sid_point from the CohortCenter instantiation
        result = 0 
        
        for i, element in enumerate(new_semester_data_with_id_only):
            if element == student_id:
                result+=1
            
        if result>=1: #Helps to deal with enrollment files containing duplicates
            your_cohort_file[student_index].append(1) 
            
        elif result<=0:
             your_cohort_file[student_index].append(0)
                
        student_index+=1 # If someone enrolled, a '1' is appended to the cohort record thanks to the student_index
                         #Then a student_index increases in increments of 1.

In [8]:
#Running the semester_attacher with mock example files

#semester_attacher(dataset_csv_with_header, semester_name_assigned_by_user, student_id_column_in_enrollment_data,
#your_cohort_file) 

semester_attacher('Cohort2012 - 12A.csv','12-A',1,test_cohort)

semester_attacher('Cohort2012 - 12B.csv','12-B',1,test_cohort)

semester_attacher('Cohort2012 - 13A.csv','13-A',1,test_cohort)

semester_attacher('Cohort2012 - 13B.csv','13-B',1,test_cohort)

In [9]:
for row in test_cohort[:11]:
    print(row)
    print('\n')

['Example Student ID', 'Cohort_Yr', '12-A', '12-B', '13-A', '13-B']


['1', '2012', 1, 1, 1, 0]


['2', '2012', 1, 1, 1, 0]


['3', '2012', 1, 1, 0, 0]


['4', '2012', 1, 1, 0, 0]


['5', '2012', 1, 1, 0, 0]


['6', '2012', 1, 1, 0, 0]


['7', '2012', 1, 1, 0, 0]


['8', '2012', 1, 1, 0, 0]


['9', '2012', 1, 1, 0, 0]


['10', '2012', 1, 1, 0, 1]




#### One of the ways in which I made sure the semester_attacher function was working properly consisted of remembering that every cohort is enrolled as a whole typically in a first year or semester. Since the members of my test_cohort were all enrolled in the first semester, all of the values corresponding to '12-A' should be '1'.

### Step 4: Define the graduation timelines 

#### This step is where the user defines the graduation timelines in python dictionaries. The keys are the semester codes used for the semester_attacher function, and the values are the graduation terms, dates or codes as strings. However, dates are commonly typical. The reason for the user adding the dates manually accomodates different institutional criteria for assigning degree conferred dates, either for 2 or 4 year institutions.

In [10]:
#The user adds semester codes used in semester_attacher function as the dictionary keys.
#The dates conferred or the graduation codes will be values stored as strings.
#Three slots should accomodate most graduation date or code variations for a given term.

graduation_time_slots_1 = {'12-B' : '5/10/2013', '13-A' : '12/10/2013', '13-B' : '5/10/2014'}

graduation_time_slots_2 = {'12-B' : '6/10/2013','13-A' : '12/15/2013'}

graduation_time_slots_3 = {'13-B' : '6/10/2014'} #For example, my institution has two dates for the first semester.


### Step 5: Supply graduation data and run the graduation_marker function 

#### The following function "graduation_marker" focuses on identifying the term in which a cohort member graduates and changing the corresponding value of that point in time to 2. The general idea is now becoming clearer. A retention timeline of sorts is being created. 

#### A nested function called "graduation_roster_creator" first focuses on isolating the student id, the graduation date or code, and the dictionary key from graduation_time_slots_1/2/3  in a separate list. Each graduate record will have a timeline stamp of sorts, which is the dictionary key. Since the user defines both the terms (keys) and the dates or codes of graduation (values), a second nested function called "graduation_semester_index_finder" will use these keys and compare it to the semesters in the header. Upon finding a matching term, it will return the right index number of the term (key) in the header. 

#### The logic of the entire function is the following. If Alex is in the custom graduate roster (thanks to "graduation_roster_creator") and happens to graduate in the term '12-A', the "graduation_semester_index_finder" function will return the index number of that '12-A' in the header. Finally, this will indicate the correct column to be updated for the given cohort record. In this case, it would be index number 3, and the value stored at Alex's row will be updated to '2', denoting a graduation at that term.

In [11]:
#The function that will cull graduation data in a specific format and use it to update the cohort file's values


def graduation_marker (dataset_csv_with_header, your_cohort_file, student_id_column_in_graduation_data, 
                       graduation_date_column, graduation_time_slots_A = graduation_time_slots_1, 
                       student_id_column_in_your_cohort = sid_point,
                       second_graduation_slots = False, graduation_time_slots_B = graduation_time_slots_2,
                       third_graduation_slots = False, graduation_time_slots_C = graduation_time_slots_3):
    
#Phase 1: Preparing for the creation of custom graduation data for ease of use
    
    student_id_index_in_user_cohort = student_id_column_in_your_cohort #sid_point from CohortCenter method
    
    student_id_index_in_graduation_data = student_id_column_in_graduation_data - 1 
    
    graduation_date_index = graduation_date_column - 1
    
    graduation_data = csv_to_list_opener(dataset_csv_with_header)
    
    graduation_data_no_header = graduation_data[1:]
    
    graduation_time_list = [] #Will hold most important data from graduation file for identification of graduates
    
    
    def graduation_roster_creator (graduation_data_no_header, student_id_in_graduation_data, 
                                   graduation_date_index_number, graduation_time_slots):
        
        for key in graduation_time_slots:
            
            for row in graduation_data_no_header:
                
                student_id = row[student_id_index_in_graduation_data]
            
                graduation_date = row[graduation_date_index]
                
                if graduation_date == graduation_time_slots[key]:
                    
                    graduation_time_list.append([key, student_id, graduation_date])
                    
        return graduation_time_list #Creates a custom graduation record list with the essentials only
    

#Phase 2: Running the function to create a custom graduation roster from user defined parameters

    graduation_roster_creator(graduation_data_no_header, student_id_index_in_graduation_data, graduation_date_index, 
                              graduation_time_slots_A)
    
    if second_graduation_slots: #Covers for alternate degree dates alloted during the same semester
        
        graduation_roster_creator(graduation_data_no_header, student_id_index_in_graduation_data, graduation_date_index, 
                                  graduation_time_slots_B)
        
    if third_graduation_slots: #Up to three date variation scenarios per semester may cover most potential needs
        
        graduation_roster_creator(graduation_data_no_header, student_id_index_in_graduation_data, graduation_date_index, 
                                  graduation_time_slots_C) #All three graduation_time_slots were given default values
        
        
#Phase 3: Writing a function to aid the update of the right values to '2' based on the correct semester index number

    def graduation_semester_index_finder (a_list_with_header,semester_code):
        
        header = a_list_with_header[0]
        
        target_semester_index = []
        
        for i, element in enumerate(header):
            
            if element == semester_code:
                
                target_semester_index.append(i) #Appends the index
        
        return target_semester_index[0] #Assumes target_index occurs once and calls the value at index 0
    
#Phase 4: Writing the update itself
    
    for row in your_cohort_file[1:]:
        
        sid = row [student_id_index_in_user_cohort] 
        
        for graduation_record in graduation_time_list:
            
            semester_code_key = graduation_record[0] 
            graduate_id = graduation_record[1]
            
            if sid == graduate_id:
                
                row[graduation_semester_index_finder(your_cohort_file,semester_code_key)] = 2
                
            #The finder will return the index of the semester code in the cohort file.
            #Then the value at that given row's index is updated to become 2, to denote a person who graduated at 
            #that point in time.
    

In [12]:
#For this example, I will use all three graduation time slots.
#The user provides the file, the column numbers, and the values as true if necessary.

graduation_marker ('Grad Data All.csv', test_cohort, 1, 2, second_graduation_slots = True,
                   third_graduation_slots = True)


#Arguments

#graduation_marker (dataset_csv_with_header, your_cohort_file, student_id_column_in_graduation_data, 
#graduation_date_column, graduation_time_slots_A = graduation_time_slots_1, 
#student_id_column_in_your_cohort = sid_point,
#second_graduation_slots = False, graduation_time_slots_B = graduation_time_slots_2,
#third_graduation_slots = False, graduation_time_slots_C = graduation_time_slots_3)

In [13]:
#Now to observe the results and the emergence of 2s

for row in test_cohort[:11]:
    print(row)
    print('\n')

['Example Student ID', 'Cohort_Yr', '12-A', '12-B', '13-A', '13-B']


['1', '2012', 1, 2, 1, 0]


['2', '2012', 1, 1, 1, 0]


['3', '2012', 1, 1, 0, 2]


['4', '2012', 1, 1, 0, 0]


['5', '2012', 1, 2, 0, 0]


['6', '2012', 1, 1, 2, 0]


['7', '2012', 1, 1, 0, 0]


['8', '2012', 1, 1, 0, 0]


['9', '2012', 1, 1, 0, 0]


['10', '2012', 1, 1, 2, 1]




### Step 6: Generating the user's csv cohort file with its retention timeline 

#### The user may save his or her file at this point in time. The file would currently have a series of 1s, 2s, and 0s spread across an academic timeline. Therefore, the user would know when x or y student enrolled and/or graduated. For the user's ease, I highly reccomend creating the file in the jupyter notebook and then downloading it to another of his or her computer's folders.

In [14]:
def save_my_file (your_cohort_file, name_of_file): #User will be told to put name_of_file in quotes
    
    from csv import writer 
    
    name_of_file = str (name_of_file)
    
    file = open(name_of_file, 'w+') 
    
    file_to_write = writer(file)
    
    for row in your_cohort_file:
        
        file_to_write.writerow(row)
    
    user_instructions = "For you to download it, right-click the orange symbol next to the word 'jupyter'. You should land on the home page. Scroll up or down. As soon as you see your file, click the empty box next to it. Make sure you have only one item checkmarked. Go to the top of the page and click download. You got your file. Don't close the page yet. Now, hold on for the best part, the retention calculator is the next step."
    
    return user_instructions

In [15]:
save_my_file (test_cohort, 'Savemyfiletestfunction')

"For you to download it, right-click the orange symbol next to the word 'jupyter'. You should land on the home page. Scroll up or down. As soon as you see your file, click the empty box next to it. Make sure you have only one item checkmarked. Go to the top of the page and click download. You got your file. Don't close the page yet. Now, hold on for the best part, the retention calculator is the next step."

### Step 7A: The retention calculator, the single cohort solution

#### Before we continue, let's observe the attribute that was recorded at the instantiation of a CohortCenter class. I assigned the result of the "retention_marker.start"  to retention_start_point. Let's call that from the header of test_cohort.

In [16]:
test_cohort[0][retention_start_point:]

['12-A', '12-B', '13-A', '13-B']

#### As can be seen, the retention timeline was defined at the beginning in order to accurately lock where the academic timeline would begin for the cohort. This attribute is the cornerstone for the entire retention_calculator function and its nested functions.

#### Next, I will give an overview of the retention_calculator function. The retention_start_point is assigned to the user_timeline variable for ease of use, and a headcount of the cohort is calculated. For the single cohort scenario, a calculator function is run with the user updated cohort file. 

#### This function commences with an empty dictionary and a for-loop for every single term in the user_timeline. A nested semester_index_finder function assists in retrieving the index number of that term in the user's cohort file. Then an additional nested cohort_retention_counter function is able to focus on the right column and tally all the 1s and 2s and able to separate them into subgroups of those who enrolled and those who graduated. At this point in time, the user can choose to have the tallies combined or separate as well as in a percentage format.

#### Lastly, the empty dictionary is constantly updated with every semester tallied until it is finished. The return of the function is a series of print commands with the cohort's headcount and the finished dictionary tally. 

### Step 7B: The retention calculator, the multiple cohort problem solver

#### What if the user wanted to analyze multiple cohorts in one file? The requirements for this to work are to have the student_ids in one column and the cohort_ids in another, as well as specifying the multiple_cohorts argument as "True". For example, I can have records from cohorts 2012 and 2013. 

#### The multiple cohort part of the function begins by redefining the cohort headcount as a dictionary full of the cohorts as keys and their headcounts as values. It then uses the cohorts in a for-loop. This loop also includes an isolate_cohort function, which takes a cohort as argument and uses it to store only records of that cohort in a separate list. 

#### The rest functions in a similar manner. A message identifying the cohort is printed, and the calculator function is then run with the custom list previously produced and generates the tallies. The final result is the tallies for the different cohorts being showed through the print command.

In [17]:
# Single - Multiple cohort scenarios together



def retention_calculator (your_cohort_file, pct = False, timeline_start = retention_start_point, 
                          combine_graduates_and_enrolled = False, multiple_cohorts = False, 
                          cohort_column = cohort_point):
    
    
    user_timeline = your_cohort_file[0][timeline_start:] #The index number of the first term attached
        
    cohort_head_count = len (your_cohort_file[1:]) #If the user wants percentages, headcounts are needed
    
    
    def semester_index_finder (a_list_with_header,semester_code):#Assuming target_semester_index occurs once.
        
        header = a_list_with_header[0]
        target_semester_index = []
        
        for i, element in enumerate(header):
            
            if element == semester_code:
                target_semester_index.append(i) #Appends the index
                
        return target_semester_index[0]
    

    def cohort_retention_counter (a_list_with_header,index_from_semester_finder,semester_key): 
        
        dictionary = {}
        dictionary[semester_key+'M']=0 #M means enrolled in that semester
        dictionary[semester_key+'G']=0 #G means graduated in that semester
        
        combined_numbers_dictionary = {} #A dictionary for when the user wants the subgroups added together
        combined_numbers_dictionary[semester_key]=0 #One key instead of two keys, as opposed to before
        
        for row in a_list_with_header[1:]:

            column_semester = int(row[index_from_semester_finder])
            
            if column_semester == 1:
                dictionary[semester_key+'M']+=1
                combined_numbers_dictionary[semester_key]+=1 
                
            elif column_semester == 2:
                dictionary[semester_key+'G']+=1
                combined_numbers_dictionary[semester_key]+=1 #All dictionary variants updated
        
        if combine_graduates_and_enrolled:
            
            return combined_numbers_dictionary
        
        else:
            
            return dictionary
    
    
 
    def calculator(your_cohort_file, head_count = cohort_head_count):
        
        cohort_ret_freq_table = {} #The final dictionary that will house all of the terms and their tallies
        
        for semester in user_timeline:
            
            target_index = semester_index_finder(your_cohort_file,semester) #Returns index of a semester in header
            
            freq_table_semester = cohort_retention_counter(your_cohort_file,target_index,semester) 
            
            #freq_table_Semester generates a dictionary for semester and knows on what column to focus on
            #because of the semester_index_finder.
            
            if pct:#Percentage option
                
                for key in freq_table_semester:
                    
                    freq_table_semester[key] = (freq_table_semester[key] / head_count) * 100
                    
            cohort_ret_freq_table.update(freq_table_semester)#Updates until last semester in timeline is tallied
                    
        return print('Cohort headcount:',head_count), print('\n'),print(cohort_ret_freq_table)
                
    
    #This function is to ensure the success of a multiple cohort calculation
    #It isolates the records of a specific cohort in a list
    
    def isolate_cohort (your_cohort_file, cohort_key, cohort_column):
        
        isolated_cohort_list = []
        isolated_cohort_list.append(your_cohort_file[0]) #The header holds the retention timeline
        
        for row in your_cohort_file[1:]:
                
            cohort_id = row[cohort_column]#cohort_column = cohort_point, attribute of the CohortCenter instantiation
                
            if cohort_id == cohort_key: #cohort_key is a dictionary key defined later on.
                    
                isolated_cohort_list.append(row) #Besides a student_id, a cohort label is an id as well.
                    
        return isolated_cohort_list #Result is the user file's original header and records of the specific cohort.
    
    
    #The following is the code specifically tailored for a file with multiple cohorts.
    

    if multiple_cohorts:
        
        cohort_head_count = {} #This becomes redefined to allow for the headcounts of multiple cohorts
        
        for row in your_cohort_file[1:]:
            
            cohort_name = row[cohort_column]#cohort_point, attribute
            
            cohort_name = str(cohort_name)#making sure that the label of that cohort is registered as a string
            
            if cohort_name in cohort_head_count:
                
                cohort_head_count[cohort_name]+= 1
                    
            else:
                
                cohort_head_count[cohort_name]= 1
                
        #Now all cohorts exist as keys in the cohort_head_count dictionary with their headcounts as the values.
                
                
        for individual_cohort in cohort_head_count:
            
            individual_head_count = cohort_head_count[individual_cohort] #Retrieves the headcount of that cohort
            
            target_cohort_records = isolate_cohort (your_cohort_file, individual_cohort, cohort_column)
            
            string_identifier = 'The retention for cohort {}:'
            
            print(string_identifier.format(individual_cohort)) #It is a simple message to differentiate multiple
            print('\n')                                        #cohorts in the results and create more spacing.
            
            calculator(target_cohort_records, head_count = individual_head_count)#The head_count argument changed.
            
            print('\n')
            
            #The calculator returns a printing of the results and continues until finishing the last cohort.
    
    else:#single cohort 
    
        calculator(your_cohort_file)#Often times, this is the default. 
    

### Step 8A: The results of the retention calculator, single cohort scenario

In [18]:
#Running the calculator itself without additional conditions

retention_calculator(test_cohort)

Cohort headcount: 100


{'12-AM': 100, '12-AG': 0, '12-BM': 74, '12-BG': 17, '13-AM': 65, '13-AG': 11, '13-BM': 35, '13-BG': 12}


In [19]:
#Running the calculator with percentage as condition

retention_calculator(test_cohort, pct = True)

Cohort headcount: 100


{'12-AM': 100.0, '12-AG': 0.0, '12-BM': 74.0, '12-BG': 17.0, '13-AM': 65.0, '13-AG': 11.0, '13-BM': 35.0, '13-BG': 12.0}


In [20]:
#Using the combined condition

retention_calculator(test_cohort, combine_graduates_and_enrolled = True)

Cohort headcount: 100


{'12-A': 100, '12-B': 91, '13-A': 76, '13-B': 47}


In [21]:
#Using the combined condition and percentage
retention_calculator(test_cohort, pct = True, combine_graduates_and_enrolled = True)

Cohort headcount: 100


{'12-A': 100.0, '12-B': 91.0, '13-A': 76.0, '13-B': 47.0}


### Step 8B: The results of the retention calculator, multiple cohort scenario

In [22]:
#Here is my example mock-data

multi_test_cohort = csv_to_list_opener('MultiCohort2012-13.csv')

#The user places the cohort's name inside the CohortCenter's parentheses, same as the one used for the csv function
#This later becomes crucial for the retention calculator itself

retention_marker = CohortCenter (multi_test_cohort,2,1)

retention_start_point = retention_marker.start

cohort_point = retention_marker.cohort_identifier

sid_point = retention_marker.sid_column_index

semester_attacher('MCohort2012&13 - 12A.csv','12-A',1,multi_test_cohort)

semester_attacher('MCohort2012&13 - 12B.csv','12-B',1,multi_test_cohort)

semester_attacher('MCohort2012&13 - 13A.csv','13-A',1,multi_test_cohort)

semester_attacher('MCohort2012&13 - 13B.csv','13-B',1,multi_test_cohort)

#User adds semester codes used in semester_attacher function and dates conferred as string values
#Three slots should accomodate most graduation date or code variations.

graduation_time_slots_1 = {'12-B' : '5/10/2013', '13-A' : '12/10/2013', '13-B' : '5/10/2014'}

graduation_time_slots_2 = {'12-B' : '6/10/2013','13-A' : '12/15/2013'}

graduation_time_slots_3 = {'13-B' : '6/10/2014'}

graduation_marker ('MultiGrad Data All.csv', multi_test_cohort, 1, 2, 
                   second_graduation_slots = True,third_graduation_slots = True)

In [23]:
#First Test, no conditions except Multi
retention_calculator (multi_test_cohort, pct = False, timeline_start = retention_start_point,
                      combine_graduates_and_enrolled = False, multiple_cohorts = True, 
                      cohort_column = 1)

The retention for cohort 2012:


Cohort headcount: 100


{'12-AM': 100, '12-AG': 0, '12-BM': 74, '12-BG': 17, '13-AM': 65, '13-AG': 11, '13-BM': 35, '13-BG': 12}


The retention for cohort 2013:


Cohort headcount: 110


{'12-AM': 58, '12-AG': 0, '12-BM': 46, '12-BG': 24, '13-AM': 80, '13-AG': 20, '13-BM': 48, '13-BG': 23}




In [24]:
#Second test without multi condition added, but the numbers from both cohorts were combined
retention_calculator (multi_test_cohort, pct = False, timeline_start = retention_start_point,
                      combine_graduates_and_enrolled = False, multiple_cohorts = False, 
                      cohort_column = cohort_point)

Cohort headcount: 210


{'12-AM': 158, '12-AG': 0, '12-BM': 120, '12-BG': 41, '13-AM': 145, '13-AG': 31, '13-BM': 83, '13-BG': 35}


In [25]:
#Third, percent activated
retention_calculator (multi_test_cohort, pct = True, timeline_start = retention_start_point,
                      combine_graduates_and_enrolled = False, multiple_cohorts = True, 
                      cohort_column = cohort_point)

The retention for cohort 2012:


Cohort headcount: 100


{'12-AM': 100.0, '12-AG': 0.0, '12-BM': 74.0, '12-BG': 17.0, '13-AM': 65.0, '13-AG': 11.0, '13-BM': 35.0, '13-BG': 12.0}


The retention for cohort 2013:


Cohort headcount: 110


{'12-AM': 52.72727272727272, '12-AG': 0.0, '12-BM': 41.81818181818181, '12-BG': 21.818181818181817, '13-AM': 72.72727272727273, '13-AG': 18.181818181818183, '13-BM': 43.63636363636363, '13-BG': 20.909090909090907}




In [26]:
#Fourth, combined active
retention_calculator (multi_test_cohort, pct = False, timeline_start = retention_start_point,
                      combine_graduates_and_enrolled = True, multiple_cohorts = True, 
                      cohort_column = cohort_point)

The retention for cohort 2012:


Cohort headcount: 100


{'12-A': 100, '12-B': 91, '13-A': 76, '13-B': 47}


The retention for cohort 2013:


Cohort headcount: 110


{'12-A': 58, '12-B': 70, '13-A': 100, '13-B': 71}




In [27]:
#Fifth, pct and combined 
retention_calculator (multi_test_cohort, pct = True, timeline_start = retention_start_point,
                      combine_graduates_and_enrolled = True, multiple_cohorts = True, 
                      cohort_column = cohort_point)

The retention for cohort 2012:


Cohort headcount: 100


{'12-A': 100.0, '12-B': 91.0, '13-A': 76.0, '13-B': 47.0}


The retention for cohort 2013:


Cohort headcount: 110


{'12-A': 52.72727272727272, '12-B': 63.63636363636363, '13-A': 90.9090909090909, '13-B': 64.54545454545455}




# Conclusion

#### This tool's main purpose is to reduce the time and resources required to generate a retention report for the cohorts' timeline. The user's cohort file is turned into a list of lists and is appended a series of 1s, 0s, and 2s to identify retention in a given term. This information is then tallied using the retention_calculator function, a combination of various nested functions working together to produce tallies for every cohort's semester. The user can then see the results in percentages or raw numbers, in enrolled and graduated subgroups or a combined group tally, as well as run the calculator for multiple cohorts, which would essentially be saving time.

#### I hope this tool can better assist the other three statisticians when this type of report is produced, as this is often the time when we need each other's help the most. I also hope that this tool reduces time consumption from weeks to hours. The time saved can be better spent on more pressing questions the many other stakeholders may have about the nature of retention itself as well as the students we are trying to retain.