# 15011c: CE Student Supports

## Part 1: Cleaning Survey Data
#### _Programmer_: Bonnie Brooks

##### **Purpose**: Cleaning the CE Student Supports survey data downloaded from Qualtrics. 
##### **Deliverables**: clean Excel spreadsheet unique at the college level.

###### Before running this code:
* First, download survey data from Qualtrics via the Data & Analysis tab > Data > Export & Import button > Export
* Save in R: drive folder as "Survey Responses (most recent date of download).csv"



In [None]:
# import libraries you need

import pandas as pd
import numpy as np

# import survey data

pd.set_option('precision', 0)

data = pd.read_csv("Survey Responses (1.13.2020).csv")

data.head()

In [None]:
# initial data cleaning (i.e. dropping NaNs, unwanted rows, etc.)

data = data.drop(index=1)
data = data.drop(index=0)
data = data.dropna(subset = ['Q1'])
data = data.fillna("NaN")

In [None]:
# sampling questions as a check
data[['QID11_1', 'QID11_2', 'QID11_3', 'Q1', 'E11', 'E18']].head(10)

In [None]:
data_cleaned = pd.DataFrame()
data_cleaned[['ResponseId','FIRST_NAME', 'LAST_NAME', 'COLLEGE_NAME', 'EMAIL_ADDRESS', 'TITLE']] = data[['ResponseId','RecipientFirstName', 'RecipientLastName', 'QID13', 'QID11_3', 'QID158']]
data_to_merge = data.drop(['StartDate', 'EndDate', 'Status', 'IPAddress', 'Progress', 'Duration (in seconds)', 'Finished', 'RecordedDate', 'ExternalReference', 'LocationLatitude', 'LocationLongitude', 'DistributionChannel', 'UserLanguage', 'QID141','QID158', 'RecipientFirstName', 'RecipientLastName', 'RecipientEmail','QID13', 'QID11_3', 'QID158'], axis=1)

In [None]:
# check to see if data_cleaned and data_to_merge look right
data_to_merge.head(10)

### _Problem_: 
###### Several questions (e.g. E5) allow survey respondents to indicate multiple text answers, which shows up as multiple text answers that are comma separated in a single cell for a given respondent. Need to create a single column for a given problem and corresponding answer choice with 0s/1s for if a respondent indicated that choice (e.g. column E5_1 for "Case management")  
### _Answer_: 
###### - create dictionaries of possible answers for a given question
###### - create function that creates a list of all possible answers for a given multiple choice question 
###### - create function that creates new columns based on a given question and possible answer
###### - apply functions to all multiple choice questions

In [None]:
# dictionary of answers for questions with text answers

dictionaries = {
    
# E1

1 : {
    "Financial aid application information" : "1",
    "Students who are placed in developmental/remedial education courses" : "2",
    "Referrals from high school counselors or programs"  : "3",
    "Information provided by county" : "4",
    "Other information provided by students" : "5",
    "Other" : "6",
    "NaN" : "999"
},

# E2, E3
### MANUALLY CHANGE E2_7 E3_7 E2_8 E3_8

2 : {
    "Program information sent to all students at the college" : "1",
    "EOPS marketing department/division materials or events" : "2",
    "Automatic district-generated email sent to students whose financial aid information suggested they would be eligible" : "3",
    "Contacted by EOPS staff based on information from the students financial aid application" : "4",
    "Contacted directly by the financial aid office" : "5",
    "High school counselor or other advertising in high schools" : "6",
    "Referrals from college staff or faculty e.g. on-campus organizations professors" : "7",
    "Referrals from off-campus service providers e.g. community based organizations county departments" : "8",
    "Word-of-mouth" : "9",
    "Other" : "10",
    "NaN" : "999"
},
    
# E5, E14, C2, C9, NU2, CAL5
# Questions that are contingent on the above being answered:
# E6, E7, E8, E9, E15, E16, C3, C4, C5, C10, NU3, NU4, NU5, NU8, NU9, CAL6, CAL7, CAL8, CAL12, CAL13, 
            
3: {
    "Case-management/coordination of on-campus services" : "1",
    "Referrals to county-provided services" : "2",
    "Referrals to community based organizations" : "3",
    "Academic counseling": "4",
    "Personal counseling separate from academic counseling": "5", 
    "Tutoring" : "6",
    "Textbook assistance" : "7", 
    "School supplies" : "8", 
    "Transportation": "9", 
    "Food pantry or meal tickets" : "10", 
    "Work study job" : "11",
    "For-credit college courses e.g. College Success course" : "12", 
    "Computer lab not available for use by the general student body" : "13", 
    "Laptop loan program" : "14",
    "Summer bridge program" : "15", 
    "Child care center": "16", 
    "Child care subsidies" : "17", 
    "Vocational board exam and certification fees" : "18", 
    "Transfer application fees to CSUs and UCs" : "19", 
    "Housing assistance" : "20", 
    "Mental health services": "21", 
    "Unmet needs grants" : "22", 
    "Emergency aid financial" : "23",
    "Clothing" : "24",
    "Tools" : "25",
    "Others" : "26", 
    "NaN" : "999"
},

### manually change E5_12, E14_12, C2_12, C9_12, NU2_12, CAL5_12
    
# E17, C11, NU10, CAL14

4: {
    "First-come first-serve" : "1",
    "Need-based by income" : "2",
    "Need-based holistic" : "3",
    "CalWORKs students" : "4",
    "Age" : "5",
    "Students children are at risk for foster-care" : "6",
    "Students children at risk for foster-care" : "7",
    "Current foster care status" : "8",
    "Other" : "9",
    "EOPS/CARE students" : "10",
    "NaN" : "999"
},

# CAL1
    
5: {
    "Automatic district-generated emails sent to based on the financial aid information in a students college application" : "1",
    "Advertising in high schools" : "2",
    "Communicating with high school counselors" : "3",
    "Information provided by students through EOPS application" : "4",
    "County referral" : "5",
    "Other information provided by students" : "6",
    "Other" : "7",
    'NaN': "999"
},
    
# E11, C7, NU7, CAL10
    
6 : {
    "Phone" : "1",
    "Text message" : "2",
    "Email" : "3",
    "In-person appointment" : "4",
    "Other:" : "5",
    "NaN" : "999"
},
    

# E13
    
7 : {
    "Limited capacity in the overall program a program cap" : "1",
    "Lack of awareness of the program" : "2",
    "Thinking the program wont be helpful" : "3",
    "Thinking the program is inconvenient/will take too much time" : "4",
    "Being enrolled in too few credits" : "5",
    "Already having participated in EOPS for maximum number of terms" : "6",
    "Having completed more than 70 units" : "7",
    "Applying too late in the semester" : "8",
    "Forms are too complicated or too much of a hassle" : "9",
    "Not being a CA resident or Dream Act eligible" : "10",
    "Being above the low-income threshold have too much income" : "11",
    "Age restrictions" : "12",
    "Not being considered academically disadvantaged" : "13",
    "Being generally uninterested in the program" : "14",
    "Other program eligibility requirements" : "15",
    "Others" : "16",
    "None" : "17",
    "NaN" : "999"
    
},
  
# E18, C12, NU11
    
8 : {
    "No longer feel like they are benefiting from the program" : "1",
    "Dont have time for the program" : "2",
    "Enrolling in too few credits" : "3",
    "Maxed out terms allowed to be enrolled in EOPS/CARE" : "4",
    "Not meeting with case manager/counselor" : "5",
    "Income increases over low-income threshold" : "6",
    "Student age requirements" : "7",
    "Low GPA 1st offense" : "8",
    "Low GPA repeat offense had chance to remedy GPA after a warning" : "9",
    "Not following educational plan" : "10",
    "Other measures of inadequate academic progress besides GPA" : "11",
    "Other" : "12",
    "None" : "13",
    "Parent is no longer a single head of household" : "14",
    "Income increases such that a parent is no longer meets the eligibility criteria for welfare" : "15",
    "Childs age" : "16",
    "NaN" : "999"
    
},

# E20
    
9 : {
    "Not aware of the program" : "1",
    "Would not benefit from the program or find the program unnecessary" : "2",
    "Do not have the time or resources to apply for the program" : "3",
    "Do not have the time or resources to participate in the program" : "4",
    "Do not want to be enrolled in the program due to stigma" : "5",
    "Do not want to be enrolled in the program for other reasons" : "6",
    "Other" : "7",
    "NaN" : "999"
},

# C8

10 : {
    "Limited capacity in CARE a program cap" : "1",
    "Not interested in the services CARE provides" : "2",
    "Not a single parent household not a one-parent CalWORKs assitance unit as verified by county" : "3",
    "Do not meet eligibility criteria for TANF/CalWORKs e.g. income is too high" : "4",
    "Cannot show/verify welfare eligibility even though they are likely eligible" : "5",
    "Children are older than 14" : "6",
    "Parent is younger than 18" : "7",
    "Other program eligibility requirements" : "8",
    "Others" : "9",
    "NaN" : "999"
},

# CAL2
# Questions that are contingent on the above being answered:
# CAL3

11 : {
    "Automatic email to all students" : "1",
    "Automatic district-generated email sent only to students whose financial aid information suggested they would be eligible" : "2",
    "Other use of financial aid information" : "3",
    "High school counselor" : "4",
    "Other advertising in high schools" : "5",
    "Flyers/brochures posted in community based organizations off-campus" : "6",
    "Flyers/brochures posted in on-campus organizations" : "7",
    "Referrals from professors" : "8",
    "Referrals from EOPS/CARE advisors or staff" : "9",
    "Referrals from other school administrators" : "10",
    "Referrals from other campus service providers" : "11",
    "Referrals from county welfare department" : "12",
    "Word-of-mouth" : "13",
    "Not sure" : "14",
    "Other" : "15",
    "NaN" : "999"
},
    
# CAL15

12 : {
    "Enrolling in too few credits" : "1",
    "Not meeting with case manager/counselor" : "2",
    "Income increases over the CalWORKs/TANF threshold" : "3",
    "Age requirements parent or child" : "4",
    "Not enough hours of combined education and work to meet Welfare-to-Work requirements" : "5",
    "Timed-out of eligibility to use education as a Welfare-to-Work activity" : "6",
    "Timed-out of CalWORKs/TANF in general" : "7",
    "Sanctioned from CalWORKs/TANF for another reason" : "8",
    "No longer feel like they are benefiting from the program" : "9",
    "Dont have time for the program" : "10",
    "Low GPA 1st offense" : "11",
    "Low GPA repeat offense had chance to remedy GPA after a warning" : "12",
    "Not following educational plan" : "13",
    "Other measures of inadequate academic progress besides GPA" : "14",
    "Other" : "15",
    "NaN" : "999"
},
    
# O1
    
13 : {
    "Ujoma" : "1",
    "Puente" : "2",
    "Fresh Success part of SNAP E&T" : "3",
    "Other SNAP E&T Program" : "4",
    "Guardian Scholars" : "5",
    "DSPS" : "6",
    "Student Support Services program SSS" : "7",
    "Formerly Incarcerated Students in Transition FIST or other programs targeting formerly incarcerated students" : "8",
    "Other please list all applicable programs" : "9",
    "None of the above" : "10",
    "NaN" : "999"
},
    
# Q2

14 : {
    "Yes" : "1",
    "No" : "2",
    "No but we offer a similar program for foster youth called:" : "3",
    "NaN" : "999"
}
}

In [None]:
# defining lists of questions pertaining to different dictionaries of possible answers

questions_for_dict_1 = ['E1']
questions_for_dict_2 = ['E2', 'E3']
questions_for_dict_3 = ['E5', 'E14', 'C2', 'C9', 'CAL5', 'CAL12', 'NU2', 'NU8']
questions_for_dict_4 = ['E17', 'C11', 'NU10', 'CAL14']
questions_for_dict_5 = ['CAL1']
questions_for_dict_6 = ['E11', 'C7', 'NU7', 'CAL10']
questions_for_dict_7 = ['E13']
questions_for_dict_8 = ['E18', 'C12', 'NU11']
questions_for_dict_9 = ['E20']
questions_for_dict_10 = ['C8']
questions_for_dict_11 = ['CAL2', 'CAL3']
questions_for_dict_12 = ['CAL15']
questions_for_dict_13 = ['O1']
             
# function that creates a list of all possible answers that survey respondents could've chosen for each multiple choice question
    
def uniqueanswer(question):
    
    newlist = []
    
    for cell in data[1:][question].str.replace(', ',' ').str.replace('"', '').str.replace("'", "").str.replace("\(", "").str.replace("\)", "").str.split(','):
        
#         if 'NaN' not in cell:
            
            newlist.extend(cell)
            allanswers = list(set(newlist))
            
    return allanswers

In [None]:
# testing uniqueanswer function
uniqueanswer('NU8') # dict(list(enumerate(uniqueanswer('E5'))))

In [None]:
# function that takes each possible answer from a multiple choice question and transforms into variables, adds to new cleaned data frame "data_cleaned"

def newvars(question,column_names_list=['RESPONSE_ID','FIRST_NAME', 'LAST_NAME', 'COLLEGE_NAME', 'EMAIL_ADDRESS', 'TITLE'],allanswers=[]):
    
    for answer in allanswers:
        
            # mask command returns True/False if condition is met (in this case, if a question has a particular answer based on "allanswers")
            
            mask_allanswers = data[question].str.contains(answer) 
            column_name = ""
            
            if question in questions_for_dict_1: 
                column_name = question + "_" + dictionaries[1][answer]
            
            elif question in questions_for_dict_2:
                column_name = question + "_" + dictionaries[2][answer]
                
            elif question in questions_for_dict_3:
                column_name = question + "_" + dictionaries[3][answer]
            
            elif question in questions_for_dict_4:
                column_name = question + "_" + dictionaries[4][answer]
                
            elif question in questions_for_dict_5:
                column_name = question + "_" + dictionaries[5][answer]
                
            elif question in questions_for_dict_6:
                column_name = question + "_" + dictionaries[6][answer]
                
            elif question in questions_for_dict_7:
                column_name = question + "_" + dictionaries[7][answer]
                
            elif question in questions_for_dict_8:
                column_name = question + "_" + dictionaries[8][answer]
                
            elif question in questions_for_dict_9:
                column_name = question + "_" + dictionaries[9][answer]
                
            elif question in questions_for_dict_10:
                column_name = question + "_" + dictionaries[10][answer]
                
            elif question in questions_for_dict_11:
                column_name = question + "_" + dictionaries[11][answer]
                
            elif question in questions_for_dict_12:
                column_name = question + "_" + dictionaries[12][answer]
                
            elif question in questions_for_dict_13:
                column_name = question + "_" + dictionaries[13][answer]
            
            elif question == 'Q1':
                
                column_name = answer
                
            elif question == 'Q2':
                
                column_name = question + "_" + dictionaries[14][answer]
                
            else:
                pass
            
            data_cleaned[column_name] = 0
            data_cleaned.loc[mask_allanswers,column_name] = 1
            column_names_list.append(column_name)

    return data_cleaned[column_names_list].head(10)


In [None]:
# creating a list of all question numbers that are multiple choice

#EOPS_questions_list = ['Q1', 'E1', 'E4_2', 'E4_5', 'E6_9', 'E7_9', 'E8_9', 'E9_9', 'E10', 'E12', 'E17', 'E19', 'C1', 'C3_9']
#CALWORKS_questions_list = ['Q1', 'CAL1', 'CAL4_2', 'CAL4_5', 'CAL7_9', 'CAL8_9', 'CAL9', 'CAL11', 'E&CAL1_4']

list_of_columns = ['Q1', 'Q2', 'E1', 'E2', 'E3', 'E5', 'E11', 'E13', 'E14', 'E17', 'E18', 'E20', 'C2', 'C7', 'C8', 'C9', 'C11', 'C12', 'NU2', 'NU7', 'NU8', 'NU10', 'NU11', 'CAL1', 'CAL2', 'CAL3', 'CAL5', 'CAL10', 'CAL12', 'CAL14', 'CAL15', 'O1']

for column in list_of_columns:
    data[column] = data[column].str.replace(', ',' ').str.replace('"', '').str.replace("'", "").str.replace("\(", "").str.replace("\)", "")
    unique_answers = uniqueanswer(column)
    newvars(column,column_names_list=['ResponseId','FIRST_NAME', 'LAST_NAME', 'COLLEGE_NAME', 'EMAIL_ADDRESS', 'TITLE'],allanswers=unique_answers)

In [None]:
data_cleaned['E5_1'].head(10)

In [None]:
# cleaning Q2 (asks about if college offers NextUp/CAFYES program for foster youth)

data_cleaned['NextUp_CAFYES'] = data_cleaned['Q2_1']
mask = data_cleaned['Q2_999'] == 1
data_cleaned.loc[mask, 'NextUp_CAFYES'] = 999

In [None]:
data_cleaned[['FIRST_NAME', 'LAST_NAME', 'COLLEGE_NAME', 'EOPS/CARE', 'CalWORKs', 'Q2_1', 'Q2_2', 'Q2_3', 'Q2_999', 'NextUp_CAFYES']].head(15)

In [None]:
# merging dataframe of cleaned questions with dataframe of rest of questions

data_final = pd.merge(data_cleaned,
                 data_to_merge,
                 on='ResponseId')

data_final[['FIRST_NAME', 'LAST_NAME', 'COLLEGE_NAME', 'EOPS/CARE', 'CalWORKs', 'E1', 'E12', 'E2_7', 'E5_5', 'E2_9','E17_2','E18_11', 'CAL2_13', 'E&CAL1_1', 'O1_1']].head(10)
# E18_2: Don't have time for the program

In [None]:
cols = data_final.filter(regex='^(E&CAL)\d+_\d+.*(?<!TEXT)$').columns # .*(?<!a)$
data_final[cols].head(10)

In [None]:
# creating list of all columns (i.e. survey questions) in dataframe

cols = data_final.filter(regex='^(E|C|NU|O)').columns # .*(?<!a)$
cols_add = pd.Index(['FIRST_NAME', 'LAST_NAME', 'NextUp_CAFYES'])
cols_final = cols.append(cols_add)

In [None]:
the_columns = list(cols_final)
the_columns.sort()

In [None]:
# very long list generated! 

the_columns

### _Problem_: 
###### Need to reflect valid missings. Cases where there are valid missings:
###### - Some survey respondents only responded for one program (e.g. EOPS/CARE), which means they wouldn't answer questions for other programs (e.g. CalWORKs). 
###### - Some answer choices weren't chosen, meaning carry-forward questions pertaining to a given answer choice wouldn't need to be answered (e.g. if "Transportation" wasn't a service offered at a given college, the questions pertaining to "Transportation" wouldn't be filled out)
### _Answer_: 
###### - group questions that pertain to a given program 
###### - create a program that indicates missing values where values are truly missing, and a "999" to indicate valid missings.

In [None]:
# program that groups questions that pertain to a given program and indicates valid missings in questions for programs the respondent is not responding for
# program also replaces text "NaN" with np.nan for those questions with TEXT suffix

categories = ['EOPS/CARE', 'NextUp_CAFYES', 'CalWORKs']

for x in categories:
    
    if x == 'EOPS/CARE':
        
        cols = data_final.filter(regex='(E|C|NU)\d+$').columns # .*(?<!a)$
        cols_1 = data_final.filter(regex='(E|C)\d+$').columns # .*(?<!a)$
        cols_2 = data_final.filter(regex='(E|C)\d+_\d+.*(?<!TEXT)$').columns
        cols_3 = data_final.filter(regex='^CAL\d+$').columns # .*(?<!a)$
        cols_4 = data_final.filter(regex='^CAL\d+_\d+.*(?<!TEXT)$').columns
        
        data_final.loc[((data_final[cols] == "NaN").all(axis=1)) & (data_final['EOPS/CARE'] == 1), cols] = np.nan
        data_final.loc[((data_final[cols].isnull()).all(axis=1)) & (data_final['EOPS/CARE'] == 1), cols_2] = np.nan
        data_final.loc[(data_final['CalWORKs'] == 0), cols_4] = 999
        
        data_final_EOPS = data_final.loc[(data_final['EOPS/CARE'] == 1, )]
        
    elif x == 'CalWORKs':
        
        cols = data_final.filter(regex='^CAL\d+$').columns # .*(?<!a)$
        cols_2 = data_final.filter(regex='^CAL\d+_\d+.*(?<!TEXT)$').columns
        cols_3 = data_final.filter(regex='(E|C)\d+$').columns # .*(?<!a)$
        cols_4 = data_final.filter(regex='(E|C)\d+_\d+.*(?<!TEXT)$').columns
        
        data_final.loc[((data_final[cols] == "NaN").all(axis=1)) & (data_final['CalWORKs'] == 1), cols] = np.nan
        data_final.loc[((data_final[cols].isnull()).all(axis=1)) & (data_final['CalWORKs'] == 1), cols_2] = np.nan
        data_final.loc[(data_final['EOPS/CARE'] == 0), cols_4] = 999
        
    elif x == 'NextUp_CAFYES':
        
        cols = data_final.filter(regex='NU\d+$').columns # .*(?<!a)$
        cols_2 = data_final.filter(regex='NU\d+_\d+.*(?<!TEXT)$').columns
        cols_3 = data_final.filter(regex='^CAL\d+$').columns # .*(?<!a)$
        cols_4 = data_final.filter(regex='^CAL\d+_\d+.*(?<!TEXT)$').columns
        
        data_final.loc[((data_final[cols] == "NaN").all(axis=1)) & (data_final['NextUp_CAFYES'] == 1), cols] = np.nan
        data_final.loc[((data_final[cols].isnull()).all(axis=1)) & (data_final['NextUp_CAFYES'] == 1), cols_2] = np.nan
        data_final.loc[(data_final['CalWORKs'] == 0), cols_4] = 999

In [None]:
trus = data_final.filter(regex='O\d+_\d+.*(?<!TEXT)$').columns # .*(?<!a)$
trus2 = trus.drop(['O1_999'])
data_final.loc[(data_final['O1_999'] == 1) & ((data_final[trus2] == 0).all(axis=1)), trus2] = np.nan

In [None]:
nans = data_final.filter(regex='E\d+_999$').columns

In [None]:
nans

In [None]:
# testing the valid missing code

data_final[['FIRST_NAME', 'LAST_NAME', 'COLLEGE_NAME', 'EOPS/CARE', 'CalWORKs', 'NextUp_CAFYES', 'E1_999', 'E5_1', 'E6_1', 'E5_999', 'C2_1', 'C3_1', 'NU2_1', 'NU3_1', 'CAL5_1', 'CAL6_1', 'E&CAL1_1', 'O1_4', 'O1_999']].head(10)

In [None]:
# list of carry-forward questions related to a given question

# questions_E5 = ['E6', 'E7', 'E8', 'E9']
# questions_E14 = ['E15', 'E16']
# questions_C2 = ['C3', 'C4', 'C5']
# questions_C9 = ['C10']
# questions_NU2 = ['NU3', 'NU4', 'NU5', 'NU8', 'NU9']
# questions_CAL5 = ['CAL6', 'CAL7', 'CAL8', 'CAL12', 'CAL13']
# questions_CAL2 = ['CAL3'] 

In [None]:


list_of_questions = ['E6', 'E7', 'E8', 'E9', 'E15', 'E16', 'C3', 'C4', 'C5', 'C10', 'NU3', 'NU4', 'NU5', 'NU9', 'CAL6', 'CAL7', 'CAL8', 'CAL13']
new_list = []
for i in the_columns:
    for x in list_of_questions:
        if x in i:
            new_list.append(i)

In [None]:
new_list

In [None]:
# recoding "Empty" values as "999" for text questions with only one answer

new_list_2 = ['E10', 'E12', 'E19', 'C1', 'C6', 'NU1', 'NU6', 'CAL11', 'CAL9']
for x in new_list_2:
    data_final.loc[data_final[x] == "Empty", x] = "999"

In [None]:
from natsort import natsorted, ns
the_columns_final = natsorted(the_columns)

In [None]:
data_final = data_final[the_columns_final]

In [None]:
data_final[['FIRST_NAME', 'LAST_NAME', 'COLLEGE_NAME', 'E5_1','E6_1', 'E1_6_TEXT']].head(10)

In [None]:
data_final.to_excel("Data_Cleaned_Detail_Final.xlsx", sheet_name="ALL")

In [None]:
cols = [c for c in data_final.columns if '_TEXT' not in c]
data_final = data_final[cols]

In [None]:
cols_2 = data_final.filter(regex='^(E|C|NU|CAL|E&CAL|O)\d+_\d+.*(?<!TEXT)$').columns

In [None]:
cols_add = pd.Index(['FIRST_NAME', 'LAST_NAME', 'NextUp_CAFYES', 'COLLEGE_NAME', 'EMAIL_ADDRESS', 'EOPS/CARE', 'CalWORKs', 'E10', 'C1', 'C6', 'CAL11', 'CAL9', 'E12', 'E19', 'NU1', 'NU6'])
cols_3 = cols_2.append(cols_add)
cols_final = list(cols_3)

In [None]:
from natsort import natsorted, ns
the_cols_final = natsorted(cols_final)

In [None]:
data_final= data_final[the_cols_final]

In [None]:
data_final.head(10)

In [None]:
data_final.to_excel("Data_Cleaned_Final.xlsx", sheet_name="ALL")

# _Notes_
#### **Missing values**:
###### - 999: valid missing due to program (i.e. respondent is only answering questions for EOPS/CARE, so would not answer questions for CalWORKs)
###### - NaN: true missing (i.e. respondent should have responded to this question but did not, which means subsequent answers to linked questions may also be missing)

#### **Next steps**:
###### - Program valid missings in carry-forward questions pertaining to services respondents indicate their colleges does NOT offer (STATA)
###### - Separate out EOPS/CARE, CalWORKs into separate data files for analysis
###### - Manually go through each question of colleges that have multiple respondents for same program