# College Scorecard Analysis

Last updated: April 26, 2024

## Load data

I downloaded the most recent intitution-level and field level [data](https://collegescorecard.ed.gov/data/), unzipped and saved it on my location machine as a CSV. The data was last updated on October 10, 2023. Full descriptions of variables are in the [technical documentation](https://collegescorecard.ed.gov/assets/InstitutionDataDocumentation.pdf). The most recent year is 2021-22.

In [182]:
# load packages
import pandas as pd
import numpy as np

In [2]:
# institution level data
df = pd.read_csv('data/data/Most-Recent-Cohorts-Institution.csv')

  df = pd.read_csv('data/data/Most-Recent-Cohorts-Institution.csv')


Create dataframes for different combinations of schools I want to compare. The variable <b>CONTROL</b> is from IPEDS and code 3 means for-profits.

In [5]:
# filter for just Illinois for profits
ilfp = df[(df['CONTROL'] == 3) & (df['ST_FIPS'] == 17)].copy()

In [6]:
len(ilfp)

96

In [7]:
# filter for all Illinois schools
il = df[df['ST_FIPS'] == 17].copy()

The field level data does not include school state, which I need to merge from the institution-level set to create a dataframe of just Illinois schools. There are a few schools in the institution-level data that are not in the field-level data are the following:<br>
- First Institute of Travel Inc.
- Bexley Hall Seabury Western Theological Seminary Federation Inc.
- Zen Shiatsu Chicago
- Larry's Barber College-Joliet (Larry's Barber College is in the field-level data)
- Flashpoint Chicago A Campus of Columbia College Hollywood
- Triton College - Intl Union of Operating Engr Local 399 Trning Fac.
- University of Notre Dame -
- Columbia College Crystal Lake, Lake County, Freeport, Elgin (Columbia College is in the field-level data)
- Stellar Career College - Chicago IL
- Rasmussen University - Aurora, Romeoville, Tinley Park (Rasmussen is in)
- Networks Barber College
- Relay Graduate School of Education - Chicago

In [None]:
# full field level data
ff = pd.read_csv('data/data/Most-Recent-Cohorts-Field-of-Study.csv')

In [92]:
ilff = pd.merge(ff,il[['UNITID']],on='UNITID',how='right')

## Percent of revenue spent teaching each student

I would like to model the analysis after what the Student Borrower Protection Center did in their 2021 [report](https://protectborrowers.org/for-profit-mapping/). <br>

I divided <b>TUITFTE</b> is the net tution revenue per full-time equivalent student by <b>INEXPFTE</b> which is the instructional expenditures per full-time equivalent student. Both include undergrad and grad students and are from the IPEDS Finance component and FTE enrollment is included in the IPEDS 12-Month Enrollment component. <br>

<font color='red'>Check with expert:</font> Which variables should I be using to calculate this? In IPEDS under Finance, for for-profits is dividing <b>Total revenues and investment return</b> by <b> Instruction - Total amount</b> better? The percentages really range and some are above 100% which makes this tricky.

In [10]:
ilfp['pct_rev_instruction'] = ilfp['TUITFTE']/ilfp['INEXPFTE']

In [112]:
ilfp[['INSTNM', 'TUITFTE','INEXPFTE', 'pct_rev_instruction']].sort_values('pct_rev_instruction')

Unnamed: 0,INSTNM,TUITFTE,INEXPFTE,pct_rev_instruction
852,Cannella School of Hair Design-Chicago,3625.0,5018.0,0.722399
853,Cannella School of Hair Design-Chicago,4086.0,4646.0,0.879466
851,Cannella School of Hair Design-Villa Park,10943.0,11284.0,0.969780
957,Professional's Choice Hair Design Academy,8780.0,8914.0,0.984967
5344,Empire Beauty School-Stone Park,8770.0,8628.0,1.016458
...,...,...,...,...
6413,Rasmussen University-Aurora,,,
6414,Rasmussen University-Romeoville/Joliet,,,
6415,Rasmussen University-Mokena/Tinley Park,,,
6489,Networks Barber College,,,


## Cost

Two different variables are used for cost (under the dev-category, cost): <br>
1. Average Cost of Attendance, Tuition and Fees: this is the average annual total cost of attendence, including tuition and fees, books and supplies, and living expenses for all full-time, first-time, degree/certificate-seeking undergraduates who receive Title IV aid. I'ts calculated from IPEDS. (COSTT4_A, COSTT4_P)
2. Average Net Price: this is the total annual average cost of attendence minus the average grant/scholarship aid given. (NPT4_PRIV, NPT4_PUB)

Average cost is split into program and academic year so I combined the two into a new variable, <b>combined_cost</b> Average net price is split into public and private, so I combined them into a new variable called <b>combined_price</b>.

I calculated a weighted average for each institution type, weighted by <b>UG12MN</b> which is the unduplicated count of undergraduate students enrolled during a 12 month period.

In [160]:
# combine columns

# fill NaN values in column 'COSTT4_P' with values from column 'COSTT4_A' if 'COSTT4_P' is NaN
il['combined_cost'] = il.apply(lambda row: row['COSTT4_P'] if pd.isna(row['COSTT4_A']) else row['COSTT4_A'], axis=1)
df['combined_cost'] = df.apply(lambda row: row['COSTT4_P'] if pd.isna(row['COSTT4_A']) else row['COSTT4_A'], axis=1)

# same for net price
il['combined_price'] = il.apply(lambda row: row['NPT4_PRIV'] if pd.isna(row['NPT4_PUB']) else row['NPT4_PUB'], axis=1)
df['combined_price'] = df.apply(lambda row: row['NPT4_PRIV'] if pd.isna(row['NPT4_PUB']) else row['NPT4_PUB'], axis=1)

In [158]:
# inspect data
il[['INSTNM','CONTROL','UGDS','G12MN','UG12MN','NPT4_PRIV','NPT4_PROG','NPT4_PUB','combined_price','COSTT4_A','COSTT4_P','combined_cost']].sort_values('NPT4_PUB')

Unnamed: 0,INSTNM,CONTROL,UGDS,G12MN,UG12MN,NPT4_PRIV,NPT4_PROG,NPT4_PUB,combined_price,COSTT4_A,COSTT4_P,combined_cost
939,Moraine Valley Community College,1,7443.0,,17693.0,,,1610.0,1610.0,10929.0,,10929.0
868,City Colleges of Chicago-Wilbur Wright College,1,4537.0,,11885.0,,,3151.0,3151.0,9412.0,,9412.0
855,Carl Sandburg College,1,1108.0,,2443.0,,,3232.0,3232.0,11190.0,,11190.0
866,City Colleges of Chicago-Richard J Daley College,1,2203.0,,8361.0,,,3552.0,3552.0,9418.0,,9418.0
878,Elgin Community College,1,6347.0,,11788.0,,,3628.0,3628.0,8745.0,,8745.0
...,...,...,...,...,...,...,...,...,...,...,...,...
6414,Rasmussen University-Romeoville/Joliet,3,,,,,,,,,,
6415,Rasmussen University-Mokena/Tinley Park,3,,,,,,,,,,
6471,Relay Graduate School of Education - Chicago,2,,,,,,,,,,
6489,Networks Barber College,3,,,,,,,,,,


In [159]:
def weighted_average_cost(group):
    # calculate the weighted average using the formula:
    # sum(value * weight) / sum(weight)
    weighted_sum = (group['combined_cost'] * group['UG12MN']).sum()
    total_weight = group['UG12MN'].sum()
    return weighted_sum / total_weight

def weighted_average_price(group):
    # calculate the weighted average using the formula:
    # sum(value * weight) / sum(weight)
    weighted_sum = (group['combined_price'] * group['UG12MN']).sum()
    total_weight = group['UG12MN'].sum()
    return weighted_sum / total_weight

The average sticker price for one year of attending a for-profit in Illinois is about $30k. But the data shows the actual price of attending is roughly the same as the sticker price for for-profits, about $23k while the price of a private non-profit gets cut in half, even though their sticker price is higher. The price of a for-profit is 3x that of a public school in Illinois, on average.

In [163]:
# average cost for il schools 
# 1 = public, 2 = private non-profit, 3 = private for-profit
il.groupby('CONTROL').apply(weighted_average_cost).reset_index(name='weighted_avg_cost')

Unnamed: 0,CONTROL,weighted_avg_cost
0,1,15622.116757
1,2,51383.011963
2,3,29845.993446


In [164]:
# average price for il schools
il.groupby('CONTROL').apply(weighted_average_price).reset_index(name='weighted_avg_price')

Unnamed: 0,CONTROL,weighted_avg_price
0,1,7946.992349
1,2,23077.501405
2,3,23067.973687


In [165]:
# average cost for us schools
df.groupby('CONTROL').apply(weighted_average_cost).reset_index(name='weighted_avg_cost')

Unnamed: 0,CONTROL,weighted_avg_cost
0,1,17667.304001
1,2,44457.576189
2,3,25285.161952


In [166]:
# average pirce for us schools
df.groupby('CONTROL').apply(weighted_average_price).reset_index(name='weighted_avg_cost')

Unnamed: 0,CONTROL,weighted_avg_cost
0,1,10070.134924
1,2,22702.072365
2,3,19825.508062


## Median debt

Cumulative median student debt represent the sum of all undergraduate federal loans over students’ college education at the institution. An individual borrower's debt could be in multiple instituion's median debt calculations. Overall median debt is DEBT_N, it's also split out by gender, first-gen, graduated/withrdraw, and income levels.<br>

<font color='red'>Check with expert:</font> The median debt levels in College Scorecard seem low relative the the average cumulative student loan debt reported in this [TICAS study](https://ticas.org/wp-content/uploads/2023/12/Quick-Facts-About-Student-Loan-Debt-2023.pdf). This is definitely an undercount because it's just federal loans for undergrad. 

In [None]:
# inspect relevent variables
il[['INSTNM','DEBT_N','DEBT_MDN','FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].head()

Unnamed: 0,INSTNM,DEBT_N,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
837,Adler University,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed
838,American Academy of Art College,119,27000,27000,26931,26982,27000,27000,9500,28338,27000,19000
839,American Islamic College,,,,,,,,,,,
840,School of the Art Institute of Chicago,1055,19000,19500,18500,20701,18495,27000,11000,21363,19500,18456
841,Augustana College,1126,26000,26000,26000,26000,26000,27000,6625,25000,25280,26000


In [208]:
# create new debt dataframes so i can replace 'PrivacySuppressed'
il_debt = il[['INSTNM','CONTROL','DEBT_N','DEBT_MDN','FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].copy()
df_debt = df[['INSTNM','CONTROL','DEBT_N','DEBT_MDN','FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].copy()

il_debt = il_debt.replace('PrivacySuppressed', np.nan)
df_debt = df_debt.replace('PrivacySuppressed', np.nan)

# convert to floats which work with np.nan
il_debt[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']] = il_debt[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].astype(float)
df_debt[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']] = df_debt[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].astype(float)

Overall there does not appear to be surprising differences in school type across the different median debts.

In [None]:
# group by for il
il_debt.groupby('CONTROL')[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].mean()

Unnamed: 0_level_0,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,7628.298246,7988.471698,7242.660377,7984.333333,7689.313725,10815.472727,5689.090909,7949.321429,7696.734694,7876.571429
2,16534.878788,17779.036364,16070.309091,17024.35,16939.066667,22126.0,8449.0,17068.112903,17187.649123,16736.140351
3,8993.917647,9842.837838,10093.864865,9471.509434,9434.962264,10966.901235,4970.053333,9122.768116,9498.647059,8353.098039


In [209]:
# group by for us
df_debt.groupby('CONTROL')[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].mean()

Unnamed: 0_level_0,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,9932.099769,10427.933709,9503.49531,10270.524516,9983.832903,14539.668522,6918.532268,10307.945718,10060.746239,9736.595535
2,15828.529292,16601.890308,15263.160781,16195.860119,16184.795387,22212.307143,8443.658501,15669.332607,16552.048632,16245.017557
3,9229.516231,10074.509677,10002.898925,9707.955398,10155.577246,12649.894632,5683.637487,9436.672441,10397.955428,9872.008671


In [211]:
df_debt.sort_values('DEBT_MDN', ascending=False).dropna().head()

Unnamed: 0,INSTNM,CONTROL,DEBT_N,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
3729,Platt College-Aurora,3,173,33443.0,34192.0,31683.0,34192.0,31424.0,42125.0,9928.0,39742.0,34096.0,27160.0
4258,American University of Health Sciences,3,221,32484.0,34666.0,27500.0,33641.0,29834.0,40326.0,16333.0,37467.0,31036.0,27166.0
4659,Gnomon,3,129,28332.0,28332.0,28332.0,28332.0,28332.0,28332.0,20000.0,28332.0,28144.0,27000.0
2924,Providence College,2,1469,27000.0,27000.0,27000.0,27000.0,27000.0,27000.0,14830.0,25943.0,27000.0,27000.0
2868,Saint Francis University,2,945,27000.0,27000.0,26000.0,26205.0,27000.0,27000.0,10610.0,20750.0,27000.0,27000.0


Median debt is lower at for-profits than some public schools and many private non-profits in Illinois, but that is maybe because the length of study is shorter. <br>
<br>
<font color='red'>TODO:</font> see if I can track down length of program in IPEDS and merge.

In [219]:
il_debt[il_debt['CONTROL'] == 3].sort_values('DEBT_MDN', ascending=False).dropna().head()

Unnamed: 0,INSTNM,CONTROL,DEBT_N,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
838,American Academy of Art College,3,119,27000.0,27000.0,26931.0,26982.0,27000.0,27000.0,9500.0,28338.0,27000.0,19000.0
4783,Chamberlain University-Illinois,3,25881,16458.0,16411.0,16657.0,16405.0,16594.0,20919.0,10922.0,16577.0,15795.0,17250.0
882,Fox College,3,615,13625.0,13625.0,15363.0,13623.0,15791.0,16209.0,4504.0,16000.0,12825.0,13623.0
5931,Stautzenberger College-Rockford Career College,3,2436,13127.0,13292.0,11839.0,13000.0,13367.0,14302.0,7125.0,13160.0,13000.0,12076.0
6415,Rasmussen University-Mokena/Tinley Park,3,25107,13000.0,13000.0,12500.0,12834.0,13845.0,20899.0,6334.0,12500.0,14203.0,13586.0


In [221]:
il_debt[il_debt['CONTROL'] == 1].sort_values('DEBT_MDN', ascending=False).dropna().head()

Unnamed: 0,INSTNM,CONTROL,DEBT_N,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
859,Chicago State University,1,1689,21500.0,22045.0,17500.0,21735.0,19250.0,30625.0,14250.0,23000.0,19000.0,15000.0
994,Western Illinois University,1,4090,19762.0,20998.0,18750.0,20500.0,18054.0,25251.0,12000.0,23496.0,20350.0,17500.0
981,Southern Illinois University-Carbondale,1,5185,17750.0,18113.0,17379.0,18500.0,15750.0,21543.0,10500.0,19750.0,17500.0,15000.0
897,University of Illinois Urbana-Champaign,1,10350,16500.0,16000.0,16871.0,15650.0,17500.0,19500.0,7500.0,13613.0,16985.0,17500.0
948,Northern Illinois University,1,7480,16250.0,17125.0,15000.0,16750.0,15000.0,22162.0,9500.0,18000.0,16000.0,15000.0


## Default and repayment

Using the three year cohort default rate, which is produced annually as an institutional accountability metric. The three-year cohort default rate (CDR3) represents a snapshot in time. For example, FY 2016 rates were calculated using the cohort of borrowers who entered repayment on their federal student loans between October 1, 2015 and September 30, 2016, and who defaulted before September 30, 2018. 

In [235]:
il[['INSTNM','CDR3','BBRR2_FED_UGCOMP_DFLT_SUPP','BBRR4_FED_UG_DFLT','BBRR4_FED_UG_MAKEPROG','BBRR3_FED_GRCOMP_DFLT']]

Unnamed: 0,INSTNM,CDR3,BBRR2_FED_UGCOMP_DFLT_SUPP,BBRR4_FED_UG_DFLT,BBRR4_FED_UG_MAKEPROG,BBRR3_FED_GRCOMP_DFLT
837,Adler University,0.002,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,<=0.05
838,American Academy of Art College,0.019,PrivacySuppressed,0.05-0.09,0.15-0.19,PrivacySuppressed
839,American Islamic College,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed
840,School of the Art Institute of Chicago,0.008,PrivacySuppressed,0.07-0.08,0.11-0.12,<=0.20
841,Augustana College,0.011,PrivacySuppressed,0.04,0.17,PrivacySuppressed
...,...,...,...,...,...,...
6414,Rasmussen University-Romeoville/Joliet,0.016,0.01831016986543,0.11,0.06,PrivacySuppressed
6415,Rasmussen University-Mokena/Tinley Park,0.016,0.01831016986543,0.11,0.06,PrivacySuppressed
6471,Relay Graduate School of Education - Chicago,0.023,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,<=0.10
6489,Networks Barber College,0.030,PrivacySuppressed,0.40-0.59,PrivacySuppressed,PrivacySuppressed


In [251]:
var_list = ['CDR3',
            'BBRR1_FED_UG_DFLT','BBRR1_FED_UG_DLNQ','BBRR1_FED_UG_MAKEPROG','BBRR1_FED_UG_NOPROG','BBRR1_FED_UG_PAIDINFULL',
            'BBRR2_FED_UG_DFLT','BBRR2_FED_UG_DLNQ','BBRR2_FED_UG_MAKEPROG','BBRR2_FED_UG_NOPROG','BBRR2_FED_UG_PAIDINFULL',
            'BBRR3_FED_UG_DFLT','BBRR3_FED_UG_DLNQ','BBRR3_FED_UG_MAKEPROG','BBRR3_FED_UG_NOPROG','BBRR3_FED_UG_PAIDINFULL',
            'BBRR4_FED_UG_DFLT','BBRR4_FED_UG_DLNQ','BBRR4_FED_UG_MAKEPROG','BBRR4_FED_UG_NOPROG','BBRR4_FED_UG_PAIDINFULL',
            'BBRR4_FED_GR_DFLT','BBRR4_FED_GR_DLNQ','BBRR4_FED_GR_MAKEPROG','BBRR4_FED_GR_NOPROG','BBRR4_FED_GR_PAIDINFULL']

Check for completeness of variables. Undergrad is more reliable across 1-4 years after entering repayment, but the repayment percentages are all over - some are ranges, other exact percentages. Grad has too many privacy suppressed. The 3-year cohort default rate <b>CDR3</b> is the most reliable and the only fully numeric one that I can do calculations on.

In [253]:
for var in var_list:
    privacy = len(ilfp[ilfp[var] == 'PrivacySuppressed'])
    nas = len(ilfp[ilfp[var].isnull()])
    print(var)
    print('total privacy/nulls: ', privacy+nas)
    print('pct incomplete: ',(privacy+nas)/len(ilfp))

CDR3
total privacy/nulls:  7
pct incomplete:  0.07291666666666667
BBRR1_FED_UG_DFLT
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_DLNQ
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_MAKEPROG
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_NOPROG
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_PAIDINFULL
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR2_FED_UG_DFLT
total privacy/nulls:  17
pct incomplete:  0.17708333333333334
BBRR2_FED_UG_DLNQ
total privacy/nulls:  21
pct incomplete:  0.21875
BBRR2_FED_UG_MAKEPROG
total privacy/nulls:  18
pct incomplete:  0.1875
BBRR2_FED_UG_NOPROG
total privacy/nulls:  20
pct incomplete:  0.20833333333333334
BBRR2_FED_UG_PAIDINFULL
total privacy/nulls:  21
pct incomplete:  0.21875
BBRR3_FED_UG_DFLT
total privacy/nulls:  18
pct incomplete:  0.1875
BBRR3_FED_UG_DLNQ
total privacy/nulls:  21
pct incomplete:  0.21875
BBRR3_FED_UG

In [255]:
test = il[['INSTNM', 'CONTROL'] + var_list].copy()
test.to_csv('test.csv')

Nationally, the average default rate across all Title IV eligible higher ed institutions is 3%. It's highest at for-profit schools, where the default rate is 3.7%. It's 2.1% at non-profits and 3.1% at public schools. Illinois schools roughly follow that trend. 

In [265]:
df['CDR3'].describe()

count    5673.000000
mean        0.030427
std         0.032918
min         0.000000
25%         0.011000
50%         0.023000
75%         0.040000
max         0.428000
Name: CDR3, dtype: float64

In [266]:
df.groupby('CONTROL')['CDR3'].mean()

CONTROL
1    0.031335
2    0.021488
3    0.036775
Name: CDR3, dtype: float64

In [267]:
il.groupby('CONTROL')['CDR3'].mean()

CONTROL
1    0.035140
2    0.018464
3    0.035011
Name: CDR3, dtype: float64

But some for-profit schools in Illinois, have default rates two to six times higher than the national average across all schools, and have among the highest default rates out of all schools in the country. Nearly all are cosmetology or barbering schools. 

In [276]:
# top 15 highest default rates amongst Illinois for-profits
ilfp[['INSTNM','CDR3','CIPTITLE1','CIPTITLE2','CIPTITLE3', 'G12MN','UG12MN','DEBT_MDN', 'MD_EARN_WNE_P10','NPT4_PRIV']].sort_values('CDR3', ascending=False).head(15)

Unnamed: 0,INSTNM,CDR3,CIPTITLE1,CIPTITLE2,CIPTITLE3,G12MN,UG12MN,DEBT_MDN,MD_EARN_WNE_P10,NPT4_PRIV
852,Cannella School of Hair Design-Chicago,0.184,Cosmetology/Cosmetologist General,,,,68.0,2533,20933.0,9225.0
5430,Creative Touch Cosmetology School,0.177,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,,,21.0,PrivacySuppressed,,14014.0
983,Taylor Business Institute,0.177,,,,,201.0,4554,24348.0,16555.0
986,Tri-County Beauty Academy,0.137,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,,,16.0,PrivacySuppressed,,12104.0
5217,Larry's Barber College,0.13,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,,133.0,3723,,13785.0
6024,Larry's Barber College-Joliet,0.13,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,,11.0,3723,,8276.0
5969,Larry's Barber College,0.13,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,,49.0,3723,,14064.0
3804,Hairmasters Institute of Cosmetology,0.09,Cosmetology/Cosmetologist General,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,139.0,7048,22942.0,8575.0
5204,Reflections Academy of Beauty,0.09,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,Cosmetology Barber/Styling and Nail Instructor,,46.0,9500,,11733.0
929,Steven Papageorge Hair Academy,0.068,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,,,81.0,PrivacySuppressed,17031.0,15374.0


Why are public schools default rates just as high as for-profits?

In [271]:
il[il['CONTROL']==1][['INSTNM','CDR3','CIPTITLE1','CIPTITLE2','CIPTITLE3', 'DEBT_MDN', 'MD_EARN_WNE_P10','NPT4_PUB']].sort_values('CDR3', ascending=False).head()

Unnamed: 0,INSTNM,CDR3,CIPTITLE1,CIPTITLE2,CIPTITLE3,DEBT_MDN,MD_EARN_WNE_P10,NPT4_PUB
911,Kaskaskia College,0.1,,,,3500,34989.0,6064.0
906,John A Logan College,0.086,,,,3500,31782.0,5616.0
959,Rend Lake College,0.081,,,,4475,35169.0,8227.0
978,Spoon River College,0.075,,,,6750,36442.0,5552.0
912,Kishwaukee College,0.07,,,,6542,40064.0,5472.0


For context, here are some of the schools with the lowest default rates (non-zero) in Illinois. Chamberlain appears to be an exception to for-profit trends.

In [282]:
il[il['CDR3'] > 0.0][['INSTNM','CDR3','CIPTITLE1','CIPTITLE2','CIPTITLE3', 'DEBT_MDN', 'MD_EARN_WNE_P10','NPT4_PUB','combined_cost','combined_price']].sort_values('CDR3', ascending=True).head(10)

Unnamed: 0,INSTNM,CDR3,CIPTITLE1,CIPTITLE2,CIPTITLE3,DEBT_MDN,MD_EARN_WNE_P10,NPT4_PUB,combined_cost,combined_price
893,Rosalind Franklin University of Medicine and S...,0.001,,,,PrivacySuppressed,,,,
949,Northwestern University,0.001,,,,14000,85796.0,,81058.0,28230.0
837,Adler University,0.002,,,,PrivacySuppressed,,,,
879,Elmhurst University,0.002,,,,15000,58657.0,,50133.0,23036.0
6149,University of Notre Dame -,0.002,,,,19000,93220.0,,,
935,Methodist College,0.004,,,,27000,66066.0,,38345.0,28547.0
993,Oak Point University,0.005,,,,25000,84630.0,,,
943,National University of Health Sciences,0.005,,,,12289,46611.0,,,
4783,Chamberlain University-Illinois,0.005,,,,16458,82055.0,,32437.0,23638.0
861,University of Chicago,0.005,,,,13368,78439.0,,81531.0,22690.0


In [285]:
# create table for viz
dflt_viz = il[['INSTNM','CONTROL','CDR3','CIPTITLE1','G12MN','UG12MN','DEBT_MDN', 'MD_EARN_WNE_P6','combined_price','combined_cost']].sort_values('CDR3', ascending=False).copy()

In [286]:
dflt_viz.to_csv('output/default_viz.csv', index=False)

## Earnings

Collge Scorecard reports mean and median earnings for students 6, 8 and 10 years after entry. This is for students who are not enrolled and working at the date of measurement. Earnings are based on wages and deferred compensation reports via the IRS form W-2 and Schedule SE. Earnings are based on 2020 earnings and may be impacted by the pandemic. <br>

From College Scorecard: " 2008-09 and 2009-10 pooled award year cohort measured in calendar year 2019 and 2020. Earnings are inflation adjusted to 2021 dollars." <br>

I calculated the average of median earnings across schools by <b>CONTROL</b>.<br>

These are the median earnings figures reported in each school's profile on College Scorecard and are what [Bloomberg](https://www.bloomberg.com/graphics/2024-college-return-on-investment/) used for a story that cited data provided by the Georgetown University's Center on Education and the Workforce.

In [83]:
# inspect median and mean data
il[['INSTNM','CONTROL','MD_EARN_WNE_P10','MN_EARN_WNE_P10','MD_EARN_WNE_P6','MD_EARN_WNE_P8']].dropna().sort_values('MD_EARN_WNE_P10', ascending=False)

Unnamed: 0,INSTNM,CONTROL,MD_EARN_WNE_P10,MN_EARN_WNE_P10,MD_EARN_WNE_P6,MD_EARN_WNE_P8
6149,University of Notre Dame -,2,93220.0,98400,84235.0,87008.0
949,Northwestern University,2,85796.0,93400,72370.0,78939.0
965,Rush University,2,84906.0,125000,70482.0,70868.0
993,Oak Point University,2,84630.0,72900,76137.0,76457.0
902,Illinois Institute of Technology,2,82793.0,78000,68517.0,73894.0
...,...,...,...,...,...,...
4815,Tricoci University of Beauty Culture-Chicago NW,3,20011.0,PrivacySuppressed,25520.0,25923.0
3731,Educators of Beauty College of Cosmetology-Roc...,3,18827.0,19800,21896.0,19726.0
980,Educators of Beauty College of Cosmetology-Ste...,3,18827.0,19800,21896.0,19726.0
872,Tricoci University of Beauty Culture,3,18311.0,22200,15573.0,16339.0


The average time it takes to graduate is 5 years. At 6, 8, and 10 years after entering the school, median earnings for Illinoisians who attended for-profits is consistently lower than their counterparts who went to public and private nonprofit schools. The difference is roughly ~$20k for private nonprofits. 

In [33]:
il.groupby('CONTROL')[['MD_EARN_WNE_P6','MD_EARN_WNE_P8','MD_EARN_WNE_P10']].mean()

Unnamed: 0_level_0,MD_EARN_WNE_P6,MD_EARN_WNE_P8,MD_EARN_WNE_P10
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,34585.716667,37910.233333,40839.233333
2,46890.909091,51234.19697,55169.333333
3,26399.52439,28702.408451,30639.063492


This trend is reflected nationally as well.

In [34]:
df.groupby('CONTROL')[['MD_EARN_WNE_P6','MD_EARN_WNE_P8','MD_EARN_WNE_P10']].mean()

Unnamed: 0_level_0,MD_EARN_WNE_P6,MD_EARN_WNE_P8,MD_EARN_WNE_P10
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,36597.315234,39669.291561,42691.477261
2,43251.767458,47349.144309,51338.654483
3,27615.410573,29091.836176,30968.139363


About one third of Illinois for-profit schools do we not have median earnings data for 10 and 8 years after enrolling. The data is a little more complete for 6 years after enrolling so that's the data I'll show by school.

In [71]:
il[il['MD_EARN_WNE_P10'].isnull()].groupby('CONTROL').size()

CONTROL
1     3
2    24
3    33
dtype: int64

In [79]:
il[il['MD_EARN_WNE_P6'].isnull()].groupby('CONTROL').size()

CONTROL
1     3
2    24
3    14
dtype: int64

In [81]:
il[il['MD_EARN_WNE_P8'].isnull()].groupby('CONTROL').size()

CONTROL
1     3
2    24
3    25
dtype: int64

In [73]:
len(ilfp)

96

The for-profit schools with the highest median incomes 6 years after enrolling appear to be mostly schools that award degrees in health professions and related programs. <b>PCIP51</b> is the percentage of degrees in the health professions. While those with the lowest earnings award 100% of their degrees in personal and culinary services <b>PCIP12</b>.

In [105]:
# top 10 highest earning schools
ilfp[['INSTNM','CONTROL','MD_EARN_WNE_P10','MN_EARN_WNE_P10','MD_EARN_WNE_P6','MD_EARN_WNE_P8','PCIP51','PCIP12']].sort_values('MD_EARN_WNE_P6', ascending=False).head(10)

Unnamed: 0,INSTNM,CONTROL,MD_EARN_WNE_P10,MN_EARN_WNE_P10,MD_EARN_WNE_P6,MD_EARN_WNE_P8,PCIP51,PCIP12
4783,Chamberlain University-Illinois,3,82055.0,60400,76330.0,81287.0,1.0,0.0
5306,Verve College,3,,,57941.0,,1.0,0.0
3834,Worsham College of Mortuary Science,3,53900.0,48100,56713.0,53766.0,0.0,1.0
4927,Ambria College of Nursing,3,67488.0,PrivacySuppressed,50066.0,59625.0,1.0,0.0
3786,Universal Technical Institute of Illinois Inc,3,51889.0,48100,46572.0,47765.0,0.0,0.0
3666,ETI School of Skilled Trades,3,44410.0,44900,43814.0,48865.0,0.0,0.0
5371,DeVry University-Illinois,3,45217.0,50300,38520.0,40307.0,0.5015,0.0
4339,MDT College of Health Sciences,3,46801.0,36100,38110.0,38502.0,1.0,0.0
6415,Rasmussen University-Mokena/Tinley Park,3,37168.0,34300,35866.0,34181.0,,
6414,Rasmussen University-Romeoville/Joliet,3,37168.0,34300,35866.0,34181.0,,


In [106]:
# top 10 lowest earning schools
ilfp[['INSTNM','CONTROL','MD_EARN_WNE_P10','MN_EARN_WNE_P10','MD_EARN_WNE_P6','MD_EARN_WNE_P8','PCIP51','PCIP12']].sort_values('MD_EARN_WNE_P6', ascending=True).head(10)

Unnamed: 0,INSTNM,CONTROL,MD_EARN_WNE_P10,MN_EARN_WNE_P10,MD_EARN_WNE_P6,MD_EARN_WNE_P8,PCIP51,PCIP12
852,Cannella School of Hair Design-Chicago,3,20933.0,18000,15448.0,15695.0,0.0,1.0
4930,Innovations Design Academy,3,,,15515.0,,0.0,1.0
872,Tricoci University of Beauty Culture,3,18311.0,22200,15573.0,16339.0,0.0,1.0
3833,Rosel School of Cosmetology,3,15858.0,18700,16379.0,20898.0,0.0,1.0
983,Taylor Business Institute,3,24348.0,25200,16625.0,24254.0,0.1928,0.0
5969,Larry's Barber College,3,,,17652.0,,0.0,1.0
5217,Larry's Barber College,3,,,17652.0,,0.0,1.0
6024,Larry's Barber College-Joliet,3,,,17652.0,,0.0,1.0
986,Tri-County Beauty Academy,3,,PrivacySuppressed,17970.0,18285.0,0.0,1.0
3768,Bell Mar Beauty College,3,26062.0,21500,18169.0,24914.0,0.0,1.0


<font color='red'>TO DO: </font> Look at the earnings by program. For-profits may have lower median earnings because the professions they serve are lower-earning professions, but are the same programs at public/nonprofits higher earning than they are at for-profits? What is the program-level breakdown in for-profits, and are there some programs, like cosmo and nursing, that are predominately served by only for-profit schools?

In [107]:
ilff[['INSTNM','CIPDESC','CREDDESC','CONTROL','EARN_MDN_HI_1YR','EARN_COUNT_NWNE_HI_1YR']]

Unnamed: 0,INSTNM,CIPDESC,CREDDESC,CONTROL,EARN_MDN_HI_1YR,EARN_COUNT_NWNE_HI_1YR
0,Adler University,Communication and Media Studies.,Master's Degree,"Private, nonprofit",PrivacySuppressed,0
1,Adler University,Gerontology.,Master's Degree,"Private, nonprofit",PrivacySuppressed,PrivacySuppressed
2,Adler University,Health and Physical Education/Fitness.,Master's Degree,"Private, nonprofit",,
3,Adler University,"Psychology, General.",Master's Degree,"Private, nonprofit",37452,4
4,Adler University,"Clinical, Counseling and Applied Psychology.",Master's Degree,"Private, nonprofit",40358,6
...,...,...,...,...,...,...
8475,,,,,,
8476,,,,,,
8477,,,,,,
8478,,,,,,


## Drop out and completion rates