# College Scorecard Analysis

Last updated: May 21, 2024

## Load data

I downloaded the most recent intitution-level and field level [data](https://collegescorecard.ed.gov/data/), unzipped and saved it on my location machine as a CSV. The data was last updated on October 10, 2023. Full descriptions of variables are in the [technical documentation](https://collegescorecard.ed.gov/assets/InstitutionDataDocumentation.pdf). The most recent year is 2021-22.

In [66]:
# load packages
import pandas as pd
import numpy as np

In [2]:
# institution level data
df = pd.read_csv('data/data/Most-Recent-Cohorts-Institution.csv')

  df = pd.read_csv('data/data/Most-Recent-Cohorts-Institution.csv')


Create dataframes for different combinations of schools I want to compare. The variable <b>CONTROL</b> is from IPEDS and code 3 means for-profits.

In [3]:
# filter for just Illinois for profits
ilfp = df[(df['CONTROL'] == 3) & (df['ST_FIPS'] == 17)].copy()

In [4]:
len(ilfp)

96

In [10]:
ilfp.to_csv('ilfp.csv')

In [5]:
# filter for all Illinois schools
il = df[df['ST_FIPS'] == 17].copy()

The field level data does not include school state, which I need to merge from the institution-level set to create a dataframe of just Illinois schools. There are a few schools in the institution-level data that are not in the field-level data are the following:<br>
- First Institute of Travel Inc.
- Bexley Hall Seabury Western Theological Seminary Federation Inc.
- Zen Shiatsu Chicago
- Larry's Barber College-Joliet (Larry's Barber College is in the field-level data)
- Flashpoint Chicago A Campus of Columbia College Hollywood
- Triton College - Intl Union of Operating Engr Local 399 Trning Fac.
- University of Notre Dame -
- Columbia College Crystal Lake, Lake County, Freeport, Elgin (Columbia College is in the field-level data)
- Stellar Career College - Chicago IL
- Rasmussen University - Aurora, Romeoville, Tinley Park (Rasmussen is in)
- Networks Barber College
- Relay Graduate School of Education - Chicago

In [6]:
# full field level data
ff = pd.read_csv('data/data/Most-Recent-Cohorts-Field-of-Study.csv')

In [7]:
ilff = pd.merge(ff,il[['UNITID']],on='UNITID',how='right')

## Percent of revenue spent teaching each student

I would like to model the analysis after what the Student Borrower Protection Center did in their 2021 [report](https://protectborrowers.org/for-profit-mapping/). <br>

I divided <b>TUITFTE</b> is the net tution revenue per full-time equivalent student by <b>INEXPFTE</b> which is the instructional expenditures per full-time equivalent student. Both include undergrad and grad students and are from the IPEDS Finance component and FTE enrollment is included in the IPEDS 12-Month Enrollment component. <br>

<font color='red'>Check with expert:</font> Which variables should I be using to calculate this? In IPEDS under Finance, for for-profits is dividing <b>Total revenues and investment return</b> by <b> Instruction - Total amount</b> better? The percentages really range and some are above 100% which makes this tricky.

<font color='red'>TODO:</font> Look at instructional expenditure per FTE only

In [None]:
ilfp[[]]

In [281]:
ilfp['pct_rev_instruction'] = ilfp['TUITFTE']/ilfp['INEXPFTE']

In [282]:
il[['INSTNM', 'TUITFTE','INEXPFTE']].sort_values('INEXPFTE').tail(30)

Unnamed: 0,INSTNM,TUITFTE,INEXPFTE
974,University of Saint Mary of the Lake,31877.0,24459.0
847,Blessing Rieman College of Nursing and Health ...,16788.0,25761.0
4475,Bexley Hall Seabury Western Theological Semina...,11611.0,27040.0
859,Chicago State University,6719.0,29476.0
856,Catholic Theological Union at Chicago,19808.0,29916.0
894,University of Illinois Chicago,11753.0,30679.0
886,Graham Hospital School of Nursing,7672.0,35787.0
927,Lutheran School of Theology at Chicago,7361.0,37020.0
956,Principia College,8606.0,37206.0
949,Northwestern University,31160.0,37609.0


## Cost

Two different variables are used for cost (under the dev-category, cost): <br>
1. Average Cost of Attendance, Tuition and Fees: this is the average annual total cost of attendence, including tuition and fees, books and supplies, and living expenses for all full-time, first-time, degree/certificate-seeking undergraduates who receive Title IV aid. I'ts calculated from IPEDS. (COSTT4_A, COSTT4_P)
2. Average Net Price: this is the total annual average cost of attendence minus the average grant/scholarship aid given. (NPT4_PRIV, NPT4_PUB)

Average cost is split into program and academic year so I combined the two into a new variable, <b>combined_cost</b> Average net price is split into public and private, so I combined them into a new variable called <b>combined_price</b>.

I calculated a weighted average for each institution type, weighted by <b>UG12MN</b> which is the unduplicated count of undergraduate students enrolled during a 12 month period.

In [8]:
# combine columns

# fill NaN values in column 'COSTT4_P' with values from column 'COSTT4_A' if 'COSTT4_P' is NaN
il['combined_cost'] = il.apply(lambda row: row['COSTT4_P'] if pd.isna(row['COSTT4_A']) else row['COSTT4_A'], axis=1)
df['combined_cost'] = df.apply(lambda row: row['COSTT4_P'] if pd.isna(row['COSTT4_A']) else row['COSTT4_A'], axis=1)

# same for net price
il['combined_price'] = il.apply(lambda row: row['NPT4_PRIV'] if pd.isna(row['NPT4_PUB']) else row['NPT4_PUB'], axis=1)
df['combined_price'] = df.apply(lambda row: row['NPT4_PRIV'] if pd.isna(row['NPT4_PUB']) else row['NPT4_PUB'], axis=1)

In [12]:
# inspect data
il[['INSTNM','CONTROL','UGDS','G12MN','UG12MN','NPT4_PRIV','NPT4_PROG','NPT4_PUB','combined_price','COSTT4_A','COSTT4_P','combined_cost']].sort_values('UG12MN', ascending=False)

Unnamed: 0,INSTNM,CONTROL,UGDS,G12MN,UG12MN,NPT4_PRIV,NPT4_PROG,NPT4_PUB,combined_price,COSTT4_A,COSTT4_P,combined_cost
875,College of DuPage,1,14801.0,,36245.0,,,5519.0,5519.0,13150.0,,13150.0
897,University of Illinois Urbana-Champaign,1,33889.0,23759.0,35932.0,,,15483.0,15483.0,31102.0,,31102.0
5371,DeVry University-Illinois,3,19729.0,5320.0,27424.0,28883.0,,,28883.0,35990.0,,35990.0
4783,Chamberlain University-Illinois,3,13101.0,20685.0,26294.0,23638.0,,,23638.0,32437.0,,32437.0
894,University of Illinois Chicago,1,22011.0,13067.0,23643.0,,,11329.0,11329.0,24382.0,,24382.0
...,...,...,...,...,...,...,...,...,...,...,...,...
6414,Rasmussen University-Romeoville/Joliet,3,,,,,,,,,,
6415,Rasmussen University-Mokena/Tinley Park,3,,,,,,,,,,
6471,Relay Graduate School of Education - Chicago,2,,,,,,,,,,
6489,Networks Barber College,3,,,,,,,,,,


In [159]:
def weighted_average_cost(group):
    # calculate the weighted average using the formula:
    # sum(value * weight) / sum(weight)
    weighted_sum = (group['combined_cost'] * group['UG12MN']).sum()
    total_weight = group['UG12MN'].sum()
    return weighted_sum / total_weight

def weighted_average_price(group):
    # calculate the weighted average using the formula:
    # sum(value * weight) / sum(weight)
    weighted_sum = (group['combined_price'] * group['UG12MN']).sum()
    total_weight = group['UG12MN'].sum()
    return weighted_sum / total_weight

The average sticker price for one year of attending a for-profit in Illinois is about $30k. But the data shows the actual price of attending is roughly the same as the sticker price for for-profits, about $23k while the price of a private non-profit gets cut in half, even though their sticker price is higher. The price of a for-profit is 3x that of a public school in Illinois, on average.

In [163]:
# average cost for il schools 
# 1 = public, 2 = private non-profit, 3 = private for-profit
il.groupby('CONTROL').apply(weighted_average_cost).reset_index(name='weighted_avg_cost')

Unnamed: 0,CONTROL,weighted_avg_cost
0,1,15622.116757
1,2,51383.011963
2,3,29845.993446


In [164]:
# average price for il schools
il.groupby('CONTROL').apply(weighted_average_price).reset_index(name='weighted_avg_price')

Unnamed: 0,CONTROL,weighted_avg_price
0,1,7946.992349
1,2,23077.501405
2,3,23067.973687


In [165]:
# average cost for us schools
df.groupby('CONTROL').apply(weighted_average_cost).reset_index(name='weighted_avg_cost')

Unnamed: 0,CONTROL,weighted_avg_cost
0,1,17667.304001
1,2,44457.576189
2,3,25285.161952


In [166]:
# average pirce for us schools
df.groupby('CONTROL').apply(weighted_average_price).reset_index(name='weighted_avg_cost')

Unnamed: 0,CONTROL,weighted_avg_cost
0,1,10070.134924
1,2,22702.072365
2,3,19825.508062


## Median debt

Cumulative median student debt represent the sum of all undergraduate federal loans over students’ college education at the institution. An individual borrower's debt could be in multiple instituion's median debt calculations. Overall median debt is DEBT_N, it's also split out by gender, first-gen, graduated/withrdraw, and income levels.<br>

The median debt levels in College Scorecard seem low relative the the average cumulative student loan debt reported in this [TICAS study](https://ticas.org/wp-content/uploads/2023/12/Quick-Facts-About-Student-Loan-Debt-2023.pdf). This is definitely an undercount because it's just federal loans for undergrad. The TICAS study uses a different data source that takes into account private loans, according to Peter Granville at TCF. The median debt of a graduate <b>GRAD_DEBT_MDN</b> weighted by the number of borrowers <b>GRAD_DEBT_N</b> is what I use for the aggregate debt by school type (CONTROL) Grad debt is better because it represents students who finished their degrees in their entirety. This includes all the federal loans a student takes out (minus Parent PLUS loans). <br>

I also split up debt levels by school type under the Carnegie classification into bachelors/doctorates/masters programs and everything else (many for-profit cosmo schools aren't classified through Carnegie).

In [None]:
# inspect relevent variables
il[['INSTNM','DEBT_N','DEBT_MDN','FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].head()

Unnamed: 0,INSTNM,DEBT_N,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
837,Adler University,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed
838,American Academy of Art College,119,27000,27000,26931,26982,27000,27000,9500,28338,27000,19000
839,American Islamic College,,,,,,,,,,,
840,School of the Art Institute of Chicago,1055,19000,19500,18500,20701,18495,27000,11000,21363,19500,18456
841,Augustana College,1126,26000,26000,26000,26000,26000,27000,6625,25000,25280,26000


In [232]:
# create new debt dataframes so i can replace 'PrivacySuppressed'
il_debt = il[['INSTNM','OPEID6','CONTROL','CCBASIC','CCUGPROF','CCSIZSET','PREDDEG','DEBT_N','GRAD_DEBT_N','DEBT_MDN','FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].copy()
df_debt = df[['INSTNM','OPEID6','CONTROL','CCBASIC','CCUGPROF','CCSIZSET','PREDDEG','DEBT_N','GRAD_DEBT_N','DEBT_MDN','FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].copy()

il_debt = il_debt.replace('PrivacySuppressed', np.nan)
df_debt = df_debt.replace('PrivacySuppressed', np.nan)

# convert to floats which work with np.nan
il_debt[['DEBT_N','GRAD_DEBT_N','DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']] = il_debt[['DEBT_N','GRAD_DEBT_N','DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].astype(float)
df_debt[['DEBT_N','GRAD_DEBT_N','DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']] = df_debt[['DEBT_N','GRAD_DEBT_N','DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].astype(float)

Data cleaning note for median debt from Peter Granville: One important thing to know is that student debt variables in the College Scorecard are organized at the level of the six-digit OPEID, but the rows represent one per eight-digit OPEID. Because of that, I always remove rows with duplicated OPEID6 values before running numbers on student debt, so that every unique OPEID6 value is found in only one row. 

In [233]:
# delete duplicate 6-digit OPEIDs which have the same debt medians/means and number of students
il_debt.drop_duplicates(subset=['OPEID6'], keep='first', inplace=True)

Overall there does not appear to be surprising differences in school type across the different median debts.

In [None]:
# group by for il
il_debt.groupby('CONTROL')[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].mean()

Unnamed: 0_level_0,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,7628.298246,7988.471698,7242.660377,7984.333333,7689.313725,10815.472727,5689.090909,7949.321429,7696.734694,7876.571429
2,16534.878788,17779.036364,16070.309091,17024.35,16939.066667,22126.0,8449.0,17068.112903,17187.649123,16736.140351
3,8993.917647,9842.837838,10093.864865,9471.509434,9434.962264,10966.901235,4970.053333,9122.768116,9498.647059,8353.098039


In [209]:
# group by for us
df_debt.groupby('CONTROL')[['DEBT_MDN', 'FEMALE_DEBT_MDN', 'MALE_DEBT_MDN','FIRSTGEN_DEBT_MDN','NOTFIRSTGEN_DEBT_MDN','GRAD_DEBT_MDN','WDRAW_DEBT_MDN','LO_INC_DEBT_MDN','MD_INC_DEBT_MDN','HI_INC_DEBT_MDN']].mean()

Unnamed: 0_level_0,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,9932.099769,10427.933709,9503.49531,10270.524516,9983.832903,14539.668522,6918.532268,10307.945718,10060.746239,9736.595535
2,15828.529292,16601.890308,15263.160781,16195.860119,16184.795387,22212.307143,8443.658501,15669.332607,16552.048632,16245.017557
3,9229.516231,10074.509677,10002.898925,9707.955398,10155.577246,12649.894632,5683.637487,9436.672441,10397.955428,9872.008671


In [211]:
# schools in the US with the highest median debt
df_debt.sort_values('DEBT_MDN', ascending=False).dropna().head()

Unnamed: 0,INSTNM,CONTROL,DEBT_N,DEBT_MDN,FEMALE_DEBT_MDN,MALE_DEBT_MDN,FIRSTGEN_DEBT_MDN,NOTFIRSTGEN_DEBT_MDN,GRAD_DEBT_MDN,WDRAW_DEBT_MDN,LO_INC_DEBT_MDN,MD_INC_DEBT_MDN,HI_INC_DEBT_MDN
3729,Platt College-Aurora,3,173,33443.0,34192.0,31683.0,34192.0,31424.0,42125.0,9928.0,39742.0,34096.0,27160.0
4258,American University of Health Sciences,3,221,32484.0,34666.0,27500.0,33641.0,29834.0,40326.0,16333.0,37467.0,31036.0,27166.0
4659,Gnomon,3,129,28332.0,28332.0,28332.0,28332.0,28332.0,28332.0,20000.0,28332.0,28144.0,27000.0
2924,Providence College,2,1469,27000.0,27000.0,27000.0,27000.0,27000.0,27000.0,14830.0,25943.0,27000.0,27000.0
2868,Saint Francis University,2,945,27000.0,27000.0,26000.0,26205.0,27000.0,27000.0,10610.0,20750.0,27000.0,27000.0


Median debt is lower at for-profits than some public schools and many private non-profits in Illinois, but that is maybe because the length of study is shorter. <br>
<br>
I need a way of classifying schools and calculating their debt. There are two ways I an do this, detailed below.

### carnegie classification method

The 2021 Carnegie classifications are used in College Scorecard. Everything 14 and above I'm classifying as Bachelors/Masters/Doctorate level institutions (which I assume are 4+ years) and everything else which is certificates, cosmetology degrees, associate's degrees, etc. 

Bachelors/Masters/Doctorate:
- 14: Baccalaureate/Associate's Colleges: Associate's Dominant
- 15: Doctoral Universities: Very High Research Activity
- 16: Doctoral Universities: High Research Activity
- 17: Doctoral/Professional Universities
- 18: Master's Colleges & Universities: Larger Programs
- 19: Master's Colleges & Universities: Medium Programs
- 20: Master's Colleges & Universities: Small Programs
- 21: Baccalaureate Colleges: Arts & Sciences Focus
- 22: Baccalaureate Colleges: Diverse Fields
- 23: Baccalaureate/Associate's Colleges: Mixed Baccalaureate/Associate's
- 24: Special Focus Four-Year: Faith-Related Institutions
- 25: Special Focus Four-Year: Medical Schools & Centers
- 26: Special Focus Four-Year: Other Health Professions Schools
- 27: Special Focus Four-Year: Research Institution
- 28: Special Focus Four-Year: Engineering and Other Technology-Related Schools
- 29: Special Focus Four-Year: Business & Management Schools
- 30: Special Focus Four-Year: Arts, Music & Design Schools
- 31: Special Focus Four-Year: Law Schools
- 32: Special Focus Four-Year: Other Special Focus Institutions

All others:
- -2: Not applicable
- 0: Not classified
- 1: Associate's Colleges: High Transfer-High Traditional
- 2: Associate's Colleges: High Transfer-Mixed Traditional/Nontraditional
- 3: Associate's Colleges: High Transfer-High Nontraditional
- 4: Associate's Colleges: Mixed Transfer/Career & Technical-High Traditional
- 5: Associate's Colleges: Mixed Transfer/Career & Technical-Mixed Traditional/Nontraditional
- 6: Associate's Colleges: Mixed Transfer/Career & Technical-High Nontraditional
- 7: Associate's Colleges: High Career & Technical-High Traditional
- 8: Associate's Colleges: High Career & Technical-Mixed Traditional/Nontraditional
- 9: Associate's Colleges: High Career & Technical-High Nontraditional
- 10: Special Focus Two-Year: Health Professions
- 11: Special Focus Two-Year: Technical Professions
- 12: Special Focus Two-Year: Arts & Design
- 13: Special Focus Two-Year: Other Fields

<font color='red'>TODO:</font> ask Peter if the carnegie classifciations are the right ones!

In [169]:
il_debt[il_debt['CCBASIC'] > 13][['INSTNM','CCBASIC','CCUGPROF','CCSIZSET','GRAD_DEBT_MDN','DEBT_MDN']]

Unnamed: 0,INSTNM,CCBASIC,CCUGPROF,CCSIZSET,GRAD_DEBT_MDN,DEBT_MDN
837,Adler University,26.0,0.0,18.0,,
838,American Academy of Art College,30.0,6.0,6.0,27000.0,27000.0
840,School of the Art Institute of Chicago,30.0,13.0,10.0,27000.0,19000.0
841,Augustana College,21.0,12.0,11.0,27000.0,26000.0
842,Aurora University,17.0,11.0,12.0,20318.0,15500.0
...,...,...,...,...,...,...
4525,Toyota Technological Institute at Chicago,28.0,0.0,18.0,,
4641,Rasmussen University-Illinois,14.0,5.0,9.0,20899.0,13000.0
4783,Chamberlain University-Illinois,26.0,5.0,15.0,20919.0,16458.0
4927,Ambria College of Nursing,26.0,5.0,6.0,15438.0,9500.0


In [161]:
def weighted_average_debt(group):
    # calculate the weighted average using the formula:
    # sum(value * weight) / sum(weight)
    weighted_sum = (group['DEBT_MDN'] * group['DEBT_N']).sum()
    total_weight = group['DEBT_N'].sum()
    return weighted_sum / total_weight

In [162]:
il_debt.groupby('CONTROL').apply(weighted_average_debt).reset_index(name='weighted_avg_debt')

Unnamed: 0,CONTROL,weighted_avg_debt
0,1,12141.376535
1,2,17413.615432
2,3,12561.544728


In [163]:
def weighted_average_grad_debt(group):
    # calculate the weighted average using the formula:
    # sum(value * weight) / sum(weight)
    weighted_sum = (group['GRAD_DEBT_MDN'] * group['GRAD_DEBT_N']).sum()
    total_weight = group['GRAD_DEBT_N'].sum()
    return weighted_sum / total_weight

In [173]:
# filter for just 4 year plus
il_debt[il_debt['CCBASIC']>=14].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,20214.162063
1,2,23042.337251
2,3,21737.891445


In [174]:
# filter for less than 4 years
il_debt[il_debt['CCBASIC'] < 14.0].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,8588.10101
1,2,16795.507692
2,3,10706.614889


In [140]:
il[il['INSTNM'].str.contains('Empire Beauty')][['INSTNM','OPEID','G12MN','UG12MN','DEBT_MDN','DEBT_N']].sort_values('OPEID')

Unnamed: 0,INSTNM,OPEID,G12MN,UG12MN,DEBT_MDN,DEBT_N
5343,Empire Beauty School-Vernon Hills,2079405.0,,115.0,7050,1541
5344,Empire Beauty School-Stone Park,2079406.0,,240.0,7050,1541


### predominate degree granted method

First, look at the distribution of schools by their predominate degree level granted <b>PREDDEG</b>. Predominant undergraduate award (PREDDEG) identifies the type of award that the institution primarily confers; for instance, an institution that awards 40 percent bachelor’s degrees, 30 percent associate degrees, and 30 percent certificate programs would be classified as predominantly bachelor’s degree awarding. <br>

* 0 = Not classified
* 1 = Predominately certificate-degree granting
* 2 = Predomineatly associate's-degree granting
* 3 = Predominately bachelor's-degree granting
* 4 = Entirely graduate-degree granting

<br> 3 = for-profit, 2 = non-profit, 1 = public

In [263]:
# il distribution of schools by control and preddeg
il_dist = pd.pivot_table(il,
              index='PREDDEG',
              columns='CONTROL',
              values='OPEID6',
              aggfunc='count')

# what percentage of each school type are a specific predomineate degree?
il_dist['1_pct'] = il_dist[1].div(il_dist[1].sum())
il_dist['2_pct'] = il_dist[2].div(il_dist[2].sum())
il_dist['3_pct'] = il_dist[3].div(il_dist[3].sum())

il_dist

CONTROL,1,2,3,1_pct,2_pct,3_pct
PREDDEG,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1.0,7.0,6.0,0.015873,0.077778,0.0625
1,25.0,4.0,80.0,0.396825,0.044444,0.833333
2,25.0,3.0,7.0,0.396825,0.033333,0.072917
3,12.0,56.0,2.0,0.190476,0.622222,0.020833
4,,20.0,1.0,,0.222222,0.010417


The majority of for-profits in IL are predominately certificate-degree granting. Very few public schools are predominately certificate-granting, so maybe the best comparison is all other non-profit and public schools. The national distribution is similar. For a better comparison with a greater sample size, we could compare for-profits in IL with the national pool of for-profits, non-profits and public schools as well.

In [264]:
# national distribution of schools by control and preddeg
us_dist = pd.pivot_table(df,
              index='PREDDEG',
              columns='CONTROL',
              values='OPEID6',
              aggfunc='count')

# what percentage of each school type are a specific predomineate degree?
us_dist['1_pct'] = us_dist[1].div(us_dist[1].sum())
us_dist['2_pct'] = us_dist[2].div(us_dist[2].sum())
us_dist['3_pct'] = us_dist[3].div(us_dist[3].sum())

us_dist

CONTROL,1,2,3,1_pct,2_pct,3_pct
PREDDEG,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,149,139,210,0.072015,0.0711,0.084034
1,591,179,1909,0.285645,0.09156,0.763906
2,735,149,177,0.355244,0.076215,0.070828
3,580,1248,171,0.280329,0.638363,0.068427
4,14,240,32,0.006767,0.122762,0.012805


In [266]:
# IL certificate degree
il_debt[il_debt['PREDDEG'] == 1].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,7918.031891
1,2,11161.577236
2,3,14522.14253


In [267]:
# IL associate degree
il_debt[il_debt['PREDDEG'] == 2].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,8977.764781
1,2,15168.92
2,3,19848.386351


In [269]:
# IL bachelors degree - just 2 for profits: American Academy of Art College and Chamberlain University
il_debt[il_debt['PREDDEG'] == 3].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,20214.162063
1,2,23110.422946
2,3,20947.942743


IL certificate degree average debt is larger than the national certificate level

In [270]:
# NATL certificate degree
df_debt[df_debt['PREDDEG'] == 1].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,11006.361767
1,2,12665.637738
2,3,10636.139597


IL associate degree average debt is in line with for-profits nationally. Public school associates degree graduate with half the debt than nationally.

In [272]:
# NATL associate degree
df_debt[df_debt['PREDDEG'] == 2].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,16332.24103
1,2,18752.972414
2,3,18925.282892


In [273]:
# NATL bachelors degree 
df_debt[df_debt['PREDDEG'] == 3].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,20503.632651
1,2,22791.729602
2,3,27435.523646


In [274]:
# NATL graduate degree 
df_debt[df_debt['PREDDEG'] == 4].groupby('CONTROL').apply(weighted_average_grad_debt).reset_index(name='weighted_avg_grad_debt')

Unnamed: 0,CONTROL,weighted_avg_grad_debt
0,1,26814.0
1,2,20703.477473
2,3,32946.0


Let's go with predominate degrees awarded. Export school-by-school data as a CSV.

In [277]:
il_debt.to_csv('output/il_median_debt.csv', index=False)

## Default and repayment

Using the three year cohort default rate, which is produced annually as an institutional accountability metric. The three-year cohort default rate (CDR3) represents a snapshot in time. For example, FY 2016 rates were calculated using the cohort of borrowers who entered repayment on their federal student loans between October 1, 2015 and September 30, 2016, and who defaulted before September 30, 2018. 

In [235]:
il[['INSTNM','CDR3','BBRR2_FED_UGCOMP_DFLT_SUPP','BBRR4_FED_UG_DFLT','BBRR4_FED_UG_MAKEPROG','BBRR3_FED_GRCOMP_DFLT']]

Unnamed: 0,INSTNM,CDR3,BBRR2_FED_UGCOMP_DFLT_SUPP,BBRR4_FED_UG_DFLT,BBRR4_FED_UG_MAKEPROG,BBRR3_FED_GRCOMP_DFLT
837,Adler University,0.002,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,<=0.05
838,American Academy of Art College,0.019,PrivacySuppressed,0.05-0.09,0.15-0.19,PrivacySuppressed
839,American Islamic College,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed
840,School of the Art Institute of Chicago,0.008,PrivacySuppressed,0.07-0.08,0.11-0.12,<=0.20
841,Augustana College,0.011,PrivacySuppressed,0.04,0.17,PrivacySuppressed
...,...,...,...,...,...,...
6414,Rasmussen University-Romeoville/Joliet,0.016,0.01831016986543,0.11,0.06,PrivacySuppressed
6415,Rasmussen University-Mokena/Tinley Park,0.016,0.01831016986543,0.11,0.06,PrivacySuppressed
6471,Relay Graduate School of Education - Chicago,0.023,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,<=0.10
6489,Networks Barber College,0.030,PrivacySuppressed,0.40-0.59,PrivacySuppressed,PrivacySuppressed


In [251]:
var_list = ['CDR3',
            'BBRR1_FED_UG_DFLT','BBRR1_FED_UG_DLNQ','BBRR1_FED_UG_MAKEPROG','BBRR1_FED_UG_NOPROG','BBRR1_FED_UG_PAIDINFULL',
            'BBRR2_FED_UG_DFLT','BBRR2_FED_UG_DLNQ','BBRR2_FED_UG_MAKEPROG','BBRR2_FED_UG_NOPROG','BBRR2_FED_UG_PAIDINFULL',
            'BBRR3_FED_UG_DFLT','BBRR3_FED_UG_DLNQ','BBRR3_FED_UG_MAKEPROG','BBRR3_FED_UG_NOPROG','BBRR3_FED_UG_PAIDINFULL',
            'BBRR4_FED_UG_DFLT','BBRR4_FED_UG_DLNQ','BBRR4_FED_UG_MAKEPROG','BBRR4_FED_UG_NOPROG','BBRR4_FED_UG_PAIDINFULL',
            'BBRR4_FED_GR_DFLT','BBRR4_FED_GR_DLNQ','BBRR4_FED_GR_MAKEPROG','BBRR4_FED_GR_NOPROG','BBRR4_FED_GR_PAIDINFULL']

Check for completeness of variables. Undergrad is more reliable across 1-4 years after entering repayment, but the repayment percentages are all over - some are ranges, other exact percentages. Grad has too many privacy suppressed. The 3-year cohort default rate <b>CDR3</b> is the most reliable and the only fully numeric one that I can do calculations on.

In [253]:
for var in var_list:
    privacy = len(ilfp[ilfp[var] == 'PrivacySuppressed'])
    nas = len(ilfp[ilfp[var].isnull()])
    print(var)
    print('total privacy/nulls: ', privacy+nas)
    print('pct incomplete: ',(privacy+nas)/len(ilfp))

CDR3
total privacy/nulls:  7
pct incomplete:  0.07291666666666667
BBRR1_FED_UG_DFLT
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_DLNQ
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_MAKEPROG
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_NOPROG
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR1_FED_UG_PAIDINFULL
total privacy/nulls:  19
pct incomplete:  0.19791666666666666
BBRR2_FED_UG_DFLT
total privacy/nulls:  17
pct incomplete:  0.17708333333333334
BBRR2_FED_UG_DLNQ
total privacy/nulls:  21
pct incomplete:  0.21875
BBRR2_FED_UG_MAKEPROG
total privacy/nulls:  18
pct incomplete:  0.1875
BBRR2_FED_UG_NOPROG
total privacy/nulls:  20
pct incomplete:  0.20833333333333334
BBRR2_FED_UG_PAIDINFULL
total privacy/nulls:  21
pct incomplete:  0.21875
BBRR3_FED_UG_DFLT
total privacy/nulls:  18
pct incomplete:  0.1875
BBRR3_FED_UG_DLNQ
total privacy/nulls:  21
pct incomplete:  0.21875
BBRR3_FED_UG

In [255]:
test = il[['INSTNM', 'CONTROL'] + var_list].copy()
test.to_csv('test.csv')

Nationally, the average default rate across all Title IV eligible higher ed institutions is 3%. It's highest at for-profit schools, where the default rate is 3.7%. It's 2.1% at non-profits and 3.1% at public schools. Illinois schools roughly follow that trend. 

In [265]:
df['CDR3'].describe()

count    5673.000000
mean        0.030427
std         0.032918
min         0.000000
25%         0.011000
50%         0.023000
75%         0.040000
max         0.428000
Name: CDR3, dtype: float64

In [85]:
il['CDR3'].describe()

count    230.000000
mean       0.029000
std        0.031236
min        0.000000
25%        0.010000
50%        0.020000
75%        0.035750
max        0.184000
Name: CDR3, dtype: float64

In [266]:
df.groupby('CONTROL')['CDR3'].mean()

CONTROL
1    0.031335
2    0.021488
3    0.036775
Name: CDR3, dtype: float64

In [267]:
il.groupby('CONTROL')['CDR3'].mean()

CONTROL
1    0.035140
2    0.018464
3    0.035011
Name: CDR3, dtype: float64

But some for-profit schools in Illinois, have default rates two to six times higher than the national average across all schools, and have among the highest default rates out of all schools in the country. Nearly all are cosmetology or barbering schools. 

In [276]:
# top 15 highest default rates amongst Illinois for-profits
ilfp[['INSTNM','CDR3','CIPTITLE1','CIPTITLE2','CIPTITLE3', 'G12MN','UG12MN','DEBT_MDN', 'MD_EARN_WNE_P10','NPT4_PRIV']].sort_values('CDR3', ascending=False).head(15)

Unnamed: 0,INSTNM,CDR3,CIPTITLE1,CIPTITLE2,CIPTITLE3,G12MN,UG12MN,DEBT_MDN,MD_EARN_WNE_P10,NPT4_PRIV
852,Cannella School of Hair Design-Chicago,0.184,Cosmetology/Cosmetologist General,,,,68.0,2533,20933.0,9225.0
5430,Creative Touch Cosmetology School,0.177,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,,,21.0,PrivacySuppressed,,14014.0
983,Taylor Business Institute,0.177,,,,,201.0,4554,24348.0,16555.0
986,Tri-County Beauty Academy,0.137,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,,,16.0,PrivacySuppressed,,12104.0
5217,Larry's Barber College,0.13,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,,133.0,3723,,13785.0
6024,Larry's Barber College-Joliet,0.13,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,,11.0,3723,,8276.0
5969,Larry's Barber College,0.13,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,,49.0,3723,,14064.0
3804,Hairmasters Institute of Cosmetology,0.09,Cosmetology/Cosmetologist General,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor,,139.0,7048,22942.0,8575.0
5204,Reflections Academy of Beauty,0.09,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,Cosmetology Barber/Styling and Nail Instructor,,46.0,9500,,11733.0
929,Steven Papageorge Hair Academy,0.068,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,,,81.0,PrivacySuppressed,17031.0,15374.0


Why are public schools default rates just as high as for-profits?

In [271]:
il[il['CONTROL']==1][['INSTNM','CDR3','CIPTITLE1','CIPTITLE2','CIPTITLE3', 'DEBT_MDN', 'MD_EARN_WNE_P10','NPT4_PUB']].sort_values('CDR3', ascending=False).head()

Unnamed: 0,INSTNM,CDR3,CIPTITLE1,CIPTITLE2,CIPTITLE3,DEBT_MDN,MD_EARN_WNE_P10,NPT4_PUB
911,Kaskaskia College,0.1,,,,3500,34989.0,6064.0
906,John A Logan College,0.086,,,,3500,31782.0,5616.0
959,Rend Lake College,0.081,,,,4475,35169.0,8227.0
978,Spoon River College,0.075,,,,6750,36442.0,5552.0
912,Kishwaukee College,0.07,,,,6542,40064.0,5472.0


For context, here are some of the schools with the lowest default rates (non-zero) in Illinois. Chamberlain appears to be an exception to for-profit trends.

In [282]:
il[il['CDR3'] > 0.0][['INSTNM','CDR3','CIPTITLE1','CIPTITLE2','CIPTITLE3', 'DEBT_MDN', 'MD_EARN_WNE_P10','NPT4_PUB','combined_cost','combined_price']].sort_values('CDR3', ascending=True).head(10)

Unnamed: 0,INSTNM,CDR3,CIPTITLE1,CIPTITLE2,CIPTITLE3,DEBT_MDN,MD_EARN_WNE_P10,NPT4_PUB,combined_cost,combined_price
893,Rosalind Franklin University of Medicine and S...,0.001,,,,PrivacySuppressed,,,,
949,Northwestern University,0.001,,,,14000,85796.0,,81058.0,28230.0
837,Adler University,0.002,,,,PrivacySuppressed,,,,
879,Elmhurst University,0.002,,,,15000,58657.0,,50133.0,23036.0
6149,University of Notre Dame -,0.002,,,,19000,93220.0,,,
935,Methodist College,0.004,,,,27000,66066.0,,38345.0,28547.0
993,Oak Point University,0.005,,,,25000,84630.0,,,
943,National University of Health Sciences,0.005,,,,12289,46611.0,,,
4783,Chamberlain University-Illinois,0.005,,,,16458,82055.0,,32437.0,23638.0
861,University of Chicago,0.005,,,,13368,78439.0,,81531.0,22690.0


In [285]:
# create table for viz
dflt_viz = il[['INSTNM','CONTROL','CDR3','CIPTITLE1','G12MN','UG12MN','DEBT_MDN', 'MD_EARN_WNE_P6','combined_price','combined_cost']].sort_values('CDR3', ascending=False).copy()

In [286]:
dflt_viz.to_csv('output/default_viz.csv', index=False)

In [75]:
il[['INSTNM','CONTROL','PLUS_DEBT_INST_MD','DEBT_MDN','LPGPLUS_AMT','LPGPLUS_CNT']]

Unnamed: 0,INSTNM,CONTROL,PLUS_DEBT_INST_MD,DEBT_MDN,LPGPLUS_AMT,LPGPLUS_CNT
837,Adler University,2,PrivacySuppressed,PrivacySuppressed,237319111,3071
838,American Academy of Art College,3,58474,27000,PrivacySuppressed,PrivacySuppressed
839,American Islamic College,2,,,PrivacySuppressed,PrivacySuppressed
840,School of the Art Institute of Chicago,2,61555,19000,45617222,1270
841,Augustana College,2,30000,26000,338855,14
...,...,...,...,...,...,...
6414,Rasmussen University-Romeoville/Joliet,3,12900,13000,PrivacySuppressed,PrivacySuppressed
6415,Rasmussen University-Mokena/Tinley Park,3,12900,13000,PrivacySuppressed,PrivacySuppressed
6471,Relay Graduate School of Education - Chicago,2,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed
6489,Networks Barber College,3,PrivacySuppressed,15900,PrivacySuppressed,PrivacySuppressed


## Earnings

Collge Scorecard reports mean and median earnings for students 6, 8 and 10 years after entry. This is for students who are not enrolled and working at the date of measurement. Earnings are based on wages and deferred compensation reports via the IRS form W-2 and Schedule SE. Earnings are based on 2020 earnings and may be impacted by the pandemic. <br>

From College Scorecard: " 2008-09 and 2009-10 pooled award year cohort measured in calendar year 2019 and 2020. Earnings are inflation adjusted to 2021 dollars." <br>

I calculated the average of median earnings across schools by <b>CONTROL</b>.<br>

These are the median earnings figures reported in each school's profile on College Scorecard and are what [Bloomberg](https://www.bloomberg.com/graphics/2024-college-return-on-investment/) used for a story that cited data provided by the Georgetown University's Center on Education and the Workforce.

In [83]:
# inspect median and mean data
il[['INSTNM','CONTROL','MD_EARN_WNE_P10','MN_EARN_WNE_P10','MD_EARN_WNE_P6','MD_EARN_WNE_P8']].dropna().sort_values('MD_EARN_WNE_P10', ascending=False)

Unnamed: 0,INSTNM,CONTROL,MD_EARN_WNE_P10,MN_EARN_WNE_P10,MD_EARN_WNE_P6,MD_EARN_WNE_P8
6149,University of Notre Dame -,2,93220.0,98400,84235.0,87008.0
949,Northwestern University,2,85796.0,93400,72370.0,78939.0
965,Rush University,2,84906.0,125000,70482.0,70868.0
993,Oak Point University,2,84630.0,72900,76137.0,76457.0
902,Illinois Institute of Technology,2,82793.0,78000,68517.0,73894.0
...,...,...,...,...,...,...
4815,Tricoci University of Beauty Culture-Chicago NW,3,20011.0,PrivacySuppressed,25520.0,25923.0
3731,Educators of Beauty College of Cosmetology-Roc...,3,18827.0,19800,21896.0,19726.0
980,Educators of Beauty College of Cosmetology-Ste...,3,18827.0,19800,21896.0,19726.0
872,Tricoci University of Beauty Culture,3,18311.0,22200,15573.0,16339.0


The average time it takes to graduate is 5 years. At 6, 8, and 10 years after entering the school, median earnings for Illinoisians who attended for-profits is consistently lower than their counterparts who went to public and private nonprofit schools. The difference is roughly ~$20k for private nonprofits. 

In [33]:
il.groupby('CONTROL')[['MD_EARN_WNE_P6','MD_EARN_WNE_P8','MD_EARN_WNE_P10']].mean()

Unnamed: 0_level_0,MD_EARN_WNE_P6,MD_EARN_WNE_P8,MD_EARN_WNE_P10
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,34585.716667,37910.233333,40839.233333
2,46890.909091,51234.19697,55169.333333
3,26399.52439,28702.408451,30639.063492


In [285]:
# IL associate degree
il[il['PREDDEG'] == 2].groupby('CONTROL')[['MD_EARN_WNE_P6','MD_EARN_WNE_P8','MD_EARN_WNE_P10']].mean()

Unnamed: 0_level_0,MD_EARN_WNE_P6,MD_EARN_WNE_P8,MD_EARN_WNE_P10
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,31751.818182,34783.909091,37278.590909
2,37335.333333,38323.333333,40936.666667
3,35924.714286,39709.857143,41185.142857


In [286]:
# IL certificate degree
il[il['PREDDEG'] == 1].groupby('CONTROL')[['MD_EARN_WNE_P6','MD_EARN_WNE_P8','MD_EARN_WNE_P10']].mean()

Unnamed: 0_level_0,MD_EARN_WNE_P6,MD_EARN_WNE_P8,MD_EARN_WNE_P10
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,31776.44,34485.04,37032.92
2,41893.0,45881.666667,47145.0
3,24244.073529,26157.689655,27373.714286


This trend is reflected nationally as well.

In [34]:
df.groupby('CONTROL')[['MD_EARN_WNE_P6','MD_EARN_WNE_P8','MD_EARN_WNE_P10']].mean()

Unnamed: 0_level_0,MD_EARN_WNE_P6,MD_EARN_WNE_P8,MD_EARN_WNE_P10
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,36597.315234,39669.291561,42691.477261
2,43251.767458,47349.144309,51338.654483
3,27615.410573,29091.836176,30968.139363


About one third of Illinois for-profit schools do we not have median earnings data for 10 and 8 years after enrolling. The data is a little more complete for 6 years after enrolling so that's the data I'll show by school.

In [71]:
il[il['MD_EARN_WNE_P10'].isnull()].groupby('CONTROL').size()

CONTROL
1     3
2    24
3    33
dtype: int64

In [79]:
il[il['MD_EARN_WNE_P6'].isnull()].groupby('CONTROL').size()

CONTROL
1     3
2    24
3    14
dtype: int64

In [81]:
il[il['MD_EARN_WNE_P8'].isnull()].groupby('CONTROL').size()

CONTROL
1     3
2    24
3    25
dtype: int64

In [73]:
len(ilfp)

96

The for-profit schools with the highest median incomes 6 years after enrolling appear to be mostly schools that award degrees in health professions and related programs. <b>PCIP51</b> is the percentage of degrees in the health professions. While those with the lowest earnings award 100% of their degrees in personal and culinary services <b>PCIP12</b>.

In [105]:
# top 10 highest earning schools
ilfp[['INSTNM','CONTROL','MD_EARN_WNE_P10','MN_EARN_WNE_P10','MD_EARN_WNE_P6','MD_EARN_WNE_P8','PCIP51','PCIP12']].sort_values('MD_EARN_WNE_P6', ascending=False).head(10)

Unnamed: 0,INSTNM,CONTROL,MD_EARN_WNE_P10,MN_EARN_WNE_P10,MD_EARN_WNE_P6,MD_EARN_WNE_P8,PCIP51,PCIP12
4783,Chamberlain University-Illinois,3,82055.0,60400,76330.0,81287.0,1.0,0.0
5306,Verve College,3,,,57941.0,,1.0,0.0
3834,Worsham College of Mortuary Science,3,53900.0,48100,56713.0,53766.0,0.0,1.0
4927,Ambria College of Nursing,3,67488.0,PrivacySuppressed,50066.0,59625.0,1.0,0.0
3786,Universal Technical Institute of Illinois Inc,3,51889.0,48100,46572.0,47765.0,0.0,0.0
3666,ETI School of Skilled Trades,3,44410.0,44900,43814.0,48865.0,0.0,0.0
5371,DeVry University-Illinois,3,45217.0,50300,38520.0,40307.0,0.5015,0.0
4339,MDT College of Health Sciences,3,46801.0,36100,38110.0,38502.0,1.0,0.0
6415,Rasmussen University-Mokena/Tinley Park,3,37168.0,34300,35866.0,34181.0,,
6414,Rasmussen University-Romeoville/Joliet,3,37168.0,34300,35866.0,34181.0,,


In [106]:
# top 10 lowest earning schools
ilfp[['INSTNM','CONTROL','MD_EARN_WNE_P10','MN_EARN_WNE_P10','MD_EARN_WNE_P6','MD_EARN_WNE_P8','PCIP51','PCIP12']].sort_values('MD_EARN_WNE_P6', ascending=True).head(10)

Unnamed: 0,INSTNM,CONTROL,MD_EARN_WNE_P10,MN_EARN_WNE_P10,MD_EARN_WNE_P6,MD_EARN_WNE_P8,PCIP51,PCIP12
852,Cannella School of Hair Design-Chicago,3,20933.0,18000,15448.0,15695.0,0.0,1.0
4930,Innovations Design Academy,3,,,15515.0,,0.0,1.0
872,Tricoci University of Beauty Culture,3,18311.0,22200,15573.0,16339.0,0.0,1.0
3833,Rosel School of Cosmetology,3,15858.0,18700,16379.0,20898.0,0.0,1.0
983,Taylor Business Institute,3,24348.0,25200,16625.0,24254.0,0.1928,0.0
5969,Larry's Barber College,3,,,17652.0,,0.0,1.0
5217,Larry's Barber College,3,,,17652.0,,0.0,1.0
6024,Larry's Barber College-Joliet,3,,,17652.0,,0.0,1.0
986,Tri-County Beauty Academy,3,,PrivacySuppressed,17970.0,18285.0,0.0,1.0
3768,Bell Mar Beauty College,3,26062.0,21500,18169.0,24914.0,0.0,1.0


<font color='red'>TO DO: </font> Look at the earnings by program. For-profits may have lower median earnings because the professions they serve are lower-earning professions, but are the same programs at public/nonprofits higher earning than they are at for-profits? What is the program-level breakdown in for-profits, and are there some programs, like cosmo and nursing, that are predominately served by only for-profit schools?

In [107]:
ilff[['INSTNM','CIPDESC','CREDDESC','CONTROL','EARN_MDN_HI_1YR','EARN_COUNT_NWNE_HI_1YR']]

Unnamed: 0,INSTNM,CIPDESC,CREDDESC,CONTROL,EARN_MDN_HI_1YR,EARN_COUNT_NWNE_HI_1YR
0,Adler University,Communication and Media Studies.,Master's Degree,"Private, nonprofit",PrivacySuppressed,0
1,Adler University,Gerontology.,Master's Degree,"Private, nonprofit",PrivacySuppressed,PrivacySuppressed
2,Adler University,Health and Physical Education/Fitness.,Master's Degree,"Private, nonprofit",,
3,Adler University,"Psychology, General.",Master's Degree,"Private, nonprofit",37452,4
4,Adler University,"Clinical, Counseling and Applied Psychology.",Master's Degree,"Private, nonprofit",40358,6
...,...,...,...,...,...,...
8475,,,,,,
8476,,,,,,
8477,,,,,,
8478,,,,,,


## Completion rates

I looked at the completion rate for first-time, full-time students at four year and less than four year schools who completed within 150% of the expected time to complete. Completion rates by race/ethnicity is really sparse in terms of data availability.<br>

Graduation rates at for-profit schools in Illinois are not as abysmal as I thought. On average, for-profit and non-profit grad rates are much higher than public schools natinoally and in Illinois. Public school grad rates are pulled down by community colleges with less than 30% graduating. Most IL for-profits are less than four years. The average grad rate for a for-profit school in IL is much higher for less than four year schools (67%) than four year (58%), but they vary a lot. SAE institute of technology in Chicago has one of the lowest grad rates at 31% while G Skin has a grad rate of 70%.

In [40]:
ilfp[['INSTNM','CONTROL','G12MN','UG12MN','C150_4','C150_L4','C150_4_POOLED','C150_L4_POOLED','C150_4_WHITE','C150_L4_WHITE','C150_4_BLACK','C150_L4_BLACK']].sort_values('C150_L4').head(50)

Unnamed: 0,INSTNM,CONTROL,G12MN,UG12MN,C150_4,C150_L4,C150_4_POOLED,C150_L4_POOLED,C150_4_WHITE,C150_L4_WHITE,C150_4_BLACK,C150_L4_BLACK
5397,Networks Barber College,3,,66.0,,0.0833,,0.325,,,,
5478,SAE Institute of Technology-Chicago,3,,228.0,,0.3125,,0.4801,,0.3636,,0.1875
5344,Empire Beauty School-Stone Park,3,,240.0,,0.4219,,0.3935,,,,
4339,MDT College of Health Sciences,3,,434.0,,0.4286,,0.2692,,,,
850,Cameo Beauty Academy,3,,143.0,,0.4286,,0.5,,,,
5356,Trenz Beauty Academy,3,,505.0,,0.4545,,0.4792,,,,
874,Cosmetology & Spa Academy,3,,574.0,,0.4551,,0.4682,,,,
5143,State Career College,3,,179.0,,0.4576,,0.5221,,,,
4973,Paul Mitchell the School-Chicago,3,,334.0,,0.4667,,0.5842,,,,
890,Hair Professionals School of Cosmetology,3,,179.0,,0.4737,,0.3871,,,,


In [35]:
il[['INSTNM','CONTROL','C150_4','C150_L4','C150_4_POOLED','C150_L4_POOLED','C150_4_WHITE','C150_L4_WHITE','C150_4_BLACK','C150_L4_BLACK']].sort_values('C150_L4').head(20)

Unnamed: 0,INSTNM,CONTROL,C150_4,C150_L4,C150_4_POOLED,C150_L4_POOLED,C150_4_WHITE,C150_L4_WHITE,C150_4_BLACK,C150_L4_BLACK
5397,Networks Barber College,3,,0.0833,,0.325,,,,
985,South Suburban College,1,,0.1563,,0.1675,,0.3636,,0.0987
864,City Colleges of Chicago-Olive-Harvey College,1,,0.1627,,0.2039,,0.5,,0.1441
955,Prairie State College,1,,0.169,,0.1675,,0.2105,,0.125
908,Joliet Junior College,1,,0.1855,,0.1838,,0.2227,,0.0879
930,Generations College,2,,0.2,,0.3387,,0.0,,0.2
863,City Colleges of Chicago-Malcolm X College,1,,0.2137,,0.2239,,0.2222,,0.113
941,Morton College,1,,0.2148,,0.2104,,0.3125,,0.0
875,College of DuPage,1,,0.2163,,0.2167,,0.2547,,0.1275
989,Triton College,1,,0.2253,,0.222,,0.2948,,0.0976


In [25]:
# 1 = public, 2 = private non-profit, 3 = private for-profit
il.groupby('CONTROL')[['C150_4','C150_L4']].mean()

Unnamed: 0_level_0,C150_4,C150_L4
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.479542,0.3561
2,0.594906,0.54515
3,0.50885,0.678308


In [34]:
il.groupby('CONTROL')[['C150_4','C150_L4']].median()

Unnamed: 0_level_0,C150_4,C150_L4
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.50825,0.3404
2,0.6046,0.4903
3,0.5845,0.6789


In [26]:
# national average grad rate
df.groupby('CONTROL')[['C150_4','C150_L4']].mean()

Unnamed: 0_level_0,C150_4,C150_L4
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.481738,0.426934
2,0.55888,0.657649
3,0.44865,0.673239


In [30]:
# national median grad rate
df.groupby('CONTROL')[['C150_4','C150_L4','WDRAW_ORIG_YR2_RT']].median()

Unnamed: 0_level_0,C150_4,C150_L4
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.47735,0.3608
2,0.5813,0.6938
3,0.4444,0.68615


## Student demographics

Variables:
Average age of entry
Share of dependent students (assuming this is based on federal financial aid definitions)
Median family income
Average family income for independent students 
Share of first-generation students <br>

Analysis also in this google sheet: https://docs.google.com/spreadsheets/d/1ict9t77xUymD0tzH4rLpsiZ6SIjTOCC4VkCFnsvMyDo/edit?usp=sharing

In [60]:
# create a new dataframe with all the demographic cols I want
demos = il[['INSTNM','CONTROL','MD_FAMINC','FIRST_GEN','AGE_ENTRY','DEPENDENT','MARRIED','FEMALE','FAMINC_IND','PCT_WHITE','PCT_BLACK','PCT_HISPANIC','PCT_BA']].copy()

In [64]:
# for variables I care about, how many for-profit, non-profit and public schools have their privacy suppressed or nan?
demos[demos['DEPENDENT'] == 'PrivacySuppressed'].groupby('CONTROL').size() # 7 fp don't

demos[demos['MD_FAMINC'] == 'PrivacySuppressed'].groupby('CONTROL').size() # all fp have this

demos[demos['AGE_ENTRY'] == 'PrivacySuppressed'].groupby('CONTROL').size() # all fp have this

# first gen is too spotty
demos[demos['FIRST_GEN'] == 'PrivacySuppressed'].groupby('CONTROL').size() # 18 fp don't have this

CONTROL
2    18
3    18
dtype: int64

In [65]:
demos.head()

Unnamed: 0,INSTNM,CONTROL,MD_FAMINC,FIRST_GEN,AGE_ENTRY,DEPENDENT,MARRIED,FEMALE,FAMINC_IND,PCT_WHITE,PCT_BLACK,PCT_HISPANIC,PCT_BA
837,Adler University,2,,,,,,,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed
838,American Academy of Art College,3,41179.0,0.375,20.006622517,PrivacySuppressed,PrivacySuppressed,0.6887417219,PrivacySuppressed,71.1399993896484,14.5,16.9699993133544,15.7399997711181
839,American Islamic College,2,,,,,,,,,,,
840,School of the Art Institute of Chicago,2,50784.0,0.212636695,21.148105626,0.8645235362,0.0195177956,0.7221584386,18061.771186,77.370002746582,9.57999992370605,12.0699996948242,22.2399997711181
841,Augustana College,2,90216.0,0.2224352828,19.553406998,0.9861878453,PrivacySuppressed,0.5883977901,12334.533333,88.2399978637695,4.78000020980835,6.3600001335144,19


In [68]:
# create new debt dataframes so i can replace 'PrivacySuppressed'
demos = demos.replace('PrivacySuppressed', np.nan)

# convert to floats which work with np.nan
demos[['MD_FAMINC','FIRST_GEN','AGE_ENTRY','DEPENDENT','MARRIED','FEMALE','FAMINC_IND','PCT_WHITE','PCT_BLACK','PCT_HISPANIC','PCT_BA']] = demos[['MD_FAMINC','FIRST_GEN','AGE_ENTRY','DEPENDENT','MARRIED','FEMALE','FAMINC_IND','PCT_WHITE','PCT_BLACK','PCT_HISPANIC','PCT_BA']].astype(float)

FINDING: On average, the majority of Illinois for-profit students are not dependent on the financial support of their parents whereas the majority of students at public and non-profit schools are. For-profit students are also older and have lower median family incomes. The average age at entry at an Illinois for-profit school is 27 and the median family income for students not dependent on their parents’ support at a for-profit school in Illinois is about $18,800, according to WBEZ’s analysis.

In [70]:
demos.groupby('CONTROL')[['MD_FAMINC', 'FAMINC_IND','AGE_ENTRY', 'DEPENDENT']].mean()

Unnamed: 0_level_0,MD_FAMINC,FAMINC_IND,AGE_ENTRY,DEPENDENT
CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,25684.880952,19531.624087,23.899297,0.626831
2,44819.463768,25306.454799,23.803207,0.673504
3,19234.55914,18799.860503,27.256838,0.364276


In [79]:
il['rate not working'] = il['COUNT_NWNE_3YR'] / il['COUNT_WNE_3YR']

In [80]:
il[['INSTNM','COUNT_NWNE_3YR','COUNT_WNE_3YR','rate not working']]

Unnamed: 0,INSTNM,COUNT_NWNE_3YR,COUNT_WNE_3YR,rate not working
837,Adler University,1.0,,
838,American Academy of Art College,4.0,121.0,0.033058
839,American Islamic College,,,
840,School of the Art Institute of Chicago,82.0,626.0,0.130990
841,Augustana College,18.0,625.0,0.028800
...,...,...,...,...
6414,Rasmussen University-Romeoville/Joliet,435.0,7125.0,0.061053
6415,Rasmussen University-Mokena/Tinley Park,435.0,7125.0,0.061053
6471,Relay Graduate School of Education - Chicago,12.0,483.0,0.024845
6489,Networks Barber College,8.0,,


FINDING for story: share of black and latino students at SAE based on <b>UGDS_WHITE </b> (Total share of enrollment of undergraduate degree-seeking students who are white) and other race/ethnicities. <br>

FINDING for story: Out of 142 title IV students, 59 or 42% have a family income below $30k. This is only for: "the number of full-time, first-time, degree/certificate-seeking undergraduates who received Title IV aid (see NUM4_PUB, NUM4_PRIV), by income category, included in the IPEDS Student Financial Aid component.""

In [93]:
ilfp[ilfp['INSTNM'].str.contains('SAE')][['INSTNM','UGDS','G12MN','UG12MN','UGDS_BLACK','UGDS_HISP','UGDS_WHITE','UGDS_ASIAN','NUM4_PRIV','NUM41_PRIV']]

Unnamed: 0,INSTNM,UGDS,G12MN,UG12MN,UGDS_BLACK,UGDS_HISP,UGDS_WHITE,UGDS_ASIAN,NUM4_PRIV,NUM41_PRIV
5478,SAE Institute of Technology-Chicago,262.0,,228.0,0.5992,0.2176,0.1298,0.0038,142.0,59.0


In [95]:
59/142

0.4154929577464789

Percent of undergraduates who receive federal Pell Grants, which are awarded to students with high levels of financial need

In [101]:
ilfp[['INSTNM','PCTPELL']]

Unnamed: 0,INSTNM,PCTPELL
838,American Academy of Art College,0.5089
849,Paul Mitchell The School Tinley Park,0.5108
850,Cameo Beauty Academy,0.5594
851,Cannella School of Hair Design-Villa Park,0.1148
852,Cannella School of Hair Design-Chicago,0.6618
...,...,...
6413,Rasmussen University-Aurora,
6414,Rasmussen University-Romeoville/Joliet,
6415,Rasmussen University-Mokena/Tinley Park,
6489,Networks Barber College,


In [102]:
il.groupby('CONTROL')['PCTPELL'].mean()

CONTROL
1    0.317695
2    0.377417
3    0.505257
Name: PCTPELL, dtype: float64

In [103]:
il.groupby('CONTROL')['PCTPELL'].median()

CONTROL
1    0.30175
2    0.36230
3    0.51670
Name: PCTPELL, dtype: float64

In [104]:
df.groupby('CONTROL')['PCTPELL'].median()

CONTROL
1    0.32310
2    0.35205
3    0.56060
Name: PCTPELL, dtype: float64

## Location analysis

In [None]:
# on pause - less of a priority 

In [83]:
loc = ilfp[['INSTNM', 'LATITUDE', 'LONGITUDE','ADDR']].copy()

In [84]:
loc.to_csv('test.csv')

## Data for lookup table

In [307]:
lookup = pd.merge(il_debt,
                   il[['OPEID6','combined_cost','MD_EARN_WNE_P6','COUNT_NWNE_P6','MD_EARN_WNE_P8','MD_EARN_WNE_P10','MD_FAMINC','DEPENDENT','G12MN','UG12MN','UGDS_BLACK','UGDS_HISP','UGDS_WHITE','UGDS_ASIAN','PCTPELL']],
                  on='OPEID6',
                  how='outer')

lookup.to_csv('output/data_for_lookup.csv', index=False)

* 0 = Not classified
* 1 = Predominately certificate-degree granting
* 2 = Predomineatly associate's-degree granting
* 3 = Predominately bachelor's-degree granting
* 4 = Entirely graduate-degree granting

In [288]:
# find comparisons between nonprofit + public certificate vs for profits
il_dist

CONTROL,1,2,3,1_pct,2_pct,3_pct
PREDDEG,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1.0,7.0,6.0,0.015873,0.077778,0.0625
1,25.0,4.0,80.0,0.396825,0.044444,0.833333
2,25.0,3.0,7.0,0.396825,0.033333,0.072917
3,12.0,56.0,2.0,0.190476,0.622222,0.020833
4,,20.0,1.0,,0.222222,0.010417


In [297]:
# avg median earnings for associates + certificate nonprofit + public schools
lookup[((lookup['PREDDEG'] == 1) | (lookup['PREDDEG'] == 2)) & (lookup['CONTROL'] < 3)]['MD_EARN_WNE_P6'].mean()

32664.166666666668

In [305]:
# calculate weighted average graduate debt for associates + certificate nonprofit + public schools
# formula: (val * wt).sum() / wt.sum()
val  = lookup[((lookup['PREDDEG'] == 1) | (lookup['PREDDEG'] == 2)) & (lookup['CONTROL'] < 3)]['GRAD_DEBT_MDN']
wt  = lookup[((lookup['PREDDEG'] == 1) | (lookup['PREDDEG'] == 2)) & (lookup['CONTROL'] < 3)]['GRAD_DEBT_N']
(val * wt).sum() / wt.sum()

8779.950248207391

In [306]:
# calculate weighted average cost for associates + certificate nonprofit + public schools
# formula: (val * wt).sum() / wt.sum()
val  = lookup[((lookup['PREDDEG'] == 1) | (lookup['PREDDEG'] == 2)) & (lookup['CONTROL'] < 3)]['combined_cost']
wt  = lookup[((lookup['PREDDEG'] == 1) | (lookup['PREDDEG'] == 2)) & (lookup['CONTROL'] < 3)]['UG12MN']
(val * wt).sum() / wt.sum()

11466.774026114195

In [308]:
# calculate weighted average median earnings for associates + certificate nonprofit + public schools
# formula: (val * wt).sum() / wt.sum()
val  = lookup[((lookup['PREDDEG'] == 1) | (lookup['PREDDEG'] == 2)) & (lookup['CONTROL'] < 3)]['MD_EARN_WNE_P6']
wt  = lookup[((lookup['PREDDEG'] == 1) | (lookup['PREDDEG'] == 2)) & (lookup['CONTROL'] < 3)]['COUNT_NWNE_P6']
(val * wt).sum() / wt.sum()

31148.478549964835

In [None]:
# format lookup table for flourish!

# TK

## Data by field overview

4 digit CIP code for cosmetology: 

In [309]:
ilff[ilff['CONTROL'] == "Private, for-profit"].head()

Unnamed: 0,UNITID,OPEID6,INSTNM,CONTROL,MAIN,CIPCODE,CIPDESC,CREDLEV,CREDDESC,IPEDSCOUNT1,...,BBRR4_FED_COMP_N,BBRR4_FED_COMP_DFLT,BBRR4_FED_COMP_DLNQ,BBRR4_FED_COMP_FBR,BBRR4_FED_COMP_DFR,BBRR4_FED_COMP_NOPROG,BBRR4_FED_COMP_MAKEPROG,BBRR4_FED_COMP_PAIDINFULL,BBRR4_FED_COMP_DISCHARGE,DISTANCE
475,143376.0,30653.0,Paul Mitchell The School Tinley Park,"Private, for-profit",1.0,1204.0,Cosmetology and Related Personal Grooming Serv...,1.0,Undergraduate Certificate or Diploma,116.0,...,226,0.10 - 0.14,<=0.05,0.50 - 0.54,0.05 - 0.09,0.05 - 0.09,0.10 - 0.14,<=0.05,<=0.05,1.0
476,143376.0,30653.0,Paul Mitchell The School Tinley Park,"Private, for-profit",1.0,1313.0,Teacher Education and Professional Development...,1.0,Undergraduate Certificate or Diploma,,...,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,0.0
477,143464.0,30784.0,Cameo Beauty Academy,"Private, for-profit",1.0,1204.0,Cosmetology and Related Personal Grooming Serv...,1.0,Undergraduate Certificate or Diploma,24.0,...,57,<=0.20,<=0.20,0.60 - 0.79,<=0.20,<=0.20,<=0.20,<=0.20,<=0.20,1.0
478,143464.0,30784.0,Cameo Beauty Academy,"Private, for-profit",1.0,1313.0,Teacher Education and Professional Development...,1.0,Undergraduate Certificate or Diploma,,...,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,0.0
479,143473.0,22026.0,Cannella School of Hair Design-Villa Park,"Private, for-profit",1.0,1204.0,Cosmetology and Related Personal Grooming Serv...,1.0,Undergraduate Certificate or Diploma,50.0,...,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,1.0


In [107]:
cip_descs = ff.groupby('CIPDESC').size().reset_index()

In [109]:
cip_descs.tail()

Unnamed: 0,CIPDESC,0
433,"Visual and Performing Arts, Other.",179
434,Wildlife and Wildlands Science and Management.,162
435,Woodworking.,128
436,Work and Family Studies.,11
437,Zoology/Animal Biology.,294


In [108]:
cip_descs.to_csv('cip_descs.csv')

In [178]:
ff[ff['CREDLEV'] == 4][['INSTNM','CREDLEV','CIPDESC']]

Unnamed: 0,INSTNM,CREDLEV,CIPDESC
272,University of Alabama in Huntsville,4,Computer/Information Technology Administration...
279,University of Alabama in Huntsville,4,Teaching English or French as a Second or Fore...
292,University of Alabama in Huntsville,4,Computer Engineering.
313,University of Alabama in Huntsville,4,Rhetoric and Composition/Writing Studies.
324,University of Alabama in Huntsville,4,"Multi/Interdisciplinary Studies, Other."
...,...,...,...
233106,Bath Spa University,4,Teacher Education and Professional Development...
233332,University of Lincoln,4,History.
233767,University of Chester,4,Teacher Education and Professional Development...
233784,University of Chester,4,"Business, Management, Marketing, and Related S..."


In [None]:
#  match state from inst level data

In [193]:
ff = pd.merge(ff, df[['OPEID6','STABBR']], on='OPEID6', how='left')

In [196]:
# Il cosmo schools broken down by control
ff[(ff['CIPCODE']==1204) & (ff['STABBR'] == 'IL')].groupby('CONTROL').size()

CONTROL
Private, for-profit    110
Private, nonprofit       2
Public                  19
dtype: int64

In [202]:
# filter for just il cosmo
il_cosmo = ff[(ff['CIPCODE']==1204) & (ff['STABBR'] == 'IL')].copy()

In [200]:
il_cosmo.to_csv('test.csv')

In [None]:
# how many IL cosmo are for-profits? plus tabula of other cosmo schools
# median income, debt, cost of for-profit cosmo school
# median income, debt, cost of for-profit cosmo school compared to others (maybe nationally)
# total enrollment at cosmo in IL

# how many cosmo schools aren't captured in federal data - cellini: https://www.peerresearchproject.org/peer/research/body/PEER_Cosmetology_B.pdf

# the data so-what of cosmo schools: disproportionately female, poc, worst outcomes out of all other program types

# the program types with the most for-profit penetration 
# earnings by program type

In [205]:
il_cosmo[['INSTNM','IPEDSCOUNT1','IPEDSCOUNT2','CIPCODE','EARN_COUNT_NE_3YR','EARN_NE_MDN_3YR']]

Unnamed: 0,INSTNM,IPEDSCOUNT1,IPEDSCOUNT2,CIPCODE,EARN_COUNT_NE_3YR,EARN_NE_MDN_3YR
68373,Tricoci University of Beauty Culture-Urbana,51.0,47.0,1204,40,11193
68686,Paul Mitchell The School Tinley Park,116.0,115.0,1204,243,19182
68688,Cameo Beauty Academy,24.0,19.0,1204,64,20696
68690,Cannella School of Hair Design-Villa Park,50.0,13.0,1204,PrivacySuppressed,PrivacySuppressed
68691,Cannella School of Hair Design-Chicago,28.0,6.0,1204,PrivacySuppressed,PrivacySuppressed
...,...,...,...,...,...,...
404387,Larry's Barber College,,0.0,1204,PrivacySuppressed,PrivacySuppressed
404388,Larry's Barber College,,0.0,1204,PrivacySuppressed,PrivacySuppressed
404389,Larry's Barber College,,0.0,1204,PrivacySuppressed,PrivacySuppressed
404447,Tricoci University of Beauty Culture-Normal,,37.0,1204,120,19899


In [218]:
# list of cip codes under cosmo 4-digit CIP (12.04)
# based on this: https://nces.ed.gov/ipeds/cipcode/cipdetail.aspx?y=56&cipid=90389
cosmo_cip_list = [12.0401,12.0402,12.0404,12.0406,12.0407,12.0408,12.0409,12.0410,12.0411,12.0412,12.0413,12.0414,12.0499]

In [224]:
ilfp[ilfp['CIPCODE1'].isin(cosmo_cip_list)][['INSTNM','CIPTITLE1','CIPTITLE2','CIPTITLE3']].sort_values('INSTNM').head(50)

Unnamed: 0,INSTNM,CIPTITLE1,CIPTITLE2,CIPTITLE3
4860,Aveda Institute-Chicago,Cosmetology/Cosmetologist General,Aesthetician/Esthetician and Skin Care Specialist,
3768,Bell Mar Beauty College,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,Cosmetology Barber/Styling and Nail Instructor
850,Cameo Beauty Academy,Cosmetology/Cosmetologist General,Aesthetician/Esthetician and Skin Care Specialist,Cosmetology Barber/Styling and Nail Instructor
852,Cannella School of Hair Design-Chicago,Cosmetology/Cosmetologist General,,
853,Cannella School of Hair Design-Chicago,Cosmetology/Cosmetologist General,,
851,Cannella School of Hair Design-Villa Park,Cosmetology/Cosmetologist General,,
854,Capri Beauty College,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,Cosmetology Barber/Styling and Nail Instructor
4898,Capri Beauty College,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,Cosmetology Barber/Styling and Nail Instructor
874,Cosmetology & Spa Academy,Cosmetology/Cosmetologist General,Barbering/Barber,Cosmetology Barber/Styling and Nail Instructor
944,Cosmetology Concepts Niles,Cosmetology/Cosmetologist General,Cosmetology Barber/Styling and Nail Instructor,


In [217]:
ilfp[['INSTNM','CONTROL','CIPTITLE1','CIPCODE1','CIPTITLE2','CIPCODE2','CIPTITLE3','CIPCODE3','CIPTITLE4','CIPCODE4']][ilfp['INSTNM'].str.contains('Estelle')]

Unnamed: 0,INSTNM,CONTROL,CIPTITLE1,CIPCODE1,CIPTITLE2,CIPCODE2,CIPTITLE3,CIPCODE3,CIPTITLE4,CIPCODE4
4565,Estelle Medical Academy,3,,,,,,,,
5108,Estelle Skin Care and Spa Institute,3,Aesthetician/Esthetician and Skin Care Specialist,12.0409,,,,,,


In [None]:
# filter for just IL 
ilff = ff[(ff['CIPCODE']==1204) & (ff['STABBR'] == 'IL')].copy()

## Cosmetology enrollment

I am definiting the universe of cosmetology schools as schools where the most popular CIP program is one of the [6 digit CIPS](https://nces.ed.gov/ipeds/cipcode/cipdetail.aspx?y=56&cipid=90389) under Cosmetology and Related Personal Grooming Services. Compared to the list of schools with any cosmetology programs, it only leaves out Midwest Technical Institute, for which cosmetology is the third-most popular program. 

Load IPEDS enrollment data for the following:(more info in ipeds-analysis.ipynb)
- 12 month unduplicated enrollment by race/ethnicity, gender, degree-seeking status
- 2021-22 academic year
- all students, undergrad vs grad total

In [335]:
ipeds = pd.read_csv('data/ipeds_pulls/12 month enrollment by race gender natl 2022 w control merged.csv')

Filter for just all students total under <b>EFFY2022.Level and degree/certificate-seeking status of student </b> to avoid double counting enrollments.

In [336]:
ipeds = ipeds[ipeds['EFFY2022.Level and degree/certificate-seeking status of student'] == 'All students total'].copy()

Create a list of cosmetology school <b>UNITIDs</b> based on the most popular CIP program being one of the cosemtology program codes.

In [314]:
cosmo_school_unitids = ilfp[ilfp['CIPCODE1'].isin(cosmo_cip_list)]['UNITID'].to_list()

Filter IPEDS data to just cosmo schools and get the total EFFY22 enrollment at those schools. <br>

Enrollment at Illinois cosmetology schools is around 10,000 students in 2022.

In [320]:
ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Grand total'].sum()

10639

Compare that to the 12 month undergraduate enrollment in College Scorecard for these schools as a check.

In [321]:
ilfp[ilfp['UNITID'].isin(cosmo_school_unitids)]['UG12MN'].sum()

10849.0

Compare that to overall enrollment at Illinois-based for-profit schools.

In [322]:
ilfp[~ilfp['UNITID'].isin(cosmo_school_unitids)]['UG12MN'].sum()

71847.0

In [323]:
ilfp[ilfp['INSTNM'].str.contains('DeVry')]['UG12MN'].sum()

27424.0

In [324]:
ilfp[ilfp['INSTNM'].str.contains('Chamberlain')]['UG12MN'].sum()

26294.0

## Cosmetology enrollment demographics

Are students enrolled in these IL cosmo schools overwhelmingly women of color?

In [326]:
ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Black or African American women'].sum()

3430

In [327]:
ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Hispanic or Latino women'].sum() 

2431

In [329]:
ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.White women'].sum()

3167

Black and Latino women make up 55% of Illinois cosmetology school enrollment. 

In [330]:
# sum of black and latino women in IL cosmo schools
woc = ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Hispanic or Latino women'].sum() + ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Black or African American women'].sum()

In [331]:
woc/ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Grand total'].sum()

0.5508976407557101

More than 90% are women

In [334]:
ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Grand total women'].sum()/ipeds[ipeds['unitid'].isin(cosmo_school_unitids)]['EFFY2022.Grand total'].sum()

0.9221731365729862

Compare WOC enrollment at cosmo schools versus non-cosmo schools at all school types, not just for-profits.

In [337]:
ipeds[(~ipeds['unitid'].isin(cosmo_school_unitids)) & (ipeds['state'] == 'Illinois')]['EFFY2022.Hispanic or Latino women'].sum()

114812

In [338]:
ipeds[(~ipeds['unitid'].isin(cosmo_school_unitids)) & (ipeds['state'] == 'Illinois')]['EFFY2022.Black or African American women'].sum()

78466

In [340]:
woc = ipeds[ipeds['state'] == 'Illinois']['EFFY2022.Black or African American women'].sum() + ipeds[ipeds['state'] == 'Illinois']['EFFY2022.Hispanic or Latino women'].sum()

In [341]:
woc

199139

This is the share of IL postsecondary enrollment that is Black or Latino. Women and people of color make up a disproportionate share of cosmetology graduates. 

In [342]:
woc/ipeds[ipeds['state'] == 'Illinois']['EFFY2022.Grand total'].sum()

0.20804738079840698

## Cosmetology programs by school control

In [325]:
# Il cosmo schools broken down by control
ff[(ff['CIPCODE']==1204) & (ff['STABBR'] == 'IL')].groupby('CONTROL').size()

CONTROL
Private, for-profit    110
Private, nonprofit       2
Public                  19
dtype: int64

More than 80% of Illinois cosmetology programs are offered by for-profit schools. 

In [343]:
110/len(ff[(ff['CIPCODE']==1204) & (ff['STABBR'] == 'IL')])

0.8396946564885496

In [347]:
ff[(ff['CIPCODE']==1204) & (ff['STABBR'] == 'IL') & (ff['CONTROL'] == 'Private, nonprofit')].head()

Unnamed: 0,UNITID,OPEID6,INSTNM,CONTROL,MAIN,CIPCODE,CIPDESC,CREDLEV,CREDDESC,IPEDSCOUNT1,...,BBRR4_FED_COMP_DFLT,BBRR4_FED_COMP_DLNQ,BBRR4_FED_COMP_FBR,BBRR4_FED_COMP_DFR,BBRR4_FED_COMP_NOPROG,BBRR4_FED_COMP_MAKEPROG,BBRR4_FED_COMP_PAIDINFULL,BBRR4_FED_COMP_DISCHARGE,DISTANCE,STABBR
72582,146676.0,1709,Lincoln College,"Private, nonprofit",1,1204,Cosmetology and Related Personal Grooming Serv...,1,Undergraduate Certificate or Diploma,,...,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,0,IL
72583,146676.0,1709,Lincoln College,"Private, nonprofit",1,1204,Cosmetology and Related Personal Grooming Serv...,2,Associate's Degree,,...,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,0,IL


In [350]:
ff[ff['INSTNM'].str.contains('Chamberlain')].groupby('CIPDESC').size()

CIPDESC
Bioethics/Medical Ethics.                                                              15
Business Administration, Management and Operations.                                    30
Public Health.                                                                         30
Registered Nursing, Nursing Administration, Nursing Research and Clinical Nursing.    300
Social Work.                                                                           15
dtype: int64

In [351]:
ff[ff['INSTNM'].str.contains('DeVry')].groupby('CIPDESC').size()

CIPDESC
Accounting and Related Services.                                                      736
Allied Health Diagnostic, Intervention, and Treatment Professions.                     16
Business Administration, Management and Operations.                                   784
Business/Commerce, General.                                                           368
Clinical/Medical Laboratory Science/Research and Allied Professions.                   48
Communication and Media Studies.                                                      176
Computer Engineering Technologies/Technicians.                                        192
Computer Engineering.                                                                 144
Computer Software and Media Applications.                                             464
Computer Systems Analysis.                                                            272
Computer Systems Networking and Telecommunications.                                   848
Co

In [354]:
# top ten most common for-profit programs
ff[(ff['CONTROL'] == 'Private, for-profit') & (ff['STABBR'] == 'IL')].groupby('CIPDESC').size().reset_index(name='count of programs').sort_values('count of programs', ascending=False).head(10)

Unnamed: 0,CIPDESC,count of programs
0,Accounting and Related Services.,127
55,Health and Medical Administrative Services.,127
28,Cosmetology and Related Personal Grooming Serv...,110
11,"Business Administration, Management and Operat...",107
76,"Registered Nursing, Nursing Administration, Nu...",102
61,Human Resources Management and Services.,90
29,Criminal Justice and Corrections.,86
67,Management Information Systems and Services.,77
62,"Human Services, General.",72
33,Design and Applied Arts.,72


In [360]:
# top ten most common non-profit programs
ff[(ff['CONTROL'] == 'Private, nonprofit') & (ff['STABBR'] == 'IL')].groupby('CIPDESC').size().reset_index(name='count of programs').sort_values('count of programs', ascending=False).head(10)

Unnamed: 0,CIPDESC,count of programs
37,"Business Administration, Management and Operat...",111
222,"Registered Nursing, Nursing Administration, Nu...",110
250,Teacher Education and Professional Development...,78
51,"Clinical, Counseling and Applied Psychology.",76
159,"Liberal Arts and Sciences, General Studies and...",75
251,Teacher Education and Professional Development...,72
212,"Psychology, General.",72
255,Theological and Ministerial Studies.,62
31,"Biology, General.",61
107,"English Language and Literature, General.",57


In [361]:
# top ten most common public programs
ff[(ff['CONTROL'] == 'Public') & (ff['STABBR'] == 'IL')].groupby('CIPDESC').size().reset_index(name='count of programs').sort_values('count of programs', ascending=False).head(10)

Unnamed: 0,CIPDESC,count of programs
48,"Business Administration, Management and Operat...",110
84,Criminal Justice and Corrections.,105
1,Accounting and Related Services.,103
170,"Liberal Arts and Sciences, General Studies and...",102
154,"Human Development, Family Studies, and Related...",91
144,Health and Medical Administrative Services.,87
15,"Allied Health Diagnostic, Intervention, and Tr...",86
49,Business Operations Support and Assistant Serv...,82
260,Vehicle Maintenance and Repair Technologies.,76
221,Precision Metal Working.,71


In [None]:
pd.pivot_table(ff[ff['STABBR'] == 'IL'],
              index='CIPDESC',
              columns='CONTROL',
              values='')