# Estimating Labor Market Returns to Education

<span style="color: #008080">*Bárbara Flores*</span>

In this exercise, we're going to use data from the [American Communities Survey (ACS)](https://usa.ipums.org/usa/acs.shtml) to study the relationship betwen educational attainment and wages. The ACS is a survey conducted by the United States Census Bureau (though it is not "The Census," which is a counting of every person in the United States that takes place every 10 years) to measure numerous features of the US population. The data we will be working with includes about 100 variables from the 2017 ACS survey, and is a 10% sample of the ACS (which itself is a 1% sample of the US population, so we're working with about a 0.1% sample of the United States). 

This data comes from [IPUMS](https://usa.ipums.org/usa/), which provides a very useful tool for getting subsets of major survey datasets, not just from the US, but [from government statistical agencies the world over](https://international.ipums.org/international-action/sample_details).

This is *real* data, meaning that you are being provided the data as it is provided by IPUMS. Documentation for all variables used in this data can be found [here](https://usa.ipums.org/usa-action/variables/group) (you can either search by variable name to figure out the meaning of a variable in this data, or search for something you want to see if a variable with the right name is in this data). 

Within this data is information on both the educational background and current earnings of a representative sample of Americans. We will now use this data to estimate the labor-market returns to graduating high school and college, and to learn something about the meaning of an educational degree. 

## Gradescope Autograding

Please follow [all standard guidance](https://www.practicaldatascience.org/html/autograder_guidelines.html) for submitting this assignment to the Gradescope autograder, including storing your solutions in a dictionary called `results` and ensuring your notebook runs from the start to completion without any errors.

For this assignment, please name your file `exercise_dataframes.ipynb` before uploading.

You can check that you have answers for all questions in your `results` dictionary with this code:

```python
assert set(results.keys()) == {
    "ex2_num_obs",
    "ex3_num_vars",
    "ex8_updated_num_obs",
    "ex9_updated_num_obs",
    "ex11_grade12_income",
    "ex12_college_income",
    "ex12_college_income_pct",
    "ex14_high_school_dropout",
    "ex15_grade_9",
    "ex15_grade_10",
    "ex15_grade_11",
    "ex15_grade_12",
    "ex15_4_years_of_college",
    "ex15_graduate",
}
```

### Submission Limits

Please remember that you are **only allowed three submissions to the autograder.** Your last submission (if you submit 3 or fewer times), or your third submission (if you submit more than 3 times) will determine your grade Submissions that error out will **not** count against this total.


In [1]:
results = dict()

## Exercises

### Exercise 1

Data for these [exercises can be found here](https://github.com/nickeubank/MIDS_Data/tree/master/US_AmericanCommunitySurvey). 

Import `US_ACS_2017_10pct_sample.dta` into a pandas DataFrame (read it directly from a URL to help the autograder, please). 

This can be done with the command `pd.read_stata`, which will read in files created in the program Stata (and which uses the file suffix `.dta`). This is a format commonly used by social scientists.

In [2]:
import pandas as pd

acs = pd.read_stata(
    "https://github.com/nickeubank/MIDS_Data/raw/master/US_AmericanCommunitySurvey/US_ACS_2017_10pct_sample.dta"
)
acs.head()

Unnamed: 0,year,datanum,serial,cbserial,numprec,subsamp,hhwt,hhtype,cluster,adjust,...,migcounty1,migmet131,vetdisab,diffrem,diffphys,diffmob,diffcare,diffsens,diffeye,diffhear
0,2017,1,177686,2017001000000.0,9,64,55,"female householder, no husband present",2017002000000.0,1.011189,...,0,not in identifiable area,,,,,,no vision or hearing difficulty,no,no
1,2017,1,1200045,2017001000000.0,6,79,25,"male householder, no wife present",2017012000000.0,1.011189,...,0,not in identifiable area,,no cognitive difficulty,no ambulatory difficulty,no independent living difficulty,no,no vision or hearing difficulty,no,no
2,2017,1,70831,2017000000000.0,1 person record,36,57,"male householder, living alone",2017001000000.0,1.011189,...,0,not in identifiable area,,has cognitive difficulty,no ambulatory difficulty,no independent living difficulty,no,no vision or hearing difficulty,no,no
3,2017,1,557128,2017001000000.0,2,10,98,married-couple family household,2017006000000.0,1.011189,...,0,not in identifiable area,,no cognitive difficulty,no ambulatory difficulty,no independent living difficulty,no,no vision or hearing difficulty,no,no
4,2017,1,614890,2017001000000.0,4,96,54,married-couple family household,2017006000000.0,1.011189,...,0,not in identifiable area,,,,,,no vision or hearing difficulty,no,no


## Getting to Know Your Data

When you get a new dataset like this, it's good to start by trying to get a feel for its contents and organization. Toy datasets you sometimes get in classes are often very small, and easy to look at, but this is a pretty large dataset, so you can't just open it up and get a good sense of it. Here are some ways to get to know your data. 

### Exercise 2

How many observations are in your data? Store the answer in your `results` dictionary with the key `"ex2_num_obs"`.

In [3]:
ex2_num_obs = acs.shape[0]
results["ex2_num_obs"] = ex2_num_obs

print(f"We have {ex2_num_obs:,} observations in the dataset")

We have 319,004 observations in the dataset


### Exercise 3

How many variables are in your data? Store the answer in your `results` dictionary with the key `"ex3_num_vars"`.

In [4]:
ex3_num_vars = acs.shape[1]
results["ex3_num_vars"] = ex3_num_vars

print(f"We have {ex3_num_vars} variables in the dataset")

We have 104 variables in the dataset


### Exercise 4

 Let's see what variables are in this dataset. First, try to see them all using the command:


```python
acs.columns
```

As you will see, `python` doesn't like to print out all the different variables when there are this many in a dataset. 

To get everything printed out, we can loop over all the columns and print them one at a time with the command:

```
for c in acs.columns: print(c)
```

It's definitely a bit of a hack, but honestly a pretty useful one!

In [5]:
acs.columns

Index(['year', 'datanum', 'serial', 'cbserial', 'numprec', 'subsamp', 'hhwt',
       'hhtype', 'cluster', 'adjust',
       ...
       'migcounty1', 'migmet131', 'vetdisab', 'diffrem', 'diffphys', 'diffmob',
       'diffcare', 'diffsens', 'diffeye', 'diffhear'],
      dtype='object', length=104)

In [6]:
for c in acs.columns:
    print(c)

year
datanum
serial
cbserial
numprec
subsamp
hhwt
hhtype
cluster
adjust
cpi99
region
stateicp
statefip
countyicp
countyfip
metro
city
citypop
strata
gq
farm
ownershp
ownershpd
mortgage
mortgag2
mortamt1
mortamt2
respmode
pernum
cbpernum
perwt
slwt
famunit
sex
age
marst
birthyr
race
raced
hispan
hispand
bpl
bpld
citizen
yrnatur
yrimmig
language
languaged
speakeng
hcovany
hcovpriv
hinsemp
hinspur
hinstri
hcovpub
hinscaid
hinscare
hinsva
hinsihs
school
educ
educd
gradeatt
gradeattd
schltype
degfield
degfieldd
degfield2
degfield2d
empstat
empstatd
labforce
occ
ind
classwkr
classwkrd
looking
availble
inctot
ftotinc
incwage
incbus00
incss
incwelfr
incinvst
incretir
incsupp
incother
incearn
poverty
migrate1
migrate1d
migplac1
migcounty1
migmet131
vetdisab
diffrem
diffphys
diffmob
diffcare
diffsens
diffeye
diffhear


### Exercise 5

That's a *lot* of variables, and definitely more than we need. In general, life is easier when working with these kinds of huge datasets if you can narrow down the number of variables a little. In this exercise, we will be looking at the relationship between education and wages, we need variables for: 

- Age
- Income
- Education
- Employment status (is the person actually working)

These quantities of interest correspond to the following variables in our data: `age`, `inctot`, `educ`, and `empstat`. 

Subset your data to just those variables. 

In [7]:
acs_subset = acs[["age", "inctot", "educ", "empstat"]]
acs_subset.head()

Unnamed: 0,age,inctot,educ,empstat
0,4,9999999,nursery school to grade 4,
1,17,6000,grade 11,employed
2,63,6150,4 years of college,employed
3,66,14000,grade 12,not in labor force
4,1,9999999,n/a or no schooling,


### Exercise 6 

Now that we have a more manageable number of variables, it's often very useful to look at a handful of rows of your data. The easiest way to do this is probably the `.head()` method (which will show you the first five rows), or the `tail()` method, which will show you the last five rows. 

But to get a good sense of your data, it's often better to use the `sample()` command, which returns a random set of rows. As the first and last rows are sometimes not representative, a random set of rows can be very helpful. Try looking at a random sample of 20 rows (note: you don't have to run `.sample()` ten times to get ten rows. Look at the `.sample` help file if you're stuck. 

In [8]:
acs_subset.sample(20)

Unnamed: 0,age,inctot,educ,empstat
93018,36,92000,4 years of college,employed
297455,10,9999999,nursery school to grade 4,
179198,40,30000,grade 12,employed
137578,29,0,grade 12,unemployed
116741,94,36900,1 year of college,not in labor force
116132,21,17200,1 year of college,employed
148598,3,9999999,n/a or no schooling,
37460,60,107000,4 years of college,employed
299209,14,9999999,"grade 5, 6, 7, or 8",
273610,65,78400,5+ years of college,employed


### Exercise 7

Do you see any immediate problems? What issues do you see? (Please do answer in markdown)

><span style="color: #008080">*For more information, we will also look at the **info** function to get an idea of whether there are null values and the type of each data.*</span>

In [9]:
acs_subset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 319004 entries, 0 to 319003
Data columns (total 4 columns):
 #   Column   Non-Null Count   Dtype   
---  ------   --------------   -----   
 0   age      319004 non-null  category
 1   inctot   319004 non-null  int32   
 2   educ     319004 non-null  category
 3   empstat  319004 non-null  category
dtypes: category(3), int32(1)
memory usage: 2.1 MB


><span style="color: #008080">*While observing a sample of the data, we can notice some issues that we might encounter when analyzing the information:*</span>
>
><span style="color: #008080">*- Regarding the variable **'inctot'**, in many cases, we have the value '9999999', which is an integer. This value likely indicates that the data is missing for the respective record. The problem it presents is that if we attempt to calculate statistics such as mean or median, we will obtain a value that is not real and will be inflated by this record.*</span>
>
><span style="color: #008080">*- The **'educ'** variable, which represents educational levels, is currently in text format and has varied formatting.*</span>
>
><span style="color: #008080">*- The *'empstat'* variable has values marked as 'n/a,' which are considered as an additional category. Depending on the analysis we want to perform or the assumptions we make, we may wish to either remove these data or transform the variable.*</span>


### Exercise 8 

One problem is that many people seem to have incomes of $9,999,999. Moreover, people with those incomes seem to be very young children. 

What you are seeing is one method (a relatively old one) for representing missing data. In this case, the value 9999999 is being used as a **sentinel value** — a way to denote missing data that was used back in the day when there was no way to add a special data type for mossing data. In this case, it identifies observations where the person is too young to work, so their income value is missing. 

So let's begin by dropping anyone who has `inctot` equal to 9999999.

After dropping, how many observations do you have? Save your answer in your `results` dictionary under the key `"ex8_updated_num_obs"`

In [10]:
# We created a new dataset that does not include values for anyone with inctot equal to 9999999.
acs_subset_no_missing_income = acs_subset[acs_subset["inctot"] != 9999999]
ex8_updated_num_obs = acs_subset_no_missing_income.shape[0]
results["ex8_updated_num_obs"] = ex8_updated_num_obs

print(
    f"After removing observations with 'inctot' equal to 9999999 from the dataset, we obtain {ex8_updated_num_obs:,} observations"
)

After removing observations with 'inctot' equal to 9999999 from the dataset, we obtain 265,103 observations


### Exercise 9

OK, the other potential problem is that our data includes lots of people who are unemployed and people who are not in the labor force (this means they not only don't have a job, but also aren't looking for a job). For this analysis, we want to focus on the wages of people who are currently employed. So subset the dataset for the people for whom `empstat` is equal to "employed". 

Note that our decision to only look at people who are employed impacts how we should interpret the relationship we estimate between education and income. Because we are only looking at employed people, we will be estimating the relationship between education and income *for people who are employed*. That means that if education affects the *likelihood* someone is employed, we won't capture that in this analysis.

(You might also want to run `.sample()` after this just to make sure you were successful in your subsetting).

After this subsetting, how many observations do you have? Save your answer in your `results` dictionary under the key `"ex9_updated_num_obs"`

In [11]:
acs_subset_employed = acs_subset_no_missing_income[
    acs_subset_no_missing_income["empstat"] == "employed"
]
print(acs_subset_employed.sample(10))

       age  inctot                 educ   empstat
170763  64   10000  5+ years of college  employed
222466  56   20000   4 years of college  employed
281611  58   24000  5+ years of college  employed
272511  41  150000  5+ years of college  employed
76431   58   28500    1 year of college  employed
297742  43   12000  n/a or no schooling  employed
186597  60   60000   4 years of college  employed
64329   25   30000             grade 12  employed
243668  52   64000   2 years of college  employed
5888    28   48000             grade 12  employed


In [12]:
ex9_updated_num_obs = acs_subset_employed.shape[0]
results["ex9_updated_num_obs"] = ex9_updated_num_obs

print(
    f"After selecting only employed individuals from our dataset, we obtain {ex9_updated_num_obs:,} observations."
)

After selecting only employed individuals from our dataset, we obtain 148,758 observations.


### Exercise 10

Now let's turn to education. The `educ` variable seems to have a lot of discrete values. Let's see what values exist, and their distribution, using the `value_counts()` method. This is an *extremely* useful tool you'll use a lot! Try the following code (modified for the name of your dataset, of course):

```python
acs["educ"].value_counts()
```

In [13]:
print(
    "If we observe the number of records per educational level in our dataset (after removing individuals without income and employment), we obtain:"
)
acs_subset_employed["educ"].value_counts().sort_index()

If we observe the number of records per educational level in our dataset (after removing individuals without income and employment), we obtain:


educ
n/a or no schooling           1291
nursery school to grade 4      468
grade 5, 6, 7, or 8           2092
grade 9                       1290
grade 10                      1910
grade 11                      2747
grade 12                     47815
1 year of college            22899
2 years of college           14077
4 years of college           33174
5+ years of college          20995
Name: count, dtype: int64

### Exercise 11

There are a lot of values in here, so let's just check a couple. What is the average value of `inctot` for people whose highest grade level is "grade 12" (in the US, that is someone who has graduated high school)?

Save your answer in your `results` dictionary under the key `"ex11_grade12_income"`.

In [14]:
ex11_grade12_income = acs_subset_employed[acs_subset_employed["educ"] == "grade 12"][
    "inctot"
].mean()
results["ex11_grade12_income"] = ex11_grade12_income

print(
    f"The average income for people whose highest grade level is 'grade 12' in this dataset, considering employed individuals with income, is: {round(ex11_grade12_income):,}"
)
print(
    "It's worth mentioning that the operation performed earlier was possible using the <= sign because 'educ' is a category with defined and ordered levels"
)

The average income for people whose highest grade level is 'grade 12' in this dataset, considering employed individuals with income, is: 38,958
It's worth mentioning that the operation performed earlier was possible using the <= sign because 'educ' is a category with defined and ordered levels


### Exercise 12

What is the average income of someone who has completed an undergraduate degree but not done any postgraduate education ("4 years of college")? 

Save your answer in your `results` dictionary under the key `"ex12_college_income"`.

In percentage terms, how much does an employed college graduate earn as compared to someone who is only a high school graduate? Use the reference category that gives an answer above 100.

Store your answer in `"ex12_college_income_pct"`. Put your answer in percentage terms (so 100 implies they earn the same amount).

*Make sure to interpret your result in words when you print it out!*

In [15]:
ex12_college_income = acs_subset_employed[
    acs_subset_employed["educ"] == "4 years of college"
]["inctot"].mean()

ex12_college_income_pct = 100 * ex12_college_income / ex11_grade12_income

results["ex12_college_income"] = ex12_college_income
results["ex12_college_income_pct"] = ex12_college_income_pct

print(
    f"The average income for people who has completed an undergraduate degree but not done any postgraduate education in this dataset,\nconsidering employed individuals with income, is: {round(ex12_college_income):,}"
)
print(
    f"\nAn employed college graduate earns {round(ex12_college_income_pct)/100} times more than someone who is only a high school graduate for this dataset."
)
print(
    "\nWe can see that for this sample, there is a clear effect on the level of education of individuals and their average income"
)
print(
    "Individuals with a completed undergraduate degree have an average income that is twice as high as that average of those with 12 or fewer years of schooling"
)

The average income for people who has completed an undergraduate degree but not done any postgraduate education in this dataset,
considering employed individuals with income, is: 75,485

An employed college graduate earns 1.94 times more than someone who is only a high school graduate for this dataset.

We can see that for this sample, there is a clear effect on the level of education of individuals and their average income
Individuals with a completed undergraduate degree have an average income that is twice as high as that average of those with 12 or fewer years of schooling



### Exercise 13
What does that suggest is the value of getting a college degree after graduating high school?

><span style="color: #008080">*As mentioned in the previous question, there seems to be a clear effect on the income a person can generate and their level of education. Individuals with a completed undergraduate degree have an average income that is twice as high as that average of those with 12 or fewer years of schooling*</span>
>
><span style="color: #008080">*However, it's important to have in mind that correlation does not always imply causation. It could also be noted that individuals with access to university education often come from families with a higher socioeconomic status, and therefore, they have more opportunities when it comes to finding employment.*</span>

### Exercise 14

What is the average income for someone who has not finished high school? What does that suggest is the value of a high school diploma? (Treat `n/a or no schooling` as having no formal schooling, not as missing).

**Hint:** You may find the [.isin()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html) method to be really helpful here.

Save your answer in your `results` dictionary under the key `"ex14_high_school_dropout"`.

In [16]:
ex14_high_school_dropout = acs_subset_employed[
    acs_subset_employed["educ"] < "grade 12"
]["inctot"].mean()
results["ex14_high_school_dropout"] = ex14_high_school_dropout
print(
    "As mentioned before, 'educ' is a category with predefined levels, where 'na' is already considered as the lowest level.\nSo, when using the '<' sign, we are already taking those cases into consideration."
)
print(
    f"\nThe average income for someone who has not finished high school in this dataset,\nconsidering employed individuals with income, is: {round(ex14_high_school_dropout):,}"
)

print(
    "We can see how not completing high school is also a factor that negatively affects the average income a person can earn."
)

As mentioned before, 'educ' is a category with predefined levels, where 'na' is already considered as the lowest level.
So, when using the '<' sign, we are already taking those cases into consideration.

The average income for someone who has not finished high school in this dataset,
considering employed individuals with income, is: 26,226
We can see how not completing high school is also a factor that negatively affects the average income a person can earn.


### Exercise 15 

Complete the following table (storing values under the provided keys where listed):

- Average income for someone who only completed 9th grade (`ex15_grade_9`): _________
- Average income for someone who only completed 10th grade (`ex15_grade_10`): _________
- Average income for someone who only completed 11th grade (`ex15_grade_11`): _________
- Average income for someone who finished high school (12th grade) but never started college (`ex15_grade_12`): _________
- Average income for someone who completed 4 year of college (in the US, this corresponds to getting an undergraduate degree), but has no post-graduate education (no more than 4 years, `ex15_4_years_of_college`): _________
- Average income for someone who has some graduate education (more than 4 years, `ex15_graduate`): _________

In [17]:
ex15_grade_9 = acs_subset_employed[acs_subset_employed["educ"] == "grade 9"][
    "inctot"
].mean()
ex15_grade_10 = acs_subset_employed[acs_subset_employed["educ"] == "grade 10"][
    "inctot"
].mean()
ex15_grade_11 = acs_subset_employed[acs_subset_employed["educ"] == "grade 11"][
    "inctot"
].mean()
ex15_grade_12 = acs_subset_employed[acs_subset_employed["educ"] == "grade 12"][
    "inctot"
].mean()
ex15_4_years_of_college = acs_subset_employed[
    acs_subset_employed["educ"] == "4 years of college"
]["inctot"].mean()
ex15_graduate = acs_subset_employed[acs_subset_employed["educ"] > "4 years of college"][
    "inctot"
].mean()


print(
    f"- Average income for someone who only completed 9th grade (`ex15_grade_9`): {round(ex15_grade_9):,}"
)
print(
    f"- Average income for someone who only completed 10th grade (`ex15_grade_10`): {round(ex15_grade_10):,}"
)
print(
    f"- Average income for someone who only completed 11th grade (`ex15_grade_11`): {round(ex15_grade_11):,}"
)
print(
    f"- Average income for someone who finished high school (12th grade) but never started college (`ex15_grade_12`): {round(ex15_grade_12):,}"
)
print(
    f"- Average income for someone who completed 4 year of college (in the US, this corresponds to getting an undergraduate degree),\nbut has no post-graduate education (no more than 4 years, `ex15_4_years_of_college`): {round(ex15_4_years_of_college):,}"
)
print(
    f"- Average income for someone who has some graduate education (more than 4 years, `ex15_graduate`): {round(ex15_graduate):,}"
)

results["ex15_grade_9"] = ex15_grade_9
results["ex15_grade_10"] = ex15_grade_10
results["ex15_grade_11"] = ex15_grade_11
results["ex15_grade_12"] = ex15_grade_12
results["ex15_4_years_of_college"] = ex15_4_years_of_college
results["ex15_graduate"] = ex15_graduate

- Average income for someone who only completed 9th grade (`ex15_grade_9`): 27,172
- Average income for someone who only completed 10th grade (`ex15_grade_10`): 23,019
- Average income for someone who only completed 11th grade (`ex15_grade_11`): 21,542
- Average income for someone who finished high school (12th grade) but never started college (`ex15_grade_12`): 38,958
- Average income for someone who completed 4 year of college (in the US, this corresponds to getting an undergraduate degree),
but has no post-graduate education (no more than 4 years, `ex15_4_years_of_college`): 75,485
- Average income for someone who has some graduate education (more than 4 years, `ex15_graduate`): 110,013


### Exercise 16 

Why do you think there is no benefit from moving from grade 9 to grade 10, or grade 10 to grade 11, but there is a huge benefit to moving from grade 11 to graduating high school (grade 12)?

(Think carefully before reading ahead!)

><span style="color: #008080">*As mentioned in the statement, there doesn't seem to be a significant effect of one more year of schooling per se when it comes to income, but there is when completing high school or completing four years of college.*</span>
>
><span style="color: #008080">*From my perspective, completing high school is related to the type of job you can access. There are jobs that require, at a minimum, a high school diploma. Therefore, having this milestone completed allows you to access more job opportunities.*</span>
>
><span style="color: #008080">*For example, in my country, even to work in a cleaning position in an organization such as a public school, it is required to have completed 12 years of schooling. This implies that many people without a completed high school education end up accessing informal jobs that are not regulated and have lower wages. At least, that is the reality I am familiar with in my country, and I would expect that something similar might happen in the United States.*</span>

## Take-aways

Congratulations! You just discovered "the sheepskin effect!": people with degrees tend to earn substantially more than people who have *almost* as much education, but don't have an actual degree. 

In economics, this is viewed as evidence that the reason employers pay people with high school degrees more than those without degree is *not* that they think those who graduated high school have learned specific, useful skills. If that were the case, we would expect employee earnings to rise with every year of high school, since in each year of high school we learn more. 

Instead, this suggests employees pay high school graduates more because they think *the kind of people* who can finish high school are the *kind of people* who are likely to succeed at their jobs. Finishing high school, in other words, isn't about accumulating specific knowledge; it's about showing that you *are the kind of person* who can rise to the challenge of finishing high school, also suggesting you are also the kind of person who can succeed as an employee. 

(Obviously, this does not tell us whether that is an *accurate* inference, just that that seems to be how employeers think.) 

In other words, in the eyes of employers, a high school degree is a *signal* about the kind of person you are, not certification that you've learned a specific set of skills (an idea that earned [Michael Spence](https://en.wikipedia.org/wiki/Michael_Spence) a Nobel Prize in Economics). 

## Check results

In [18]:
results

{'ex2_num_obs': 319004,
 'ex3_num_vars': 104,
 'ex8_updated_num_obs': 265103,
 'ex9_updated_num_obs': 148758,
 'ex11_grade12_income': 38957.76068179442,
 'ex12_college_income': 75485.05293301983,
 'ex12_college_income_pct': 193.7612727527617,
 'ex14_high_school_dropout': 26226.45692998571,
 'ex15_grade_9': 27171.907751937986,
 'ex15_grade_10': 23018.795811518325,
 'ex15_grade_11': 21541.68693119767,
 'ex15_grade_12': 38957.76068179442,
 'ex15_4_years_of_college': 75485.05293301983,
 'ex15_graduate': 110013.2213384139}

In [19]:
assert set(results.keys()) == {
    "ex2_num_obs",
    "ex3_num_vars",
    "ex8_updated_num_obs",
    "ex9_updated_num_obs",
    "ex11_grade12_income",
    "ex12_college_income",
    "ex12_college_income_pct",
    "ex14_high_school_dropout",
    "ex15_grade_9",
    "ex15_grade_10",
    "ex15_grade_11",
    "ex15_grade_12",
    "ex15_4_years_of_college",
    "ex15_graduate",
}