In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab07.ipynb")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings("ignore")

# Lab 07: Justice 40

Original lecture notebook developed by Dan Hammer. \
Lab 07 for ECON 148 (Spring 2023) by Peter F.G Hollevik.

Some parts of this lab are repetitions from lecture, but serve as important introductions for your own research on potential flaws of the Justice 40 Initiative.

---
## Part 0: Lead Paint Exposure

The Biden Administration [released a memo](https://www.whitehouse.gov/briefing-room/statements-releases/2021/01/27/fact-sheet-president-biden-takes-executive-actions-to-tackle-the-climate-crisis-at-home-and-abroad-create-jobs-and-restore-scientific-integrity-across-federal-government/) within the first week of their term, which addressed environmental justice. The two most important items for this class are:

> The order creates a government-wide Justice40 Initiative with the goal of delivering 40 percent of the overall benefits of relevant federal investments to disadvantaged communities and tracks performance toward that goal through the establishment of an Environmental Justice Scorecard.

> The order initiates the development of a Climate and Environmental Justice Screening Tool, building off EPA’s EJSCREEN, to identify disadvantaged communities, support the Justice40 Initiative, and inform equitable decision making across the federal government
    
It is very, very difficult - maybe even impossible - to **quantify environmental justice**, or even what constitutes a *disadvantaged community*. The EPA has tried. There are **fundamental issues** with their data and their math. We can begin to show why and how.

First, read in the data from a local directory. We have cleaned the data for you.

In [None]:
df = pd.read_csv("EJSCREEN_demo.csv")
df.head()

This snippet of data contains:

> `P_LDPNT`: Percentile for % pre-1960 housing (lead paint indicator)

> `P_LDPNT_D2`: Percentile for EJ Index for % pre-1960 housing (lead paint indicator)

> `VULEOPCT`: Demographic Index (based on 2 factors, % low-income and % people of color)

The index that the EPA uses to **prioritize communities** for Federal funding is based on a simple, algebraic expression:

$$\mbox{justice} = \mbox{environmental quality } \times \mbox{ demographic index } \times \mbox{ population}$$

where 

$$\mbox{demographic index} = \frac{[\% \mbox{minority}] + [\% \mbox{living below 2x federal poverty line}]}{2} - \mbox{ [national average]}$$

Just looking at this expression, you can imagine that there are some strange edge cases - communities that are **on the border of the national average of demographics** that have very different prioritization outcomes.  The data dictionary can be [downloaded directly from the EPA's FTP servers](https://gaftp.epa.gov/EJSCREEN/2020/2020_EJSCREEEN_columns-explained.xlsx).

First, create a demographic index in line with the EPA's equation, as well as a variable that indicates whether the Census block has particularly high values of lead paint exposure (above the 90th percentile).

We define our `demo_index` as follows:

> `VULEOPCT` $-$ mean(`VULEOPCT`)

In [None]:
df["demo_index"] = df.VULEOPCT - df.VULEOPCT.mean()
df["index_thresh"] = np.where(df.P_LDPNT > 90, True, False)
df.head()

### Raw Lead Paint Exposure vs. EPA Lead Paint Exposure Index'

We use `matplotlib` to plot the percentile of raw lead paint values for Census blocks against the EPA index based on lead paint exposure. Color the values for Census blocks with the highest levels of exposure to environmental harm.  A lot of the spread is driven by the demographic index (which isn't presented in this graph, but drives the value of the EPA index.)

> Census blocks with the highest levels of lead paint are highlighted in yellow.


In [None]:
plt.figure(figsize=(16, 8))
plt.scatter(
    df.P_LDPNT, 
    df.P_LDPNT_D2, 
    s=0.01, 
    c=df.index_thresh
);
plt.xlabel('Percentile for % pre-1960 housing (lead paint indicator)')
plt.ylabel('Percentile for EJ Index for % pre-1960 housing (lead paint indicator)')
plt.title('Raw Lead Paint Exposure vs. EPA Index Lead Paint Exposure');

### EPA Lead Paint Exposure Index vs. Demographic Index

We plot the lead paint index against the demographic index. The majority of variation in the Lead Paint EJ Index (y-axis) is driven by the **Demographic Index (x-axis) used by EJSCREEN**, with a sharp discontinuity that is driven by whether a census block falls above or below the national average. 

The horizontal bar near the 60th percentile in this case is for **census blocks with zero lead paint exposure.** Census blocks with the highest levels of lead paint are highlighted in yellow.

In [None]:
plt.figure(figsize=(16, 8))
plt.scatter(
    df.demo_index, 
    df.P_LDPNT_D2, 
    s=0.005, 
    c=df.index_thresh
)
plt.xlabel('EPA Demographic Index')
plt.ylabel('Percentile for EJ Index for % pre-1960 housing (lead paint indicator)')
plt.title('EPA Index Lead Paint Exposure vs. EPA Demographic Index');

The punchline is that, even if a Census block has an **extreme amount of lead paint exposure** (an environmental catastrophe), it won't be considered as part of the Justice40 funding if it falls (even slightly) on the privileged side of the National Average.  There is a sharp discontinuity.

> Remember, higher demographic index indicates a more marginalized community. Yellow indicates above 90th percentile in raw lead pain exposure. Several of these communities will not receive EJ support due to low demographic index.

There are no sharp discontinuities in the real-world, especially in justice-related work. We don't know what the right answer is, but it can't be this.

---
## Part 1: Urban, Suburban, vs. Rural

The EJSCREEN indices are used to prioritize federal funding in order to mitigate environmental injustice. 

Here, we examine the rural and urban representation in the prioritization data.

> We ask: Does this reflect the composition of the United States?  

First, read in a dataset that associates each U.S. county with a CDC assessment of urban/suburban/rural. The counties are uniquely identified by a Federal Information Processing System (FIPS) Code. 

In [None]:
# Source: https://www.cdc.gov/nchs/data_access/urban_rural.htm#Data_Files_and_Documentation
nchs = pd.read_excel("NCHSURCodes2013.xlsx")
nchs = nchs[["FIPS code", "State Abr.", "County name", "2013 code"]]
nchs.columns = ["fips", "state", "county", "classification"]
nchs.head()

In [None]:
# Associate each of the classifications to one of three categories, rather 
# than one of six - just to simplify our quick analysis
remap_dict = {
    1: "urban",
    2: "suburban",
    3: "suburban",
    4: "rural",
    5: "rural",
    6: "rural"
}

# replace the values of the `classification` column based on the key-value 
# pair in `remap_dict`
nchs = nchs.replace({'classification': remap_dict})
nchs.head()

**Question 1.1:** Now, read in the EJ indices and *merge* the dataset with the urban/suburban/rural dataset, so that each of the Census Blocks in the EJSCREEN data now has the CDC categorization. What merge type do you choose, and what are your merging on?

In [None]:
ejdf = ... # read in data.
ejdf = ... # merge. 
ejdf.head()

In [None]:
grader.check("q1_1")

Next, we create a dictionary to translate opaque variable names to something that is human-readable.

In [None]:
ejvars_dict = {
    'P_LDPNT_D2': 'Lead Paint',
    'P_DSLPM_D2': 'Diesel Particulate Matter',
    'P_CANCR_D2': 'Air Toxics Cancer Risk',
    'P_RESP_D2':  'Respiratory Hazard',
    'P_PTRAF_D2': 'Traffic Proximity',
    'P_PWDIS_D2': 'Water Discharge',
    'P_PNPL_D2':  'National Priority List',
    'P_PRMP_D2':  'Risk Management Plan',
    'P_PTSDF_D2': 'Treatment Storage and Disposal',
    'P_OZONE_D2': 'Ozone Proximity',
    'P_PM25_D2':  'PM25'
}

Each of the variables in the dataset indicates **the percentile** where that Census Block falls in prioritization. 

Those census blocks with high percentiles are prioritized for federal funding.  

Here, we examine the urban/suburban/rural breakdown of the population living within these high-priority Census Blocks.

In [None]:
v = 'P_LDPNT_D2'
temp_df = ejdf[ejdf[v] > 90]
temp_df = temp_df.groupby('classification').sum()['ACSTOTPOP'] # ACSTOTPOP: Total population

# add back in the classification variable for future reference
temp_dict = dict(temp_df)
temp_dict['classification'] = v
temp_dict

Instead of just displaying the results for one variable, we collect the results for all in a list.
Please make sure you understand what each line of code below actually does.

In [None]:
res = []

# Note that ejvars_dict.keys() is a list of the EJ variable names
for v in ejvars_dict.keys():
    temp_df = ejdf[ejdf[v] > 90] # filter for prioritized communities.
    temp_df = temp_df.groupby('classification').sum()['ACSTOTPOP']
    temp_dict = dict(temp_df)
    temp_dict['classification'] = v
    res.append(temp_dict)

# Create a dataframe of the results
graphing_df = pd.DataFrame(res)
graphing_df

Here's a snippet from the [codebook](https://gaftp.epa.gov/EJSCREEN/2020/2020_EJSCREEEN_columns-explained.xlsx) on the meaning of each variable above.

| GDB Fieldname | Description |
| ----------- | ----------- |
| P_LDPNT_D2 | Percentile for EJ Index for % pre-1960 housing (lead paint indicator) |
| P_DSLPM_D2 | Percentile for EJ Index for Diesel particulate matter level in air |
| P_CANCR_D2 | Percentile for EJ Index for Air toxics cancer risk |
| P_RESP_D2 | Percentile for EJ Index for Air toxics respiratory hazard index |
| P_PTRAF_D2 | Percentile for EJ Index for Traffic proximity and volume |
| P_PWDIS_D2 | Percentile for EJ Index for Indicator for major direct dischargers to water |
| P_PNPL_D2 | Percentile for EJ Index for Proximity to National Priorities List (NPL) sites |
| P_PRMP_D2 | Percentile for EJ Index for Proximity to Risk Management Plan (RMP) facilities |
| P_PTSDF_D2 | Percentile for EJ Index for Proximity to Treatment Storage and Disposal (TSDF) facilities |
| P_OZONE_D2 | Percentile for EJ Index for Ozone level in air |
| P_PM25_D2 | Percentile for EJ Index for PM2.5 level in air |



We now have a dataframe with the total population in rural, suburban, and urban areas **that would be prioritized for EJ funding for each of the 11 indicators.** 

**Question 1.2:** Now, convert this into a percentage, noting that we can't just divide by the total population of the United States across all of these cell values due to missing values - and the fact that those missing values aren't *uniformly distributed* across the indicators.

In [None]:
pop_df = graphing_df[["rural", "suburban", "urban"]]


# first sum over the columns, horizontally
total_by_class = ...

# then divide, within columns, vertically
pop_df = ...

# add back the classification column
pop_df["classification"] = ...

# add an `indicators` column containing human-readable names of each 
# classification using `ejvars_dict`
pop_df["indicators"] = ...
pop_df

In [None]:
grader.check("q1_2")

Below, we plot the results as we did in lecture. Make sure you feel comfortable with matplotlib plotting setup below.

In [None]:
# Plot the results. 
pop_df = pop_df.set_index(pop_df.indicators)

pop_df.plot(figsize=(12, 7),
    kind="barh", 
    stacked=True
);

plt.xlabel('Percentage distribution of Prioritized Communities')
plt.ylabel('EPA EJ Indicator')
plt.title('EPA EJ Indicator Distribution for Different types of Communities ');

Below we replot the results, but sorting the bars by rural values. Think of what story we might want to tell here!

In [None]:
pop_df.sort_values(by=['rural']).plot(figsize=(12, 7),
    kind="barh", 
    stacked=True
);

plt.xlabel('Percentage distribution of Prioritized Communities')
plt.ylabel('EPA EJ Indicator')
plt.title('EPA EJ Indicator Distribution for Different types of Communities ');

Does this plot reflect the composition rest of the country? We can check by pulling in our original dataset that contains the population of each census group in the country.

In [None]:
ejdf.groupby("classification").sum()["ACSTOTPOP"] / sum(ejdf["ACSTOTPOP"])

<!-- BEGIN QUESTION -->

**Question 1.3:** Does our findings reflect the composition of the country as a whole? What do you think could be the cause of this? Answer in 3 to 5 sentences.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---
## Part 2: Predicting the EJ Score

Next, we seek to see how well the different raw environmental data predict the final EJ scores. Do to so, we use OLS with the setup as follows:

The objective is to fit a line that is defined by $y=mx+b$ so that the coefficient $m$ and the intercept $b$ are chosen to minimize the errors between the line and the actual observations.  We use ordinary least squares to find the slope and intercept of this line - the linear model.  There may be other types of models that better represent the data than a linear model; but a linear model is a good first start, just to get a sense of the data.

Once we have found the $m$ and $b$ that minimize the errors, which we call $\hat{m}$ and $\hat{b}$, we get a predicted $y$ value for each $x$.  We denote and calculate this predicted $y$ value as $\hat{y} = \hat{m} x + \hat{b}$. Note that the $x$ value isn't estimated.  It's just the data.  And therefore has no hat on it.

For a particular record, with index $i$, calculate the predicted $y$ value for that observation as $\hat{y}_i = \hat{m} x_i + \hat{b}$.  Note that the values of $\hat{m}$ and $\hat{b}$ are constant across all observations and, as a result, don't need the $i$ index.

Finally, we denote $\bar{y}$ as the mean for the $y$ variable, across all observations.

Below, we proceed with the analysis for [`PM 2.5`](https://www.epa.gov/air-trends/particulate-matter-pm25-trends). You'll be asked to do the same for [`OZONE`](https://www.epa.gov/air-trends/ozone-trends) in just a second. The pollutant's names link to the EPA's description of the emittors. Check them out if you're unfamiliar.

In [None]:
df = pd.read_csv("EJSCREEN_sample2.csv")
df.describe()

Run the `stats.linregress` command to gather the coefficient and intercept of the relationship between our `PM2.5` raw data and the final EJ score for `PM2.5` in that community, `D_PM25_2`.

In [None]:
m, b, _, _, _ = stats.linregress(df.PM25, df.D_PM25_2)
print(f"slope: {m}")
print(f"intercept: {b}")

We plot the relationship with the line of best fit above.

In [None]:
plt.figure(figsize=(16, 8))
plt.scatter(
    df.PM25,
    df.D_PM25_2,
    s=0.01
)

plt.plot(df.PM25, m*df.PM25 + b, color="orange")
plt.ylim([-20000, 20000])
plt.xlabel('PM2.5 levels in air')
plt.ylabel('EJ Index for PM2.5 level in air')
plt.title('EJ Index for PM2.5  vs. PM2.5 level in air');

Now, we repeat the previous exercise but with our `demographic_index` discussed in lecture and earlier in this lab. Make sure you understand how it is calculated and how it might influence our final EJ score for our environmental indicators.

In [None]:
m, b, _, _, _ = stats.linregress(df.demographic_index, df.D_PM25_2)
print(f"slope: {m}")
print(f"intercept: {b}")

Plot the relationship and the line of best fit above.

In [None]:
plt.figure(figsize=(16, 8))
plt.scatter(
    df.demographic_index,
    df.D_PM25_2,
    s=0.01
)
plt.plot(df.demographic_index, m*df.demographic_index + b, color="orange")
plt.ylim([-20000, 20000])
plt.xlabel('EPA EJ Demographic Index')
plt.ylabel('EJ Index for PM2.5 level in air')
plt.title('EJ Index for PM2.5  vs. EPA EJ Demographic Index');

**Question 2.1:** Now, repeat the exercise above for the environmental indicator `OZONE`. Your solution should include 2 plots with appropriate titles, labels, ranges for x and y, and the line of best fit.

In [None]:
# YOUR CODE FOR PLOT 1 HERE.

# No prompt. Free code!

In [None]:
# YOUR CODE FOR PLOT 2 HERE.

# No prompt. Free code!

<!-- BEGIN QUESTION -->

**Question 2.2:** What does your findings suggest about the relationship between the environmental raw data on OZONE, the EJ index for OZONE, and the demographic index used by the EPA? What are potential 'flaws' with this model? Answer in 3 to 5 sentences, at a minimum.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---
## Part 3: Focus on $R^2$

The R-squared, or coefficient of correlation, or coefficient of determination is a measure of **how much of the variation can be explained by the linear model**.  A higher $R^2$ value indicates that the linear model explains more of the variation - there is less variation relegated to the residuals, relative to the total variation.

$$ R^2 = 1 - \frac{RSS}{TSS} = 1 - \frac{\sum_{i} (y_i - \hat{y}_{i})^{2}}{\sum_{i} (y_i - \bar{y})^{2}}$$



Let's calculate the $R^2$ for `PM2.5` manually. You'll be asked to do the same for `OZONE` in just a second, so pay attention! ;) 

In [None]:
m, b, _, _, _ = stats.linregress(df.PM25, df.D_PM25_2)
df["yhat"] = m*df.PM25 + b
rss = np.sum((df.D_PM25_2 - df.yhat)**2)
tss = np.sum((df.D_PM25_2 - df.D_PM25_2.mean())**2)

1 - rss/tss

We can do exactly the same by squaring the r parameter gathered from our call to `stats.linregress`.

In [None]:
m, b, r, _, _ = stats.linregress(df.PM25, df.D_PM25_2)

r**2

> Raw PM2.5 data explains approximately 7% of variation in EJ PM2.5 score. 

In [None]:
m, b, _, _, _ = stats.linregress(df.demographic_index, df.D_PM25_2)
df["yhat"] = m*df.demographic_index + b
rss = np.sum((df.D_PM25_2 - df.yhat)**2)
tss = np.sum((df.D_PM25_2 - df.D_PM25_2.mean())**2)

1 - rss/tss

In [None]:
m, b, r, _, _ = stats.linregress(df.demographic_index, df.D_PM25_2)

r**2

> The Demographic Index data explains approximately 67% of variation in EJ PM2.5 score. 

Should it?  We don't know.  But this very apparent mismatch is not being adequately addressed at the highest levels of government, as they discuss how to identify "underserved communities."

**Question 3.1:** Repeat the exercise above for `OZONE`, but build a function that takes in the independent and dependent variables of an OLS. It should return 2 numbers, the $R^2$ from your first and second regression. Save these as `r2_1` and `r2_2`.

In [None]:
def r2(x,y):
    ...
    return ...

r2_1 = ...
r2_2 = ...
print(r2_1, r2_2)

In [None]:
grader.check("q3_1")

<!-- BEGIN QUESTION -->

**Question 3.2:** Comment on your findings how much of the variation in the EJ Ozone Score is explained by the raw data vs. the demographic index? Why could this be the case? What are potential pitfalls of an EJ Ozone Score like this? Can you think of an example where funding may / or may not be diverted to communities with extreme levels of Ozone in the air? Answer in 3 to 5 sentences, at a minimum. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---
## Bonus Part 4: A Quick Look at the Raw Data
The 2 next parts are for your own 'enjoyment' - and offers an insight into how a real-world data scientist do their EDA. Thank you again to Cal grad [Dan Hammer](https://www.danham.me/r/) for his awesome insight into the Justice 40 program and its potential flaws.


Let's get a more detailed look at the raw data used above than the `describe()` method affords.

In [None]:
# Evolution of parameters fed into the plot
#     plt.hist(df.ACSTOTPOP)
#     plt.hist(df.ACSTOTPOP, range=[0,10000])

# Set figure size
plt.figure(figsize=(16, 8))

# Assign the plot to a variable, just to suppress output in Notebooks
fig = plt.hist(df["ACSTOTPOP"], range=[0,10000], bins=1000)

# Add a vertical line at the mean value, and standard deviations
plt.axvline(
    df.ACSTOTPOP.mean(), 
    linewidth=3, 
    color="orange"
)

plt.axvline(
    df.ACSTOTPOP.mean()-df.ACSTOTPOP.std(), 
    linewidth=1, 
    color="orange",
    linestyle="dashed"
)

plt.axvline(
    df.ACSTOTPOP.mean()+df.ACSTOTPOP.std(), 
    linewidth=1, 
    color="orange",
    linestyle="dashed"
)

print(int(df.ACSTOTPOP.mean()))

plt.title('Distribution of Total Census Population')
plt.ylabel('Frequency')
plt.xlabel('Total Census Population')

In [None]:
# Evolution of parameters fed into the plot
#     plt.hist(df.ACSTOTPOP)
#     plt.hist(df.ACSTOTPOP, range=[0,10000])

# Set figure size
plt.figure(figsize=(16, 8))

# Assign the plot to a variable, just to suppress output in Notebooks
fig = plt.hist(df.PM25, range=[0,16], bins=1000)

# Add a vertical line at the mean value, and standard deviations
plt.axvline(
    df.PM25.mean(), 
    linewidth=3, 
    color="orange"
)

plt.axvline(
    df.PM25.mean()-df.PM25.std(), 
    linewidth=1, 
    color="orange",
    linestyle="dashed"
)

plt.axvline(
    df.PM25.mean()+df.PM25.std(), 
    linewidth=1, 
    color="orange",
    linestyle="dashed"
)

plt.title('Distribution of PM2.5 Concentrations in the Air')
plt.ylabel('Frequency')
plt.xlabel('PM2.5 Concentrations in the Air')

This exercise can easily be repeated for any other numerical variable in the dataset. Feel free to keep exploring!

In [None]:
# YOUR EXPLORATION HERE. Note: We will check this out, but you are not at all graded on this.

---
## Bonus Part 5: Community Type and Demographic Factors.

The next part merges all of what we have done above and delves into how the distribtutions of different indicators change for the community type (urban, suburban, and rural).

In [None]:
# Source: https://www.cdc.gov/nchs/data_access/urban_rural.htm#Data_Files_and_Documentation
nchs = pd.read_excel("NCHSURCodes2013.xlsx")
nchs = nchs[["FIPS code", "State Abr.", "County name", "2013 code"]]
nchs.columns = ["fips", "state", "county", "classification"]

In [None]:
# Associate each of the classifications to one of three categories, rather 
# than one of six - just to simplify our quick analysis
remap_dict = {
    1: "urban",
    2: "suburban",
    3: "suburban",
    4: "rural",
    5: "rural",
    6: "rural"
}

# replace the values of the `classification` column based on the key-value 
# pair in `remap_dict`
nchs = nchs.replace({'classification': remap_dict})

In [None]:
ejdf_merged = df.merge(nchs, on="fips")

In [None]:
urban    = ejdf_merged[ejdf_merged.classification == "urban"].LOWINCPCT
suburban = ejdf_merged[ejdf_merged.classification == "suburban"].LOWINCPCT
rural    = ejdf_merged[ejdf_merged.classification == "rural"].LOWINCPCT

plt.figure(figsize=(20,6))
fig = plt.hist(urban, bins=100, alpha=0.5, label="urban")
fig = plt.hist(suburban, bins=100, alpha=0.5, label="suburban")
fig = plt.hist(rural, bins=100, alpha=0.5, label="rural")

plt.xlabel("% low income", size=14)
plt.legend(loc='upper right')

print(f"urban: {urban.mean()}")
print(f"suburban: {suburban.mean()}")
print(f"rural: {rural.mean()}")

Stop here for a moment and reflect on what this informs us about the potential misalignments in part 1.

In [None]:
urban    = ejdf_merged[ejdf_merged.classification == "urban"].MINORPCT
suburban = ejdf_merged[ejdf_merged.classification == "suburban"].MINORPCT
rural    = ejdf_merged[ejdf_merged.classification == "rural"].MINORPCT

plt.figure(figsize=(20,6))
fig = plt.hist(urban, bins=100, alpha=0.5, label="urban")
fig = plt.hist(suburban, bins=100, alpha=0.5, label="suburban")
fig = plt.hist(rural, bins=100, alpha=0.5, label="rural")

plt.xlabel("% people of color", size=14)
plt.legend(loc='upper right')

print(f"urban: {urban.mean()}")
print(f"suburban: {suburban.mean()}")
print(f"rural: {rural.mean()}")

As above, stop here again for a moment and reflect on what this informs us about the potential misalignments in part 1. These 2 variables are the prime drivers of the demographic index developed by the EPA. Now you should be one step closer to understanding both the 'power' and the 'pitfalls' of the way EJ is handled in the White House.

---
## Feedback

**Question 6:** Please fill out this short [feedback form](https://forms.gle/xpJ4wA9aR7yUdnae8) to let us know your thoughts about this lab! We really appreciate your opinions and feedback! At the end of the Google form, you should see a codeword. Assign the codeword to the variable `codeword` below. 

In [None]:
codeword = ...

In [None]:
grader.check("q6")

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()