# BUDS Report 09.5: Putting It All Together

### Table of Contents

1. <a href='#section 1'>An Actual Last Look at the CES Data</a>

2. <a href='#section 2'>You've Done a Lot!</a>

In [None]:
# run this cell
from datascience import *
import numpy as np
import math
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
%matplotlib inline

## 1. An Actual Last Look at the CES Data <a id='section 1'></a>

Run the next cell so that we can load the CalEnviroScreen dataset one last time.

In [None]:
ces_data = Table.read_table("ces_data_v2.csv")

# this does a bit of data cleaning
# don't worry about understanding these next few lines of code
for i in np.arange(ces_data.num_columns):
    if i != 3 and i != 11:
        ces_data = ces_data.where(i, are.above_or_equal_to(0))
ces_data

It might make sense that counties that are inland feel the effects of pollution more so than those that are on the coast. The Central Valley doesn't have as much air circulation as the coast and many pesticides are used there. Let's examine this possibility by comparing the most populated Central Valley counties with the Central Coast counties.

<div class="alert alert-warning">
    <b>PRACTICE:</b> First, let's create a table called <code>central_coast</code> that contains some of the Central Coast counties. Similarly, we'll create another table called <code>central_valley</code> with a few Central Valley counties.
    
Use the following counties for these tables.
<ul>
    <li><b>Central Coast:</b> Ventura, Santa Barbara, San Luis Obispo, Monterey, San Benito, and Santa Cruz
    <li><b>Central Valley:</b> Kern, Fresno, Sacramento, and Shasta
    </ul>

Make arrays containing these counties and assign them to the corresponding names. You can use these arrays in your <code>where</code> predicate.
    
Be sure to find a predicate that allows you to keep any row whose county is <i>contained</i> in the corresponding array.
    </div>

In [None]:
central_coast_counties = make_array(...)
central_coast = ces_data.where(...)
central_coast

In [None]:
central_valley_counties = ...
central_valley = ...
central_valley

As usual, we don't really want all of the columns in this table. We're primarily interested in the CES score (or maybe the pollution burden) of these tracts but not much else.

<div class="alert alert-warning">
    <b>PRACTICE:</b> Select the column(s) that we might need in our analysis and assign this table back to <code>central_coast</code> or <code>central_valley</code>. We won't be using the county column, but retain that column as well. It might help us keep track of the Central Coast/Central Valley data a bit.
    </div>

In [None]:
central_coast = ...
central_coast

In [None]:
central_valley = ...
central_valley

In Report 09: Section 4, we added a column with only the string "female" to the table of female data, `fem_str`. Similarly, we added a column with only the string "male" to the table of male data, `male_str`.

<div class="alert alert-warning">
    <b>PRACTICE:</b> Look at the code in this section and try to add a column called "Location" with only the string "Central Coast" for the <code>central_coast</code> table and with only the string "Central Valley" for the <code>central_valley</code> table.
    </div>

In [None]:
central_coast = ...
central_valley = ...
central = central_coast.with_rows(central_valley.rows)
central

Finally, let's take a look at the difference between the two categories by visualizing the distribution of CES scores (or pollution burden scores) for the Central Coast and Central Valley. Because we want to compare these two categories, we need to find a way to group the Central Coast data together and separate it from the Central Valley data.

<div class="alert alert-warning">
    <b>PRACTICE:</b> Look back at the BUDS Reference Sheet and take a look at the documentation notes on <code>hist</code>. Do you see any arguments that might distinguish between these two groups for us? Think about how our table is set up.
    
Don't forget to use the argument <code>normed=False</code>!
    </div>

In [None]:
central.hist(...)

What do you notice about this histogram? Do you think it agrees with what you expected?

_Written Answer:_

## 2. You've Done a Lot! <a id='section 2'></a>

Congratulations on combining a multitude of function/method calls on the CalEnviroScreen dataset! You've conducted so much research on pollution and the CalEnviroScreen data, explored many of its different characteristics, and successfully answered a number of questions about the data.

Given that it's only been *two weeks* since this program has started, you have accomplished so much. We're very proud of you!

If you want to explore the data more, feel free to play around more with the dataset in the following cells. Otherwise — great job!

### Downloading as PDF

Download this notebook as a pdf by clicking <b><code>File > Download as > PDF via LaTeX (.pdf)</code></b>. Turn in the PDF into bCourses under the corresponding assignment.