# BUDS Report 06: Table Practice

### Table of Contents
1. <a href='#section 1'>Creating Tables</a>
2. <a href='#section 2'>Accessing Columns</a>
3. <a href='#section 3'>Column Arithmetic</a>
4. <a href='#section 4'>The CES Data Set Again</a>

In [None]:
# run this cell
from datascience import *
import numpy as np
import math
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

## 1. Creating Tables <a id='section 1'></a>

Arrays represent individual columns, but tables allow us to compare items across rows. You can organize arrays into tables to make comparisons easier. Let's breakdown the first example (which you see in the following two code cells):
- It creates an empty table using the expression `Table()`,
- adds two columns by calling `with_columns` with four arguments (separated by commas),
- assigns the result to the name `fruits`, and finally
- evaluates `fruits` so that you can see the table.

The function `with_columns` takes in alternating strings (denoting column labels) and arrays (representing the data in those columns). The strings "fruit names" and "count" are the column labels that have been chosen, and the variables `fruit_names` and `count` are two arrays of the same length. 

In [None]:
fruit_names = make_array("apple","orange", "pineapple")
count = make_array(4, 3, 3)

In [None]:
fruits = Table().with_columns(
    "fruit names", fruit_names,
    "count", count)
fruits

<div class="alert alert-warning">
    <b>PRACTICE:</b> You can add more to this table by referencing the existing table <code>fruits</code>. Let's add a column named "price" using the <code>prices</code> array below. Name the new table <code>fruits_price</code>.
    </div>
    
Notice that the `prices_array` has 3 items in it, which is the same length as the number of rows in the fruits table.

In [None]:
prices = make_array(0.79, 1.10, 1.59)
prices

In [None]:
fruits_price = fruits.with_column(
    "...", ...)
fruits_price

You can also add columns to a table by inserting the array *within* `with_columns`. 

<div class="alert alert-warning">
    <b>PRACTICE:</b> Fill in the missing code, so that the table called <code>fruit_availability</code> has 4 columns. Its columns should be "fruit names", "count", "price", and "available". The column "available" represents whether the fruit is available at Store X.
    </div>

In [None]:
fruit_availability = ...with_columns(
    "...", make_array(True, False, True)) 
fruit_availability

<div class="alert alert-warning">
    <b>PRACTICE:</b> Recall that you can perform other operations on this new table. Sort the prices from least expensive to most expensive and call the resulting table <code>sorted_fruits</code>.
    </div>

In [None]:
sorted_fruits = ...
sorted_fruits

## 2. Accessing Columns <a id='section 2'></a>

`column` takes the column name of a table and returns the values in that column as an array.

You can get the "available" column from the `fruit_availability` table above. See the code below. It extracts the "available" column from the `fruit_availability` table as an array and gives it the name `availability_array`.

In [None]:
availability_array = fruit_availability.column("available")
availability_array

Extracting columns from tables is useful because it allows us to perform calculations on columns *in* our tables.

<div class="alert alert-warning">
    <b>PRACTICE:</b> Assuming all fruits are available in Store X, you want to buy one of each fruit. How much will this cost us? Use the <code>fruits_price</code> table and not the <code>prices_array</code>. Assign the sum to the name <code>total_cost</code>.
</div>

In [None]:
total_cost = ...
total_cost

## 3. Column Arithmetic <a id='section 3'></a>

If you perform an arithmetic operation on an array, Python will do the operation to every element of the array individually and return an array of all of the results. For example, say there was a new tax on fruit of 40 cents. You can add to the array of values like so.

In [None]:
new_prices = fruit_availability.column("price") + 0.40
new_prices

<div class="alert alert-warning">
    <b>PRACTICE:</b> Store X just received a new shipment of fruit and has doubled the amount of each fruit. Create an array called <code>new_count</code>, which has double the number of fruits from <code>count</code>. To do so, get the array from the <code>fruit_availability</code> table.
    </div>

In [None]:
new_count = fruit_availability...("...") * 2
new_count

<div class="alert alert-warning">
    <b>PRACTICE:</b> Let's add this array to the <code>fruit_availability</code> table. Add the column and call it "new count". Assign this new table to the name <code>updated_fruits</code>.
    </div>

In [None]:
updated_fruits = ...
updated_fruits

<div class="alert alert-warning">
    <b>PRACTICE:</b> For the final table, you only want 3 columns: "fruit names", "new count", and "price". Once you've selected only those columns, sort the prices from most expensive to least expensive.
    </div>

In [None]:
final_fruits = updated_fruits...("...", "new count", "...").sort("...", descending = ...)
final_fruits

## 4. The CES Data Set Again <a id='section 4'></a>

Now that you have an idea of how to update columns within a table, you can do some exploration with real-world data. Here, you'll revisit the CalEnviroScreen dataset. Reference the shared document in which your team collected background information. Feel free to look back at [Report 04](https://highschool.datahub.berkeley.edu/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fds-modules%2FBUDS-SU22&urlpath=tree%2FBUDS-SU22%2FWeek-1%2F4_Tables-Pt-2.ipynb&branch=main) or the more in-depth [CalEnviroScreen report](https://oehha.ca.gov/media/downloads/calenviroscreen/report/calenviroscreen40reportf2021.pdf).

In [None]:
ces_data = Table.read_table("ces_data_v2.csv")
ces_data

Previously, you looked at asthma and some other indicators that you felt had some ties to asthma. Today, you'll make conversions with different measurements.

<div class="alert alert-warning">
    <b>PRACTICE:</b> Keep only the "California.County", "Total.Population", "Asthma", "Poverty", and "Unemployment" columns from the <code>ces_data</code> table. Assign this new table to the name <code>measurements</code> and be sure that the order of the columns match.
    </div>

In [None]:
measurements = ...
measurements

This next cell focuses on cleaning the data that you have. Recall that some values could not be obtained and are denoted as `nan`. To get rid of these rows, some filtering must be done. As in Report 05, you do not need to know what is happening in this cell; it's enough to know that it is getting rid of the tracts that do not have sufficient information.

In [None]:
for label in np.arange(1, 5):
    measurements = measurements.where(label, are.above_or_equal_to(0))
    
measurements

<div class="alert alert-warning">
    <b>PRACTICE:</b> The first change to make is within the "Asthma" column. It is currently measured in asthma-related emergency-department (ED) visits per every 10,000 ED visits. To make it more convenient, you'll be converting the values to the <i>percentage</i> of asthma-related emergency-department visits. This will be the number of asthma ED visits per 100 ED visits. Look at the following expressions below.

<ul>
    <li>Current measurement: $\frac{\# asthma}{\# total} = \frac{\# asthma}{10,000}$
    <li>Percentage: $\% asthma = \frac{\# asthma}{100}$
</ul>

Find out the calculations you would need to make to convert these values to percentages. Then, assign <code>asthma_percent</code> to this array.
    </div>

In [None]:
asthma_percent = ...
asthma_percent

<div class="alert alert-warning">
    <b>PRACTICE:</b> We can now add it back to the <code>measurements</code> table using previous methods. Give it the name "Asthma.Percent" so that you can distinguish between the old and new values.
    </div>

In [None]:
measurements = ...
measurements

Well done converting our original measurements into something more intuitive! Changes like these happen frequently and are necessary when presenting findings to another party. It may seem tedious, but it gets easier the more you do it.

In this next section, let's explore the measurements of California as a whole. A characteristic of interest might be the poverty rate. CalEnviroScreen collected the percentage of people living below two times the poverty rate, *but* each percentage is specific to the corresponding tract. Since these percentages differ and since tracts are composed of different population sizes, you should find the total number of people living in poverty and the total population in order to find California's poverty rate. Consider the following equation.

$poverty rate = \frac{\# poverty}{\# total}$

<div class="alert alert-warning">
    <b>PRACTICE:</b> Let's start by finding the total number of people living in poverty (denoted '# poverty' in the above equation). Since "Poverty" is measured in terms of <i>percentages</i>, think of a way you can get it back to <i>counts</i>.
    </div>
    
*Hint:* You may need to look at more than one column in the table.

In [None]:
poverty_count = ...
poverty_count

<div class="alert alert-warning">
    <b>PRACTICE:</b> Now that you have the total number of people living in poverty for each tract, you can add this data to the table. Recall the table method that adds a column to your table, and call this column "Poverty.Count".
    </div>

In [None]:
measurements = ...
measurements

<div class="alert alert-warning">
    <b>PRACTICE:</b> Finally, you can make the calculation to find the percentage of California residents living in poverty. The <code>measurements</code> table should now have the total number of people living in poverty and the total number of residents <i>per tract</i>. Before you make your calculation, remember that we are trying to divide the total number of California residents living in poverty by the total population of California.
    </div>
    
*Hint:* All of the tracts make up California, so you can aggregate all of those numbers to get the percentage.

In [None]:
poverty_ca = ...
population_ca = ...

poverty_percentage = ...
poverty_percentage

Look at the number outputted. Is it what you expected? If not, what is surprising about this value?

_Written Answer:_

### Extra Exploration (Optional)

If you find that you would like to do extra exploration, you might find the "Unemployment" data interesting to look at. Some (but not all) of the poverty rate could be explained by unemployment levels, so try to compare the two measurements. Feel free to copy and paste the code from above.

In [None]:
unemp_count = ...
unemp_count

In [None]:
measurements = ...
measurements

In [None]:
poverty_ca = ...
population_ca = ...

poverty_percentage = ...
poverty_percentage

### Downloading as PDF

Download this notebook as a pdf by clicking <b><code>File > Download as > PDF via LaTeX (.pdf)</code></b>. Turn in the PDF into bCourses under the corresponding assignment.

Adapted from Data 8, Spring 2020 Lab 03 and Homework 02