# Minilab 4
In this lab you will further explore the Traffic Analysis Zone (TAZ) data that we began to explore in minilab 3. In part 1 of this lab we will look at some of the population data stored in 'data/tazData.csv' including employment, age, and income data.

In part 2 you will use the Tables join() function to join the populations tazData with the travel skims data that we looked at in minilab 3. You will calculate the effect that a large event in San Francisco (e.g. a Giants Game) can have on the transportation network.

In [None]:
from datascience import *
import numpy as np
import warnings
warnings.filterwarnings("ignore")

##  Part 1 - exploring TAZ population data

In [None]:
tazData = Table.read_table('data/tazData.csv')
tazData

Below is the TazData dictionary. It contains information on the meaning of each of the column headers. This information comes from http://analytics.mtc.ca.gov/foswiki/Main/TazData

    ZONE	    Transportation analysis zone	Integer, 1 to 1454	Origins, destinations, shape file
    DISTRICT	Superdistrict geographic designation	Integer, 1 to 34	Shape file
    SD	        Superdistrict geographic designation (duplicate)	Integer, 1 to 34	 
    COUNTY	    County	Integer, 1 to 9	
                                 1 - San Francisco;
                                 2 - San Mateo;
                                 3 - Santa Clara;
                                 4 - Alameda;
                                 5 - Contra Costa;
                                 6 - Solano;
                                 7 - Napa;
                                 8 - Sonoma;
                                 9 - Marin
    TOTHH	    Total households	Integer, 0 and up	 
    HHPOP	    Population living in households (as opposed to group quarters)	Integer, 0 and up	 
    TOTPOP	    Total population	Integer, 0 and up	 
    EMPRES	    Employed residents	Integer, 0 and up	 
    SFDU	    Number of occupied single-family dwelling units	Integer, 0 and up	 
    MFDU	    Number of occupied multi-family dwelling units	Integer, 0 and up	 
    HHINCQ1	    Households in the lowest income quartile (less than $25,000 annually in $1989)	Integer, 0 and up	 
    HHINCQ2	    Households in the second lowest income quartile (between $25,000 and $45,000 in $1989)	Integer, 0 and up	 
    HHINCQ3	    Households in the second highest income quartile (between $45,000 and $75,000 in $1989)	Integer, 0 and up	 
    HHINCQ4	    Households in the highest income quartile (more than $75,000 in $1989)	Integer, 0 and up	 
    TOTACRE	    Total acres	Float, 0.0 and up	 
    RESACRE	    Acres occupied by residential development	Integer, 0 and up	 
    CIACRE	    Acres occupied by commercial or industrial development	Integer, 0 and up	 
    SHPOP62P	Share of the population age 62 or older	Float, 0.0 to 1.00	 
    TOTEMP	    Total employment	Integer, 0 and up	 
    AGE0004	    Persons age 0 to 4	Integer, 0 and up	 
    AGE0519	    Persons age 5 to 19	Integer, 0 and up	 
    AGE2044	    Persons age 20 to 44	Integer, 0 and up	 
    AGE4564	    Persons age 45 to 64	Integer, 0 and up	 
    AGE65P	    Persons age 65 and older	Integer, 0 and up	 
    RETEMPN	    Retail trade employment (NAICS-based)	Integer, 0 and up	 
    FPSEMPN	    Financial and professional services employment (NAICS-based)	Integer, 0 and up	 
    HEREEMPN	Health, educational and recreational service employment (NAICS-based)	Integer, 0 and up	 
    AGREMPN	    Agricultural and natural resources employment (NAICS-based)	Integer, 0 and up	 
    MWTEMPN	    Manufacturing, wholesale trade and transportation employment (NAICS-based)	Integer, 0 and up	 
    OTHEMPN	    Other employment (NAICS-based)	Integer, 0 and up	 
    PRKCST	    Hourly parking rate paid by long-term (8-hours) parkers (year 2000 cents)	Float, 0.0 and up	 
    OPRKCST	    Hourly parking rate paid by short-term parkers (year 2000 cents)	Float, 0.0 and up	 
    AREATYPE	Area type designation	Integer, 0 - regional core, 1 - central business district, 2 - urban business, 3 - urban, 4 - suburban, 5 - rural	 
    HSENROLL	High school students enrolled at schools in this TAZ	Float, 0.0 and up	 
    COLLFTE	    College students enrolled full-time at colleges in this TAZ	Float, 0.0 and up	 
    COLLPTE	    College students enrolled part-time at colleges in this TAZ	Float, 0.0 and up	 
    TERMINAL	Average time to travel from automobile storage location to origin/destination	Float, 0.0 and up	 
    TOPOLOGY	Topology (steepness) indicator	Integer, 1 - flat, 2 - in between, 3 - steep	 
    ZERO	    Placeholder (always zero)	Integer, 0	 
    HHLDS	    Repeat of the TOTHH variable with a different name for software compatibility	Integer, 0 and up	 
    SFTAZ	    Repeat of the ZONE variable with a different name for software compatibility	Integer, 1 to 1454	 
    GQPOP	    Population living in group quarters rather than households	Integer, 0 and up	 

## Normalizing the data
Because the populations vary significantly by TAZ, first we propbably want to <a = href = "https://en.wikipedia.org/wiki/Normalization_(statistics)">normalize</a> the data. For example, rather looking at the *count* of people in each income bracket, we may care more about the *percent* of people who fall into each income bracket. Say we wanted to get the percent of population that is employed per TAZ, we divide the number of employed residents by the total population for each TAZ.

If we take a close look at the data, we notice that some TAZs do not have any residents. As we know, we get an error if we try to divide by zero, so first let's select only the TAZs where the total population is greater than 0. We create a new table called tazData_new.

In [None]:
tazData_new = tazData.where(tazData.column('TOTPOP') != 0)

## Create a table for normalized data
Let's create a new table called tazData_norm, where we store the normalized values

In [None]:
tazData_norm = Table()

## Adding percent employed to tazData_norm
tazData_norm['PCTEMP']  = tazData_new['EMPRES']/tazData_new['TOTPOP']
creates a column called 'PCTEMP' in the tazData_norm Table if it does not already exist and assigns it the values tazData_new['EMPRES']/tazData_new['TOTPOP']. 

tazData_norm.hist(overlay=False) creates a histogram of the data

In [None]:
tazData_norm['PCTEMP']=tazData_new['EMPRES']/tazData_new['TOTPOP']
tazData_norm.select('PCTEMP').hist(bins = [0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.], overlay=False, normed=False)

## Adding other normalized variable
**Task - ** Add the following normalized columns to the table. 

Income: (Note for income, we want to normalize by number of households rather than total population.)
* PCTHHINCQ1
* PCTHHINCQ2
* PCTHHINCQ3 
* PCTHHINCQ4

Age:
* PCTAGE0004
* PCTAGE0519
* PCTAGE2044
* PCTAGE4564
* PCTAGE65P

In [None]:
# Your code here


## Create histograms of the normalized data, from the histograms find the following:
**Task - ** Create tazData_norm histograms and use them to answer the following questions:
* About how many TAZs have more than 20% of the population over 65 years old?
* About how many TAZs have a medium income less than \$25k in \$1989?
* About how many TAZs have a medium income greater than \$75k in \$1989?
* About how many TAZ's have more than 50% employment?



In [None]:
#Type your answers here


## Part 2 - Giants Game Impact
### Joining datatables & calculating VHT
Imagine that 5% of the total population of the SF, Oakland, Berkeley, and Mairin area travels to AT&T park (homogeneously, assuming 5% from each TAZ). Compute the **total vehicle hours traveled (VHT)**, assuming every traveller drives alone. We'll imagine that the traffic is similar to the AM commute so you can use the data/sf_oak_TimeSkims_AM.csv from minilab3. 

You can use the Tables.join() function to [join](http://data8.org/datascience/_autosummary/datascience.tables.Table.join.html?highlight=join#datascience.tables.Table.join) the tables.

Note that the the Giants stadium is located in TAZ 110, we need to find the travel time from each TAZ to the TAZs with dest = 110.

In [None]:
#your code here


In [None]:
#Your answer here:
# Total VHT = 

### Joining datatables & calculating VMT
With the same scenario as above (5% of the total population of the SF, Oakland, Berkeley, and Mairin area travels to AT&T park (homogeneously, assuming 5% from each TAZ)). Compute the **total vehicle miles traveled (VMT)**, to get to AT&T park assuming every traveler drives alone. 

In [None]:
#your code here


In [None]:
#Your answer here:
# Total VMT = 
