# Mini-lab 2

### Before we start
Keep in mind that *code is read much more often than it is written*. 
Your code should be optimized for readability and follow [Python Style guide](https://www.python.org/dev/peps/pep-0008/).

# Looking at Census Data
In the following lab we will take a look at some census data. A [census](https://en.wikipedia.org/wiki/Census) is the procedure of systematically acquiring and recording information about the members of a given population. The U.S. is required to take census data every 10 years. Information on the race, ethnicity, age, household size, family size etc. are recorded per [census tract](https://en.wikipedia.org/wiki/Census_tract).

In this lab we will look at the distribution of population age for a few Berkeley census tracts.

### First some imports
Click in the box below and press 'Shift'+'Enter' to run the code.

In [None]:
import numpy as np
from datascience import *

%matplotlib inline

### Next, reading the data

Sites like http://census.ire.org/ provide a nice interface to allow you to download census data. But we have downloaded the relevant data and cleaned it for you. Read in the csv below to see what the data looks like.

In [None]:
data = Table.read_table('bay_area_census_age.csv')
 
data

### About the data
In the table above, we have total population, male population, female population, and population by age group for all the census tracts in the bay area. 

In addition we have some **geographic properties** of each census tract including the land area, water area, and latitude and longitude coordinates for a point inside the census tract.

Below is a map with the Bay Area census tracts highlighted in blue.

<img src="CA_census_tracts.jpg", width="500">

### Using a function

Don't be scared by the code below - we will get into the details later in the course. 
For now just recognize that the code below is a *function* used to compute the distance between two (latitude, longitude) coordinates. Rotate table is another function we will need later in the lab. Press 'Shift' + 'Enter' to run the code in the box below, we will use these functions shortly.

#### Computing the distance on a sphere aka great circle distance
For more detail please see https://en.wikipedia.org/wiki/Great-circle_distance


In [None]:
def distance_on_sphere(lat1, lon1, lat2, lon2):
    """ Computes distance (in miles) on the surface of the Earth
        between two locations
        
        Args:
            lat1 (float): latitude of the first location
            lon1 (float): longitude of the first location
            lat2 (float): latitude of the second location
            lon2 (float): longitude of the second location
        
        Returns:
            Distance in miles (float)
    """
    # Convert latitude and longitude to spherical coordinates in radians.
    degrees_to_radians = np.pi/180.0
        
    # phi = 90 - latitude
    phi1 = (90.0 - lat1)*degrees_to_radians
    phi2 = (90.0 - lat2)*degrees_to_radians
        
    # theta = longitude
    theta1 = lon1*degrees_to_radians
    theta2 = lon2*degrees_to_radians
        
    # We can compute spherical distance from spherical coordinates.
    cos = (np.sin(phi1)*np.sin(phi2)*np.cos(theta1-theta2)+
           np.cos(phi1)*np.cos(phi2))
    arc = np.arccos(cos)

    # Multiply arc by the radius of the earth to get length.
    return 3960.*arc  # to get distance in miles

def rotate_table(table):
    '''Transforms a 2 x n table to be an n x 2 table'''
    return Table().with_columns(['Columns', list(table.labels),
                                 'Values', list(table.to_array()[0])])

### Find the census tract closest to the Channing-Bowditch apartments (just South of Campus)
Now we will use the distance_on_sphere() function to find the census tract closest to the Channing-Bowditch apartments. From [Google Maps](https://goo.gl/maps/5xudrVbixun) we learn that the apartment is located at 37.867495, -122.257617 (lat, lon). We use the .apply() method to calculate the distance between each census tract and the Channing Bodwitch apartment.

In [None]:
#return closest to 37.867495, -122.257617 (Channing-Bowditch apartments): https://goo.gl/maps/5xudrVbixun
lat1, lon1 = 37.867495, -122.257617

# calculate the distance from the Channing-Bowditch apartments to each tract. Save this in the data table 
# in a column labeled 'distance to Channing'
data['distance to Channing'] = data.apply(lambda lat2, lon2 : distance_on_sphere(lat1, lon1, lat2, lon2), 
                                          ['INTPTLAT10', 'INTPTLON10'])

#select the row where 'distance to Channing' is minimum. 
# This is the closest census tract to the Channing Apartments
channing_tract = data.where(data['distance to Channing'] == min(data['distance to Channing']))

#let's take a look at what this looks like.
channing_tract

### Create a horizontal bar graph of population vs. age group
We can use the barh function to create a bar graph. The function needs the data to be oriented in a single column. Right now the data is all oriented in one row. We will use the rotate table function (above) to rotate the table. We will save this table as a variable called tograph.

In [None]:
tograph = channing_tract.select(['Under 5 years', '5 to 9 years', '10 to 14 years',
                                 '15 to 19 years','20 to 24 years','25 to 29 years',
                                 '30 to 34 years','35 to 39 years','40 to 44 years',
                                 '45 to 49 years','50 to 54 years','55 to 59 years',
                                 '60 to 64 years','65 to 69 years','70 to 74 years',
                                 '75 to 79 years','80 to 84 years','85 years and over'])
tograph = rotate_table(tograph)
tograph

### Run the code below to create the bar graph.

In [None]:
tograph.relabel('Columns', 'Age group')
tograph.relabel('Values', 'Count')
tograph.barh('Age group')

**Question 1: ** What can we say about the data plotted above? Which age groups have the highest population. Do you think this is representative of the population for the rest of the Bay Area?

In [None]:
# Answer here:



### Another South Berkeley Location
Let's see what the population looks like farther south in Berkeley, near the Oakland-Berkeley border. There's a Whole Foods at the [corner of Ashby and Telegraph](https://goo.gl/maps/xNXp4XgtbN12). Let's repeat the procedure above to find the closest census tract to the Whole Foods and create a bar graph of the population at this location.

In [None]:
#Whole Foods at Ashby and Telegraph, 37.858636,-122.2620359 https://goo.gl/maps/xNXp4XgtbN12
lat1, lon1 = 37.858636,-122.2620359

# calculate the distance from the Whole Foods to each census tract. Save this in the data table 
# in a column labeled 'distance to Whole Foods'
data['distance to Whole Foods'] = data.apply(lambda lat2, lon2 : distance_on_sphere(lat1, lon1, lat2, lon2), 
                                             ['INTPTLAT10', 'INTPTLON10'])

#select the row where 'distance to Channing' is minimum. This is the closest census tract to the Channing Apartments
wholefood_tract = data.where(data['distance to Whole Foods'] == min(data['distance to Whole Foods']))

# create a bar graph of the population by age.
tograph = wholefood_tract.select(['Under 5 years', '5 to 9 years', '10 to 14 years',
                                 '15 to 19 years','20 to 24 years','25 to 29 years',
                                 '30 to 34 years','35 to 39 years','40 to 44 years',
                                 '45 to 49 years','50 to 54 years','55 to 59 years',
                                 '60 to 64 years','65 to 69 years','70 to 74 years',
                                 '75 to 79 years','80 to 84 years','85 years and over'])

tograph = rotate_table(tograph)
tograph.relabel('Columns', 'Age group')
tograph.relabel('Values', 'Count')
tograph.barh('Age group')

** Question 2: ** Comment on how the population data for this census tract looks different from the population data from the Channing-Bowditch apartments census tract. What might explain the differences in the age demographics in the two census tracts?

In [None]:
#Answer here


### What about the Berkeley Hills?
Let's look at one more Berkeley census tract. [Remilard Park](https://goo.gl/maps/3kCQkTDHjb32) is located in the Berkeley Hills, Northeast of campus. 

**Question 3: ** How do you expect the population data in this census tract to compare to the others?

In [None]:
# Answer here


 Run the code below to find out.

In [None]:
#Remilard Park Berkeley Hills 37.8892735,-122.2616268 https://goo.gl/maps/3kCQkTDHjb32
lat1, lon1 = 37.8892735,-122.2616268

data['distance to Berkeley Hills'] = data.apply(lambda lat2, lon2 : distance_on_sphere(lat1, lon1, lat2, lon2),
                                                ['INTPTLAT10', 'INTPTLON10'])

berkeleyhills_tract = data.where(data['distance to Berkeley Hills'] == min(data['distance to Berkeley Hills']))

tograph = berkeleyhills_tract.select(['Under 5 years', '5 to 9 years', '10 to 14 years',
                                      '15 to 19 years','20 to 24 years','25 to 29 years',
                                      '30 to 34 years','35 to 39 years','40 to 44 years',
                                      '45 to 49 years','50 to 54 years','55 to 59 years',
                                      '60 to 64 years','65 to 69 years','70 to 74 years',
                                      '75 to 79 years','80 to 84 years','85 years and over'])

tograph = rotate_table(tograph)
tograph.relabel('Columns', 'Age group')
tograph.relabel('Values', 'Count')
tograph.barh('Age group')

** Question 4: ** How does the age distribution in this census tract compare to the age-distribution in the previous two census tracts? Does this conform to your expectations?

In [None]:
# Answer here


### If time allows
Pick another Bay Area location, look up the lat, lon coordinates and see if you can replicate the procedure above to find the closest census tract and plot the population vs. age for this census tract. Comment on your findings.

In [None]:
# Your code here
