## Week 1 Class activities
This notebook is a starting point for the exercises and activities that we'll do in class.

Before you attempt any of these activities, make sure to watch the Week 1 video lectures.

### Using the `requests` library to query an API
Here's the code that we saw in the video lecture that queries BART for real-time arrivals.

In [None]:
import json
import pandas as pd
import requests

APIkey = 'MW9S-E7SL-26DU-VV8V'  # the key posted on BART's website
station = 'CIVC'
requestString = 'http://api.bart.gov/api/etd.aspx?cmd=etd&orig={}&json=y&key={}&dir=s'.format(station, APIkey)
r = requests.get(requestString)
d = json.loads(r.text)
etd = d['root']['station'][0]['etd']
print('Trains from {} to {}'.format(station, etd[0]['destination']))
df = pd.DataFrame(etd[0]['estimate'])
df

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Explore the different objects. What are <strong>r</strong>, <strong>d</strong>, and <strong>etd</strong>. What can you do with them?
</div>

Hint: Use `type()` to find out the type of an object (e.g. `type(r)`), and `?` to pull up the help screen (e.g. `r?`).

You can also tab autocomplete to discover an object's attributes and methods (e.g. `r.` and then `TAB`). 

In [None]:
# your code here
r.

Now let's explore the other options and API commands that BART offers. 

The API documentation for the `etd` (real-time information) command is [here](https://api.bart.gov/docs/etd/etd.aspx). 

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Write a command to retrieve real-time departures for southbound trains at Civic Center station (code: CIVC). Hint: You'll need to add another <strong>&</strong> to <strong>requestString</strong>.
</div>

In [None]:
# suggested solution
import json
import pandas as pd
import requests

APIkey = 'MW9S-E7SL-26DU-VV8V'  # the key posted on BART's website
station = 'CIVC'
requestString = 'http://api.bart.gov/api/etd.aspx?cmd=etd&orig={}&json=y&key={}&dir=s'.format(station, APIkey)
r_2 = requests.get(requestString)
d_2 = json.loads(r_2.text)
etd_2 = d_2['root']['station'][0]['etd'][0]['estimate']
pd.DataFrame(etd_2)



<div class="alert alert-block alert-info">
    <strong>Exercise:</strong> Use the <strong>elev</strong> command to obtain the elevator status at each station, and put it in a dataframe. Optional extension: pass the parameters as a dictionary to requests, as we saw in the video lecture.

See the API docs [here](https://api.bart.gov/docs/bsa/elev.aspx) for details of that command.

In [None]:
requestString = 'http://api.bart.gov/api/bsa.aspx?cmd=elev&json=y&key={}'.format(APIkey)
r = requests.get(requestString)
d = json.loads(r.text)
pd.DataFrame(d['root']['bsa'])

# note that there isn't a station-by-station response, so this is a less satisfying
# dataframe than before
# we could go further, but would need to extract the stations from the description colum

### Accessing census data

Recall that we have seen two ways to access census data:
* The Census Bureau API
* The `cenpy` library

Let's try them both and map patterns of race for Los Angeles County. 

Here's the relevant code that we saw in the video lecture to get the 5-year ACS estimates for population (table `B01001_001E`).

In [None]:
import json
import requests
import pandas as pd

r = requests.get('https://api.census.gov/data/2015/acs/acs5?get=B01001_001E&for=county')
censusdata = r.json()
df = pd.DataFrame(censusdata[1:], columns=censusdata[0])
df.head()

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Retrieve population data at the census tract level for LA County, and put it in a pandas dataframe. (You can use the 5-year ACS if you like.)
</div>

Some examples are given [here](https://api.census.gov/data/2015/acs/acs5/examples.html). 

Note that you don't need the API key for a small number of queries, so you can delete `&key=YOUR_KEY_GOES_HERE` from the examples. 

The FIPS code for California is `06` and for Los Angeles County `037`.

In [None]:
# This is the example from the link below
# https://api.census.gov/data/2015/acs/acs5?get=NAME,B01001_001E&for=tract:020500&in=state:01%20county:001&key=YOUR_KEY_GOES_HERE
# here, we adapt it to get all tracts (*) in state 06 and county 037

r = requests.get('https://api.census.gov/data/2015/acs/acs5?get=NAME,B01001_001E&for=tract:*&in=state:06%20county:037')
censusdata = r.json()
df = pd.DataFrame(censusdata[1:], columns=censusdata[0])
df.head()

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Retrieve the census data for race/ethnicity for tracts in Los Angeles county, and put it in a pandas dataframe. 
</div>

Hints:
* The list of variables is [here](https://api.census.gov/data/2015/acs/acs5/variables.html).
* The data is crosstabulated by race and age and gender. If you just want race/ethnicity, then look at the `Estimate!!Total:` variables. For example, `B01001H_001E` gives the total number of non-Hispanic white people, without further disaggregating by gender and age. 
* Start with the simplest measure of race. For example, you could calculate the proportion of Black people or non-Hispanic white people in each census tract, by dividing the relevant variable by the total population (which you already retrieved above).
* You can request multiple variables at once - just separate them with commas. For example, `get=NAME,B01001_001E,B01001H_001E`. 
* `NAME` means that you are requesting the description of the geography - this is optional.


In [None]:
# let's calculate the % of residents who are non-Hispanic White
# looking at the list of variables, we need B01001_001E (total) and B01001H_001E (non-Hispanic white)

r = requests.get('https://api.census.gov/data/2015/acs/acs5?get=NAME,B01001_001E,B01001H_001E&for=tract:*&in=state:06%20county:037')
censusdata = r.json()
df = pd.DataFrame(censusdata[1:], columns=censusdata[0])

# as we saw in class, the data are strings
df.info()


In [None]:
# so to calculate the percentage, we first convert to a float.
# integer will not work here as it cannot hold NaN (missing data)

df['pc_nonHispanicWhite'] = df.B01001H_001E.astype(float) / df.B01001_001E.astype(float) * 100 
df.head()

<div class="alert alert-block alert-info">
    <strong>Exercise:</strong> Now do the same using <strong>cenpy</strong>.
</div>

Here's the relevant example from the lecture. Note if you want multiple variables, you can pass them as a list. For example: `variables=['B25035_001E','B01001H_001E']`.


In [None]:
# example from lecture
import cenpy
from cenpy import products

# create a connection to the American Community Survey
acs = cenpy.products.ACS()
riverside = products.ACS(2017).from_county('Riverside, CA', level='tract',
                                        variables='B25035_001E')
riverside.head()

In [None]:
# adapting the example to answer the question
import cenpy
from cenpy import products

# create a connection to the American Community Survey
# make it 2015 to match our census API example above
acs = cenpy.products.ACS()
la = products.ACS(2015).from_county('Los Angeles, CA', level='tract',
                                     variables=['B01001H_001E','B01001_001E'])
la['pc_nonHispanicWhite'] = la.B01001H_001E.astype(float) / la.B01001_001E.astype(float) * 100 


la.head()

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Can you write a function that retrieves population by race for all census tracts in a specified county? (Or a simplified measure of race, such as the proportion of Black people.) 
</div>

Hint: use the code you wrote, but replace the county FIPS code `037` with a variable. Your function can take a single argument, e.g. `countyFIPS`.

In [None]:
# using the census API (not cenpy)
# we could do a cenpy version and pass the county name instead of the FIPS code

def get_nhWhite(countyFIPS):
    # same as above, except 037 is replaced by our countyFIPS variable
    r = requests.get('https://api.census.gov/data/2015/acs/acs5?get=NAME,B01001_001E,B01001H_001E&for=tract:*&in=state:06%20county:{}'.format(countyFIPS))
    censusdata = r.json()
    df = pd.DataFrame(censusdata[1:], columns=censusdata[0])
    df['pc_nonHispanicWhite'] = df.B01001H_001E.astype(float) / df.B01001_001E.astype(float) * 100 

    return df

# for Marin County
get_nhWhite('041')

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Map your results!
</div>

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
la.plot('pc_nonHispanicWhite', ax = ax, legend=True)

# remove the tick labels
ax.set_xticks([])
ax.set_yticks([])

# set the limits to remove Catalina
# I eyeballed this from the version before removing the tick labels
ax.set_ylim([3.97*1e6, 4.15*1e6])

ax.set_title('Percent non-Hispanic white residents')


### Using Socrata

Here's the example that we saw in the lecture.

In [None]:
import geopandas as gpd
url = 'https://data.lacity.org/resource/mymu-zi3s.geojson'
gdf = gpd.read_file(url)
gdf.plot()

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Choose another dataset on Socrata, download it using the API, and map the results. 
</div>

The City of Los Angeles datasets are [here](https://data.lacity.org). Feel free to choose another city or county if you prefer.

Some possible datasets of planning-related interest:
* [DACA/DAPA workshops](https://data.lacity.org/Community-Economic-Development/Map2-DACA-DAPA-Workshops/icwt-9z3e) (seems a bit dated)
* [Solar PV permits](https://data.lacity.org/A-Prosperous-City/Solar-PV-Permits-in-LA/bdt7-w2xr)
* [Parks](https://data.lacity.org/Community-Economic-Development/Department-of-Recreation-and-Parks-Facility-and-Pa/ax8j-dhzm)

In [None]:
# your code here

# let's do solar
url = 'https://data.lacity.org/resource/bdt7-w2xr.geojson'
gdf = gpd.read_file(url)
gdf.plot()

In [None]:
# clean it up a bit
import matplotlib.pyplot as plt
import contextily as ctx

# easier if we pre-create an axis object
fig, ax = plt.subplots(figsize=(10,10))

# project to 3857 to match the basemap
# use the markersize keyword to make the points smaller
gdf.to_crs('EPSG:3857').plot(ax=ax, markersize=2)

# add a basemap
# for more options, see https://contextily.readthedocs.io/en/latest/providers_deepdive.html
# I chose xyz.OpenStreetMap.Mapnik
ctx.add_basemap(ax=ax, zoom=12, source=ctx.providers.OpenStreetMap.Mapnik)

# drop the tick labels
ax.set_xticks([])
ax.set_yticks([])

# add a title
ax.set_title('Solar PV permits, City of Los Angeles')

<div class="alert alert-block alert-info">
<h3>What you should have learned</h3>
<ul>
  <li>Gain confidence in experimenting with code - exploring different objects, writing functions, and so on</li>
  <li>Learn how to read API documentation and adapt the examples to create your own queries.</li>
  <li>Gain confidence in mapping the results. We'll practice this much more throughout the quarter.</li>
</ul>
</div>