# Getting census data

## Lecture objectives

1. Gain more experience with APIs and `requests`
2. Learn specialized ways to access census data
3. Pratice with `pandas`, `geopandas`, and plotting


Traditionally, if you wanted census data, you had to download .csv or other files and decipher them. Recently, the census has introduced an API. [See the documentation here](https://www.census.gov/data/developers/guidance/api-user-guide.Example_API_Queries.html).

If you request more than 500 queries a day, you'll need to register for a (free) [API key](https://www.census.gov/data/developers/guidance/api-user-guide.Help_&_Contact_Us.html) from the Census Bureau.

Let's download population by county from the 2015 American Community Survey five-year estimates. We see from the documentation that the API call takes the following form:

`https://api.census.gov/data/YEAR/acs/DATASET?get=TABLENAME&for=GEOGRAPHY`

So getting the population (variable `B01001_001E`) for all counties is:

`https://api.census.gov/data/2015/acs/acs5?get=B01001_001E&for=county`

Try this in your browser.

Now let's get it into Python.

In [None]:
import requests
r = requests.get('https://api.census.gov/data/2015/acs/acs5?get=B01001_001E&for=county')
type(r.text)
# This time, it looks like the data come in a string
print(r.text)

In [None]:
# But turns out that it's actually a JSON
censusdata = r.json()
type(censusdata)

In [None]:
# The JSON format is a list of lists. The first sublist is the column headers
censusdata[:5] # show the first five rows

In [None]:
# So we can also convert this to a pandas dataframe, if we use the first list as the column names
# Note that the state and county are shown by their FIPS codes
import pandas as pd
df = pd.DataFrame(censusdata[1:], columns=censusdata[0])
df

Let's rename the column to something more meaningful. `pandas` has a helpful `rename` function.

In [None]:
df.rename?

In [None]:
# note the inplace keyword changes the dataframe in place, rather than returning a copy
df.rename(columns = {'B01001_001E':'population'}, inplace=True)
df

### Using cenpy
It turns out that there is an easier way to get the census data. Rather than using the Census API, we can call it through the `cenpy` package.

In [None]:
import cenpy
from cenpy import products

# create a connection to the American Community Survey
acs = cenpy.products.ACS()

The [online documentation](https://cenpy-devs.github.io/cenpy/api.html#product-american-community-survey) is helpful in showing the functions that are available. We could also call `help(acs)` or just `acs?`.

The `tables` attribute  seems useful, as do the `filter_tables` and `from_county` functions.

In [None]:
# what tables are available?
acs.tables?

In [None]:
# Let's map the age of the housing stock
# get all the tables that have "BUILT" in their description
acs.filter_tables('BUILT', by='description')

In [None]:
# it looks like table B25035 and variable B25035_001E are promising, Let's see what is here in Riverside County
riverside = products.ACS(2017).from_county('Riverside, CA', level='tract',
                                        variables='B25035_001E')

# you might get a bunch of FutureWarnings, but you can ignore these

In [None]:
# It looks like cenpy gives us a geopandas dataframe
type(riverside)

In [None]:
riverside.head()

In [None]:
# let's rename the census column to something more memorable
riverside.rename(columns={'B25035_001E':'Median year built'}, inplace=True)

In [None]:
riverside.head()

`GEOID` gives the standard census FIPS code, formatted as 2-digit state + 3-digit county + 6 digit tract. Read more about them [here](https://www.policymap.com/2012/08/tips-on-fips-a-quick-guide-to-geographic-place-codes-part-iii/)

`cenpy` also returns the geographic boundaries of each census tract as a polygon. This is helpful! And it means that we can plot the data pretty simply.

Here, we use the standard `geopandas` plotting function. We tell it to plot the `Median year built` column, on the `ax` object that we just created.

In [None]:
import matplotlib.pyplot as plt 

# create a matplotlib figure and axis object
fig, ax = plt.subplots(1,1,figsize=(20,10))

riverside.plot('Median year built', ax=ax, cmap='plasma', legend=True, 
               legend_kwds={'orientation': 'horizontal'})
ax.set_facecolor('k')

There is much that we could do to improve this map, but let's save that for another time. In general, the best course is to follow the numerous examples for `geopandas` that you'll find online.

<div class="alert alert-block alert-info">
<h3>Key Takeaways</h3>
<ul>
  <li>Getting census data is one of the most common tasks you'll do in this course.</li>
  <li>The Census Bureau has a well-documented API, that may be useful for more specialized queries.</li>
  <li>For simple queries, cenpy is a good alternative.</li>
</ul>
</div>