In [1]:
import glob
import json
import requests
import pandas as pd
from pprint import pprint

# Census Examples 

This notebook grabs data from the US Census in Python using DataMade's Census API wrapper
- https://github.com/datamade/census
-  https://pypi.org/project/census/

> 💡 note: You may also want to check out [tidycensus in R](https://walker-data.com/tidycensus/). I might make a notebook later that does the same with that library.

### Step 1 | Get a Census API key and replace it in the cell below

In [2]:
from us import states # US state abbreviations, and a few other things

# datamade's Census package
from census import Census
c = Census("YOUR-CENSUS-API-KEY-HERE")

### Step 2 | Figure out what tables you want data from

Use https://censusreporter.org/ to figure out which tables you want. 
- Scroll to the bottom of the page to see the tables. 
- If you already know the table ID, stick that in the "Explore" section to learn more about that table.

In [3]:
TABLE = 'B01003' #population

# Here I use data from the 5-year American Community Survey
# In this python package that is "c.acs5"
# Check DataMade's documentation for other options like acs1, pf, sf1, etc...
# https://pypi.org/project/census/
for t in c.acs5.tables():
    if TABLE in t['name']:
        pprint(t)
        print("\n")

        variables_url = t['variables']
        response = requests.get(variables_url).json()
        print(f"Variables for table {t['name']}, {t['description']}:")
        variables = pd.DataFrame(response['variables'])
        display(variables)

{'description': 'TOTAL POPULATION',
 'name': 'B01003',
 'universe ': 'TOTAL_POP',
 'variables': 'http://api.census.gov/data/2020/acs/acs5/groups/B01003.json'}


Variables for table B01003, TOTAL POPULATION:


Unnamed: 0,B01003_001E,B01003_001M,B01003_001MA,B01003_001EA
label,Estimate!!Total,Margin of Error!!Total,Annotation of Margin of Error!!Total,Annotation of Estimate!!Total
concept,TOTAL POPULATION,TOTAL POPULATION,TOTAL POPULATION,TOTAL POPULATION
predicateType,int,int,string,string
group,B01003,B01003,B01003,B01003
limit,0,0,0,0
predicateOnly,True,True,True,True
universe,TOTAL_POP,TOTAL_POP,TOTAL_POP,TOTAL_POP


In the cell below, I get the population by zipcode for the 2019 5-year ACS in New York State.

If you want the data by a different geography like county, census tract or "place" (NYC is a "census place"), see the documentation for the Census package.
https://pypi.org/project/census/

You may want to use  `state_place`, `state_county` or some other function to get the data for a different geography.

In [17]:
year = 2019
state = states.NY
population = pd.DataFrame(
    c.acs5.state_zipcode(
        fields = ['NAME'] + list(variables.columns),
        state_fips = state.fips, 
        year = year,
        zcta='*',
        table=[TABLE]))\
    .rename(columns={'zip code tabulation area':'zip'})

population['state'] = population.state.apply(lambda x: states.lookup(x).name)
population = population[['zip', 'state', 'NAME'] + list(variables.columns)]
population

Unnamed: 0,zip,state,NAME,B01003_001E,B01003_001M,B01003_001MA,B01003_001EA
0,10924,New York,ZCTA5 10924,13508.0,462.0,,
1,11366,New York,ZCTA5 11366,14360.0,857.0,,
2,11364,New York,ZCTA5 11364,36215.0,1308.0,,
3,13803,New York,ZCTA5 13803,4243.0,329.0,,
4,11374,New York,ZCTA5 11374,43507.0,1626.0,,
...,...,...,...,...,...,...,...
1789,14063,New York,ZCTA5 14063,14302.0,474.0,,
1790,13807,New York,ZCTA5 13807,1163.0,199.0,,
1791,12507,New York,ZCTA5 12507,250.0,64.0,,
1792,12858,New York,ZCTA5 12858,116.0,61.0,,


Great! We have the data. But I want to replace the headers with more human-readable labels. 

Let's grab those from the variables response we got earlier.

In [18]:
labels = dict(variables.loc['label'])
labels

{'B01003_001E': 'Estimate!!Total',
 'B01003_001M': 'Margin of Error!!Total',
 'B01003_001MA': 'Annotation of Margin of Error!!Total',
 'B01003_001EA': 'Annotation of Estimate!!Total'}

In [19]:
population.rename(columns=labels)

Unnamed: 0,zip,state,NAME,Estimate!!Total,Margin of Error!!Total,Annotation of Margin of Error!!Total,Annotation of Estimate!!Total
0,10924,New York,ZCTA5 10924,13508.0,462.0,,
1,11366,New York,ZCTA5 11366,14360.0,857.0,,
2,11364,New York,ZCTA5 11364,36215.0,1308.0,,
3,13803,New York,ZCTA5 13803,4243.0,329.0,,
4,11374,New York,ZCTA5 11374,43507.0,1626.0,,
...,...,...,...,...,...,...,...
1789,14063,New York,ZCTA5 14063,14302.0,474.0,,
1790,13807,New York,ZCTA5 13807,1163.0,199.0,,
1791,12507,New York,ZCTA5 12507,250.0,64.0,,
1792,12858,New York,ZCTA5 12858,116.0,61.0,,


# Hope that helps!