# US Census Bureau API - Population Walkthrough

## Overview
| Detail Tag            | Information                                                                                        |
|-----------------------|----------------------------------------------------------------------------------------------------|
| Originally Created By | Ariel Herrera arielherrera@analyticsariel.com                                                      |
| External References   | <a href="https://www.census.gov/data/developers/data-sets/popest-popproj/popest.html" target="_blank">(1) US Census API - Population</a>|
| Input Datasets        | API key, Address params                                                                                    |
| Output Datasets       | Table with population                                              |
| Input Data Source     | API                                                                                                |
| Output Data Source    | Dataframe                                                                                                   |

## History
| Date         | Developed By  | Reason                                                |
|--------------|---------------|-------------------------------------------------------|
| 25th Jan 2021 | Ariel Herrera | Notebook created to get population data from Census API. |

## Other Details
This Notebook is a prototype.

## <font color="blue">Imports</font>

In [67]:
import requests # request http, api
import pandas as pd # tabluar data
import plotly.express as px

## <font color="blue">Local and Constant Variables</font>

In [3]:
pd.options.display.max_columns = None # show all columns in display

In [4]:
# read api keys file and assign variables
df = pd.read_csv('../data/input/api_keys.csv')
census_api_key = df.loc[df['API'] == 'census']['KEY'].iloc[0]
rapid_api_key = df.loc[df['API'] == 'rapid']['KEY'].iloc[0]

## <font color="blue">Functions</font>

In [9]:
def json_to_dataframe(response):
    """
    Convert response to dataframe
    """
    return pd.DataFrame(response.json()[1:], columns=response.json()[0])

## <font color="blue">Data</font>

### <font color="purple">Population Estimates: Estimates by Age, Sex, Race, and Hispanic Origin</font>
https://api.census.gov/data/2019/pep/population/examples.html

In [7]:
# population united states
url = "https://api.census.gov/data/2019/pep/charage?get=NAME,POP&for=us:*&key={0}"\
    .format(census_api_key)

response = requests.request("GET", url)

In [95]:
response

<Response [200]>

In [96]:
response.text

'[["NAME","POP","HISP","DATE_CODE","DATE_DESC","state"],\n["New York","16010245","1","7","7/1/2014 Population Estimate","36"],\n["New York","3640804","2","7","7/1/2014 Population Estimate","36"],\n["New York","15977166","1","8","7/1/2015 Population Estimate","36"],\n["New York","3677500","2","8","7/1/2015 Population Estimate","36"],\n["New York","15925768","1","9","7/1/2016 Population Estimate","36"],\n["New York","3707660","2","9","7/1/2016 Population Estimate","36"],\n["New York","15864453","1","10","7/1/2017 Population Estimate","36"],\n["New York","3725119","2","10","7/1/2017 Population Estimate","36"],\n["New York","15786810","1","11","7/1/2018 Population Estimate","36"],\n["New York","3743541","2","11","7/1/2018 Population Estimate","36"],\n["New York","15702503","1","12","7/1/2019 Population Estimate","36"],\n["New York","3751058","2","12","7/1/2019 Population Estimate","36"]]'

In [10]:
json_to_dataframe(response)

Unnamed: 0,NAME,POP,us
0,United States,328239523,1


In [17]:
# population united states by year (last 5 on record)
url = "https://api.census.gov/data/2019/pep/charage?get=NAME,POP&DATE_CODE=8,9,10,11,12&DATE_DESC&for=us:*&key={0}"\
    .format(census_api_key)

response = requests.request("GET", url)

In [18]:
json_to_dataframe(response)

Unnamed: 0,NAME,POP,DATE_CODE,DATE_DESC,us
0,United States,320635163,8,7/1/2015 Population Estimate,1
1,United States,322941311,9,7/1/2016 Population Estimate,1
2,United States,324985539,10,7/1/2017 Population Estimate,1
3,United States,326687501,11,7/1/2018 Population Estimate,1
4,United States,328239523,12,7/1/2019 Population Estimate,1


In [43]:
# population united states by state
url = "https://api.census.gov/data/2019/pep/charage?get=NAME,POP&for=state:*&key={0}"\
    .format(census_api_key)

response = requests.request("GET", url)

In [98]:
df_pop_by_state = json_to_dataframe(response)

# convert population to integer
df_pop_by_state['POP'] = df_pop_by_state['POP'].astype(int)

# top 5 states with largest population
df_pop_by_state_sort = df_pop_by_state\
    .sort_values(by=['POP'], ascending=False)\
    .head(5) 
print('Top 5 states with largest population estimates:')
df_pop_by_state_sort

Top 5 states with largest population estimates:


Unnamed: 0,NAME,POP,HISP,DATE_CODE,DATE_DESC,state
0,New York,16010245,1,7,7/1/2014 Population Estimate,36
2,New York,15977166,1,8,7/1/2015 Population Estimate,36
4,New York,15925768,1,9,7/1/2016 Population Estimate,36
6,New York,15864453,1,10,7/1/2017 Population Estimate,36
8,New York,15786810,1,11,7/1/2018 Population Estimate,36


In [52]:
top_5_state_list = df_pop_by_state_sort['state'].tolist()
print('list of top 5 states:', top_5_state_list)

state_str = ','.join(top_5_state_list)
print('list of states as one string:', state_str)

list of top 5 states: ['06', '48', '12', '36', '42']
list of states as one string: 06,48,12,36,42


In [107]:
# population top 5 states for age 16
url = "https://api.census.gov/data/2019/pep/charage?get=NAME,POP&AGE=16&for=state:{0}&key={1}"\
    .format(state_str,census_api_key)

response = requests.request("GET", url)


In [108]:
json_to_dataframe(response)

Unnamed: 0,NAME,POP,AGE,state
0,New York,227902,16,36
1,Pennsylvania,154083,16,42
2,Texas,414872,16,48
3,California,501846,16,6
4,Florida,234530,16,12


In [90]:
# population NY year over year split by hisp
url = "https://api.census.gov/data/2019/pep/charage?get=NAME,POP&HISP=1,2&DATE_CODE=7,8,9,10,11,12&DATE_DESC&for=state:36&key={0}"\
    .format(census_api_key)

response = requests.request("GET", url)

In [91]:
df_hisp_by_year = json_to_dataframe(response)
df_hisp_by_year.head()

Unnamed: 0,NAME,POP,HISP,DATE_CODE,DATE_DESC,state
0,New York,16010245,1,7,7/1/2014 Population Estimate,36
1,New York,3640804,2,7,7/1/2014 Population Estimate,36
2,New York,15977166,1,8,7/1/2015 Population Estimate,36
3,New York,3677500,2,8,7/1/2015 Population Estimate,36
4,New York,15925768,1,9,7/1/2016 Population Estimate,36


In [94]:
fig = px.line(df_hisp_by_year, x="DATE_DESC", y="POP", color="HISP", 
              title='New York Hispanic Population Estimates Year over Year')
fig.show()

## Looking for an <font color="green">EASIER</font> way to access the <B>CENSUS DATA</B>? Checkout AnalyticsAriel wrapper API on RAPID API - https://rapidapi.com/arielherrera/api/us-census-bureau/endpoints

Tutorial videos on YouTube Channel - AnalyticsAriel

# End Notebook