# Accessing Data From The US Census

This notebook contains code to access data from the US Census' API. There are numerous datasets available ([*link*](https://www.census.gov/data/developers/data-sets.html)).

The **American Community Survey** (ACS, [*link*](https://www.census.gov/programs-surveys/acs)) contains various demographic data collected across the country. There are four versions of this survey based on the timeframe of the collected data (*1 to 5 years of data*) and the granularity of location (*city, state, ZIP Code*). The website offers guidelines for when to use which survey ([*link*](https://www.census.gov/programs-surveys/acs/guidance/estimates.html)).

The schedule for the release of the data is located [here](https://www.census.gov/programs-surveys/acs/news/data-releases/2020/release-schedule.html). As of March 12, 2022 the 2019 ACS was the most recent release, which does not account for the effects of the lockdowns. The 2020 survey is planned to be released on March 17, 2022.

The [examples](https://api.census.gov/data/2019/acs/acs5/examples.html) page contains the formatting necessary to make calls and lists the geographic area of data that is available.

In [24]:
# Import the necessary libraries
import pandas as pd
import geopandas as gpd
import numpy as np
import requests
import json
from datetime import date
import os
from tqdm import tqdm
from sodapy import Socrata

import folium

# I. Get All Data

## I.A. Function To Get State Level Data
https://api.census.gov/data/2019/acs/acs1?get=NAME,B01001_001E&for=state:01


You must match the zipcode with the [state code](https://api.census.gov/data/2019/acs/acs1?get=NAME&for=state:*) in order to get the data.

For instance *36* is the code for NY.

In [2]:
#A function for the API call
# https://www.w3schools.com/python/ref_requests_response.asp

def obtain_census_data(year, codes, state):
    state_code_url = 'https://api.census.gov/data/{}/acs/acs1?get=NAME,{}&for=state:{}'.format(year, codes, state)
    state_code_content = requests.get(state_code_url).json()
    return state_code_content

In [3]:
# An example call for median income
obtain_census_data(year = '2019', codes = 'B19326_001E', state = '36')

[['NAME', 'B19326_001E', 'state'], ['New York', '36165', '36']]

The survey's variable list is [here](https://api.census.gov/data/2019/acs/acs1/variables.html), the variable codes are not always the same year to year and survey version to survey version (*i.e. the 5 year and 1 year surveys*). There are over 31,000 variables in this dataset, loading and searching through them may take a while.

In [4]:
census_codes = {
    "Total_Pop": "B01001_001E",
    "Total_Pop_Male": "B01001_002E",
    "Total_Pop_Female": "B01001_026E",
    "Median_Age": "B01002_001E",
    "Median_Age_Male": "B01002_002E",
    "Median_Age_Female": "B01002_003E",
}

In [5]:
inv_census_codes = {v: k for k, v in census_codes.items()}

In [6]:
# An inverted dictionary
inv_census_codes

{'B01001_001E': 'Total_Pop',
 'B01001_002E': 'Total_Pop_Male',
 'B01001_026E': 'Total_Pop_Female',
 'B01002_001E': 'Median_Age',
 'B01002_002E': 'Median_Age_Male',
 'B01002_003E': 'Median_Age_Female'}

In [7]:
#Inverts the dictionary so the columns can be renamed, add keys from the forthcoming index dictionary to this dictionary
inv_census_codes = {v: k for k, v in census_codes.items()}
# inv_census_codes.update({'District_Name':'District_Name', 'CD': 'CD', 'State_Id': 'State_Id','State': 'State', 'CD_Id_Year': 'CD_Id_Year'})

#Creates a string of codes to be used in the API call
columns_url = ''

for key in census_codes:
    columns_url += census_codes[key] + ','
    
columns_url = columns_url[:-1]

In [8]:
obtain_census_data(year = '2019', codes = columns_url, state = '36')

[['NAME',
  'B01001_001E',
  'B01001_002E',
  'B01001_026E',
  'B01002_001E',
  'B01002_002E',
  'B01002_003E',
  'state'],
 ['New York', '19453561', '9450810', '10002751', '39.2', '37.6', '40.8', '36']]

In [9]:
# https://api.census.gov/data/{}/acs/acs1?get=NAME,{}&for=state:{}
# https://support.socrata.com/hc/en-us/articles/360051168614-US-Census-Gateway-Plugin

In [10]:
all_states_census_raw = obtain_census_data(year = '2019', codes = columns_url, state = '*')

In [11]:
# The first list is the columns
all_states_census_raw[0]

['NAME',
 'B01001_001E',
 'B01001_002E',
 'B01001_026E',
 'B01002_001E',
 'B01002_002E',
 'B01002_003E',
 'state']

In [12]:
# We can reverse the codes so the Data Frame's columns will be easy to read
columns = [inv_census_codes.get(item,item)  for item in all_states_census_raw[0]]

In [13]:
all_states_census_df = pd.DataFrame(all_states_census_raw[1:],columns = columns)

In [17]:
all_states_census_df.head()

Unnamed: 0,NAME,Total_Pop,Total_Pop_Male,Total_Pop_Female,Median_Age,Median_Age_Male,Median_Age_Female,state
0,Mississippi,2976149,1434957,1541192,38.3,36.9,39.5,28
1,Missouri,6137428,3008169,3129259,38.9,37.6,40.2,29
2,Montana,1068778,537170,531608,40.5,39.3,41.7,30
3,Nebraska,1934408,966650,967758,36.8,35.7,37.8,31
4,Nevada,3080156,1544779,1535377,38.4,37.6,39.0,32


## I.B. Load the Geo Files

-https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html

In [15]:
usmap_df = gpd.read_file('us_map_data/cb_2018_us_state_500k/cb_2018_us_state_500k.shp')

In [16]:
usmap_df.head()

Unnamed: 0,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry
0,28,1779790,0400000US28,28,MS,Mississippi,0,121533519481,3926919758,"MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ..."
1,37,1027616,0400000US37,37,NC,North Carolina,0,125923656064,13466071395,"MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ..."
2,40,1102857,0400000US40,40,OK,Oklahoma,0,177662925723,3374587997,"POLYGON ((-103.00257 36.52659, -103.00219 36.6..."
3,51,1779803,0400000US51,51,VA,Virginia,0,102257717110,8528531774,"MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ..."
4,54,1779805,0400000US54,54,WV,West Virginia,0,62266474513,489028543,"POLYGON ((-82.64320 38.16909, -82.64300 38.169..."


## I.C. Combine The Survey and Map Data

In [18]:
geoJSON_df = pd.merge(left = usmap_df,
                       right = all_states_census_df,
                       how = "left", 
                       left_on = ["GEOID", "NAME"],
                       right_on = ["state", "NAME"]
                       )

In [22]:
geoJSON_df.head()

Unnamed: 0,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry,Total_Pop,Total_Pop_Male,Total_Pop_Female,Median_Age,Median_Age_Male,Median_Age_Female,state
0,28,1779790,0400000US28,28,MS,Mississippi,0,121533519481,3926919758,"MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...",2976149,1434957,1541192,38.3,36.9,39.5,28
1,37,1027616,0400000US37,37,NC,North Carolina,0,125923656064,13466071395,"MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...",10488084,5094327,5393757,39.1,37.6,40.5,37
2,40,1102857,0400000US40,40,OK,Oklahoma,0,177662925723,3374587997,"POLYGON ((-103.00257 36.52659, -103.00219 36.6...",3956971,1962477,1994494,37.0,35.6,38.3,40
3,51,1779803,0400000US51,51,VA,Virginia,0,102257717110,8528531774,"MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...",8535519,4201799,4333720,38.5,37.2,40.0,51
4,54,1779805,0400000US54,54,WV,West Virginia,0,62266474513,489028543,"POLYGON ((-82.64320 38.16909, -82.64300 38.169...",1792147,885861,906286,42.9,41.7,44.3,54


# II. Basic Choropleth Map

In [40]:
# url = (
#     "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
# )
# state_geo = f"{url}/us-states.json"
# state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
# state_data = pd.read_csv(state_unemployment)

m = folium.Map(location=[38, -98], zoom_start=2,tiles=None)
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m)
m



In [43]:

m = folium.Map(location=[48, -102], zoom_start=3)

folium.Choropleth(
    geo_data=geoJSON_df.geometry,
    name="choropleth",
    data=geoJSON_df[,
    columns=["NAME", "Total_Pop"],
    key_on="geometry",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Unemployment Rate (%)",
).add_to(m)

folium.LayerControl().add_to(m)

m

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''