## Census API 

This notebook uses API (Application programming interface) to accesss data from the Census Bureau. In many cases the API requires a key in order to fetch data. In the case of the Census API a key is only required for more then 500 calls per day. Therefore, we will not require getting an API key to run this notebook (yay!). 


This notebook provides access to specific data provided through the Census Bureau API. The code is written so that it enables flexiblity in the type of data, granularity, and period. The Census API includes many datasets and can be personalized in multiple ways.  





In [None]:
%matplotlib inline 
import requests
import pandas as pd
import json
import pickle
import geopandas as gpd
import matplotlib 
import matplotlib.pyplot as plt
import geopandas as gpd
from variables import *

# ACS 2019 data columns 

Resources: 
- Census API page https://www.census.gov/data/developers/data-sets.html 
- Census API availvale datasets https://api.census.gov/data.html 


The code in the next section provides easy-to-use code snipts for accessing a pre-defined list variables based on topics. At this time race and age related variables are avaialble. Note that these variables are being loaded from the variables.py script and are imported in the import line. 

#### Race:
- Percent White
- Percetn Black
- Percent American Indian and Alaska Native
- Percent Asian 
- Percent NativeH awaiian and Other Pacific Islander
- Percent Some Other Race
- Percent Hispanic Or Latino

#### Age: 
-  Precent Age Under 5
- Precent Age 5 to 9
- Precent Age 10 to 14
- Precent Age 15 to 19
- Precent Age 20 to 24 
- Precent Age 25 to 34
- Precent Age 35 to 44
- Precent Age 45 to 54
- Precent Age 55 to 59 
- Precent Age 60 to 64
- Precent Age 65 to 74
- Precent Age 75 to 84 
- Precent Age 85 and Older


### How does it work? 

Data for each one of these group categories can be accessed from the 5 years estimates ACS data using the code below. To download given data you will need to specify the parameters in one cell (see details below), and then run the remaining code for the section without making any futher changes. Running these cells will fetch the data, make some formating transfroamtions, re-arrange the data, rename columns, create a map of the data, and eventually export the data as a csv file into your local machine.









In [None]:
# print list of variables 

print('Race -->',RACE)
print('Age -->',AGE)

# 1. Census Tract level data

Choose your list of variables for the API call

In [None]:
#Variables for base url
year='2017' 
data='acs'
data_name='acs5/profile'
# this comes from the list of variables 
columns = RACE 
state='36'
county='005,047,061,081,085'

In [None]:
#first set the base url for acs19
acs_url = f'http://api.census.gov/data/{year}/{data}/{data_name}'
#now set the the data url
data_url = f'{acs_url}?get={columns}&for=tract:*&in=state:{state}&in=county:{county}'
#data_url

In [None]:
#retrieve the data
response=requests.get(data_url)
#print(response.text)

These results come in text form. We will transform them into json as an intermidiate step before turning them 
to a data frame 

# Transform API results into DataFrame


In [None]:
#make dataframe:

#1. turn response into json
popdata=response.json()
popdata 

# transforminto a DF

df=pd.DataFrame(popdata[1:], columns=popdata[0])

df

# change column names using a dictionary 

Column names by default are generated from the API call as the variable codes. Which for the human eye are useless strings! That's where the .py script comes into play again. RACE_GROUPS is a dictionary that is used to inidcate which string belongs to which category. The result is a DataFrame that includes column names that are convinient for the human eye. 

In [None]:
df = df.rename(columns = RACE_GROUPS)
df

## Export as csv 

In [None]:
from datetime import date
#save csv using the data paremeters 

#1. get today's data 
today = date.today()
today = today.strftime("%d%m%y")

#2.save the data using the pre-defined parameters 
home= data+year+'_'+today 
df.to_csv(home+'.csv', sep=',')


## Name and save Data Frame as CSV 

# 2. Get data from API for ACS 5 years in ZCTA

## Set parameters for API



In [None]:
#load ZCTA list by borugh from pickle

with open('BXzip.pkl', 'rb') as f: 
    BXzip = pickle.load(f)
    
with open('BKzip.pkl', 'rb') as f: 
    BKzip = pickle.load(f)

with open('MNzip.pkl', 'rb') as f: 
    MNzip = pickle.load(f)
    
with open('SIzip.pkl', 'rb') as f: 
    SIzip = pickle.load(f)
    
with open('QNzip.pkl', 'rb') as f: 
    QNzip = pickle.load(f)

In [None]:
year='2019' 
data='acs'
data_name='acs5'
columns='B01001_002E,B06009_005E' 
state='36'
zip_code= BXzip + BKzip

#may hard code state and data type, data name can change but column names are not always the sameacross yers 
 



In [None]:
#retrieve the data
response=requests.get(data_url)
#print(response.text)
# find way to give feedbak: success/or unsuccessful instead of printing the response

In [None]:
#make dataframe:

#1. turn response into json
popdata=response.json()
popdata 
#DF
dfZCTA =pd.DataFrame(popdata[1:], columns=popdata[0])


popdata[1:2][:1]




In [None]:
dfZCTA.head(3)


## Plotting

In [None]:
nycZCTA = gpd.read_file('/Users/avigailvantu/Documents/DoH/zcta/zctashape')
nycZCTA.plot(column='zcta', colormap ='plasma')
plt.axis('off')

## Merge ZCTA shapfile with ACS API results

In [None]:
#merge inner all NYC ZCTA and the API results
bxplot = nycZCTA.merge(dfZCTA, left_on='zcta', right_on='zip code tabulation area')
#need to generalize so that columns are generatede in plot script based on DF columns
bxplot.plot(column='B01001_002E',colormap='cool', k=4,legend=True, figsize=(12,10))
plt.axis('off')

## Save data as ShapeFile 

In [None]:
bxplot.to_file('CensusZCTA.shp')

## Your turn:

Go through all sections of this notebook as well as the Census Bureau's API documentation (https://www.census.gov/data/developers/data-sets.html). Take your time to familiarize yourself with the Census' developers page and the diffrent datasets available.

Use the existing code to branch out and create soemthing new. Here are some initial/partial *ideas*: 

1. Write a program that fetchs data in a diffrent geogrpahical unit (i.e. not census tracts or ZCTA),and merge it with geogprahical data.
2. Write a program that fetches new types of data and time periods. 
3. Add a new set of variables to the variables.py script, then make the needed changes to the code so that it runs the data and renames the columns.
4. Add other functionalities not specified in this notebook. These can be automating the notebook or generlizing it in a way that enables scaliblity and sustianability of the code. 

**Important:**
- Keep your code clean, organized, and reproducible. 
- Remove un-needed cells or sections. Carefully consider every part of your answer and make sure all sections fit together. 
- Use markdowns to explain your work and thought process. 
- If needed: include data visualizations. 
