<a href="https://colab.research.google.com/github/analyticsariel/projects/blob/master/How_to_Get_Rental_Zip_Code_Data_from_Census_API_using%C2%A0Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Get Rental Zip Code Data from Census API using Python

## Overview
| Detail Tag            | Information                                                                                        |
|-----------------------|----------------------------------------------------------------------------------------------------|
| Originally Created By | Ariel Herrera arielherrera@analyticsariel.com |
| External References   | API |
| Input Datasets        | Source name |
| Output Datasets       | Source name |
| Input Data Source     | Pandas DataFrame |
| Output Data Source    | Pandas DataFrame |

## History
| Date         | Developed By  | Reason                                                |
|--------------|---------------|-------------------------------------------------------|
| 1st Sep 2022 | Ariel Herrera | Create notebook. |

## Getting Started
1. Copy this notebook -> File -> Save a Copy in Drive
2. Directions

## Useful Resources
- [American Community Survey 5-Year Data API](https://www.census.gov/data/developers/data-sets/acs-5year.html)
- [Request Census API Key](https://api.census.gov/data/key_signup.html)
- [Google Colab Cheat Sheet](https://towardsdatascience.com/cheat-sheet-for-google-colab-63853778c093)

## <font color="blue">Install Packages</font>

## <font color="blue">Imports</font>

In [1]:
from google.colab import output, drive, files # specific to Google Colab
import pandas as pd
import numpy as np
import plotly.express as px
import requests
import warnings

# settings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)

## <font color="blue">Functions</font>

In [2]:
def zip_code_to_str(x):
  x_str = str(x).split('.')[0]
  if len(x_str) == 5:
    return x_str
  elif len(x_str) == 4:
    return '0' + x_str
  elif len(x_str) == 3:
    return '00' + x_str

## <font color="blue">Locals & Constants</font>

In [3]:
############
# OPTIONAL #
############

# mount drive
drive.mount('/content/drive', force_remount=False)

# data location
file_dir = '/content/drive/My Drive/Colab Data/input/' # optional

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
# read in api key file
df_api_keys = pd.read_csv(file_dir + 'api_keys.csv')

# get keys
census_api_key = df_api_keys.loc[df_api_keys['API'] =='census']['KEY'].iloc[0] # replace this with your own key
rapid_api_key = df_api_keys.loc[df_api_keys['API'] =='rapid']['KEY'].iloc[0] # replace this with your own key

## <font color="blue">Data</font>

### <font color="green">Section #1 - API Requests</font> 💻
This section will cover how to make API requests to the [American Community Survey 5-Year Data API](https://www.census.gov/data/developers/data-sets/acs-5year.html) It demonstrates how to modify your search based on different parameters.

In [5]:
variable = 'B25031_001E' # median rent all beds
year = '2020'

#### <font color="purple">1. National Region</font> 👨‍👩‍👧‍👦

In [6]:
# get data by nation
url = 'https://api.census.gov/data/{0}/acs/acs5?get=NAME,{1}&for=us:*&key={2}'\
  .format(year, variable, census_api_key)
# request data
response = requests.request("GET", url)
# view status code
response.status_code

200

In [7]:
# view data
response.text

'[["NAME","B25031_001E","us"],\n["United States","1096","1"]]'

In [8]:
# transform to JSON object
response.json() 

[['NAME', 'B25031_001E', 'us'], ['United States', '1096', '1']]

In [9]:
# read population data as a dataframe
df_national = pd.DataFrame(response.json()[1:], columns=response.json()[0])
df_national

Unnamed: 0,NAME,B25031_001E,us
0,United States,1096,1


In [10]:
# rename columns
df_national = df_national.rename(columns={'B25031_001E': 'median_rent_all_bds'})
df_national

Unnamed: 0,NAME,median_rent_all_bds,us
0,United States,1096,1


#### <font color="purple">2. Zip Code</font> 👨‍👩‍👧‍👦

##### <font color="orange">Get HUD Tract Data</font>

In [11]:
# read zip to tract data from HUD
hud_tract_url = 'https://raw.githubusercontent.com/analyticsariel/market-research-data/main/ZIP_TRACT_122021.csv'
_df_hud_tract = pd.read_csv(hud_tract_url)

# dataframe detail
print('Num of rows:', len(_df_hud_tract))
print('Num of columns:', len(_df_hud_tract.columns))
_df_hud_tract.head()

Num of rows: 172177
Num of columns: 8


Unnamed: 0,zip,tract,usps_zip_pref_city,usps_zip_pref_state,res_ratio,bus_ratio,oth_ratio,tot_ratio
0,683,72023830102,SAN GERMAN,PR,0.000791,0.001116,0.0,0.0008
1,683,72125840700,SAN GERMAN,PR,0.186219,0.370536,0.381643,0.201179
2,683,72125840400,SAN GERMAN,PR,0.300451,0.1875,0.115942,0.290308
3,683,72125840600,SAN GERMAN,PR,0.095325,0.007812,0.0,0.088184
4,683,72121960300,SAN GERMAN,PR,0.042402,0.002232,0.019324,0.039435


In [12]:
# clean zip / tract (issue with cut off IDs, data read as [int] type)
df_hud_tract = _df_hud_tract.copy()
df_hud_tract['zip'] = df_hud_tract.apply(lambda x: zip_code_to_str(x['zip']), axis=1)
df_hud_tract['tract'] = df_hud_tract.apply(lambda x: 
  '0' + str(x['tract']) if len(str(x['tract'])) == 10 else str(x['tract']), axis=1)
df_hud_tract['state_code'] = df_hud_tract.apply(lambda x: x['tract'][:2], axis=1)
df_hud_tract['county_code'] = df_hud_tract.apply(lambda x: x['tract'][2:5], axis=1)
df_hud_tract['tract_code'] = df_hud_tract.apply(lambda x: x['tract'][5:], axis=1)
df_hud_tract.head()

Unnamed: 0,zip,tract,usps_zip_pref_city,usps_zip_pref_state,res_ratio,bus_ratio,oth_ratio,tot_ratio,state_code,county_code,tract_code
0,683,72023830102,SAN GERMAN,PR,0.000791,0.001116,0.0,0.0008,72,23,830102
1,683,72125840700,SAN GERMAN,PR,0.186219,0.370536,0.381643,0.201179,72,125,840700
2,683,72125840400,SAN GERMAN,PR,0.300451,0.1875,0.115942,0.290308,72,125,840400
3,683,72125840600,SAN GERMAN,PR,0.095325,0.007812,0.0,0.088184,72,125,840600
4,683,72121960300,SAN GERMAN,PR,0.042402,0.002232,0.019324,0.039435,72,121,960300


In [13]:
# view sample zipcode
zipcode = '33606' # Hyde Park / Davis Island, Tampa, FL
df_zip_sample = df_hud_tract.loc[df_hud_tract['zip'] == zipcode]
df_zip_sample

Unnamed: 0,zip,tract,usps_zip_pref_city,usps_zip_pref_state,res_ratio,bus_ratio,oth_ratio,tot_ratio,state_code,county_code,tract_code
93377,33606,12057005500,TAMPA,FL,0.164003,0.248222,0.217361,0.178734,12,57,5500
93378,33606,12057006103,TAMPA,FL,0.006683,0.02521,0.006944,0.008748,12,57,6103
93379,33606,12057005401,TAMPA,FL,0.251603,0.122172,0.208333,0.23293,12,57,5401
93380,33606,12057005000,TAMPA,FL,0.182606,0.188106,0.103472,0.175107,12,57,5000
93381,33606,12057006000,TAMPA,FL,0.035853,0.040078,0.013889,0.034068,12,57,6000
93382,33606,12057006101,TAMPA,FL,0.156687,0.085326,0.145139,0.147653,12,57,6101
93383,33606,12057004900,TAMPA,FL,0.202565,0.290886,0.304861,0.22276,12,57,4900


##### <font color="orange">Get Census Data</font>

In [14]:
# prepare parameters
tract_str = ','.join(df_zip_sample['tract_code'].tolist())
state = df_zip_sample['state_code'].iloc[0]
county = df_zip_sample['county_code'].iloc[0]
print('Year:', year)
print('Variable:', variable)
print('Tracts:',  tract_str)
print('State:', state)
print('County:', county)

Year: 2020
Variable: B25031_001E
Tracts: 005500,006103,005401,005000,006000,006101,004900
State: 12
County: 057


In [15]:
# get data from census
url = 'https://api.census.gov/data/{0}/acs/acs5?get=NAME,{1}&for=tract:{2}&in=state:{3}%20county:{4}&key={5}'\
  .format(year, variable, tract_str, state, county, census_api_key)
response = requests.request("GET", url)
response.status_code

200

In [16]:
# read rent data
df_zip = pd.DataFrame(response.json()[1:], columns=response.json()[0])\
  .rename(columns={'B25031_001E': 'median_rent_all_bds'})
print('Num of rows:', len(df_zip))
df_zip.sort_values(by=['median_rent_all_bds'])

Num of rows: 6


Unnamed: 0,NAME,median_rent_all_bds,state,county,tract
5,"Census Tract 61.03, Hillsborough County, Florida",1192,12,57,6103
0,"Census Tract 50, Hillsborough County, Florida",1215,12,57,5000
1,"Census Tract 54.01, Hillsborough County, Florida",1234,12,57,5401
4,"Census Tract 61.01, Hillsborough County, Florida",1700,12,57,6101
3,"Census Tract 60, Hillsborough County, Florida",1754,12,57,6000
2,"Census Tract 55, Hillsborough County, Florida",1807,12,57,5500


In [17]:
print('For zipcode:{0} median rent in {1} is ${2}'.format(zipcode, year, df_zip['median_rent_all_bds'].median()))

For zipcode:33606 median rent in 2020 is $1467.0


### <font color="green">Section #2 - Loops</font> 🏙
Get data for multiple years for a given region

In [18]:
# list of years for Census API
year_list = ['2015', '2016', '2017', '2018', '2019', '2020']

#### <font color="purple">1. Census API</font>

In [19]:
# iterate through list of years
df_list = []
for year in year_list:
  # get median rent for the sample zipcode
  url = 'https://api.census.gov/data/{0}/acs/acs5?get=NAME,{1}&for=tract:{2}&in=state:{3}%20county:{4}&key={5}'\
    .format(year, variable, tract_str, state, county, census_api_key)
  response = requests.request("GET", url)
  _df = pd.DataFrame(response.json()[1:], columns=response.json()[0])
  _df['year'] = year
  df_list.append(_df)

# combine responses into a single dataframe
df_census = pd.concat(df_list)\
  .rename(columns={'B25031_001E': 'median_rent_all_bds'})
df_census['median_rent_all_bds'] = df_census['median_rent_all_bds'].astype(int)
print('Num of rows:', len(df_census))
df_census.head()

Num of rows: 41


Unnamed: 0,NAME,median_rent_all_bds,state,county,tract,year
0,"Census Tract 50, Hillsborough County, Florida",1054,12,57,5000,2015
1,"Census Tract 61.01, Hillsborough County, Florida",1195,12,57,6101,2015
2,"Census Tract 54.01, Hillsborough County, Florida",1045,12,57,5401,2015
3,"Census Tract 61.03, Hillsborough County, Florida",921,12,57,6103,2015
4,"Census Tract 60, Hillsborough County, Florida",1039,12,57,6000,2015


In [20]:
# group by year
df_census_grp = df_census.groupby(['year'])['median_rent_all_bds'].median().reset_index()
df_census_grp

Unnamed: 0,year,median_rent_all_bds
0,2015,1054.0
1,2016,1160.0
2,2017,1205.0
3,2018,1314.0
4,2019,1416.0
5,2020,1467.0


#### <font color="purple">2. US Housing Market Data API</font> 🏘
Get enriched housing and economic datasets by signing up for the [US Housing Market Data API](https://bit.ly/3AHH7sY).

In [21]:
# get enriched dataset of housing and economic data
url = "https://us-housing-market-data.p.rapidapi.com/getZipcodeEnriched"

querystring = {"zipcode":zipcode}

headers = {
	"X-RapidAPI-Key": rapid_api_key,
	"X-RapidAPI-Host": "us-housing-market-data.p.rapidapi.com"
}

response = requests.request("GET", url, headers=headers, params=querystring)
df_zip_e = pd.DataFrame.from_dict(response.json(), orient='index')
print('Num of rows: {}'.format(len(df_zip_e)))
print('Num of columns: {}'.format(len(df_zip_e.columns)))
df_zip_e.tail()

Num of rows: 72
Num of columns: 80


Unnamed: 0,census.year,census.state,census.zipcode,census.total_population,census.total_population_sex_male,census.total_population_sex_female,census.total_population_race_white,census.total_population_race_black,census.total_population_race_aian,census.total_population_race_asian,census.total_population_race_api,census.total_population_poverty,census.median_household_income,census.median_rent_all_bds,census.median_rent_0_beds,census.median_rent_1_beds,census.median_rent_2_beds,census.median_rent_3_beds,census.median_rent_4_beds,census.median_rent_5+_beds,redfin.period_begin,redfin.period_end,redfin.period_duration,redfin.region_type,redfin.region_type_id,redfin.table_id,redfin.is_seasonally_adjusted,redfin.region,redfin.city,redfin.state,redfin.state_code,redfin.property_type,redfin.property_type_id,redfin.median_sale_price,redfin.median_sale_price_mom,redfin.median_sale_price_yoy,redfin.median_list_price,redfin.median_list_price_mom,redfin.median_list_price_yoy,redfin.median_ppsf,redfin.median_ppsf_mom,redfin.median_ppsf_yoy,redfin.median_list_ppsf,redfin.median_list_ppsf_mom,redfin.median_list_ppsf_yoy,redfin.homes_sold,redfin.homes_sold_mom,redfin.homes_sold_yoy,redfin.pending_sales,redfin.pending_sales_mom,redfin.pending_sales_yoy,redfin.new_listings,redfin.new_listings_mom,redfin.new_listings_yoy,redfin.inventory,redfin.inventory_mom,redfin.inventory_yoy,redfin.months_of_supply,redfin.months_of_supply_mom,redfin.months_of_supply_yoy,redfin.median_dom,redfin.median_dom_mom,redfin.median_dom_yoy,redfin.avg_sale_to_list,redfin.avg_sale_to_list_mom,redfin.avg_sale_to_list_yoy,redfin.sold_above_list,redfin.sold_above_list_mom,redfin.sold_above_list_yoy,redfin.price_drops,redfin.price_drops_mom,redfin.price_drops_yoy,redfin.off_market_in_two_weeks,redfin.off_market_in_two_weeks_mom,redfin.off_market_in_two_weeks_yoy,redfin.parent_metro_region,redfin.parent_metro_region_metro_code,redfin.last_updated,redfin.zipcode,redfin.year
67,2020,12,33606,36749,17503,19246,30220,3121,157,846,0,2876,115096.0,1467.0,1205.0,1179.0,1743.0,2054.0,1742.0,,2020-11-01,2021-01-31,90,zip code,2,14272,f,Zip Code: 33606,,Florida,FL,Single Family Residential,6,910000.0,-0.018815,0.116564,914850.0,-0.005598,-0.010973,388.565085,0.036721,0.040327,398.890736,-0.032381,0.056684,47.0,0.021739,0.46875,8.0,-0.6,-0.333333,39.0,-0.093023,0.083333,21.0,0.05,-0.475,,,,43.5,2.5,-40.0,0.984205,-0.001095,0.024032,0.170213,0.039778,0.076463,,,,0.625,-0.025,0.041667,"Tampa, FL",45300,2022-08-14 14:44:22,33606,2020
68,2020,12,33606,36749,17503,19246,30220,3121,157,846,0,2876,115096.0,1467.0,1205.0,1179.0,1743.0,2054.0,1742.0,,2020-04-01,2020-06-30,90,zip code,2,14272,f,Zip Code: 33606,,Florida,FL,Single Family Residential,6,928750.0,0.218033,0.077436,950000.0,0.279462,0.131087,353.477081,0.068033,0.043833,368.117798,0.036848,0.031853,44.0,0.1,-0.137255,19.0,0.727273,0.0,55.0,-0.083333,0.018519,40.0,-0.130435,0.025641,,,,21.5,-34.5,-20.5,0.96618,0.002929,0.00669,0.068182,0.018182,0.009358,,,,0.526316,-0.110048,0.052632,"Tampa, FL",45300,2022-08-14 14:44:22,33606,2020
69,2020,12,33606,36749,17503,19246,30220,3121,157,846,0,2876,115096.0,1467.0,1205.0,1179.0,1743.0,2054.0,1742.0,,2020-07-01,2020-09-30,90,zip code,2,14272,f,Zip Code: 33606,,Florida,FL,Single Family Residential,6,849000.0,-0.0944,0.078095,948250.0,-0.196398,0.355611,368.117798,-0.013694,0.044758,389.404494,-0.001999,0.072374,53.0,0.0,0.261905,12.0,0.0,-0.076923,54.0,0.0,-0.018182,30.0,0.0,-0.4,,,,34.5,17.0,-7.5,0.978935,-0.000202,0.009629,0.150943,0.0,0.055705,,,,0.666667,0.083333,0.205128,"Tampa, FL",45300,2022-08-14 14:44:22,33606,2020
70,2020,12,33606,36749,17503,19246,30220,3121,157,846,0,2876,115096.0,1467.0,1205.0,1179.0,1743.0,2054.0,1742.0,,2020-02-01,2020-04-30,90,zip code,2,14272,f,Zip Code: 33606,,Florida,FL,Single Family Residential,6,773062.5,-0.0025,0.023924,778062.5,-0.056322,-0.180986,344.006492,-0.026953,0.047434,355.293759,-0.058558,-0.043659,42.0,0.2,0.135135,12.0,0.2,0.090909,56.0,-0.034483,0.056604,43.0,0.02381,-0.104167,,,,30.0,-63.0,-31.5,0.991364,-0.002061,0.029502,0.142857,0.0,0.088803,,,,0.583333,-0.016667,0.128788,"Tampa, FL",45300,2022-08-14 14:44:22,33606,2020
71,2020,12,33606,36749,17503,19246,30220,3121,157,846,0,2876,115096.0,1467.0,1205.0,1179.0,1743.0,2054.0,1742.0,,2020-01-01,2020-03-31,90,zip code,2,14272,f,Zip Code: 33606,,Florida,FL,Single Family Residential,6,775000.0,-0.060606,0.152416,824500.0,-0.173434,-0.096438,353.535354,-0.057537,0.099539,377.393228,-0.032475,0.034347,35.0,0.060606,0.25,10.0,-0.375,-0.090909,58.0,0.348837,0.074074,42.0,0.3125,-0.086957,,,,93.0,16.5,42.0,0.993425,0.005691,0.03632,0.142857,0.021645,0.071429,,,,0.6,-0.15,-0.036364,"Tampa, FL",45300,2022-08-14 14:44:22,33606,2020


### <font color="green">Section #3 - Visualization</font> 📈
Visualize housing and economic data trends over time

Source [US Census Bureau Median Gross Rent by Bedrooms](https://api.census.gov/data/2020/acs/acs5/variables/B25031_001E.json)

In [22]:
fig = px.line(df_census_grp, x='year', y='median_rent_all_bds', title='Median Rent by Year for {}'.format(zipcode))
fig.show()

Source [US Census Bureau Median Gross Rent by Bedrooms](https://api.census.gov/data/2020/acs/acs5/variables/B25031_001E.json)

In [27]:
rent_cols = ['census.median_rent_all_bds',	'census.median_rent_0_beds',	
             'census.median_rent_1_beds',	'census.median_rent_2_beds',	'census.median_rent_3_beds']

# select relevant cols
df_plot = df_zip_e[['redfin.year'] + rent_cols].drop_duplicates()

# melt df
df_plot = pd.melt(df_plot, id_vars=['redfin.year'], value_vars=rent_cols, 
                  var_name='num_bds', value_name='median_rent')
df_plot.head(6)

Unnamed: 0,redfin.year,num_bds,median_rent
0,2015,census.median_rent_all_bds,1054.0
1,2016,census.median_rent_all_bds,1160.0
2,2017,census.median_rent_all_bds,1312.0
3,2018,census.median_rent_all_bds,1314.0
4,2019,census.median_rent_all_bds,1444.0
5,2020,census.median_rent_all_bds,1467.0


In [28]:
fig = px.line(df_plot, x='redfin.year', y='median_rent', 
              color='num_bds', title='Population by Year for {}'.format(zipcode))
fig.show()

## <font color="blue">Output</font>

In [24]:
# # download file
# df.to_csv('output.csv', index=False)
# files.download('output.csv')

# End Notebook