Author: Dana Chermesh, Regional Planning intern; NYC DCP<br>
Summer 2018

### _US Metros comparison  Notebook no.3_
# Housing 2010, 2016
- **_ACS 5-yesr estimates 2012-2016 using Census API_**
- **_ACS 5-yesr estimates 2006-2010 using Census API_**

----

A user guide for Census Data API:

# [Census Data API User Guide](https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf)

The Census Data API in an API that gives the public access to raw statistical data from various Census Bureau data
programs. In terms of space, we aggregate the data and usually associate them with a
certain Census geographic boundary/area defined by a FIPS code. 

## _get your API key from:_ 
https://api.census.gov/data/key_signup.html

**Recommended:** In order to keep your API key confidential, please save your API key in a .py file named **censusAPI.py** as follows:

```python
myAPI = 'XXXXXXXXXXXXXXX'
```
Then read into this notebook as in the following cell:
```python
from censusAPI import myAPI
```

### The complete list of all available datasets for the API is located here:
https://api.census.gov/data.html


In [1]:
# imports for reading in, munging and calculating data
import pandas as pd
import json
import requests 
import urllib
import numpy as np

# reading in my api key saved in censusAPI.py as
# myAPI = 'XXXXXXXXXXXXXXX'
# request an api key in: https://api.census.gov/data/key_signup.html
from censusAPI import myAPI

# Python 3 compatibility
from __future__ import print_function, division

# Spatial
import geopandas as gpd
import fiona
import shapely

# Plotting
import matplotlib.pylab as pl
import seaborn as sns
sns.set_style('whitegrid')

%pylab inline

Populating the interactive namespace from numpy and matplotlib


----
# Housing units 2016
### _data were obtained from the  ACS 2012-2016 5-year estimate, all counties in the US_
variables to be acquired:
- **B25001_001E** |	Total Housing Units (occupied+vacant)
- **B25003_002E** | Owner occupied
- **B25003_003E** | Renter occupied

In [2]:
#read in in the variables available. the info you need is in the 1year ACS data
url = "https://api.census.gov/data/2016/acs/acs5/variables.json"
resp = requests.request('GET', url)
aff1y = json.loads(resp.text)

In [3]:
#turning things into arrays to enable broadcasting
#Python3
affkeys = np.array(list(aff1y['variables'].keys()))

affkeys

array(['B01001B_017E', 'B16005G_005E', 'B07004H_004E', ..., 'B12002_069E',
       'B17010H_012E', 'B08119_053E'], dtype='<U14')

In [4]:
# keyword for POP estimates
totalHU = 'B25001_001E'
owner = 'B25003_002E'
renter = 'B25003_003E'

aff1y['variables'][totalHU]

{'attributes': 'B25001_001M,B25001_001MA,B25001_001EA',
 'concept': 'HOUSING UNITS',
 'group': 'B25001',
 'label': 'Estimate!!Total',
 'limit': 0,
 'predicateType': 'int'}

In [5]:
# HU2016 data for all counties in the US
totalHU16 = pd.read_json('https://api.census.gov/data/2016/acs/acs5?get='+
                         totalHU + ',' +
                         owner + ',' +
                         renter +',NAME&for=county:*&in=state:*')
totalHU16.columns = totalHU16.iloc[0]
totalHU16 = totalHU16[1:]

totalHU16['state'] = totalHU16['state'].apply(lambda x: '{0:0>2}'.format(x))
totalHU16['county'] = totalHU16['county'].apply(lambda x: '{0:0>3}'.format(x))
totalHU16['STCO'] = totalHU16[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

totalHU16 = totalHU16.drop(['state', 'county'], axis=1)
totalHU16.columns = ['TotalHousing16', 'Owners16', 'renters16',
                     'Name', 'STCO']

print(totalHU16.shape)
totalHU16.head()

(3220, 5)


Unnamed: 0,TotalHousing16,Owners16,renters16,Name,STCO
1,22714,15218,5582,"Autauga County, Alabama",1001
2,107579,53905,21244,"Baldwin County, Alabama",1003
3,11802,5829,3293,"Barbour County, Alabama",1005
4,8972,5119,1929,"Bibb County, Alabama",1007
5,23850,16254,4365,"Blount County, Alabama",1009


##  Reading in geo-coded dataset
created on a different notebook, please refer to [notebook no.0: 0-US_Metro_Comparison_Geographies.ipynb](https://github.com/NYCPlanning/rp-USmetros_comparison/blob/master/0-US_Metro_Comparison_Geographies.ipynb)

In [6]:
geo = pd.read_csv('../rp-USmetros_comparison/data/USmetros_full_correct.csv')\
                       .drop(['Unnamed: 0'], axis=1)#.set_index('County_name')
geo['STCO'] = geo['STCO'].apply(lambda x: '{0:0>5}'.format(x))

print(geo.shape)
geo.head()

(274, 4)


Unnamed: 0,CSA,CSA_name,County_name,STCO
0,348,"Los Angeles-Long Beach, CA",Riverside,6065
1,348,"Los Angeles-Long Beach, CA",San Bernardino,6071
2,348,"Los Angeles-Long Beach, CA",Ventura,6111
3,176,"Chicago-Naperville, IL-IN-WI",Cook,17031
4,488,"San Jose-San Francisco-Oakland, CA",Alameda,6001


In [7]:
for i in geo['CSA'].unique():
    print('No. of counties in CSA {}: {}'\
          .format(i, geo[geo['CSA']==i].shape[0]))

No. of counties in CSA 348: 5
No. of counties in CSA 176: 19
No. of counties in CSA 488: 12
No. of counties in CSA 216: 12
No. of counties in CSA 408: 31
No. of counties in CSA 148: 19
No. of counties in CSA 428: 16
No. of counties in CSA 548: 40
No. of counties in CSA 370: 7
No. of counties in CSA 122: 39
No. of counties in CSA 220: 10
No. of counties in CSA 378: 21
No. of counties in CSA 206: 20
No. of counties in CSA 288: 14
No. of counties in CSA 500: 9


In [8]:
STCO = list(geo['STCO'])

print(type(STCO))
print(len(STCO))
STCO[:5]

<class 'list'>
274


['06065', '06071', '06111', '17031', '06001']

In [9]:
STCOstr = ",".join(STCO)
STCOstr

'06065,06071,06111,17031,06001,06013,06041,06055,06069,06075,06077,06081,06085,06087,06095,06097,06037,06059,08001,08005,08013,08014,08019,08031,08035,08039,08047,08059,08093,08123,09001,09005,09009,34003,34013,34017,34019,34021,34023,34025,34027,34029,34031,34035,34037,34039,34041,36005,36027,36047,36059,36061,36071,36079,36081,36085,36087,36103,36111,36119,09015,25001,25005,25009,25017,25021,25023,25025,25027,33001,33011,33013,33015,33017,44001,44003,44005,44007,44009,10001,10003,24015,34001,34005,34007,34009,34011,34015,34033,42011,42017,42029,42045,42091,42101,11001,24003,24005,24009,24013,24017,24019,24021,24025,24027,24031,24033,24035,24037,24041,24043,24510,42055,51013,51043,51047,51059,51061,51069,51107,51153,51157,51177,51179,51187,51510,51600,51610,51630,51683,51685,51840,54003,54027,54037,12011,12061,12085,12086,12093,12099,12111,13013,13015,13035,13045,13057,13059,13063,13067,13077,13085,13089,13097,13113,13117,13121,13129,13135,13139,13143,13149,13151,13157,13159,13171,131

### Merging datasets

In [10]:
HOUSING16_CO = totalHU16.merge(geo, on='STCO').set_index('County_name').drop("Name", axis=1)

print(HOUSING16_CO.shape)
HOUSING16_CO.head()

(274, 6)


Unnamed: 0_level_0,TotalHousing16,Owners16,renters16,STCO,CSA,CSA_name
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alameda,592796,296634,267659,6001,488,"San Jose-San Francisco-Oakland, CA"
Contra Costa,406803,250055,137485,6013,488,"San Jose-San Francisco-Oakland, CA"
Los Angeles,3490118,1499576,1782269,6037,348,"Los Angeles-Long Beach, CA"
Marin,112259,66200,38200,6041,488,"San Jose-San Francisco-Oakland, CA"
Napa,55301,30411,18964,6055,488,"San Jose-San Francisco-Oakland, CA"


In [11]:
# convert numeric columns from str ('object') to int via to_numeric
HOUSING16_CO.iloc[:,:3] = HOUSING16_CO.iloc[:,:3].apply(pd.to_numeric,
                                                      errors='coerce')

HOUSING16_CO.dtypes

TotalHousing16     int64
Owners16           int64
renters16          int64
STCO              object
CSA                int64
CSA_name          object
dtype: object

### Exporting all counties Housing data to .csv

In [12]:
HOUSING16_CO.to_csv('HOUSING16_CO_NEW.csv')

## Groupby CSAs to sum

In [13]:
CSA_housing16 = HOUSING16_CO.groupby(['CSA', 'CSA_name']).sum()

print(CSA_housing16.shape)
CSA_housing16

(15, 3)


Unnamed: 0_level_0,Unnamed: 1_level_0,TotalHousing16,Owners16,renters16
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",2475969,1396394,824382
148,"Boston-Worcester-Providence, MA-RI-NH-CT",3412143,1934972,1151423
176,"Chicago-Naperville, IL-IN-WI",3969378,2327231,1282068
206,"Dallas-Fort Worth, TX-OK",2832798,1566020,1034702
216,"Denver-Aurora, CO",1350238,814383,464369
220,"Detroit-Warren-Ann Arbor, MI",2340603,1420760,652609
288,"Houston-The Woodlands, TX",2534512,1383407,907205
348,"Los Angeles-Long Beach, CA",6375740,3072459,2820374
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",2809034,1419029,888587
378,"Minneapolis-St. Paul, MN-WI",1554227,1028692,444354


### Exporting CSA's Housing data to .csv

In [14]:
CSA_housing16.to_csv('HOUSING16_CSAs_NEW.csv')

----

# Housing units 2010
### _data were obtained from the ACS 2006-2010 5-year estimate, all counties in the US_
variables to be acquired:
- **B25002_001E** |	Total Housing Units (occupied + vacant)
- **B25003_002E** | Owner occupied
- **B25003_003E** | Renter occupied

In [15]:
url10 = "https://api.census.gov/data/2010/acs/acs5/variables.json"
resp10 = requests.request('GET', url10)
aff1y10 = json.loads(resp10.text)

In [16]:
#turning things into arrays to enable broadcasting
#Python3
affkeys10 = np.array(list(aff1y10['variables'].keys()))

affkeys10

array(['C15002C_005E', 'B01001B_017E', 'B07008PR_022E', ...,
       'B23010_008E', 'B99252_002E', 'B08603_011E'], dtype='<U14')

In [17]:
# keyword for POP estimates
totalHU10 = 'B25002_001E'
owner10 = 'B25003_002E'
renter10 = 'B25003_003E'

aff1y10['variables'][totalHU10]

{'attributes': 'B25002_001EA,B25002_001M,B25002_001MA',
 'group': 'B25002',
 'label': 'Estimate!!Total',
 'limit': 0,
 'predicateType': 'int'}

In [18]:
# HU2010 data for all counties in the US
totalHU10 = pd.read_json('https://api.census.gov/data/2010/acs/acs5?get='+
                         totalHU10 + ',' +
                         owner10 + ',' +
                         renter10 +',NAME&for=county:*&in=state:*')
totalHU10.columns = totalHU10.iloc[0]
totalHU10 = totalHU10[1:]

totalHU10['state'] = totalHU10['state'].apply(lambda x: '{0:0>2}'.format(x))
totalHU10['county'] = totalHU10['county'].apply(lambda x: '{0:0>3}'.format(x))
totalHU10['STCO'] = totalHU10[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

totalHU10.iloc[:,:3] = totalHU10.iloc[:,:3].apply(pd.to_numeric,
                                                 errors='coerce')

totalHU10 = totalHU10.drop(['state', 'county'], axis=1)
totalHU10.columns = ['TotalHousing10', 'Owners10', 'renters10',
                     'Name', 'STCO']

print(totalHU10.shape)
totalHU10.head()

(3221, 5)


Unnamed: 0,TotalHousing10,Owners10,renters10,Name,STCO
1,27478,15389,8301,"Troup County, Georgia",13285
2,3873,1993,1103,"Turner County, Georgia",13287
3,4272,2488,582,"Twiggs County, Georgia",13289
4,13714,7392,2079,"Union County, Georgia",13291
5,12188,7334,3168,"Upson County, Georgia",13293


In [19]:
totalHU10.dtypes

TotalHousing10    object
Owners10          object
renters10         object
Name              object
STCO              object
dtype: object

In [20]:
totalHU10 = totalHU10.merge(geo, on='STCO').set_index('County_name').drop("Name", axis=1)

print(totalHU10.shape)
totalHU10.head()

(274, 6)


Unnamed: 0_level_0,TotalHousing10,Owners10,renters10,STCO,CSA,CSA_name
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Troup,27478,15389,8301,13285,122,"Atlanta--Athens-Clarke County--Sandy Springs, GA"
Upson,12188,7334,3168,13293,122,"Atlanta--Athens-Clarke County--Sandy Springs, GA"
Walton,31898,22161,6986,13297,122,"Atlanta--Athens-Clarke County--Sandy Springs, GA"
Bureau,15686,10959,3621,17011,176,"Chicago-Naperville, IL-IN-WI"
Cook,2173433,1169991,766490,17031,176,"Chicago-Naperville, IL-IN-WI"


In [21]:
# export to .csv
totalHU10.to_csv('exports/HOUSING10_CO_NEW.csv')

In [22]:
totalHU10[['TotalHousing10','renters10', 'Owners10']] = totalHU10[['TotalHousing10',
                                                        'renters10', 'Owners10']].apply(pd.to_numeric,
                                                         errors='coerce')

In [23]:
totalHU10.dtypes

TotalHousing10     int64
Owners10           int64
renters10          int64
STCO              object
CSA                int64
CSA_name          object
dtype: object

In [24]:
CSA_housing10 = totalHU10.groupby(['CSA', 'CSA_name']).sum()

print(CSA_housing10.shape)
CSA_housing10

(15, 3)


Unnamed: 0_level_0,Unnamed: 1_level_0,TotalHousing10,Owners10,renters10
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",2385537,1419824,677107
148,"Boston-Worcester-Providence, MA-RI-NH-CT",3352133,1964962,1066964
176,"Chicago-Naperville, IL-IN-WI",3935118,2431899,1137673
206,"Dallas-Fort Worth, TX-OK",2637717,1509869,866058
216,"Denver-Aurora, CO",1282906,787841,395363
220,"Detroit-Warren-Ann Arbor, MI",2336269,1504995,561392
288,"Houston-The Woodlands, TX",2299533,1284443,739694
348,"Los Angeles-Long Beach, CA",6221825,3181828,2547900
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",2749614,1509088,744024
378,"Minneapolis-St. Paul, MN-WI",1512218,1035406,383150


In [25]:
CSA_housing = CSA_housing10.merge(CSA_housing16, left_index=True,
                                                 right_index=True)

CSA_housing['HousingNET'] = CSA_housing['TotalHousing16']-CSA_housing['TotalHousing10']
CSA_housing['OwnersNET'] = CSA_housing['Owners16']-CSA_housing['Owners10'] 
CSA_housing['RentersNET'] = CSA_housing['renters16']-CSA_housing['renters10'] 

CSA_housing

Unnamed: 0_level_0,Unnamed: 1_level_0,TotalHousing10,Owners10,renters10,TotalHousing16,Owners16,renters16,HousingNET,OwnersNET,RentersNET
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",2385537,1419824,677107,2475969,1396394,824382,90432,-23430,147275
148,"Boston-Worcester-Providence, MA-RI-NH-CT",3352133,1964962,1066964,3412143,1934972,1151423,60010,-29990,84459
176,"Chicago-Naperville, IL-IN-WI",3935118,2431899,1137673,3969378,2327231,1282068,34260,-104668,144395
206,"Dallas-Fort Worth, TX-OK",2637717,1509869,866058,2832798,1566020,1034702,195081,56151,168644
216,"Denver-Aurora, CO",1282906,787841,395363,1350238,814383,464369,67332,26542,69006
220,"Detroit-Warren-Ann Arbor, MI",2336269,1504995,561392,2340603,1420760,652609,4334,-84235,91217
288,"Houston-The Woodlands, TX",2299533,1284443,739694,2534512,1383407,907205,234979,98964,167511
348,"Los Angeles-Long Beach, CA",6221825,3181828,2547900,6375740,3072459,2820374,153915,-109369,272474
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",2749614,1509088,744024,2809034,1419029,888587,59420,-90059,144563
378,"Minneapolis-St. Paul, MN-WI",1512218,1035406,383150,1554227,1028692,444354,42009,-6714,61204


In [26]:
# export to .csv
CSA_housing.to_csv('exports/CSA_housing10-16_NEW.csv')

----

## Places

City-Suburbs by tenure

In [32]:
# HU2016 data for all counties in the US
placeHU16 = pd.read_json('https://api.census.gov/data/2016/acs/acs5?get='+
                         totalHU + ',' +
                         owner + ',' +
                         renter +',NAME&for=place:*&in=state:*')
placeHU16.columns = placeHU16.iloc[0]
placeHU16 = placeHU16[1:]

placeHU16['state'] = placeHU16['state'].apply(lambda x: '{0:0>2}'.format(x))
placeHU16['place'] = placeHU16['place'].apply(lambda x: '{0:0>3}'.format(x))
placeHU16['STPL'] = placeHU16[['state', 'place']].apply(lambda x: ''.join(x), axis=1)

placeHU16 = placeHU16.drop(['state', 'place'], axis=1)
placeHU16.columns = ['TotalHousing16', 'Owners16', 'renters16',
                     'Name', 'STPL']

print(placeHU16.shape)
placeHU16.head()

(29574, 5)


Unnamed: 0,TotalHousing16,Owners16,renters16,Name,STPL
1,63,50,13,"Abanda CDP, Alabama",100100
2,1319,710,304,"Abbeville city, Alabama",100124
3,2069,1293,445,"Adamsville city, Alabama",100460
4,401,237,109,"Addison town, Alabama",100484
5,199,70,37,"Akron town, Alabama",100676


In [33]:
placeHU16.STPL = placeHU16.STPL.astype(int)
placeHU16.dtypes

TotalHousing16    object
Owners16          object
renters16         object
Name              object
STPL               int64
dtype: object

### Reading in geo-coded places dataset

Created by Dara Goldberg, DCP Regional Planning

In [34]:
geoPlace = pd.read_csv('../rp-USmetros_comparison/data/Geocoded_places.csv')
geoPlace['GEOID'] = geoPlace['GEOID'].apply(lambda x: '{0:0>7}'.format(x))
geoPlace['GEOID'] = geoPlace['GEOID'].astype(int)

print(geoPlace.shape)
geoPlace.head(3)

(19, 5)


Unnamed: 0,GEOID,NAMELSAD,NAME,CSA,ALAND_mi
0,644000,"Los Angeles city, California",Los Angeles,348,468.65867
1,653000,"Oakland city, California",Oakland,488,55.89604
2,667000,"San Francisco city, California",San Francisco,488,46.90564


In [35]:
geoPlace.dtypes

GEOID         int64
NAMELSAD     object
NAME         object
CSA           int64
ALAND_mi    float64
dtype: object

In [37]:
# merging all places with our target places list
tenure_place = geoPlace.merge(placeHU16, left_on = 'GEOID', right_on = 'STPL')

tenure_place = tenure_place.drop(['Name', 'STPL'], axis=1).set_index('GEOID')

print(tenure_place.shape)
tenure_place.head()

(19, 7)


Unnamed: 0_level_0,NAMELSAD,NAME,CSA,ALAND_mi,TotalHousing16,Owners16,renters16
GEOID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
644000,"Los Angeles city, California",Los Angeles,348,468.65867,1447026,496667,859644
653000,"Oakland city, California",Oakland,488,55.89604,169654,62943,95994
667000,"San Francisco city, California",San Francisco,488,46.90564,386755,131331,225466
668000,"San Jose city, California",San Jose,488,177.5141,328185,181122,136195
820000,"Denver city, Colorado",Denver,216,153.30483,299338,138870,142202


In [38]:
tenure_place.to_csv('exports/tenure_place.csv')