Author: Dana Chermesh, Regional Planning intern; NYC DCP<br>
Summer 2018

### _US Metros comparison  Notebook no.4_
# Employment - total industries
For employment by Industry and owenership please see notebook **4.0-BLS-QCEW_byIndustry.ipynb** in this repo.
### Data were obtained from: BLS-QCEW
Data Source: [Bureau of Labor Statistics, Quarterly Census of Employment and Wages](https://www.bls.gov/cew/datatoc.htm) (BLS-QCEW)

---- 
# 0 - Imports

In [1]:
import requests
import pandas as pd
import numpy as np

from __future__ import print_function, division
import matplotlib.pylab as pl
import seaborn as sns
sns.set_style('whitegrid')
# import json

# Spatial
import geopandas as gpd
import fiona
import shapely

import statsmodels.formula.api as smf
import statsmodels.api as sm

%pylab inline

Populating the interactive namespace from numpy and matplotlib


# 1 - Data acquisition

## Download directly from the BLS-QCEW website

The following code was downloaded from the [BLS-QCEW website](https://data.bls.gov/cew/doc/access/data_access_examples.htm#PYTHON), sapmle code for Python 3 ([Download](https://data.bls.gov/cew/doc/access/qcew_python_3x_example.zip))

In [2]:
import urllib.request
import urllib

# *******************************************************************************
# qcewCreateDataRows : This function takes a raw csv string and splits it into
# a two-dimensional array containing the data and the header row of the csv file
# a try/except block is used to handle for both binary and char encoding
def qcewCreateDataRows(csv):
    dataRows = []
    try: dataLines = csv.decode().split('\r\n')
    except er: dataLines = csv.split('\r\n');
    for row in dataLines:
        dataRows.append(row.split(','))
    return dataRows
# *******************************************************************************


# *******************************************************************************
# qcewGetAreaData : This function takes a year, quarter, and area argument and
# returns an array containing the associated area data. Use 'a' for annual
# averages. 
# For all area codes and titles see:
# http://www.bls.gov/cew/doc/titles/area/area_titles.htm
#
def qcewGetAreaData(year,qtr,area):
    urlPath = "http://data.bls.gov/cew/data/api/[YEAR]/[QTR]/area/[AREA].csv"
    urlPath = urlPath.replace("[YEAR]",year)
    urlPath = urlPath.replace("[QTR]",qtr.lower())
    urlPath = urlPath.replace("[AREA]",area.upper())
    httpStream = urllib.request.urlopen(urlPath)
    csv = httpStream.read()
    httpStream.close()
    return qcewCreateDataRows(csv)
# *******************************************************************************


# *******************************************************************************
# qcewGetIndustryData : This function takes a year, quarter, and industry code
# and returns an array containing the associated industry data. Use 'a' for 
# annual averages. Some industry codes contain hyphens. The CSV files use
# underscores instead of hyphens. So 31-33 becomes 31_33. 
# For all industry codes and titles see:
# http://www.bls.gov/cew/doc/titles/industry/industry_titles.htm
#
def qcewGetIndustryData(year,qtr,industry):
    urlPath = "http://data.bls.gov/cew/data/api/[YEAR]/[QTR]/industry/[IND].csv"
    urlPath = urlPath.replace("[YEAR]",year)
    urlPath = urlPath.replace("[QTR]",qtr.lower())
    urlPath = urlPath.replace("[IND]",industry)
    httpStream = urllib.request.urlopen(urlPath)
    csv = httpStream.read()
    httpStream.close()
    return qcewCreateDataRows(csv)
# *******************************************************************************


# *******************************************************************************
# qcewGetSizeData : This function takes a year and establishment size class code
# and returns an array containing the associated size data. Size data
# is only available for the first quarter of each year.
# For all establishment size classes and titles see:
# http://www.bls.gov/cew/doc/titles/size/size_titles.htm
#
def qcewGetSizeData(year,size):
    urlPath = "http://data.bls.gov/cew/data/api/[YEAR]/1/size/[SIZE].csv"
    urlPath = urlPath.replace("[YEAR]",year)
    urlPath = urlPath.replace("[SIZE]",size)
    httpStream = urllib.request.urlopen(urlPath)
    csv = httpStream.read()
    httpStream.close()
    return qcewCreateDataRows(csv)
# *******************************************************************************

# examples >> (hashed)

# Michigan_Data = qcewGetAreaData("2015","1","26000")
# Auto_Manufacturing = qcewGetIndustryData("2015","1","3361")
# SizeData = qcewGetSizeData("2015","6")

# # prints the industry_code in row 5.
# # remember row zero contains field names
# print(Michigan_Data[5][2])


# # prints the area_fips in row 1.
# # remember row zero contains field names
# print(Auto_Manufacturing[1][0])


# # prints the own_code in row 1.
# # remember row zero contains field names
# print(SizeData[1][1])

##  Reading in geo-coded dataset
created on a different notebook, please refer to [notebook no.0: 0-US_Metro_Comparison_Geographies.ipynb](https://github.com/NYCPlanning/rp-USmetros_comparison/blob/master/0-US_Metro_Comparison_Geographies.ipynb)

In [3]:
geo = pd.read_csv('../rp-USmetros_comparison/data/USmetros_full_correct.csv')\
                                                .drop(['Unnamed: 0'], axis=1)
geo['STCO'] = geo['STCO'].apply(lambda x: '{0:0>5}'.format(x))

print(geo.shape)
geo.head()

(274, 4)


Unnamed: 0,CSA,CSA_name,County_name,STCO
0,348,"Los Angeles-Long Beach, CA",Riverside,6065
1,348,"Los Angeles-Long Beach, CA",San Bernardino,6071
2,348,"Los Angeles-Long Beach, CA",Ventura,6111
3,176,"Chicago-Naperville, IL-IN-WI",Cook,17031
4,488,"San Jose-San Francisco-Oakland, CA",Alameda,6001


In [4]:
geo.dtypes

CSA             int64
CSA_name       object
County_name    object
STCO           object
dtype: object

In [5]:
STCO = list(geo['STCO'])

print(type(STCO))
print(len(STCO))
STCO[:10]

<class 'list'>
274


['06065',
 '06071',
 '06111',
 '17031',
 '06001',
 '06013',
 '06041',
 '06055',
 '06069',
 '06075']

## Getting the 'annual_avg_emplvl' for all industries for every county in the major US metros of this analysis 
using the `qcewGetAreaData()` function built by the BLS QCEW

In [6]:
JobsCO17 = []

for i in STCO:
    county = qcewGetAreaData("2017","A", i)
    county = pd.DataFrame(county)
    county.columns = county.iloc[0]
    county = county[1:]
    county.columns = [i.replace('"', '') for i in county.columns]
    county = county.replace({'"':''}, regex=True)
    
    county = county[['area_fips','annual_avg_emplvl']][:1]
    
    JobsCO17.append(county)

JobsCO17 = pd.concat(JobsCO17)
JobsCO17.columns = ['STCO', 'Total_emp17']
JobsCO17 = JobsCO17.set_index('STCO')

print(JobsCO17.shape)
JobsCO17.head()

(274, 1)


Unnamed: 0_level_0,Total_emp17
STCO,Unnamed: 1_level_1
6065,713182
6071,730004
6111,322344
17031,2572191
6001,771652


In [7]:
JobsCO17.to_csv('JobsCO17_NEW.csv')

# 2000 / 2008 / 2010
### _** Open data of the BLS QCEW is available from 2013 only; thus data for 2010, 2008 and 2000 were downloaded directly to the local machine**_

Download from https://www.bls.gov/cew/datatoc.htm the .zip files for the selected year under _**CSVs By Area**_ and unzip to your local folder, then run the following code to read in only _**selected counties, 'annual avg emplvl'**_, using python's _streaming_ method.

## 2000

In [8]:
# check where your notebook is at, 
# in order to pass the right path in the next cell
!pwd

/Users/danachermesh/Desktop/rp-USmetros_comparison


In [9]:
# creating a list of all file names in the unzipped folder

import os

mypath = '/Users/danachermesh/Desktop/rp-USmetros_comparison/data/2000.annual.by_area/'# change to your path
filesList = os.listdir(mypath)

filesList

['2000.annual 37103 Jones County, North Carolina.csv',
 '2000.annual 45079 Richland County, South Carolina.csv',
 '2000.annual 17049 Effingham County, Illinois.csv',
 '2000.annual 20151 Pratt County, Kansas.csv',
 '2000.annual C3534 New Iberia, LA MicroSA.csv',
 '2000.annual 48261 Kenedy County, Texas.csv',
 '2000.annual C1574 Cambridge, OH MicroSA.csv',
 '2000.annual 02130 Ketchikan Gateway Borough, Alaska.csv',
 '2000.annual 33001 Belknap County, New Hampshire.csv',
 '2000.annual 47141 Putnam County, Tennessee.csv',
 '2000.annual 08000 Colorado -- Statewide.csv',
 '2000.annual C4618 Tupelo, MS MicroSA.csv',
 '2000.annual 39049 Franklin County, Ohio.csv',
 '2000.annual 20035 Cowley County, Kansas.csv',
 '2000.annual 13287 Turner County, Georgia.csv',
 '2000.annual 13137 Habersham County, Georgia.csv',
 '2000.annual 46081 Lawrence County, South Dakota.csv',
 '2000.annual 48013 Atascosa County, Texas.csv',
 '2000.annual 22043 Grant Parish, Louisiana.csv',
 '2000.annual 48049 Brown Count

In [10]:
# creating a list of the selected files for data to be obtained from
# these are only the counties that we are interested in

Co_files00 = []

for filename in filesList:
    for i in STCO:
        if i in filename:
            Co_files00.append(filename)
            
len(Co_files00)

273

In [11]:
# streaming method: reduces computer memory consumption, more efficient
# reading in only the value (row+column) we need

import csv

# function to read in data from csv in a streamable mode
def csvRows(filename):
    with open(filename, 'r') as fi:
        reader = csv.DictReader(x.replace('\0', '') for x in fi)
        for row in reader:
            return row['area_fips'], row['annual_avg_emplvl']
        
# creating a path for the folder where all the .csv files are in
# change to your folder if needed
# path = '../DCP Internship/2000.annual.by_area/'

# creating an empty dict to store the data
JobsCO00 = {}

for i in Co_files00:
    row = csvRows(mypath + i)
    # append the data to the empty dict; STCO as key and total_emp as value 
    JobsCO00[row[0]] = row[1]


# making it a DataFrame
JobsCO00 = pd.DataFrame.from_dict(JobsCO00, orient='index', columns=['Total_emp00'])
JobsCO00 = JobsCO00.sort_index()

print(JobsCO00.shape)
print(type(JobsCO00))
print(JobsCO00.dtypes)
JobsCO00.head()

(273, 1)
<class 'pandas.core.frame.DataFrame'>
Total_emp00    object
dtype: object


Unnamed: 0,Total_emp00
6001,697215
6013,337924
6037,4110915
6041,112454
6055,60552


In [12]:
JobsCO00['Total_emp00'] = JobsCO00['Total_emp00'].astype(int)

## 2008

In [13]:
# creating a list of all file names in the unzipped folder

import os

mypath08 = '/Users/danachermesh/Desktop/rp-USmetros_comparison/data/2008.annual.by_area/'# change to your path
filesList08 = os.listdir(mypath08)

filesList08

['2008.annual 48107 Crosby County, Texas.csv',
 '2008.annual 21005 Anderson County, Kentucky.csv',
 '2008.annual 37153 Richmond County, North Carolina.csv',
 '2008.annual 78020 St. John, Virgin Islands.csv',
 '2008.annual 35003 Catron County, New Mexico.csv',
 '2008.annual 26093 Livingston County, Michigan.csv',
 '2008.annual C4494 Sumter, SC MSA.csv',
 '2008.annual 17105 Livingston County, Illinois.csv',
 '2008.annual 13083 Dade County, Georgia.csv',
 '2008.annual 06005 Amador County, California.csv',
 '2008.annual 54069 Ohio County, West Virginia.csv',
 '2008.annual 08105 Rio Grande County, Colorado.csv',
 '2008.annual C1586 Canon City, CO MicroSA.csv',
 '2008.annual 18021 Clay County, Indiana.csv',
 '2008.annual 13211 Morgan County, Georgia.csv',
 '2008.annual 45043 Georgetown County, South Carolina.csv',
 '2008.annual 29051 Cole County, Missouri.csv',
 '2008.annual C3218 Marshall, MO MicroSA.csv',
 '2008.annual 72119 Rio Grande Municipio, Puerto Rico.csv',
 '2008.annual CS434 Ponce

In [14]:
Co_files08 = []

for filename in filesList08:
    for i in STCO:
        if i in filename:
            Co_files08.append(filename)
            
len(Co_files08)

274

In [15]:
# creating an empty dict to store the data
JobsCO08 = {}

for i in Co_files08:
    row = csvRows(mypath08 + i)
    # append the data to the empty dict; STCO as key and total_emp as value 
    JobsCO08[row[0]] = row[1]


# making it a DataFrame
JobsCO08 = pd.DataFrame.from_dict(JobsCO08, orient='index', columns=['Total_emp08'])
JobsCO08 = JobsCO08.sort_index()

print(JobsCO08.shape)
print(type(JobsCO08))
print(JobsCO08.dtypes)
JobsCO08.head()

(274, 1)
<class 'pandas.core.frame.DataFrame'>
Total_emp08    object
dtype: object


Unnamed: 0,Total_emp08
6001,683099
6013,339547
6037,4168699
6041,109340
6055,68850


## Merging Jobs00 + Jobs08 + Jobs17

In [16]:
JobsCO = JobsCO00.merge(JobsCO08, left_index=True,
                                  right_index=True)

JobsCO = JobsCO.merge(JobsCO17, left_index=True,
                                right_index=True)

JobsCO['Total_emp08'] = JobsCO['Total_emp08'].fillna(0).astype(int)
JobsCO['Total_emp17'] = JobsCO['Total_emp17'].fillna(0).astype(int)

JobsCO['emp_NET00-08'] = JobsCO['Total_emp08'] - JobsCO['Total_emp00']
JobsCO['emp_%00-08'] = (JobsCO['Total_emp08'] - JobsCO['Total_emp00']) \
                    / JobsCO['Total_emp00']

JobsCO['emp_NET08-17'] = JobsCO['Total_emp17'] - JobsCO['Total_emp08']
JobsCO['emp_%08-17'] = (JobsCO['Total_emp17'] - JobsCO['Total_emp08']) \
                    / JobsCO['Total_emp08']

print(JobsCO.dtypes)
print(JobsCO.shape)
JobsCO.head()

Total_emp00       int64
Total_emp08       int64
Total_emp17       int64
emp_NET00-08      int64
emp_%00-08      float64
emp_NET08-17      int64
emp_%08-17      float64
dtype: object
(273, 7)


Unnamed: 0,Total_emp00,Total_emp08,Total_emp17,emp_NET00-08,emp_%00-08,emp_NET08-17,emp_%08-17
6001,697215,683099,771652,-14116,-0.020246,88553,0.129634
6013,337924,339547,366720,1623,0.004803,27173,0.080027
6037,4110915,4168699,4381836,57784,0.014056,213137,0.051128
6041,112454,109340,115432,-3114,-0.027691,6092,0.055716
6055,60552,68850,76762,8298,0.137039,7912,0.114916


### Exporting all counties employment 00-08-17 data to .csv

In [17]:
JobsCO.to_csv('exports/Jobs00-08-17CO_NEW.csv')

### Merging with geo-coded data in order to groupby CSA's

In [18]:
JobsCO_CSA = JobsCO.iloc[:,:3].merge(geo, left_index=True, 
                                right_on='STCO').set_index('County_name')

print(JobsCO_CSA.shape)
JobsCO_CSA.tail()

(273, 6)


Unnamed: 0_level_0,Total_emp00,Total_emp08,Total_emp17,CSA,CSA_name,STCO
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Hampshire,3735,4121,3742,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",54027
Jefferson,12776,14337,15448,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",54037
Kenosha,50952,54781,66392,176,"Chicago-Naperville, IL-IN-WI",55059
Pierce,9283,9328,10026,378,"Minneapolis-St. Paul, MN-WI",55093
St. Croix,25792,28959,33521,378,"Minneapolis-St. Paul, MN-WI",55109


In [19]:
JobsCSA = JobsCO_CSA.groupby(['CSA', 'CSA_name']).sum()

JobsCSA['emp_NET00-08'] = JobsCSA['Total_emp08'] - JobsCSA['Total_emp00']
JobsCSA['emp_%00-08'] = (JobsCSA['Total_emp08'] - JobsCSA['Total_emp00']) \
                                                / JobsCSA['Total_emp00']

JobsCSA['emp_NET08-17'] = JobsCSA['Total_emp17'] - JobsCSA['Total_emp08']
JobsCSA['emp_%08-17'] = (JobsCSA['Total_emp17'] - JobsCSA['Total_emp08']) \
                                                / JobsCSA['Total_emp08']

print(JobsCSA.shape)
JobsCSA

(15, 7)


Unnamed: 0_level_0,Unnamed: 1_level_0,Total_emp00,Total_emp08,Total_emp17,emp_NET00-08,emp_%00-08,emp_NET08-17,emp_%08-17
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",2413348,2551161,2844774,137813,0.057104,293613,0.11509
148,"Boston-Worcester-Providence, MA-RI-NH-CT",3834847,3824178,4121641,-10669,-0.002782,297463,0.077785
176,"Chicago-Naperville, IL-IN-WI",4534271,4478885,4631371,-55386,-0.012215,152486,0.034046
206,"Dallas-Fort Worth, TX-OK",2844521,3071476,3586036,226955,0.079787,514560,0.167529
216,"Denver-Aurora, CO",1426753,1451976,1692477,25223,0.017679,240501,0.165637
220,"Detroit-Warren-Ann Arbor, MI",2531382,2198794,2312726,-332588,-0.131386,113932,0.051816
288,"Houston-The Woodlands, TX",2271793,2617373,2969515,345580,0.152118,352142,0.13454
348,"Los Angeles-Long Beach, CA",6793720,7225774,7741633,432054,0.063596,515859,0.071392
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",2254815,2460406,2720909,205591,0.091179,260503,0.105878
378,"Minneapolis-St. Paul, MN-WI",1878800,1916797,2063304,37997,0.020224,146507,0.076433


In [20]:
JobsCSA.to_csv('exports/Jobs00-08-17CSA_NEW.csv')

## 2010 -- _wasn't in use in this analysis_

In [21]:
mypath2 = '/Users/danachermesh/Desktop/rp-USmetros_comparison/data/2010.annual.by_area/'# change to your path
filesList2 = os.listdir(mypath2)

filesList2

['2010.annual 37059 Davie County, North Carolina.csv',
 '2010.annual 49045 Tooele County, Utah.csv',
 '2010.annual 17045 Edgar County, Illinois.csv',
 '2010.annual 45085 Sumter County, South Carolina.csv',
 '2010.annual 55073 Marathon County, Wisconsin.csv',
 '2010.annual 21085 Grayson County, Kentucky.csv',
 '2010.annual 26055 Grand Traverse County, Michigan.csv',
 '2010.annual 48327 Menard County, Texas.csv',
 '2010.annual 28021 Claiborne County, Mississippi.csv',
 '2010.annual C4450 Stephenville, TX MicroSA.csv',
 '2010.annual C4682 Vermillion, SD MicroSA.csv',
 '2010.annual 13115 Floyd County, Georgia.csv',
 '2010.annual 72131 San Sebastian Municipio, Puerto Rico.csv',
 '2010.annual 48121 Denton County, Texas.csv',
 '2010.annual 39999 Unknown Or Undefined, Ohio.csv',
 '2010.annual 29187 St. Francois County, Missouri.csv',
 '2010.annual C4918 Winston-Salem, NC MSA.csv',
 '2010.annual 10003 New Castle County, Delaware.csv',
 '2010.annual 27051 Grant County, Minnesota.csv',
 '2010.ann

In [22]:
# creating a list of the selected files for data to be obtained from
# these are only the counties that we are interested in

Co_files10 = []

for filename in filesList2:
    for i in STCO:
        if i in filename:
            Co_files10.append(filename)
            
len(Co_files10)

274

In [23]:
JobsCO10 = {}

for i in Co_files10:
    row = csvRows(mypath2 + i)
    # append the data to the empty dict; STCO as key and total_emp as value 
    JobsCO10[row[0]] = row[1]


# making it a DataFrame
JobsCO10 = pd.DataFrame.from_dict(JobsCO10, orient='index', columns=['Total_emp10'])
JobsCO10 = JobsCO10.sort_index()

print(JobsCO10.shape)
print(type(JobsCO10))
print(JobsCO10.dtypes)
JobsCO10.head()

(274, 1)
<class 'pandas.core.frame.DataFrame'>
Total_emp10    object
dtype: object


Unnamed: 0,Total_emp10
6001,630343
6013,313615
6037,3856789
6041,102062
6055,64073


### Merge 2010 w 2017 data

In [24]:
JobsCO10 = JobsCO10.merge(JobsCO17, left_index=True,
                                 right_index=True)

JobsCO10['Total_emp17'] = JobsCO10['Total_emp17'].fillna(0).astype(int)
JobsCO10['Total_emp10'] = JobsCO10['Total_emp10'].fillna(0).astype(int)


print(JobsCO10.dtypes)
print(JobsCO10.shape)
JobsCO10.head()

Total_emp10    int64
Total_emp17    int64
dtype: object
(274, 2)


Unnamed: 0,Total_emp10,Total_emp17
6001,630343,771652
6013,313615,366720
6037,3856789,4381836
6041,102062,115432
6055,64073,76762


### Merging with geo-coded data in order to groupby CSA's

In [25]:
JobsCO_CSA10 = JobsCO10.merge(geo, left_index=True, 
                              right_on='STCO').set_index('County_name')

print(JobsCO_CSA10.shape)
JobsCO_CSA10.tail(2)

(274, 5)


Unnamed: 0_level_0,Total_emp10,Total_emp17,CSA,CSA_name,STCO
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Pierce,9508,10026,378,"Minneapolis-St. Paul, MN-WI",55093
St. Croix,27950,33521,378,"Minneapolis-St. Paul, MN-WI",55109


In [26]:
JobsCSA10_17 = JobsCO_CSA10.groupby(['CSA', 'CSA_name']).sum()

JobsCSA10_17['emp_NET00-17'] = JobsCSA10_17['Total_emp17'] - JobsCSA10_17['Total_emp10']
JobsCSA10_17['emp_%00-17'] = (JobsCSA10_17['Total_emp17'] - JobsCSA10_17['Total_emp10']) \
                                                          / JobsCSA10_17['Total_emp10']

print(JobsCSA10_17.shape)
JobsCSA10_17

(15, 4)


Unnamed: 0_level_0,Unnamed: 1_level_0,Total_emp10,Total_emp17,emp_NET00-17,emp_%00-17
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",2372462,2844774,472312,0.199081
148,"Boston-Worcester-Providence, MA-RI-NH-CT",3691234,4121641,430407,0.116602
176,"Chicago-Naperville, IL-IN-WI",4192303,4631371,439068,0.104732
206,"Dallas-Fort Worth, TX-OK",2944870,3586036,641166,0.217723
216,"Denver-Aurora, CO",1405857,1730344,324487,0.230811
220,"Detroit-Warren-Ann Arbor, MI",2022604,2312726,290122,0.14344
288,"Houston-The Woodlands, TX",2541934,2969515,427581,0.168211
348,"Los Angeles-Long Beach, CA",6655146,7741633,1086487,0.163255
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",2286983,2720909,433926,0.189737
378,"Minneapolis-St. Paul, MN-WI",1823091,2063304,240213,0.131761


In [27]:
JobsCSA10_17.to_csv('exports/Jobs10-17CSA_NEW.csv')

--- 
# Housing / Jobs Balance analysis


## _All the Housing / Jobs Balance analysis can be found in a separated notebook in this repo: 6-Pop-Hou-Jobs-balance_

- PEP/2017/housing
- BLS QCEW 2017 annually

# 1. Counties - for mapping

## Total Housing from PEP housing 2017

- Detailing on Population and Housing Estimates APIs: https://www.census.gov/data/developers/data-sets/popest-popproj/popest.2000-2010_Intercensals.html

- [examples for /data/2017/pep/housing](https://api.census.gov/data/2017/pep/housing/examples.html)

In [33]:
# HU2017 data for all counties in the US
totalHU17 = pd.read_json('https://api.census.gov/data/2017/pep/housing?get='+
                         'HUEST,GEONAME&for=county:*&DATE=10')

totalHU17.columns = totalHU17.iloc[0]
totalHU17 = totalHU17[1:]

totalHU17['state'] = totalHU17['state'].apply(lambda x: '{0:0>2}'.format(x))
totalHU17['county'] = totalHU17['county'].apply(lambda x: '{0:0>3}'.format(x))
totalHU17['STCO'] = totalHU17[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

totalHU17 = totalHU17.drop(['state', 'county','DATE'], axis=1)
totalHU17.columns = ['TotalHousing17','Name', 'STCO']

print(totalHU17.shape)
totalHU17.head(20)

(3142, 3)


Unnamed: 0,TotalHousing17,Name,STCO
1,372981,"Fairfield County, Connecticut",9001
2,379719,"Hartford County, Connecticut",9003
3,88285,"Litchfield County, Connecticut",9005
4,76339,"Middlesex County, Connecticut",9007
5,367195,"New Haven County, Connecticut",9009
6,123398,"New London County, Connecticut",9011
7,59729,"Tolland County, Connecticut",9013
8,49742,"Windham County, Connecticut",9015
9,49825,"Androscoggin County, Maine",23001
10,39911,"Aroostook County, Maine",23003


## Total Housing 2010 PEP 2010 -- Not in use 

This data set is called "int_housingunits 2000" and it includes both 2000 and 2010 Census Decennials data.

- API Call: api.census.gov/data/2000/pep/int_housingunits
- [examples](https://api.census.gov/data/2000/pep/int_housingunits.html)
- [variables](https://api.census.gov/data/2000/pep/int_housingunits/variables.html) in /data/2000/pep/int_housingunits/variables

In [34]:
# HU2010 data for all counties in the US, from PEP housing 2000-2010
totalHU10 = pd.read_json('https://api.census.gov/data/2000/pep/int_housingunits?get='+
                         'HUEST,GEONAME&for=county:*&DATE=10')

totalHU10.columns = totalHU10.iloc[0]
totalHU10 = totalHU10[1:]

totalHU10['state'] = totalHU10['state'].apply(lambda x: '{0:0>2}'.format(x))
totalHU10['county'] = totalHU10['county'].apply(lambda x: '{0:0>3}'.format(x))
totalHU10['STCO'] = totalHU10[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

totalHU10 = totalHU10.drop(['state', 'county', 'DATE'], axis=1)
totalHU10.columns = ['TotalHousing10','Name', 'STCO']

print(totalHU10.shape)
totalHU10.head()

(3143, 3)


Unnamed: 0,TotalHousing10,Name,STCO
1,358484,"Fairfield County, Connecticut",9001
2,372325,"Hartford County, Connecticut",9003
3,86754,"Litchfield County, Connecticut",9005
4,74172,"Middlesex County, Connecticut",9007
5,359866,"New Haven County, Connecticut",9009


## Total Pop + Total Housing 2010 from US Census Bureau Decennial Census 2010

In [35]:
# total HU for all counties in the US, 2010
totalHU10_sf = pd.read_json('https://api.census.gov/data/2010/sf1?get='+
                            'P0010001,H00010001,NAME&for=county:*')
totalHU10_sf.columns = totalHU10_sf.iloc[0]
totalHU10_sf = totalHU10_sf[1:]

totalHU10_sf['state'] = totalHU10_sf['state'].apply(lambda x: '{0:0>2}'.format(x))
totalHU10_sf['county'] = totalHU10_sf['county'].apply(lambda x: '{0:0>3}'.format(x))
totalHU10_sf['STCO'] = totalHU10_sf[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

totalHU10_sf = totalHU10_sf.drop(['state', 'county'], axis=1)
totalHU10_sf.columns = ['TotalPop00','TotalHousing00','Name', 'STCO']

print(totalHU10_sf.shape)
totalHU10_sf.head()

(3221, 4)


Unnamed: 0,TotalPop00,TotalHousing00,Name,STCO
1,54571,22135,Autauga County,1001
2,182265,104061,Baldwin County,1003
3,27457,11829,Barbour County,1005
4,22915,8981,Bibb County,1007
5,57322,23887,Blount County,1009


## Total Pop + Total Housing 2000 from US Census Bureau Decennial Census 2000

In [36]:
# total HU for all counties in the US, 2000
totalHU00 = pd.read_json('https://api.census.gov/data/2000/sf1?get='+
                         'P001001,H001001,NAME&for=county:*')
totalHU00.columns = totalHU00.iloc[0]
totalHU00 = totalHU00[1:]

totalHU00['state'] = totalHU00['state'].apply(lambda x: '{0:0>2}'.format(x))
totalHU00['county'] = totalHU00['county'].apply(lambda x: '{0:0>3}'.format(x))
totalHU00['STCO'] = totalHU00[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

totalHU00 = totalHU00.drop(['state', 'county'], axis=1)
totalHU00.columns = ['TotalPop00','TotalHousing00','Name', 'STCO']

print(totalHU00.shape)
totalHU00.head()

(3141, 4)


Unnamed: 0,TotalPop00,TotalHousing00,Name,STCO
1,43671,17662,Autauga County,1001
2,140415,74285,Baldwin County,1003
3,29038,12461,Barbour County,1005
4,20826,8345,Bibb County,1007
5,51024,21158,Blount County,1009


------

## CSA's

In [37]:
HouJobs = pd.read_excel('BPS_HousingPermits_analysis.xlsx', 
                        sheet_name='HousingJobs_Balance')[:-2].set_index('Name')

HouJobs['housing / jobs 2010'] = HouJobs['housing / jobs 2010'].round(decimals=2)
HouJobs['housing / jobs 2016'] = HouJobs['housing / jobs 2016'].round(decimals=2)
HouJobs['housing / jobs 10-16 NET'] = HouJobs['housing / jobs 10-16 NET'].round(decimals=2)

HouJobs['CSA'] = HouJobs['CSA'].astype(int)

print(HouJobs .shape)
HouJobs 

(15, 5)


Unnamed: 0_level_0,CSA,FullName,housing / jobs 2010,housing / jobs 2016,housing / jobs 10-16 NET
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
New York,408,"New York-Newark, NY-NJ-CT-PA",0.92,0.84,-0.07
Los Angeles,348,"Los Angeles-Long Beach, CA",0.68,0.6,-0.08
Chicago,176,"Chicago-Naperville, IL-IN-WI",0.42,0.39,-0.03
Washington,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",0.84,0.8,-0.04
San Francisco,488,"San Jose-San Francisco-Oakland, CA",0.92,0.77,-0.14
Boston,148,"Boston-Worcester-Providence, MA-RI-NH-CT",0.92,0.84,-0.08
Dallas,206,"Dallas-Fort Worth, TX-OK",0.92,0.81,-0.11
Philadelphia,428,"Philadelphia-Reading-Camden, PA-NJ-DE-MD",0.97,0.93,-0.04
Houston,288,"Houston-The Woodlands, TX",0.94,0.86,-0.08
Miami,370,"Miami-Fort Lauderdale-Port St. Lucie, FL",1.21,1.05,-0.16
