07-20-2018

**_Author: Dana Chermesh, Regional Planning intern_**


### US Metros comparison 
comparison by the county level of 15 regions (CSA's) accross the country

----

### _Notebook no.2_
# Labor force data
### - _ACS 5-yesr estimates 2012-2016 using Census API_
### - _ACS 5-yesr estimates 2006-2010 using Census API_

----

A user guide for Census Data API:

# [Census Data API User Guide](https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf)

The Census Data API in an API that gives the public access to raw statistical data from various Census Bureau data
programs. In terms of space, we aggregate the data and usually associate them with a
certain Census geographic boundary/area defined by a FIPS code. 

## _get your API key from:_ 
https://api.census.gov/data/key_signup.html

**Recommended:** In order to keep your API key confidential, please save your API key in a .py file named **censusAPI.py** as follows:

```python
myAPI = 'XXXXXXXXXXXXXXX'
```
Then read into this notebook as in the following cell:
```python
from censusAPI import myAPI
```

### The complete list of all available datasets for the API is located here:
https://api.census.gov/data.html

----
## Labor Force 2016
### _data were obtained from the ACS 2012-2016 5-year estimate, all counties in the US_

[list of variables](https://api.census.gov/data/2016/acs/acs5/variables)

variables to be acquired:
- **B23025_001E** |	All pop at age 16 and over
- **B23025_007E** | All pop at age 16 and over, not in labor force

for prime age (25-54):
- **B23001_025E** | Male 25 to 29 in labor force
- **B23001_032E** | Male 30 to 34 in labor force
- **B23001_039E** | Male 35 to 44 in labor force
- **B23001_046E** | Male 45 to 54 in labor force
- **B23001_111E** | Female 25 to 29 in labor force
- **B23001_118E** | Female 30 to 34 in labor force
- **B23001_125E** | Female 35 to 44 in labor force
- **B23001_132E** | Female 45 to 54 in labor force

for over 65:
- **B23001_074E** | Male 65 to 69 in labor force
- **B23001_079E** | Male 70 to 74 in labor force
- **B23001_084E** | Male 75 and over in labor force
- **B23001_160E** | Female 65 to 69 in labor force
- **B23001_165E** | Female 70 to 74 in labor force
- **B23001_170E** | Female 75 and over in labor force

In [1]:
import pandas as pd
import json
# reading in my api key saved in censusAPI.py as
# myAPI = 'XXXXXXXXXXXXXXX'
# request an api key in: https://api.census.gov/data/key_signup.html
from censusAPI import myAPI

In [2]:
import json
import requests 
import urllib
import numpy as np

#read in in the variables available. the info you need is in the 1year ACS data
url = "https://api.census.gov/data/2016/acs/acs5/variables.json"
resp = requests.request('GET', url)
aff1y = json.loads(resp.text)

In [3]:
#turning things into arrays to enable broadcasting
#Python3
affkeys = np.array(list(aff1y['variables'].keys()))

affkeys

array(['B24022_009E', 'B20005G_008E', 'B11002D_007E', ...,
       'B07202PR_002E', 'B25129_060E', 'B05003F_004E'], dtype='<U14')

In [29]:
print(aff1y['variables']['B23025_001E'])
print(aff1y['variables']['B23025_007E'])

{'limit': 0, 'label': 'Estimate!!Total', 'attributes': 'B23025_001M,B23025_001MA,B23025_001EA', 'group': 'B23025', 'predicateType': 'int', 'concept': 'EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER'}
{'limit': 0, 'label': 'Estimate!!Total!!Not in labor force', 'attributes': 'B23025_007M,B23025_007MA,B23025_007EA', 'group': 'B23025', 'predicateType': 'int', 'concept': 'EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER'}


In [32]:
Labor16vars = ['B23025_001E', 'B23025_007E', 'B23001_025E', 'B23001_032E',
               'B23001_039E', 'B23001_046E', 'B23001_111E', 'B23001_118E',
               'B23001_125E', 'B23001_132E', 'B23001_074E', 'B23001_079E',
               'B23001_084E', 'B23001_160E', 'B23001_165E', 'B23001_170E']

Labor16str = ",".join(Labor16vars)
Labor16str

'B23025_001E,B23025_007E,B23001_025E,B23001_032E,B23001_039E,B23001_046E,B23001_111E,B23001_118E,B23001_125E,B23001_132E,B23001_074E,B23001_079E,B23001_084E,B23001_160E,B23001_165E,B23001_170E'

In [33]:
# Labor Force data for all counties in the US, 2016
Labor16 = pd.read_json('https://api.census.gov/data/2016/acs/acs5?get='+
                          Labor16str +
                         ',NAME&for=county:*&in=state:*')
Labor16.columns = Labor16.iloc[0]
Labor16 = Labor16[1:]

Labor16['state'] = Labor16['state'].apply(lambda x: '{0:0>2}'.format(x))
Labor16['county'] = Labor16['county'].apply(lambda x: '{0:0>3}'.format(x))
Labor16['STCO'] = Labor16[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

# converting dtypes to int
for col in Labor16vars:
    Labor16[col] = Labor16[col].astype(int)

# aggregating age groups and calculations
Labor16['LaborForce'] = Labor16['B23025_001E'] - Labor16['B23025_007E']
Labor16['PrimeAge'] = Labor16['B23001_025E'] + Labor16['B23001_032E'] +\
                      Labor16['B23001_039E'] + Labor16['B23001_046E'] +\
                      Labor16['B23001_111E'] + Labor16['B23001_118E'] +\
                      Labor16['B23001_125E'] + Labor16['B23001_132E']
Labor16['Over65'] = Labor16['B23001_074E'] + Labor16['B23001_079E'] +\
                    Labor16['B23001_084E'] + Labor16['B23001_160E'] +\
                    Labor16['B23001_165E'] + Labor16['B23001_170E']

Labor16 = Labor16.drop(['state', 'county', 'B23025_001E', 'B23025_007E',
                        'B23001_025E', 'B23001_032E', 'B23001_039E',
                        'B23001_046E', 'B23001_111E', 'B23001_118E', 
                        'B23001_125E', 'B23001_132E', 'B23001_074E',
                        'B23001_079E', 'B23001_084E', 'B23001_160E',
                        'B23001_165E', 'B23001_170E'], axis=1)
Labor16.columns = ['Name', 'STCO', 'LaborForce', 'PrimeAge', 'Over65']

print(Labor16.shape)
print(Labor16.dtypes)
Labor16.head()

(3220, 5)
Name          object
STCO          object
LaborForce     int64
PrimeAge       int64
Over65         int64
dtype: object


Unnamed: 0,Name,STCO,LaborForce,PrimeAge,Over65
1,"Autauga County, Alabama",1001,26008,17356,978
2,"Baldwin County, Alabama",1003,93872,59889,5505
3,"Barbour County, Alabama",1005,10316,6397,814
4,"Bibb County, Alabama",1007,9002,6019,600
5,"Blount County, Alabama",1009,22969,15351,1129


----
## Labor Force 2000
### _data were obtained from US Census Bureau Decennial 2000 Census, SF3_

[list of variables](https://api.census.gov/data/2000/sf3/variables.html)

variables to be acquired:
- **P043003** | Male In labor force
- **P043010** | Female In labor force

for prime age (25-54):
- **PCT035025** | Male 25 to 29 in labor force
- **PCT035032** | Male 30 to 34 in labor force
- **PCT035039** | Male 35 to 44 in labor force
- **PCT035046** | Male 45 to 54 in labor force
- **PCT035117** | Female 25 to 29 in labor force
- **PCT035124** | Female 30 to 34 in labor force
- **PCT035131** | Female 35 to 44 in labor force
- **PCT035138** | Female 45 to 54 in labor force

for over 65:
- **PCT035074** | Male 65 to 69 in labor force
- **PCT035081** | Male 70 to 74 in labor force
- **PCT035088** | Male 75 and over in labor force
- **PCT035166** | Female 65 to 69 in labor force
- **PCT035173** | Female 70 to 74 in labor force
- **PCT035180** | Female 75 and over in labor force

In [37]:
Labor00vars = ['P043003', 'P043010', 'PCT035025', 'PCT035032',
               'PCT035039', 'PCT035046', 'PCT035117', 'PCT035124',
               'PCT035131', 'PCT035138','PCT035074', 'PCT035081',
               'PCT035088', 'PCT035166', 'PCT035173', 'PCT035180']

Labor00str = ",".join(Labor00vars)
Labor00str

'P043003,P043010,PCT035025,PCT035032,PCT035039,PCT035046,PCT035117,PCT035124,PCT035131,PCT035138,PCT035074,PCT035081,PCT035088,PCT035166,PCT035173,PCT035180'

In [39]:
# Labor Force data for all counties in the US, 2000
Labor00 = pd.read_json('https://api.census.gov/data/2000/sf3?get='+
                        Labor00str +
                        ',NAME&for=county:*&in=state:*')
Labor00.columns = Labor00.iloc[0]
Labor00 = Labor00[1:]

Labor00['state'] = Labor00['state'].apply(lambda x: '{0:0>2}'.format(x))
Labor00['county'] = Labor00['county'].apply(lambda x: '{0:0>3}'.format(x))
Labor00['STCO'] = Labor00[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

# converting dtypes to int
for col in Labor00vars:
    Labor00[col] = Labor00[col].astype(int)

Labor00['LaborForce00'] = Labor00['P043003'] + Labor00['P043010']
Labor00['PrimeAge00'] = Labor00['PCT035025'] + Labor00['PCT035032'] +\
                      Labor00['PCT035039'] + Labor00['PCT035046'] +\
                      Labor00['PCT035117'] + Labor00['PCT035124'] +\
                      Labor00['PCT035131'] + Labor00['PCT035138']
Labor00['Over65_00'] = Labor00['PCT035074'] + Labor00['PCT035081'] +\
                       Labor00['PCT035088'] + Labor00['PCT035166'] +\
                       Labor00['PCT035173'] + Labor00['PCT035180']

Labor00 = Labor00.drop(['state', 'county', 'P043003', 'P043010',
                        'PCT035025', 'PCT035032', 'PCT035039',
                        'PCT035046', 'PCT035117', 'PCT035124', 
                        'PCT035131', 'PCT035138', 'PCT035074',
                        'PCT035081', 'PCT035088', 'PCT035166',
                        'PCT035173', 'PCT035180'], axis=1)
Labor00.columns = ['Name', 'STCO', 'LaborForce00', 'PrimeAge00', 'Over65_00']

print(Labor00.shape)
print(Labor00.dtypes)
Labor00.head()

(3219, 5)
Name            object
STCO            object
LaborForce00     int64
PrimeAge00       int64
Over65_00        int64
dtype: object


Unnamed: 0,Name,STCO,LaborForce00,PrimeAge00,Over65_00
1,Autauga County,1001,21167,15026,588
2,Baldwin County,1003,65960,46800,2499
3,Barbour County,1005,10826,7656,473
4,Bibb County,1007,8521,6144,204
5,Blount County,1009,23896,16867,832


## Merging 2000 and 2016 data

In [71]:
Labor = Labor00.merge(Labor16, on='STCO')

Labor = Labor.drop(['Name_x'], axis=1)
Labor = Labor.append(Labor.sum(numeric_only=True), ignore_index=True)
Labor = Labor.set_index('Name_y').fillna(0).astype(int)

Labor['over65_%_16'] = Labor['Over65'] / Labor['LaborForce']
Labor['over65_%_00'] = Labor['Over65_00'] / Labor['LaborForce00']

Labor['NET_LaborTotal'] = Labor['LaborForce'] - Labor['LaborForce00']
Labor['%_LaborTotal'] = (Labor['LaborForce'] - Labor['LaborForce00']) \
                         / Labor['LaborForce00']
Labor['NET_PrimeAge'] = Labor['PrimeAge'] - Labor['PrimeAge00']
Labor['%_PrimeAge'] = (Labor['PrimeAge'] - Labor['PrimeAge00']) \
                      / Labor['PrimeAge00']
Labor['NET_Over65'] = Labor['LaborForce'] - Labor['LaborForce00']
Labor['%_Over65'] = (Labor['Over65'] - Labor['Over65_00']) \
                     / Labor['Over65_00']
Labor['over65_%change'] = Labor['over65_%_16'] - Labor['over65_%_00']

Labor.index = Labor.index.fillna('UStotal')

print(Labor.shape)
Labor.tail()

(3213, 16)


Unnamed: 0_level_0,STCO,LaborForce00,PrimeAge00,Over65_00,LaborForce,PrimeAge,Over65,over65_%_16,over65_%_00,NET_LaborTotal,%_LaborTotal,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,over65_%change
Name_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
"Vieques Municipio, Puerto Rico",72147,2395,1657,33,3342,2096,161,0.048175,0.013779,947,0.395407,439,0.264937,947,3.878788,0.034396
"Villalba Municipio, Puerto Rico",72149,7463,5547,87,9278,6407,131,0.014119,0.011658,1815,0.2432,860,0.155039,1815,0.505747,0.002462
"Yabucoa Municipio, Puerto Rico",72151,9498,6812,52,10802,7863,109,0.010091,0.005475,1304,0.137292,1051,0.154287,1304,1.096154,0.004616
"Yauco Municipio, Puerto Rico",72153,13622,10020,163,11931,8941,333,0.02791,0.011966,-1691,-0.124137,-1079,-0.107685,-1691,1.042945,0.015945
UStotal,0,139958254,98706665,4663882,162051329,104821609,8002699,0.049384,0.033323,22093075,0.157855,6114944,0.061951,22093075,0.715888,0.01606


In [68]:
Labor.dtypes

STCO                int64
LaborForce00        int64
PrimeAge00          int64
Over65_00           int64
LaborForce          int64
PrimeAge            int64
Over65              int64
over65_%_16       float64
over65_%_00       float64
NET_LaborTotal      int64
%_LaborTotal      float64
NET_PrimeAge        int64
%_PrimeAge        float64
NET_Over65          int64
%_Over65          float64
over65_%change    float64
dtype: object

In [74]:
Labor['STCO'] = Labor['STCO'].astype(str)
Labor['STCO'] = Labor['STCO'].apply(lambda x: '{0:0>5}'.format(x))

Labor.dtypes[:1]

STCO    object
dtype: object

##  Reading in geo-coded dataset
created on a different notebook, please refer to _**ADD NOTEBOOK NAME**_

In [75]:
geo = pd.read_csv('../Regional_USmetros_comparison/data/USmetros_full.csv').iloc[:,:-2] \
                                .drop(['Unnamed: 0', 'SHAPE_AREA'], axis=1)
geo['STCO'] = geo['STCO'].apply(lambda x: '{0:0>5}'.format(x))

print(geo.shape)
geo.head()

(270, 4)


Unnamed: 0,CSA,CSA_name,County_name,STCO
0,488,"San Jose-San Francisco-Oakland, CA",Alameda,6001
1,488,"San Jose-San Francisco-Oakland, CA",Contra Costa,6013
2,488,"San Jose-San Francisco-Oakland, CA",Marin,6041
3,488,"San Jose-San Francisco-Oakland, CA",Napa,6055
4,488,"San Jose-San Francisco-Oakland, CA",San Benito,6069


### Merging datasets

In [80]:
Labor_CO = Labor.merge(geo, on='STCO').set_index('County_name')

print(Labor_CO.shape)
Labor_CO.tail()

(269, 18)


Unnamed: 0_level_0,STCO,LaborForce00,PrimeAge00,Over65_00,LaborForce,PrimeAge,Over65,over65_%_16,over65_%_00,NET_LaborTotal,%_LaborTotal,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,over65_%change,CSA,CSA_name
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Hampshire,54027,9056,6431,350,9700,6402,440,0.045361,0.038648,644,0.071113,-29,-0.004509,644,0.257143,0.006712,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA"
Jefferson,54037,22669,15828,737,29404,18728,1738,0.059108,0.032511,6735,0.297102,2900,0.18322,6735,1.358209,0.026596,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA"
Kenosha,55059,77980,55476,1895,88879,58245,2995,0.033697,0.024301,10899,0.139767,2769,0.049913,10899,0.580475,0.009396,176,"Chicago-Naperville, IL-IN-WI"
Pierce,55093,22165,13868,621,23604,13421,1000,0.042366,0.028017,1439,0.064922,-447,-0.032232,1439,0.610306,0.014349,378,"Minneapolis-St. Paul, MN-WI"
St. Croix,55109,35867,26097,1024,48805,32472,1851,0.037926,0.02855,12938,0.360722,6375,0.244281,12938,0.807617,0.009377,378,"Minneapolis-St. Paul, MN-WI"


In [78]:
Labor_CO[Labor_CO['CSA']==408].shape

(31, 18)

### Exporting all counties Housing data to .csv

In [81]:
Labor_CO.to_csv('Labor_Counties.csv')

## Groupby CSAs to sum

In [82]:
Labor_CSA = Labor_CO.groupby(['CSA', 'CSA_name']).sum()

print(Labor_CSA.shape)
Labor_CSA

(15, 15)


Unnamed: 0_level_0,Unnamed: 1_level_0,LaborForce00,PrimeAge00,Over65_00,LaborForce,PrimeAge,Over65,over65_%_16,over65_%_00,NET_LaborTotal,%_LaborTotal,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,over65_%change
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",2538045,1870205,58002,3220087,2211700,122914,1.731715,1.059575,682042,9.964193,341495,6.10638,682042,44.846761,0.67214
148,"Boston-Worcester-Providence, MA-RI-NH-CT",4013723,2870795,139818,4500566,2828692,253765,1.157128,0.672544,486843,1.835709,-42103,-1.123138,486843,16.70962,0.484584
176,"Chicago-Naperville, IL-IN-WI",2119798,1516128,59594,2471875,1584629,113805,0.876852,0.563683,352077,3.265963,68501,1.323389,352077,15.56239,0.313169
206,"Dallas-Fort Worth, TX-OK",2868730,2071787,77125,3821344,2583048,164325,1.140047,0.706488,952614,6.056987,511261,3.997507,952614,23.72363,0.433558
216,"Denver-Aurora, CO",1456374,1065179,37734,1819491,1218564,80426,0.583459,0.259448,363117,2.544742,153385,0.693678,363117,20.294435,0.324011
220,"Detroit-Warren-Ann Arbor, MI",2696073,1952779,74461,2637416,1697656,110282,0.413208,0.263205,-58657,0.009534,-255123,-1.154306,-58657,6.108782,0.150003
288,"Houston-The Woodlands, TX",2355212,1712443,59110,3378973,2287083,142711,0.78665,0.480016,1023761,4.16863,574640,2.383241,1023761,17.174346,0.306635
348,"Los Angeles-Long Beach, CA",5724165,4130260,173676,6818638,4629945,305515,0.093024,0.061126,1094473,0.365121,499685,0.208421,1094473,1.593505,0.031898
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",2522648,1784755,125264,3317864,2203373,191049,0.471074,0.415385,795216,2.082441,418618,1.356908,795216,3.458144,0.055688
378,"Minneapolis-St. Paul, MN-WI",1882490,1349682,50862,2166725,1414085,89754,0.916349,0.623497,284235,4.301386,64403,2.045584,284235,18.581366,0.292852


### Exporting CSA's Housing data to .csv

In [84]:
Labor_CSA.to_csv('exports/Labor_CSA.csv')