# _Updates to make -- from Carolyn's notes, 082018:_

- Baseline data for +65 workforce
- Overall % of aging workforce 
- Draw conclusions

----
Author: Dana Chermesh, Regional Planning intern;
07-20-2018

### _US Metros comparison  Notebook no.2_
# Labor force
- **_ACS 5-yesr estimates 2012-2016 using Census API_**
- **_ACS 5-yesr estimates 2006-2010 using Census API_**




----

A user guide for Census Data API:

# [Census Data API User Guide](https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf)

The Census Data API in an API that gives the public access to raw statistical data from various Census Bureau data
programs. In terms of space, we aggregate the data and usually associate them with a
certain Census geographic boundary/area defined by a FIPS code. 

## _get your API key from:_ 
https://api.census.gov/data/key_signup.html

**Recommended:** In order to keep your API key confidential, please save your API key in a .py file named **censusAPI.py** as follows:

```python
myAPI = 'XXXXXXXXXXXXXXX'
```
Then read into this notebook as in the following cell:
```python
from censusAPI import myAPI
```

### The complete list of all available datasets for the API is located here:
https://api.census.gov/data.html

----
## Labor Force 2016
### _data were obtained from the ACS 2012-2016 5-year estimate, all counties in the US_

[list of variables](https://api.census.gov/data/2016/acs/acs5/variables)

variables to be acquired:
- **B23025_001E** |	All pop at age 16 and over
- **B23025_007E** | All pop at age 16 and over, not in labor force

for prime age (25-54):
- **B23001_025E** | Male 25 to 29 in labor force
- **B23001_032E** | Male 30 to 34 in labor force
- **B23001_039E** | Male 35 to 44 in labor force
- **B23001_046E** | Male 45 to 54 in labor force
- **B23001_111E** | Female 25 to 29 in labor force
- **B23001_118E** | Female 30 to 34 in labor force
- **B23001_125E** | Female 35 to 44 in labor force
- **B23001_132E** | Female 45 to 54 in labor force

for over 65:
- **B23001_074E** | Male 65 to 69 in labor force
- **B23001_079E** | Male 70 to 74 in labor force
- **B23001_084E** | Male 75 and over in labor force
- **B23001_160E** | Female 65 to 69 in labor force
- **B23001_165E** | Female 70 to 74 in labor force
- **B23001_170E** | Female 75 and over in labor force

for 55-64:
- **B23001_053E** | Male 54 to 59 in labor force
- **B23001_060E** | Male 60 to 61 in labor force
- **B23001_067E** | Male 62 and 64 in labor force
- **B23001_139E** | Female 54 to 59 in labor force
- **B23001_146E** | Female 60 to 61 in labor force
- **B23001_153E** | Female 62 and 64 in labor force

In [2]:
import pandas as pd
import json
# reading in my api key saved in censusAPI.py as
# myAPI = 'XXXXXXXXXXXXXXX'
# request an api key in: https://api.census.gov/data/key_signup.html
from censusAPI import myAPI

In [3]:
import json
import requests 
import urllib
import numpy as np

#read in in the variables available. the info you need is in the 1year ACS data
url = "https://api.census.gov/data/2016/acs/acs5/variables.json"
resp = requests.request('GET', url)
aff1y = json.loads(resp.text)

In [4]:
#turning things into arrays to enable broadcasting
#Python3
affkeys = np.array(list(aff1y['variables'].keys()))

affkeys

array(['B17022_029E', 'B07404GPR_004E', 'C23002E_002E', ...,
       'B20005E_037E', 'B24012_045E', 'B17022_049E'], dtype='<U14')

In [5]:
print(aff1y['variables']['B23025_001E'])
print(aff1y['variables']['B23025_007E'])

{'limit': 0, 'predicateType': 'int', 'label': 'Estimate!!Total', 'attributes': 'B23025_001M,B23025_001MA,B23025_001EA', 'concept': 'EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER', 'group': 'B23025'}
{'limit': 0, 'predicateType': 'int', 'label': 'Estimate!!Total!!Not in labor force', 'attributes': 'B23025_007M,B23025_007MA,B23025_007EA', 'concept': 'EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER', 'group': 'B23025'}


In [6]:
Labor16vars = ['B23025_001E', 'B23025_007E', 'B23001_025E', 'B23001_032E',
               'B23001_039E', 'B23001_046E', 'B23001_053E', 'B23001_060E',
               'B23001_067E', 'B23001_139E', 'B23001_146E', 'B23001_111E',
               'B23001_153E', 'B23001_118E', 'B23001_125E', 'B23001_132E', 
               'B23001_074E', 'B23001_079E', 'B23001_084E', 'B23001_160E',
               'B23001_165E', 'B23001_170E']

Labor16str = ",".join(Labor16vars)
Labor16str

'B23025_001E,B23025_007E,B23001_025E,B23001_032E,B23001_039E,B23001_046E,B23001_053E,B23001_060E,B23001_067E,B23001_139E,B23001_146E,B23001_111E,B23001_153E,B23001_118E,B23001_125E,B23001_132E,B23001_074E,B23001_079E,B23001_084E,B23001_160E,B23001_165E,B23001_170E'

In [7]:
# Labor Force data for all counties in the US, 2016
Labor16 = pd.read_json('https://api.census.gov/data/2016/acs/acs5?get='+
                          Labor16str +
                         ',NAME&for=county:*&in=state:*')
Labor16.columns = Labor16.iloc[0]
Labor16 = Labor16[1:]

Labor16['state'] = Labor16['state'].apply(lambda x: '{0:0>2}'.format(x))
Labor16['county'] = Labor16['county'].apply(lambda x: '{0:0>3}'.format(x))
Labor16['STCO'] = Labor16[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

# converting dtypes to int
for col in Labor16vars:
    Labor16[col] = Labor16[col].astype(int)

# aggregating age groups and calculations
Labor16['LaborForce'] = Labor16['B23025_001E'] - Labor16['B23025_007E']
Labor16['PrimeAge'] = Labor16['B23001_025E'] + Labor16['B23001_032E'] +\
                      Labor16['B23001_039E'] + Labor16['B23001_046E'] +\
                      Labor16['B23001_111E'] + Labor16['B23001_118E'] +\
                      Labor16['B23001_125E'] + Labor16['B23001_132E']
Labor16['Over65'] = Labor16['B23001_074E'] + Labor16['B23001_079E'] +\
                    Labor16['B23001_084E'] + Labor16['B23001_160E'] +\
                    Labor16['B23001_165E'] + Labor16['B23001_170E']
Labor16['55-64'] = Labor16['B23001_053E'] + Labor16['B23001_060E'] +\
                   Labor16['B23001_067E'] + Labor16['B23001_139E'] +\
                   Labor16['B23001_146E'] + Labor16['B23001_153E']
Labor16['Under25'] = Labor16['LaborForce'] - Labor16['PrimeAge'] -\
                     Labor16['Over65'] - Labor16['55-64']

Labor16 = Labor16.drop(['state', 'county', 'B23025_001E', 'B23025_007E',
                        'B23001_025E', 'B23001_032E', 'B23001_039E',
                        'B23001_046E', 'B23001_111E', 'B23001_118E', 
                        'B23001_125E', 'B23001_132E', 'B23001_074E',
                        'B23001_079E', 'B23001_084E', 'B23001_160E',
                        'B23001_139E', 'B23001_146E', 'B23001_153E',
                        'B23001_053E', 'B23001_060E', 'B23001_067E',
                        'B23001_165E', 'B23001_170E'], axis=1)
Labor16.columns = ['Name', 'STCO', 'LaborForce', 'PrimeAge', 
                   'Over65', '55-64', 'Under25']

print(Labor16.shape)
print(Labor16.dtypes)
Labor16.head()

(3220, 7)
Name          object
STCO          object
LaborForce     int64
PrimeAge       int64
Over65         int64
55-64          int64
Under25        int64
dtype: object


Unnamed: 0,Name,STCO,LaborForce,PrimeAge,Over65,55-64,Under25
1,"Autauga County, Alabama",1001,26008,17356,978,4086,3588
2,"Baldwin County, Alabama",1003,93872,59889,5505,16197,12281
3,"Barbour County, Alabama",1005,10316,6397,814,1720,1385
4,"Bibb County, Alabama",1007,9002,6019,600,1364,1019
5,"Blount County, Alabama",1009,22969,15351,1129,3829,2660


----
## Labor Force 2000
### _data were obtained from US Census Bureau Decennial 2000 Census, SF3_

[list of variables](https://api.census.gov/data/2000/sf3/variables.html)

variables to be acquired:
- **P043003** | Male In labor force
- **P043010** | Female In labor force

for prime age (25-54):
- **PCT035025** | Male 25 to 29 in labor force
- **PCT035032** | Male 30 to 34 in labor force
- **PCT035039** | Male 35 to 44 in labor force
- **PCT035046** | Male 45 to 54 in labor force
- **PCT035117** | Female 25 to 29 in labor force
- **PCT035124** | Female 30 to 34 in labor force
- **PCT035131** | Female 35 to 44 in labor force
- **PCT035138** | Female 45 to 54 in labor force

for over 65:
- **PCT035074** | Male 65 to 69 in labor force
- **PCT035081** | Male 70 to 74 in labor force
- **PCT035088** | Male 75 and over in labor force
- **PCT035166** | Female 65 to 69 in labor force
- **PCT035173** | Female 70 to 74 in labor force
- **PCT035180** | Female 75 and over in labor force

for 55-64:
- **PCT035053** | Male 55 to 59 in labor force
- **PCT035060** | Male 60 to 61 in labor force
- **PCT035067** | Male 62 to 64 in labor force
- **PCT035145** | Female 55 to 59 in labor force
- **PCT035152** | Female 60 to 61 in labor force
- **PCT035159** | Female 62 to 64 in labor force

In [8]:
Labor00vars = ['P043003', 'P043010', 'PCT035025', 'PCT035032',
               'PCT035039', 'PCT035046', 'PCT035053', 'PCT035060',
               'PCT035067', 'PCT035145', 'PCT035152', 'PCT035159',
               'PCT035117', 'PCT035124', 'PCT035131', 'PCT035138',
               'PCT035074', 'PCT035081', 'PCT035088', 'PCT035166', 
               'PCT035173', 'PCT035180']

Labor00str = ",".join(Labor00vars)
Labor00str

'P043003,P043010,PCT035025,PCT035032,PCT035039,PCT035046,PCT035053,PCT035060,PCT035067,PCT035145,PCT035152,PCT035159,PCT035117,PCT035124,PCT035131,PCT035138,PCT035074,PCT035081,PCT035088,PCT035166,PCT035173,PCT035180'

In [9]:
# Labor Force data for all counties in the US, 2000
Labor00 = pd.read_json('https://api.census.gov/data/2000/sf3?get='+
                        Labor00str +
                        ',NAME&for=county:*&in=state:*')
Labor00.columns = Labor00.iloc[0]
Labor00 = Labor00[1:]

Labor00['state'] = Labor00['state'].apply(lambda x: '{0:0>2}'.format(x))
Labor00['county'] = Labor00['county'].apply(lambda x: '{0:0>3}'.format(x))
Labor00['STCO'] = Labor00[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

# converting dtypes to int
for col in Labor00vars:
    Labor00[col] = Labor00[col].astype(int)

Labor00['LaborForce00'] = Labor00['P043003'] + Labor00['P043010']
Labor00['PrimeAge00'] = Labor00['PCT035025'] + Labor00['PCT035032'] +\
                      Labor00['PCT035039'] + Labor00['PCT035046'] +\
                      Labor00['PCT035117'] + Labor00['PCT035124'] +\
                      Labor00['PCT035131'] + Labor00['PCT035138']
Labor00['Over65_00'] = Labor00['PCT035074'] + Labor00['PCT035081'] +\
                       Labor00['PCT035088'] + Labor00['PCT035166'] +\
                       Labor00['PCT035173'] + Labor00['PCT035180']
Labor00['55-64_00'] = Labor00['PCT035053'] + Labor00['PCT035060'] +\
                      Labor00['PCT035067'] + Labor00['PCT035145'] +\
                      Labor00['PCT035152'] + Labor00['PCT035159']
Labor00['Under25_00'] = Labor00['LaborForce00'] - Labor00['PrimeAge00'] -\
                        Labor00['Over65_00'] - Labor00['55-64_00']

Labor00 = Labor00.drop(['state', 'county', 'P043003', 'P043010',
                        'PCT035025', 'PCT035032', 'PCT035039',
                        'PCT035046', 'PCT035117', 'PCT035124', 
                        'PCT035131', 'PCT035138', 'PCT035074',
                        'PCT035081', 'PCT035088', 'PCT035166',
                        'PCT035053', 'PCT035060', 'PCT035067',
                        'PCT035145', 'PCT035152', 'PCT035159',
                        'PCT035173', 'PCT035180'], axis=1)
Labor00.columns = ['Name', 'STCO', 'LaborForce00', 'PrimeAge00',
                   'Over65_00', '55-64_00', 'Under25_00']

print(Labor00.shape)
print(Labor00.dtypes)
Labor00.head()

(3219, 7)
Name            object
STCO            object
LaborForce00     int64
PrimeAge00       int64
Over65_00        int64
55-64_00         int64
Under25_00       int64
dtype: object


Unnamed: 0,Name,STCO,LaborForce00,PrimeAge00,Over65_00,55-64_00,Under25_00
1,Autauga County,1001,21167,15026,588,2262,3291
2,Baldwin County,1003,65960,46800,2499,7771,8890
3,Barbour County,1005,10826,7656,473,1130,1567
4,Bibb County,1007,8521,6144,204,917,1256
5,Blount County,1009,23896,16867,832,2869,3328


## Merging 2000 and 2016 data

In [10]:
Labor = Labor00.merge(Labor16, on='STCO')

Labor = Labor.drop(['Name_x'], axis=1)
Labor = Labor.append(Labor.sum(numeric_only=True), ignore_index=True)
Labor = Labor.set_index('Name_y').fillna(0).astype(int)

Labor['over65_%_16'] = Labor['Over65'] / Labor['LaborForce']
Labor['over65_%_00'] = Labor['Over65_00'] / Labor['LaborForce00']

Labor['over65_%change'] = Labor['over65_%_16'] - Labor['over65_%_00']

Labor['NET_LaborTotal'] = Labor['LaborForce'] - Labor['LaborForce00']
Labor['%_LaborTotal'] = (Labor['LaborForce'] - Labor['LaborForce00']) \
                         / Labor['LaborForce00']
Labor['NET_PrimeAge'] = Labor['PrimeAge'] - Labor['PrimeAge00']
Labor['%_PrimeAge'] = (Labor['PrimeAge'] - Labor['PrimeAge00']) \
                      / Labor['PrimeAge00']
Labor['NET_Over65'] = Labor['LaborForce'] - Labor['LaborForce00']
Labor['%_Over65'] = (Labor['Over65'] - Labor['Over65_00']) \
                     / Labor['Over65_00']
Labor['NET_55-64'] = Labor['55-64'] - Labor['55-64_00']
Labor['%_55-64'] = (Labor['55-64'] - Labor['55-64_00']) \
                      / Labor['55-64_00']
Labor['NET_Under25'] = Labor['Under25'] - Labor['Under25_00']
Labor['%_Under25'] = (Labor['Under25'] - Labor['Under25_00']) \
                     / Labor['Under25_00']


Labor.index = Labor.index.fillna('UStotal')

print(Labor.shape)
Labor.tail()

(3213, 24)


Unnamed: 0_level_0,STCO,LaborForce00,PrimeAge00,Over65_00,55-64_00,Under25_00,LaborForce,PrimeAge,Over65,55-64,...,NET_LaborTotal,%_LaborTotal,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,NET_55-64,%_55-64,NET_Under25,%_Under25
Name_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Vieques Municipio, Puerto Rico",72147,2395,1657,33,343,362,3342,2096,161,558,...,947,0.395407,439,0.264937,947,3.878788,215,0.626822,165,0.455801
"Villalba Municipio, Puerto Rico",72149,7463,5547,87,355,1474,9278,6407,131,1077,...,1815,0.2432,860,0.155039,1815,0.505747,722,2.033803,189,0.128223
"Yabucoa Municipio, Puerto Rico",72151,9498,6812,52,477,2157,10802,7863,109,932,...,1304,0.137292,1051,0.154287,1304,1.096154,455,0.953878,-259,-0.120074
"Yauco Municipio, Puerto Rico",72153,13622,10020,163,1076,2363,11931,8941,333,1492,...,-1691,-0.124137,-1079,-0.107685,-1691,1.042945,416,0.386617,-1198,-0.506983
UStotal,0,139958254,98706665,4663882,14095193,22492514,162051329,104821609,8002699,25906804,...,22093075,0.157855,6114944,0.061951,22093075,0.715888,11811611,0.837989,827703,0.036799


In [11]:
Labor.dtypes

STCO                int64
LaborForce00        int64
PrimeAge00          int64
Over65_00           int64
55-64_00            int64
Under25_00          int64
LaborForce          int64
PrimeAge            int64
Over65              int64
55-64               int64
Under25             int64
over65_%_16       float64
over65_%_00       float64
over65_%change    float64
NET_LaborTotal      int64
%_LaborTotal      float64
NET_PrimeAge        int64
%_PrimeAge        float64
NET_Over65          int64
%_Over65          float64
NET_55-64           int64
%_55-64           float64
NET_Under25         int64
%_Under25         float64
dtype: object

In [12]:
Labor['STCO'] = Labor['STCO'].astype(str)
Labor['STCO'] = Labor['STCO'].apply(lambda x: '{0:0>5}'.format(x))

Labor.dtypes[:1]

STCO    object
dtype: object

##  Reading in geo-coded dataset
created on a different notebook, please refer to _**ADD NOTEBOOK NAME**_

In [13]:
geo = pd.read_csv('../rp-USmetros_comparison/data/USmetros_full.csv').iloc[:,:-2] \
                                .drop(['Unnamed: 0', 'SHAPE_AREA'], axis=1)
geo['STCO'] = geo['STCO'].apply(lambda x: '{0:0>5}'.format(x))

print(geo.shape)
geo.head()

(270, 4)


Unnamed: 0,CSA,CSA_name,County_name,STCO
0,488,"San Jose-San Francisco-Oakland, CA",Alameda,6001
1,488,"San Jose-San Francisco-Oakland, CA",Contra Costa,6013
2,488,"San Jose-San Francisco-Oakland, CA",Marin,6041
3,488,"San Jose-San Francisco-Oakland, CA",Napa,6055
4,488,"San Jose-San Francisco-Oakland, CA",San Benito,6069


### Merging datasets

In [14]:
Labor_CO = Labor.merge(geo, on='STCO').set_index('County_name')

print(Labor_CO.shape)
Labor_CO.tail()

(269, 26)


Unnamed: 0_level_0,STCO,LaborForce00,PrimeAge00,Over65_00,55-64_00,Under25_00,LaborForce,PrimeAge,Over65,55-64,...,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,NET_55-64,%_55-64,NET_Under25,%_Under25,CSA,CSA_name
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hampshire,54027,9056,6431,350,1063,1212,9700,6402,440,1813,...,-29,-0.004509,644,0.257143,750,0.70555,-167,-0.137789,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA"
Jefferson,54037,22669,15828,737,2454,3650,29404,18728,1738,5268,...,2900,0.18322,6735,1.358209,2814,1.146699,20,0.005479,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA"
Kenosha,55059,77980,55476,1895,7099,13510,88879,58245,2995,13362,...,2769,0.049913,10899,0.580475,6263,0.882237,767,0.056773,176,"Chicago-Naperville, IL-IN-WI"
Pierce,55093,22165,13868,621,1839,5837,23604,13421,1000,3881,...,-447,-0.032232,1439,0.610306,2042,1.110386,-535,-0.091657,378,"Minneapolis-St. Paul, MN-WI"
St. Croix,55109,35867,26097,1024,3228,5518,48805,32472,1851,8192,...,6375,0.244281,12938,0.807617,4964,1.537794,772,0.139906,378,"Minneapolis-St. Paul, MN-WI"


In [15]:
Labor_CO[Labor_CO['CSA']==408].shape

(31, 26)

### Exporting all counties Housing data to .csv

In [25]:
Labor_CO.to_csv('exports/Labor_Counties.csv')

## Groupby CSAs to sum

In [16]:
Labor_CSA = Labor_CO.groupby(['CSA', 'CSA_name']).sum()

print(Labor_CSA.shape)
Labor_CSA

(15, 23)


Unnamed: 0_level_0,Unnamed: 1_level_0,LaborForce00,PrimeAge00,Over65_00,55-64_00,Under25_00,LaborForce,PrimeAge,Over65,55-64,Under25,...,NET_LaborTotal,%_LaborTotal,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,NET_55-64,%_55-64,NET_Under25,%_Under25
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",2538045,1870205,58002,216304,393534,3220087,2211700,122914,458259,427214,...,682042,9.964193,341495,6.10638,682042,44.846761,241955,38.912911,33680,5.456478
148,"Boston-Worcester-Providence, MA-RI-NH-CT",4013723,2870795,139818,421332,581778,4500566,2828692,253765,775288,642821,...,486843,1.835709,-42103,-1.123138,486843,16.70962,353956,16.555633,61043,1.963579
176,"Chicago-Naperville, IL-IN-WI",2119798,1516128,59594,212143,331933,2471875,1584629,113805,413335,360106,...,352077,3.265963,68501,1.323389,352077,15.56239,201192,16.412937,28173,1.158686
206,"Dallas-Fort Worth, TX-OK",2868730,2071787,77125,257825,461993,3821344,2583048,164325,545704,528267,...,952614,6.056987,511261,3.997507,952614,23.72363,287879,19.243305,66274,3.152351
216,"Denver-Aurora, CO",1456374,1065179,37734,129402,224059,1819491,1218564,80426,276459,244042,...,363117,2.544742,153385,0.693678,363117,20.294435,147057,16.274107,19983,1.264906
220,"Detroit-Warren-Ann Arbor, MI",2696073,1952779,74461,251732,417101,2637416,1697656,110282,430378,399100,...,-58657,0.009534,-255123,-1.154306,-58657,6.108782,178646,7.855947,-18001,-0.307643
288,"Houston-The Woodlands, TX",2355212,1712443,59110,211555,372104,3378973,2287083,142711,491314,457865,...,1023761,4.16863,574640,2.383241,1023761,17.174346,279759,15.500204,85761,2.477587
348,"Los Angeles-Long Beach, CA",5724165,4130260,173676,532771,887458,6818638,4629945,305515,997166,886012,...,1094473,0.365121,499685,0.208421,1094473,1.593505,464395,1.699893,-1446,0.0294
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",2522648,1784755,125264,284949,327680,3317864,2203373,191049,540022,383420,...,795216,2.082441,418618,1.356908,795216,3.458144,255073,6.369128,55740,1.451164
378,"Minneapolis-St. Paul, MN-WI",1882490,1349682,50862,172298,309648,2166725,1414085,89754,350318,312568,...,284235,4.301386,64403,2.045584,284235,18.581366,178020,23.485376,2920,1.335435


### Exporting CSA's Housing data to .csv

In [27]:
Labor_CSA.to_csv('exports/Labor_CSA.csv')

----

# Labor Force by Places
_Identifying City vs Suburb trends_

## 2016

In [20]:
# Labor Force data for all counties in the US, 2016
Labor16_place = pd.read_json('https://api.census.gov/data/2016/acs/acs5?get='+
                              Labor16str +
                             ',NAME&for=place:*&in=state:*')
Labor16_place.columns = Labor16_place.iloc[0]
Labor16_place = Labor16_place[1:]

Labor16_place['state'] = Labor16_place['state'].apply(lambda x: '{0:0>2}'.format(x))
Labor16_place['place'] = Labor16_place['place'].apply(lambda x: '{0:0>3}'.format(x))
Labor16_place['STPL'] = Labor16_place[['state', 'place']].apply(lambda x: ''.join(x), axis=1)

# converting dtypes to int
for col in Labor16vars:
    Labor16_place[col] = Labor16_place[col].astype(int)

# aggregating age groups and calculations
Labor16_place['LaborForce'] = Labor16_place['B23025_001E'] - Labor16_place['B23025_007E']
Labor16_place['PrimeAge'] = Labor16_place['B23001_025E'] + Labor16_place['B23001_032E'] +\
                            Labor16_place['B23001_039E'] + Labor16_place['B23001_046E'] +\
                            Labor16_place['B23001_111E'] + Labor16_place['B23001_118E'] +\
                            Labor16_place['B23001_125E'] + Labor16_place['B23001_132E']
Labor16_place['Over65'] = Labor16_place['B23001_074E'] + Labor16_place['B23001_079E'] +\
                          Labor16_place['B23001_084E'] + Labor16_place['B23001_160E'] +\
                          Labor16_place['B23001_165E'] + Labor16_place['B23001_170E']
Labor16_place['55-64'] = Labor16_place['B23001_053E'] + Labor16_place['B23001_060E'] +\
                         Labor16_place['B23001_067E'] + Labor16_place['B23001_139E'] +\
                         Labor16_place['B23001_146E'] + Labor16_place['B23001_153E']
Labor16_place['Under25'] = Labor16_place['LaborForce'] - Labor16_place['PrimeAge'] -\
                           Labor16_place['Over65'] - Labor16_place['55-64']

Labor16_place = Labor16_place.drop(['state', 'place', 'B23025_001E', 'B23025_007E',
                                    'B23001_025E', 'B23001_032E', 'B23001_039E',
                                    'B23001_046E', 'B23001_111E', 'B23001_118E', 
                                    'B23001_125E', 'B23001_132E', 'B23001_074E',
                                    'B23001_079E', 'B23001_084E', 'B23001_160E',
                                    'B23001_139E', 'B23001_146E', 'B23001_153E',
                                    'B23001_053E', 'B23001_060E', 'B23001_067E',
                                    'B23001_165E', 'B23001_170E'], axis=1)
Labor16_place.columns = ['Name', 'STPL', 'LaborForce', 'PrimeAge', 
                         'Over65', '55-64', 'Under25']

print(Labor16_place.shape)
print(Labor16_place.dtypes)
Labor16_place.head()

(29574, 7)
Name          object
STPL          object
LaborForce     int64
PrimeAge       int64
Over65         int64
55-64          int64
Under25        int64
dtype: object


Unnamed: 0,Name,STPL,LaborForce,PrimeAge,Over65,55-64,Under25
1,"Abanda CDP, Alabama",100100,74,34,0,40,0
2,"Abbeville city, Alabama",100124,855,466,56,204,129
3,"Adamsville city, Alabama",100460,2107,1314,104,425,264
4,"Addison town, Alabama",100484,317,184,14,60,59
5,"Akron town, Alabama",100676,93,56,2,11,24


## 2000

In [36]:
# Labor Force data for all counties in the US, 2000
Labor00_place = pd.read_json('https://api.census.gov/data/2000/sf3?get='+
                              Labor00str +
                              ',NAME&for=place:*&in=state:*')
Labor00_place.columns = Labor00_place.iloc[0]
Labor00_place = Labor00_place[1:]

Labor00_place['state'] = Labor00_place['state'].apply(lambda x: '{0:0>2}'.format(x))
Labor00_place['place'] = Labor00_place['place'].apply(lambda x: '{0:0>3}'.format(x))
Labor00_place['STPL'] = Labor00_place[['state', 'place']].apply(lambda x: ''.join(x), axis=1)

# converting dtypes to int
for col in Labor00vars:
    Labor00_place[col] = Labor00_place[col].astype(int)

Labor00_place['LaborForce00'] = Labor00_place['P043003'] + Labor00_place['P043010']
Labor00_place['PrimeAge00'] = Labor00_place['PCT035025'] + Labor00_place['PCT035032'] +\
                              Labor00_place['PCT035039'] + Labor00_place['PCT035046'] +\
                              Labor00_place['PCT035117'] + Labor00_place['PCT035124'] +\
                              Labor00_place['PCT035131'] + Labor00_place['PCT035138']
Labor00_place['Over65_00'] = Labor00_place['PCT035074'] + Labor00_place['PCT035081'] +\
                             Labor00_place['PCT035088'] + Labor00_place['PCT035166'] +\
                             Labor00_place['PCT035173'] + Labor00_place['PCT035180']
Labor00_place['55-64_00'] = Labor00_place['PCT035053'] + Labor00_place['PCT035060'] +\
                            Labor00_place['PCT035067'] + Labor00_place['PCT035145'] +\
                            Labor00_place['PCT035152'] + Labor00_place['PCT035159']
Labor00_place['Under25_00'] = Labor00_place['LaborForce00'] - Labor00_place['PrimeAge00'] -\
                              Labor00_place['Over65_00'] - Labor00_place['55-64_00']

Labor00_place = Labor00_place.drop(['state', 'place', 'P043003', 'P043010',
                                    'PCT035025', 'PCT035032', 'PCT035039',
                                    'PCT035046', 'PCT035117', 'PCT035124', 
                                    'PCT035131', 'PCT035138', 'PCT035074',
                                    'PCT035081', 'PCT035088', 'PCT035166',
                                    'PCT035053', 'PCT035060', 'PCT035067',
                                    'PCT035145', 'PCT035152', 'PCT035159',
                                    'PCT035173', 'PCT035180'], axis=1)
Labor00_place.columns = ['Name', 'STPL', 'LaborForce00', 'PrimeAge00',
                         'Over65_00', '55-64_00', 'Under25_00']

print(Labor00_place.shape)
print(Labor00_place.dtypes)
Labor00_place.head()

(25375, 7)
Name            object
STPL            object
LaborForce00     int64
PrimeAge00       int64
Over65_00        int64
55-64_00         int64
Under25_00       int64
dtype: object


Unnamed: 0,Name,STPL,LaborForce00,PrimeAge00,Over65_00,55-64_00,Under25_00
1,Abbeville city,100124,1269,804,60,167,238
2,Adamsville city,100460,2419,1706,93,310,310
3,Addison town,100484,334,219,10,61,44
4,Akron town,100676,178,142,3,16,17
5,Alabaster city,100820,12609,10064,221,1034,1290


### Merging 2000 + 2016 data

In [37]:
Labor_place = Labor00_place.merge(Labor16_place, on='STPL')

Labor_place = Labor_place.drop(['Name_x'], axis=1)
Labor_place = Labor_place.append(Labor_place.sum(numeric_only=True), ignore_index=True)
Labor_place = Labor_place.set_index('Name_y').fillna(0).astype(int)

Labor_place['over65_%_16'] = Labor_place['Over65'] / Labor_place['LaborForce']
Labor_place['over65_%_00'] = Labor_place['Over65_00'] / Labor_place['LaborForce00']

Labor_place['over65_%change'] = Labor_place['over65_%_16'] - Labor_place['over65_%_00']

Labor_place['NET_LaborTotal'] = Labor_place['LaborForce'] - Labor_place['LaborForce00']
Labor_place['%_LaborTotal'] = (Labor_place['LaborForce'] - Labor_place['LaborForce00']) \
                               / Labor_place['LaborForce00']
Labor_place['NET_PrimeAge'] = Labor_place['PrimeAge'] - Labor_place['PrimeAge00']
Labor_place['%_PrimeAge'] = (Labor_place['PrimeAge'] - Labor_place['PrimeAge00']) \
                            / Labor_place['PrimeAge00']
Labor_place['NET_Over65'] = Labor_place['LaborForce'] - Labor_place['LaborForce00']
Labor_place['%_Over65'] = (Labor_place['Over65'] - Labor_place['Over65_00']) \
                     / Labor_place['Over65_00']
Labor_place['NET_55-64'] = Labor_place['55-64'] - Labor_place['55-64_00']
Labor_place['%_55-64'] = (Labor_place['55-64'] - Labor_place['55-64_00']) \
                          / Labor_place['55-64_00']
Labor_place['NET_Under25'] = Labor_place['Under25'] - Labor_place['Under25_00']
Labor_place['%_Under25'] = (Labor_place['Under25'] - Labor_place['Under25_00']) \
                            / Labor_place['Under25_00']


Labor_place.index = Labor_place.index.fillna('UStotal')

print(Labor_place.shape)
Labor_place.tail()

(24650, 24)


Unnamed: 0_level_0,STPL,LaborForce00,PrimeAge00,Over65_00,55-64_00,Under25_00,LaborForce,PrimeAge,Over65,55-64,...,NET_LaborTotal,%_LaborTotal,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,NET_55-64,%_55-64,NET_Under25,%_Under25
Name_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Villalba zona urbana, Puerto Rico",7286831,1197,912,31,61,193,1494,1019,68,160,...,297,0.24812,107,0.117325,297,1.193548,99,1.622951,54,0.279793
"Yabucoa zona urbana, Puerto Rico",7287863,1458,1100,18,89,251,2226,1704,31,170,...,768,0.526749,604,0.549091,768,0.722222,81,0.910112,70,0.278884
"Yauco zona urbana, Puerto Rico",7288035,6064,4395,118,509,1042,5834,4475,181,778,...,-230,-0.037929,80,0.018203,-230,0.533898,269,0.528487,-642,-0.616123
"Yaurel comunidad, Puerto Rico",7288121,326,223,9,31,63,276,206,0,27,...,-50,-0.153374,-17,-0.076233,-50,-1.0,-4,-0.129032,-20,-0.31746
UStotal,0,98868078,69253950,3292657,9460639,16860832,116600623,76194405,5423319,17452708,...,17732545,0.179356,6940455,0.100217,17732545,0.647095,7992069,0.844771,669359,0.039699


## Reading in geo-coded places dataset

Created by Dara Goldberg, DCP Regional Planning

In [41]:
geoPlace = pd.read_csv('../rp-USmetros_comparison/data/Geocoded_places.csv')
geoPlace['GEOID'] = geoPlace['GEOID'].apply(lambda x: '{0:0>7}'.format(x))
geoPlace['GEOID'] = geoPlace['GEOID'].astype(int)

print(geoPlace.shape)
geoPlace.head(3)

(19, 5)


Unnamed: 0,GEOID,NAMELSAD,NAME,CSA,ALAND_mi
0,644000,"Los Angeles city, California",Los Angeles,348,468.65867
1,653000,"Oakland city, California",Oakland,488,55.89604
2,667000,"San Francisco city, California",San Francisco,488,46.90564


In [42]:
geoPlace.dtypes

GEOID         int64
NAMELSAD     object
NAME         object
CSA           int64
ALAND_mi    float64
dtype: object

In [43]:
# merging all places with our target places list
Labor_place = geoPlace.merge(Labor_place, left_on = 'GEOID', right_on = 'STPL')

print(Labor_place.shape)
Labor_place.head(20)

(19, 29)


Unnamed: 0,GEOID,NAMELSAD,NAME,CSA,ALAND_mi,STPL,LaborForce00,PrimeAge00,Over65_00,55-64_00,...,NET_LaborTotal,%_LaborTotal,NET_PrimeAge,%_PrimeAge,NET_Over65,%_Over65,NET_55-64,%_55-64,NET_Under25,%_Under25
0,644000,"Los Angeles city, California",Los Angeles,348,468.65867,644000,1690316,1214480,55796,144477,...,407149,0.240872,243548,0.200537,407149,0.641103,131894,0.912907,-4064,-0.014748
1,653000,"Oakland city, California",Oakland,488,55.89604,653000,190725,141902,5270,16817,...,37040,0.194206,20618,0.145297,37040,0.865655,13691,0.814117,-1831,-0.068484
2,667000,"San Francisco city, California",San Francisco,488,46.90564,667000,448669,345232,13904,38633,...,72495,0.161578,42511,0.123137,72495,0.52244,29845,0.772526,-7125,-0.13998
3,668000,"San Jose city, California",San Jose,488,177.5141,668000,456641,338383,10119,41308,...,83610,0.183098,40108,0.118528,83610,0.982706,35662,0.863319,-2104,-0.031482
4,820000,"Denver city, Colorado",Denver,216,153.30483,820000,301714,219052,9973,23958,...,79500,0.263495,59239,0.270434,79500,0.462649,20305,0.847525,-4658,-0.095586
5,1150000,"Washington city, District of Columbia",Washington,548,61.13988,1150000,298225,206426,10559,29125,...,86831,0.291159,69031,0.33441,86831,0.649777,14466,0.496687,-3527,-0.067677
6,1245000,"Miami city, Florida",Miami,370,35.98691,1245000,147356,101253,7758,17988,...,75666,0.513491,58985,0.582551,75666,0.300335,11832,0.657772,2519,0.123741
7,1304000,"Atlanta city, Georgia",Atlanta,122,133.43344,1304000,213257,148279,6304,16086,...,32616,0.152942,27467,0.185239,32616,0.154029,8778,0.545692,-4600,-0.108012
8,1714000,"Chicago city, Illinois",Chicago,176,227.3401,1714000,1358054,967050,36342,115719,...,91899,0.06767,64830,0.067039,91899,0.305652,58179,0.502761,-42218,-0.176686
9,2507000,"Boston city, Massachusetts",Boston,148,48.34364,2507000,308395,212308,6721,22434,...,77176,0.25025,48269,0.227354,77176,0.928731,21068,0.93911,1597,0.02386


In [44]:
Labor_place.to_csv('exports/Labor_place.csv')