07-11-2018

**_Author: Dana Chermesh, Regional Planning intern_**


# US Metros comparison 
### comparison by the county level of 15 regions (CSA's) accross the country

---- 

## _Notebook no.4 -- HOUSING_ 
### retrieved from _Census Bureau Building Permits Survey County and Place level for 2017 annually_

----


In [1]:
import pandas as pd

# reading in my api key saved in censusAPI.py as
# myAPI = 'XXXXXXXXXXXXXXX'
# request an api key in: https://api.census.gov/data/key_signup.html
from censusAPI import myAPI

# Data

## 1.1 Total housing units 2010 (base point); _Decennial Census 2010, SF1_
Data were retrieved using my census API (**dana explanation here**)
- [variables](https://api.census.gov/data/2010/sf1/variables.html)

In [2]:
# total POP and total housing units for all counties in the US, 2010
# P001001 = total pop
# H00010001 = total housing units

housing10 = pd.read_json('https://api.census.gov/data/2010/sf1?get=P0010001,H00010001'+
                         '&for=county:*&in=state:*&key='+myAPI)
housing10.columns = housing10.iloc[0]
housing10 = housing10[1:]

housing10.columns = ['pop2010', 'hu2010', 'state', 'county']
housing10['STCO'] = housing10[['state', 'county']].apply(lambda x: ''.join(x), axis=1)

print(housing10.shape)
housing10.head()

(3221, 5)


Unnamed: 0,pop2010,hu2010,state,county,STCO
1,54571,22135,1,1,1001
2,182265,104061,1,3,1003
3,27457,11829,1,5,1005
4,22915,8981,1,7,1007
5,57322,23887,1,9,1009


## 1.2 Perimts issueance data from 2010 to 2017 - [Building Permits Survey](https://www.census.gov/construction/bps/)
The data were retrieved from the Census Bureau [Building Permits Survey](https://www.census.gov/construction/bps/), [Permits by County or Place](http://www2.census.gov/econ/bps).

For downloading the data, please go to the [County/](https://www2.census.gov/econ/bps/County/) page or the [Places/](https://www2.census.gov/econ/bps/Place/) page, and choose:
- [co2017a.txt](https://www2.census.gov/econ/bps/County/co2017a.txt) 
- [co2016a.txt](https://www2.census.gov/econ/bps/County/co2016a.txt) 
- [co2015a.txt](https://www2.census.gov/econ/bps/County/co2015a.txt) 
- [co2014a.txt](https://www2.census.gov/econ/bps/County/co2014a.txt) 
- [co2013a.txt](https://www2.census.gov/econ/bps/County/co2013a.txt) 
- [co2012a.txt](https://www2.census.gov/econ/bps/County/co2012a.txt) 
- [co2011a.txt](https://www2.census.gov/econ/bps/County/co2011a.txt) 
- [co2010a.txt](https://www2.census.gov/econ/bps/County/co2010a.txt) 

and:
- [ne2017a.txt](https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2017a.txt)
- [mw2017a.txt](https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2017a.txt)
- [so2017a.txt](https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2017a.txt)
- [we2017a.txt](https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2017a.txt)<br>
.<br>
.<br>
.<br>
.<br>

Data can be read directly to this notebook using pandas `read_table`, as bellow.

In [87]:
tablesCO = ['co2017a', 'co2016a', 'co2015a', 'co2014a',
            'co2013a', 'co2012a', 'co2011a', 'co2010a']

COdata = []

for year in tablesCO:
    df = 'co'+year

    df = pd.read_table('https://www2.census.gov/econ/bps/County/co2017a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

    df.columns = df.iloc[0]
    df = df[1:].set_index(['Name'])

    df = df.drop(['Code','Bldgs', 'Value'], axis=1)
    df.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

    df = df.astype(int)
    df['1-2units'] = df['1unit'] + df['2unit']
    df['+3units'] = df['3-4unit'] + df['+5unit']
    df = df.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

    df['State'] = df['State'].apply(lambda x: '{0:0>2}'.format(x))
    df['County'] = df['County'].apply(lambda x: '{0:0>3}'.format(x))

    df['STCO'] = df[['State', 'County']].apply(lambda x: ''.join(x), axis=1)

    COdata.append(df)

COall = pd.concat(COdata).groupby('STCO').sum()
print(COall.shape)
print(COall.dtypes)
COall.head()

(3038, 2)
1-2units    int64
+3units     int64
dtype: object


Unnamed: 0_level_0,1-2units,+3units
STCO,Unnamed: 1_level_1,Unnamed: 2_level_1
1001,1504,0
1003,18392,944
1005,24,0
1007,80,0
1009,144,0


In [88]:
BPcounties = COall.merge(geo, left_index=True, right_on='STCO').set_index('County_name')

print(BPcounties.shape)
BPcounties.head()

(270, 5)


Unnamed: 0_level_0,1-2units,+3units,CSA,CSA_name,STCO
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Alameda,20968,53096,488,"San Jose-San Francisco-Oakland, CA",6001
Contra Costa,13544,2328,488,"San Jose-San Francisco-Oakland, CA",6013
Los Angeles,53576,119016,348,"Los Angeles-Long Beach, CA",6037
Marin,752,0,488,"San Jose-San Francisco-Oakland, CA",6041
Napa,840,624,488,"San Jose-San Francisco-Oakland, CA",6055


In [3]:
# 2017
counties = pd.read_table('https://www2.census.gov/econ/bps/County/co2017a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties.columns = counties.iloc[0]
counties = counties[1:].set_index(['Name'])

counties = counties.drop(['Code','Bldgs', 'Value'], axis=1)
counties.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties = counties.astype(int)
counties['1-2units'] = counties['1unit'] + counties['2unit']
counties['+3units'] = counties['3-4unit'] + counties['+5unit']
counties = counties.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties['State'] = counties['State'].apply(lambda x: '{0:0>2}'.format(x))
counties['County'] = counties['County'].apply(lambda x: '{0:0>3}'.format(x))

counties['STCO'] = counties[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties.shape)
counties.head(3)

(3038, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,188,0,1001
Baldwin County,1,3,2299,118,1003
Barbour County,1,5,3,0,1005


In [4]:
# 2016
counties16 = pd.read_table('https://www2.census.gov/econ/bps/County/co2016a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties16.columns = counties16.iloc[0]
counties16 = counties16[1:].set_index(['Name'])

counties16 = counties16.drop(['Code','Bldgs', 'Value'], axis=1)
counties16.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties16 = counties16.astype(int)
counties16['1-2units'] = counties16['1unit'] + counties16['2unit']
counties16['+3units'] = counties16['3-4unit'] + counties16['+5unit']
counties16 = counties16.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties16['State'] = counties16['State'].apply(lambda x: '{0:0>2}'.format(x))
counties16['County'] = counties16['County'].apply(lambda x: '{0:0>3}'.format(x))

counties16['STCO'] = counties16[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties16.shape)
counties16.head(3)

(3039, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,169,0,1001
Baldwin County,1,3,2171,349,1003
Barbour County,1,5,3,0,1005


In [5]:
# 2015
counties15 = pd.read_table('https://www2.census.gov/econ/bps/County/co2015a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties15.columns = counties15.iloc[0]
counties15 = counties15[1:].set_index(['Name'])

counties15 = counties15.drop(['Code','Bldgs', 'Value'], axis=1)
counties15.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties15 = counties15.astype(int)
counties15['1-2units'] = counties15['1unit'] + counties15['2unit']
counties15['+3units'] = counties15['3-4unit'] + counties15['+5unit']
counties15 = counties15.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties15['State'] = counties15['State'].apply(lambda x: '{0:0>2}'.format(x))
counties15['County'] = counties15['County'].apply(lambda x: '{0:0>3}'.format(x))

counties15['STCO'] = counties15[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties15.shape)
counties15.head(3)

(3044, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,158,0,1001
Baldwin County,1,3,1644,559,1003
Barbour County,1,5,10,0,1005


In [6]:
# 2014
counties14 = pd.read_table('https://www2.census.gov/econ/bps/County/co2014a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties14.columns = counties14.iloc[0]
counties14 = counties14[1:].set_index(['Name'])

counties14 = counties14.drop(['Code','Bldgs', 'Value'], axis=1)
counties14.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties14 = counties14.astype(int)
counties14['1-2units'] = counties14['1unit'] + counties14['2unit']
counties14['+3units'] = counties14['3-4unit'] + counties14['+5unit']
counties14 = counties14.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties14['State'] = counties14['State'].apply(lambda x: '{0:0>2}'.format(x))
counties14['County'] = counties14['County'].apply(lambda x: '{0:0>3}'.format(x))

counties14['STCO'] = counties14[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties14.shape)
counties14.head(3)

(3038, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,131,0,1001
Baldwin County,1,3,1380,4,1003
Barbour County,1,5,8,0,1005


In [7]:
# 2013
counties13 = pd.read_table('https://www2.census.gov/econ/bps/County/co2013a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties13.columns = counties13.iloc[0]
counties13 = counties13[1:].set_index(['Name'])

counties13 = counties13.drop(['Code','Bldgs', 'Value'], axis=1)
counties13.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties13 = counties13.astype(int)
counties13['1-2units'] = counties13['1unit'] + counties13['2unit']
counties13['+3units'] = counties13['3-4unit'] + counties13['+5unit']
counties13 = counties13.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties13['State'] = counties13['State'].apply(lambda x: '{0:0>2}'.format(x))
counties13['County'] = counties13['County'].apply(lambda x: '{0:0>3}'.format(x))

counties13['STCO'] = counties13[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties13.shape)
counties13.head(3)

(3027, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,120,0,1001
Baldwin County,1,3,1221,106,1003
Barbour County,1,5,5,0,1005


In [8]:
# 2012
counties12 = pd.read_table('https://www2.census.gov/econ/bps/County/co2012a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties12.columns = counties12.iloc[0]
counties12 = counties12[1:].set_index(['Name'])

counties12 = counties12.drop(['Code','Bldgs', 'Value'], axis=1)
counties12.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties12 = counties12.astype(int)
counties12['1-2units'] = counties12['1unit'] + counties12['2unit']
counties12['+3units'] = counties12['3-4unit'] + counties12['+5unit']
counties12 = counties12.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties12['State'] = counties12['State'].apply(lambda x: '{0:0>2}'.format(x))
counties12['County'] = counties12['County'].apply(lambda x: '{0:0>3}'.format(x))

counties12['STCO'] = counties12[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties12.shape)
counties12.head(3)

(3026, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,129,256,1001
Baldwin County,1,3,1184,0,1003
Barbour County,1,5,2,0,1005


In [9]:
# 2011
counties11 = pd.read_table('https://www2.census.gov/econ/bps/County/co2011a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties11.columns = counties11.iloc[0]
counties11 = counties11[1:].set_index(['Name'])

counties11 = counties11.drop(['Code','Bldgs', 'Value'], axis=1)
counties11.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties11 = counties11.astype(int)
counties11['1-2units'] = counties11['1unit'] + counties11['2unit']
counties11['+3units'] = counties11['3-4unit'] + counties11['+5unit']
counties11 = counties11.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties11['State'] = counties11['State'].apply(lambda x: '{0:0>2}'.format(x))
counties11['County'] = counties11['County'].apply(lambda x: '{0:0>3}'.format(x))

counties11['STCO'] = counties11[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties11.shape)
counties11.head(3)

(3026, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,106,0,1001
Baldwin County,1,3,738,18,1003
Barbour County,1,5,15,40,1005


In [10]:
# 2010
counties10 = pd.read_table('https://www2.census.gov/econ/bps/County/co2010a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties10.columns = counties10.iloc[0]
counties10 = counties10[1:].set_index(['Name'])

counties10 = counties10.drop(['Code','Bldgs', 'Value'], axis=1)
counties10.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties10 = counties10.astype(int)
counties10['1-2units'] = counties10['1unit'] + counties10['2unit']
counties10['+3units'] = counties10['3-4unit'] + counties10['+5unit']
counties10 = counties10.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties10['State'] = counties10['State'].apply(lambda x: '{0:0>2}'.format(x))
counties10['County'] = counties10['County'].apply(lambda x: '{0:0>3}'.format(x))

counties10['STCO'] = counties10[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties10.shape)
counties10.head()

(3026, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,135,56,1001
Baldwin County,1,3,682,14,1003
Barbour County,1,5,10,0,1005
Bibb County,1,7,8,0,1007
Blount County,1,9,18,0,1009


In [11]:
counties10_17 = pd.concat([counties10, counties11, counties12, counties13, 
                counties14, counties15, counties16, counties]).groupby('STCO').sum()

print(counties10_17.shape)
counties10_17.head()

(3061, 2)


Unnamed: 0_level_0,1-2units,+3units
STCO,Unnamed: 1_level_1,Unnamed: 2_level_1
1001,1136,312
1003,11319,1168
1005,56,40
1007,102,0
1009,74,40


## Reading in geo-coded dataset
created on a different notebook, please refer to _**ADD NOTEBOOK NAME**_

In [12]:
geo = pd.read_csv('data/USmetros_full.csv').iloc[:,:-2] \
            .drop(['Unnamed: 0', 'SHAPE_AREA'], axis=1)
geo['STCO'] = geo['STCO'].apply(lambda x: '{0:0>5}'.format(x))

geo.head()

Unnamed: 0,CSA,CSA_name,County_name,STCO
0,488,"San Jose-San Francisco-Oakland, CA",Alameda,6001
1,488,"San Jose-San Francisco-Oakland, CA",Contra Costa,6013
2,488,"San Jose-San Francisco-Oakland, CA",Marin,6041
3,488,"San Jose-San Francisco-Oakland, CA",Napa,6055
4,488,"San Jose-San Francisco-Oakland, CA",San Benito,6069


## Merging datasets

In [13]:
HUcounties = counties10_17.merge(geo, left_index=True, right_on='STCO').set_index('County_name')

print(HUcounties.shape)
HUcounties.head()

(270, 5)


Unnamed: 0_level_0,1-2units,+3units,CSA,CSA_name,STCO
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Alameda,13298,19916,488,"San Jose-San Francisco-Oakland, CA",6001
Contra Costa,11794,4309,488,"San Jose-San Francisco-Oakland, CA",6013
Los Angeles,35396,90987,348,"Los Angeles-Long Beach, CA",6037
Marin,1103,1004,488,"San Jose-San Francisco-Oakland, CA",6041
Napa,1027,278,488,"San Jose-San Francisco-Oakland, CA",6055


In [14]:
HUcounties[HUcounties['CSA']==408].shape

(31, 5)

### Exporting all counties housing permits 2017 data to .csv

In [15]:
HUcounties.to_csv('data/HU17counties.csv')

## groupby CSAs to sum housing permits by Metro

In [16]:
huCSA = HUcounties.groupby(['CSA', 'CSA_name']).sum()#.iloc[:,:-2]

print(huCSA.shape)
huCSA

(15, 2)


Unnamed: 0_level_0,Unnamed: 1_level_0,1-2units,+3units
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",136174,63364
148,"Boston-Worcester-Providence, MA-RI-NH-CT",71045,57752
176,"Chicago-Naperville, IL-IN-WI",45693,14690
206,"Dallas-Fort Worth, TX-OK",189949,150573
216,"Denver-Aurora, CO",79600,67976
220,"Detroit-Warren-Ann Arbor, MI",45326,12381
288,"Houston-The Woodlands, TX",258944,105905
348,"Los Angeles-Long Beach, CA",62747,128721
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",60513,78889
378,"Minneapolis-St. Paul, MN-WI",55957,38144


### Exporting CSAs housing permits 2017 data to .csv

In [17]:
huCSA.to_csv('data/HU17CSA.csv')

# _DANA add ACS total housing + compute change_

-----

# PLACES 
Major Cities within the Regions

### _Note#1: PLACES in the housing permits survey are separated to Midwest, Northeast, South and West Regions; data were downloaded through each of these for years 2010 - 2017 and were concatenated, then cleaned to include our targeted cities only_

### _Note#2: NYC is considered as 5 places, which are its 5 boroughs_

In [71]:
NE = ['ne2017a', 'ne2016a', 'ne2015a', 'ne2014a',
      'ne2013a', 'ne2012a', 'ne2011a', 'ne2010a']
NEdata = []

for year in NE:
    df = 'NE'+year
    df = pd.read_table('https://www2.census.gov/econ/bps/Place/Northeast%20Region/'+
                         year+'.txt', 
                         header=0, sep=r'\,|\t', engine='python').iloc[:,:28]

    df.columns = df.iloc[0]
    df = df[1:].set_index(['Name'])

    df = df.drop(['Bldgs', 'Value'], axis=1)
    df.columns = ['State', '6-Digit', 'County', 'Census Place',
                  'Place','FIPS MCD', 'Pop', 'CSA', 'CBSA',
                  'Footnote', 'Central', 'Zip','Region', 'Division', 
                  'Number of','1unit', '2unit', '3-4unit', '+5unit']
    df = df.drop(['Central', 'Pop', 'Footnote', 'Census Place',
                  '6-Digit', 'FIPS MCD', 'Number of',
                  'Zip', 'Region', 'Division'], axis=1)

    df['1unit'] = df['1unit'].astype(int)
    df['2unit'] = df['2unit'].astype(int)
    df['3-4unit'] = df['3-4unit'].astype(int)
    df['+5unit'] = df['+5unit'].astype(int)

    df['1-2units'] = df['1unit'] + df['2unit']
    df['+3units'] = df['3-4unit'] + df['+5unit']

    # creating 'GEOID' column from state and place
    df['State'] = df['State'].apply(lambda x: '{0:0>2}'.format(x))
    df['Place'] = df['Place'].apply(lambda x: '{0:0>5}'.format(x))

    df['GEOID'] = df[['State', 'Place']].apply(lambda x: ''.join(x), axis=1)

    # Dropping columns
    df = df.drop(['1unit', '2unit', '3-4unit', '+5unit',
                                  'State', 'Place'], axis=1)

    # ONLY FOR NE FILES! >> aggregating NYC 5 boroughs into 1 row: NYC
#     df = df.append(df.loc[['Queens borough', 'Bronx borough', 
#                     'Manhattan borough','Brooklyn borough',
#                     'Staten Island borough']].sum(numeric_only=True),
#                      ignore_index=True)
# #     df['GEOID'] = df['GEOID'].fillna('3651000')
# #     df['CSA'] = df['CSA'].fillna('408')

#     df['1-2units'] = df['1-2units'].astype(int)
#     df['+3units'] = df['+3units'].astype(int)
    df['GEOID'] = df['GEOID'].replace(' ', '')
    for index, row in df.iterrows():
        if row['GEOID'] == '2500025':
            row['GEOID'] = '2507000'
    NEdata.append(df)

NEall = pd.concat(NEdata).groupby(['GEOID']).sum().reset_index()
print(NEall.shape)
print(NEall.dtypes)
NEall.head()

(1949, 3)
GEOID       object
1-2units     int64
+3units      int64
dtype: object


Unnamed: 0,GEOID,1-2units,+3units
0,9000,8463,2641
1,900000,8590,5637
2,901150,20,0
3,908000,143,628
4,908420,230,118


In [53]:
# NE2017
placesNE17 = pd.read_table('https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2017a.txt', 
             header=0, sep=r'\,|\t', engine='python').iloc[:,:28]

placesNE17.columns = placesNE17.iloc[0]
placesNE17 = placesNE17[1:].set_index(['Name'])

placesNE17 = placesNE17.drop(['Bldgs', 'Value'], axis=1)
placesNE17.columns = ['State', '6-Digit', 'County', 'Census Place',
                      'Place','FIPS MCD', 'Pop', 'CSA', 'CBSA',
                      'Footnote', 'Central', 'Zip','Region', 'Division', 
                      'Number of','1unit', '2unit', '3-4unit', '+5unit']
placesNE17 = placesNE17.drop(['Central', 'Footnote', 'Census Place',
                              '6-Digit', 'FIPS MCD', 'Number of',
                              'Zip', 'Region', 'Division'], axis=1)

placesNE17['1unit'] = placesNE17['1unit'].astype(int)
placesNE17['2unit'] = placesNE17['2unit'].astype(int)
placesNE17['3-4unit'] = placesNE17['3-4unit'].astype(int)
placesNE17['+5unit'] = placesNE17['+5unit'].astype(int)
placesNE17['Pop'] = placesNE17['Pop'].astype(int)

placesNE17['1-2units'] = placesNE17['1unit'] + placesNE17['2unit']
placesNE17['+3units'] = placesNE17['3-4unit'] + placesNE17['+5unit']

# creating 'GEOID' column from state and place
placesNE17['State'] = placesNE17['State'].apply(lambda x: '{0:0>2}'.format(x))
placesNE17['Place'] = placesNE17['Place'].apply(lambda x: '{0:0>5}'.format(x))

placesNE17['GEOID'] = placesNE17[['State', 'Place']].apply(lambda x: ''.join(x), axis=1)

# Dropping columns
placesNE17 = placesNE17.drop(['1unit', '2unit', '3-4unit', '+5unit',
                              'State', 'Place'], axis=1)

# ONLY FOR NE FILES! >> aggregating NYC 5 boroughs into 1 row: NYC
# placesNE17 = placesNE17.append(placesNE17.loc[['Queens borough', 'Bronx borough', 
#                 'Manhattan borough','Brooklyn borough',
#                 'Staten Island borough']].sum(numeric_only=True),
#                  ignore_index=True)
# placesNE17.set_value(5580, 'CSA', '408')
# placesNE17.set_value(5580, 'GEOID', '3651000')

placesNE17['Pop'] = placesNE17['Pop'].astype(int)
placesNE17['1-2units'] = placesNE17['1-2units'].astype(int)
placesNE17['+3units'] = placesNE17['+3units'].astype(int)


print(placesNE17.shape)
placesNE17.head(2)

(5580, 7)


Unnamed: 0_level_0,County,Pop,CSA,CBSA,1-2units,+3units,GEOID
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Andover town,13,3303,278,25540,3,0,900000
Ansonia,9,19249,408,35300,6,0,901150


In [82]:
# NE2016
placesNE16 = pd.read_table('https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2016a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:28]

placesNE16.columns = placesNE16.iloc[0]
placesNE16 = placesNE16[1:].set_index(['Name'])

placesNE16 = placesNE16.drop(['Bldgs', 'Value'], axis=1)
placesNE16.columns = ['State', '6-Digit', 'County', 'Census Place',
                      'Place','FIPS MCD', 'Pop', 'CSA', 'CBSA',
                      'Footnote', 'Central', 'Zip','Region', 'Division', 
                      'Number of','1unit', '2unit', '3-4unit', '+5unit']
placesNE16 = placesNE16.drop(['Central', 'Footnote', 'Census Place',
                              '6-Digit', 'FIPS MCD', 'Number of',
                              'Zip', 'Region', 'Division'], axis=1)

placesNE16['1unit'] = placesNE16['1unit'].astype(int)
placesNE16['2unit'] = placesNE16['2unit'].astype(int)
placesNE16['3-4unit'] = placesNE16['3-4unit'].astype(int)
placesNE16['+5unit'] = placesNE16['+5unit'].astype(int)
placesNE16['Pop'] = placesNE16['Pop'].astype(int)

placesNE16['1-2units'] = placesNE16['1unit'] + placesNE16['2unit']
placesNE16['+3units'] = placesNE16['3-4unit'] + placesNE16['+5unit']
placesNE16 = placesNE16.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

placesNE16['State'] = placesNE16['State'].apply(lambda x: '{0:0>2}'.format(x))
placesNE16['Place'] = placesNE16['County'].apply(lambda x: '{0:0>5}'.format(x))

placesNE16['GEOID'] = placesNE16[['State', 'Place']].apply(lambda x: ''.join(x), axis=1)

# aggregating NYC 5 boroughs into 1 row: NYC
NYC = [placesNE16.loc[['Queens borough', 'Bronx borough', 
                'Manhattan borough','Brooklyn borough',
                'Staten Island borough']]]
placesNE16 = placesNE16.append(NYC).sum(numeric_only=True)
# placesNE17.index[-1:] = placesNE17.index[-1:].rename("New York",inplace=False)

# 3651000
print(placesNE16.shape)
placesNE16.tail(2)

(5578, 9)


Unnamed: 0_level_0,State,County,Place,Pop,CSA,CBSA,1-2units,+3units,GEOID
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Woodstock town,50,27,27,2148,999,99999,6,0,5000027
Woodstock village,50,27,27,900,999,99999,2,0,5000027


In [86]:
placesNE16.loc[['Queens borough', 'Bronx borough', 
                'Manhattan borough','Brooklyn borough',
                'Staten Island borough']].sum(numeric_only=True)

Pop         8175133
1-2units       1513
+3units       14767
dtype: int64

In [54]:
# NE2015
placesNE15 = pd.read_table('https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2015a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:28]

placesNE15.columns = placesNE15.iloc[0]
placesNE15 = placesNE15[1:].set_index(['Name'])

placesNE15 = placesNE15.drop(['Bldgs', 'Value'], axis=1)
placesNE15.columns = ['State', '6-Digit', 'County', 'Census Place',
                      'Place','FIPS MCD', 'Pop', 'CSA', 'CBSA',
                      'Footnote', 'Central', 'Zip','Region', 'Division', 
                      'Number of','1unit', '2unit', '3-4unit', '+5unit']
placesNE15 = placesNE15.drop(['Central', 'Footnote', 'Census Place',
                              '6-Digit', 'FIPS MCD', 'Number of',
                              'Zip', 'Region', 'Division'], axis=1)

placesNE15['1unit'] = placesNE15['1unit'].astype(int)
placesNE15['2unit'] = placesNE15['2unit'].astype(int)
placesNE15['3-4unit'] = placesNE15['3-4unit'].astype(int)
placesNE15['+5unit'] = placesNE15['+5unit'].astype(int)
placesNE15['Pop'] = placesNE15['Pop'].astype(int)

placesNE15['1-2units'] = placesNE15['1unit'] + placesNE15['2unit']
placesNE15['+3units'] = placesNE15['3-4unit'] + placesNE15['+5unit']
placesNE15 = placesNE15.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

placesNE15['State'] = placesNE15['State'].apply(lambda x: '{0:0>2}'.format(x))
placesNE15['Place'] = placesNE15['County'].apply(lambda x: '{0:0>5}'.format(x))

placesNE15['GEOID'] = placesNE15[['State', 'Place']].apply(lambda x: ''.join(x), axis=1)

# aggregating NYC 5 boroughs into 1 row: NYC
# placesNE17.append(placesNE17.loc[['Queens borough', 'Bronx borough', 
#                 'Manhattan borough','Brooklyn borough',
#                 'Staten Island borough']].sum(numeric_only=True),
#                  ignore_index=True)
# placesNE17.index[-1:] = placesNE17.index[-1:].rename("New York",inplace=False)

# 3651000
print(placesNE15.shape)
placesNE15.head(2)

(5581, 9)


Unnamed: 0_level_0,State,County,Place,Pop,CSA,CBSA,1-2units,+3units,GEOID
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Andover town,9,13,13,3303,278,25540,4,0,900013
Ansonia,9,9,9,19249,408,35300,0,0,900009


In [74]:
NEall[NEall['GEOID'] == ]

0        09000  
1       0900000 
2       0901150 
3       0908000 
4       0908420 
5       0918430 
6       0919480 
7       0927810 
8       0934180 
9       0934460 
10      0937000 
11      0946450 
12      0947290 
13      0949880 
14      0950370 
15      0952000 
16      0952280 
17      0955990 
18      0956200 
19      0968100 
20      0973000 
21      0973700 
22      0976500 
23      0980000 
24      0982800 
25       23000  
26      2300000 
27      2302060 
28      2302100 
29      2302795 
          ...   
1919    4400000 
1920    4414140 
1921    4419180 
1922    4422960 
1923    4449960 
1924    4454640 
1925    4459000 
1926    4474300 
1927    4480780 
1928     50000  
1929    5000000 
1930    5003175 
1931    5010675 
1932    5011050 
1933    5024025 
1934    5024400 
1935    5034975 
1936    5042700 
1937    5046000 
1938    5048850 
1939    5049075 
1940    5050200 
1941    5053125 
1942    5061225 
1943    5061675 
1944    5066175 
1945    5074650 
1946    507690

### Reading in my Geocoded places table
Created by Dara Goldberg

In [27]:
cities = pd.read_excel('data/CSA Population+Change_2010-2017.xlsx', 
             sheet_name='Cities_pop+geoinfo').iloc[:,:5]

# setting GEOID to 7 digits to assure match
cities['GEOID'] = cities['GEOID'].apply(lambda x: '{0:0>7}'.format(x))
# setting GEOID to str
cities.GEOID = cities.GEOID.astype(str)

print(cities.shape)
cities

(19, 5)


Unnamed: 0,GEOID,NAMELSAD,NAME,CSA,ALAND_mi
0,644000,"Los Angeles city, California",Los Angeles,348,468.65867
1,653000,"Oakland city, California",Oakland,488,55.89604
2,667000,"San Francisco city, California",San Francisco,488,46.90564
3,668000,"San Jose city, California",San Jose,488,177.5141
4,820000,"Denver city, Colorado",Denver,216,153.30483
5,1150000,"Washington city, District of Columbia",Washington,548,61.13988
6,1245000,"Miami city, Florida",Miami,370,35.98691
7,1304000,"Atlanta city, Georgia",Atlanta,122,133.43344
8,1714000,"Chicago city, Illinois",Chicago,176,227.3401
9,2507000,"Boston city, Massachusetts",Boston,148,48.34364


In [73]:
cities.merge(NEall, on='GEOID')

Unnamed: 0,GEOID,NAMELSAD,NAME,CSA,ALAND_mi,1-2units,+3units


In [45]:
NEall.columns

Index(['1-2units', '+3units'], dtype='object')