07-11-2018

**_Author: Dana Chermesh, Regional Planning intern_**


# US Metros comparison 
### comparison by the county level of 15 regions (CSA's) accross the country

## _Notebook no.4 -- HOUSING_ 
### retrieved from Census Bureau Building Permits Survey County and Place level for 2017 annually

----


In [18]:
import pandas as pd

# Data

The data were retrieved from the Census Bureau [Building Permits Survey](https://www.census.gov/construction/bps/), [Permits by County or Place](http://www2.census.gov/econ/bps).

For downloading the data, please go to the [County/](https://www2.census.gov/econ/bps/County/) page or the [Places/](https://www2.census.gov/econ/bps/Place/) page, and choose the [co2017a.txt](https://www2.census.gov/econ/bps/County/co2017a.txt) and the [ne2017a.txt](https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2017a.txt) datasets, respectively.

Data can be read directly to this notebook using pandas `read_table`, as bellow.

In [19]:
counties = pd.read_table('https://www2.census.gov/econ/bps/County/co2017a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:18]

counties.columns = counties.iloc[0]
counties = counties[1:].set_index(['Name'])

counties = counties.drop(['Code','Bldgs', 'Value'], axis=1)
counties.columns = ['State', 'County', '1unit', '2unit', '3-4unit', '+5unit']

counties = counties.astype(int)
counties['1-2units'] = counties['1unit'] + counties['2unit']
counties['+3units'] = counties['3-4unit'] + counties['+5unit']
counties = counties.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

counties['State'] = counties['State'].apply(lambda x: '{0:0>2}'.format(x))
counties['County'] = counties['County'].apply(lambda x: '{0:0>3}'.format(x))

counties['STCO'] = counties[['State', 'County']].apply(lambda x: ''.join(x), axis=1)


print(counties.shape)
counties.head()

(3038, 5)


Unnamed: 0_level_0,State,County,1-2units,+3units,STCO
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Autauga County,1,1,188,0,1001
Baldwin County,1,3,2299,118,1003
Barbour County,1,5,3,0,1005
Bibb County,1,7,10,0,1007
Blount County,1,9,18,0,1009


## Reading in geo-coded dataset
created on a different notebook, please refer to _**ADD NOTEBOOK NAME**_

In [20]:
data = pd.read_csv('data/USmetros_full.csv').iloc[:,:-2] \
        .drop(['Unnamed: 0', 'SHAPE_AREA'], axis=1)
data['STCO'] = data['STCO'].apply(lambda x: '{0:0>5}'.format(x))

data.head()

Unnamed: 0,CSA,CSA_name,County_name,STCO
0,488,"San Jose-San Francisco-Oakland, CA",Alameda,6001
1,488,"San Jose-San Francisco-Oakland, CA",Contra Costa,6013
2,488,"San Jose-San Francisco-Oakland, CA",Marin,6041
3,488,"San Jose-San Francisco-Oakland, CA",Napa,6055
4,488,"San Jose-San Francisco-Oakland, CA",San Benito,6069


## Merging datasets

In [21]:
HUcounties = counties.merge(data, on='STCO').set_index('County_name')
HUcounties = HUcounties.drop(['State', 'County'], axis=1)

print(HUcounties.shape)
HUcounties.head()

(270, 5)


Unnamed: 0_level_0,1-2units,+3units,STCO,CSA,CSA_name
County_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Alameda,2621,6637,6001,488,"San Jose-San Francisco-Oakland, CA"
Contra Costa,1693,291,6013,488,"San Jose-San Francisco-Oakland, CA"
Los Angeles,6697,14877,6037,348,"Los Angeles-Long Beach, CA"
Marin,94,0,6041,488,"San Jose-San Francisco-Oakland, CA"
Napa,105,78,6055,488,"San Jose-San Francisco-Oakland, CA"


In [22]:
HUcounties[HUcounties['CSA']==408].shape

(31, 5)

### Exporting all counties housing permits 2017 data to .csv

In [14]:
HUcounties.to_csv('data/HU17counties.csv')

## groupby CSAs to sum housing permits by Metro

In [23]:
huCSA = HUcounties.groupby(['CSA', 'CSA_name']).sum()#.iloc[:,:-2]

print(huCSA.shape)
huCSA

(15, 2)


Unnamed: 0_level_0,Unnamed: 1_level_0,1-2units,+3units
CSA,CSA_name,Unnamed: 2_level_1,Unnamed: 3_level_1
122,"Atlanta--Athens-Clarke County--Sandy Springs, GA",28028,9337
148,"Boston-Worcester-Providence, MA-RI-NH-CT",10331,10579
176,"Chicago-Naperville, IL-IN-WI",7225,3805
206,"Dallas-Fort Worth, TX-OK",36093,27829
216,"Denver-Aurora, CO",14815,13233
220,"Detroit-Warren-Ann Arbor, MI",8306,3429
288,"Houston-The Woodlands, TX",37031,6256
348,"Los Angeles-Long Beach, CA",11861,19223
370,"Miami-Fort Lauderdale-Port St. Lucie, FL",10163,13011
378,"Minneapolis-St. Paul, MN-WI",9753,6549


### Exporting CSAs housing permits 2017 data to .csv

In [16]:
huCSA.to_csv('data/HU17CSA.csv')

-----

# PLACES 
Major Cities within the Regions

### _Note: PLACES in the housing permits survey are separated to Midwest, Northeast, South and West Regions; data were downloaded through each of these and were concatenated_

In [43]:
forColumns

Index(['Survey', 'State', '6-Digit', 'County', 'Census Place', 'FIPS Place',
       'FIPS MCD', 'Pop', 'CSA', 'CBSA', 'Footnote', 'Central', 'Zip',
       'Region', 'Division', 'Number of', 'Place'],
      dtype='object')

In [92]:
places = pd.read_table('https://www2.census.gov/econ/bps/Place/Northeast%20Region/ne2017a.txt', 
            header=0, sep=r'\,|\t', engine='python').iloc[:,:28]

places.columns = places.iloc[0]
places = places[1:].set_index(['Name'])

places = places.drop(['Bldgs', 'Value'], axis=1)
places.columns = ['State', '6-Digit', 'County', 'Census Place',
                  'Place','FIPS MCD', 'Pop', 'CSA', 'CBSA',
                  'Footnote', 'Central', 'Zip','Region', 'Division', 
                  'Number of','1unit', '2unit', '3-4unit', '+5unit']
places = places.drop(['Central', 'Footnote', 'Census Place',
                      '6-Digit', 'FIPS MCD', 'Number of',
                      'Zip', 'Region', 'Division'], axis=1)

places['1unit'] = places['1unit'].astype(int)
places['2unit'] = places['2unit'].astype(int)
places['3-4unit'] = places['3-4unit'].astype(int)
places['+5unit'] = places['+5unit'].astype(int)

places['1-2units'] = places['1unit'] + places['2unit']
places['+3units'] = places['3-4unit'] + places['+5unit']
places = places.drop(['1unit', '2unit', '3-4unit', '+5unit'], axis=1)

places['State'] = places['State'].apply(lambda x: '{0:0>2}'.format(x))
places['Place'] = places['County'].apply(lambda x: '{0:0>5}'.format(x))

places['GEOID'] = places[['State', 'Place']].apply(lambda x: ''.join(x), axis=1)


print(places.shape)
places.head()

(5580, 9)


Unnamed: 0_level_0,State,County,Place,Pop,CSA,CBSA,1-2units,+3units,GEOID
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Andover town,9,13,13,3303,278,25540,3,0,900013
Ansonia,9,9,9,19249,408,35300,6,0,900009
Ashford town,9,15,15,4317,148,49340,8,0,900015
Avon town,9,3,3,18098,278,25540,20,0,900003
Barkhamsted town,9,5,5,3799,999,99999,0,0,900005


In [95]:
places[places['State'] == '36']

Unnamed: 0_level_0,State,County,Place,Pop,CSA,CBSA,1-2units,+3units,GEOID
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Addison village,36,101,00101,1763,999,99999,0,0,3600101
Airmont village,36,087,00087,8628,408,35620,7,0,3600087
Akron village,36,029,00029,2868,160,15380,5,0,3600029
Alabama town,36,037,00037,1869,999,99999,1,0,3600037
Albany,36,001,00001,97856,104,10580,9,109,3600001
Albion town,36,073,00073,3808,464,40380,2,0,3600073
Albion town,36,075,00075,1666,532,45060,20,0,3600075
Albion village,36,073,00073,6056,464,40380,0,0,3600073
Alden town,36,029,00029,8260,160,15380,5,0,3600029
Alden village,36,029,00029,2605,160,15380,3,0,3600029


### Reading in my Geocoded places table
Created by Dara Goldberg

In [35]:
cities = pd.read_excel('data/CSA Population+Change_2010-2017.xlsx', 
             sheet_name='Cities_pop+geoinfo').iloc[:,:5]

# setting GEOID to 7 digits to assure match
cities['GEOID'] = cities['GEOID'].apply(lambda x: '{0:0>7}'.format(x))
# setting GEOID to str
cities.GEOID = cities.GEOID.astype(str)

print(cities.shape)
cities

(19, 5)


Unnamed: 0,GEOID,NAMELSAD,NAME,CSA,ALAND_mi
0,644000,"Los Angeles city, California",Los Angeles,348,468.65867
1,653000,"Oakland city, California",Oakland,488,55.89604
2,667000,"San Francisco city, California",San Francisco,488,46.90564
3,668000,"San Jose city, California",San Jose,488,177.5141
4,820000,"Denver city, Colorado",Denver,216,153.30483
5,1150000,"Washington city, District of Columbia",Washington,548,61.13988
6,1245000,"Miami city, Florida",Miami,370,35.98691
7,1304000,"Atlanta city, Georgia",Atlanta,122,133.43344
8,1714000,"Chicago city, Illinois",Chicago,176,227.3401
9,2507000,"Boston city, Massachusetts",Boston,148,48.34364
