### Transit Scores from Census Blocks
This notebook contains code to calculate transit and walk scores for each Census Tract in San Diego County. The general approach is as follows:
* get centroid latitude/longitude points for each Census Block
* get population count for each Census Block
* get walkscore and transit score for each centroid location using the [WalkScore.com](https://www.walkscore.com/) API
* create a walkscore and transit score for each Census Tract by weighting the walkscore/transit scores of all Blocks by Block population and then averaging
* save the weighted scores per Census Tract for use in the county map

Population counts per Census Block are only available from the 2010 Census. The more recent American Community Survey estimates don't go down to that geographic resolution. ACS estimates can be obtained for Census Block Groups instead (see the TransitScores_From_Census_BlockGroups notebook for code that uses more recent Census data).

In [1]:
import shapefile
import json
import pandas as pd
import numpy as np
import requests

#### Create dataframe with Census Block centroid information
The shapefile can be found at https://www.census.gov/cgi-bin/geo/shapefiles/index.php
* Select year 2010
* Select Blocks
* Under Blocks (2010), select State (California)
* Select County (San Diego County)

Centroid latitude and longitude are stored as 'INTPTLAT10' and 'INTPTLON10'

In [2]:
shp_path = "data_blocks/tl_2010_06073_tabblock10.shp"
sf = shapefile.Reader(shp_path)

fields = [x[0] for x in sf.fields][1:]

records = sf.records()
num_records = len(records)

tmp_list = []
for i in range(num_records):
    record = sf.record(i)
    tmp_list.append(record[0:15])

df_centroids = pd.DataFrame(tmp_list, columns=fields)
df_centroids = df_centroids.drop(['MTFCC10','UR10','UACE10','UATYP10','FUNCSTAT10',
                                  'ALAND10','AWATER10','NAME10'], axis=1)
df_centroids.head()

Unnamed: 0,STATEFP10,COUNTYFP10,TRACTCE10,BLOCKCE10,GEOID10,INTPTLAT10,INTPTLON10
0,6,73,16902,1159,60730169021159,32.9028933,-116.9070116
1,6,73,16902,1012,60730169021012,32.9608914,-116.8167736
2,6,73,13310,2013,60730133102013,32.6363816,-116.9703056
3,6,73,16701,2009,60730167012009,32.8435707,-116.9558789
4,6,73,16902,1186,60730169021186,32.9292921,-116.8574745


#### Create dataframe with block population information
This data can be found at https://www.census.gov/geo/maps-data/data/tiger-data.html
* Under Select 2010 Census, select Population & Housing Unit Counts
* Select California

The downloaded zipfile is 428M (huge because it contains population information for every Census Block in the entire state of California). Population per block is stored in 'POP10'

In [3]:
SANDIEGO_FIPS = "073"

shp_path = "data_blocks/tabblock2010_06_pophu.shp"
sf = shapefile.Reader(shp_path)
fields = [x[0] for x in sf.fields][1:]

records = sf.records()
num_records = len(records)

tmp_list = []
for i in range(num_records):
    record = sf.record(i)
    if record[1] == SANDIEGO_FIPS:
        tmp_list.append(record[0:8])

df_pop = pd.DataFrame(tmp_list, columns=fields)
df_pop = df_pop.drop(['STATEFP10','COUNTYFP10','TRACTCE10','BLOCKCE',
                      'PARTFLG','HOUSING10'], axis=1)
df_pop.head()

Unnamed: 0,BLOCKID10,POP10
0,60730163012002,174
1,60730162022016,0
2,60730160002007,0
3,60730160002002,123
4,60730160001011,76


#### Join dataframes and save intermediate file to CSV

In [4]:
df_blocks = pd.merge(df_centroids, df_pop, 
                                     left_on='GEOID10', right_on='BLOCKID10')
df_blocks = df_blocks.drop('BLOCKID10', axis=1)
df_blocks.head()
df_blocks.to_csv('data/BlocksWithCentroidsPop.csv', header=True)

#### Get walkscore and transit score for each Census block
This section requires the [WalkScore.com API](https://www.walkscore.com/professional/api.php). The free API is limited to 5,000 calls per day, but there are 43,415 blocks in San Diego County, so keep that in mind if you want to collect this data yourself (we obtained special permission for a one-time 50,000 call use). This code block (the full 43K calls) took about 7 hrs to run; I'd estimate a 5,000 call version would take around 2 hrs. A csv file with the walk/transit scores ('BlocksWithTransit.csv') is provided on this repository, and is preloaded here.

Calls to the Walkscore.com API need to be formatted as follows (all one line):

`url = 'http://api.walkscore.com/score?format=json&lat=47.6085&lon=-122.3295&transit=1&wsapikey=YOUR_API_KEY'`

where you replace the example lat and long values with your own.

In [5]:
#begin_str = 'http://api.walkscore.com/score?format=json'
#end_str = '&transit=1&wsapikey=YOUR_API_KEY'
#
#blocks_walk = []
#blocks_transit = []
#
#for index, row in df_blocks.iterrows():
#    mid_str = '&lat=' + str(row.INTPTLAT10) + '&lon=' + str(row.INTPTLON10)
#    url = begin_str + mid_str + end_str
#    r = requests.get(url)
#    json_data = r.json()
#    if ('walkscore' in json_data) and ('transit' in json_data):
#        blocks_walk.extend([json_data['walkscore']])
#        blocks_transit.extend([json_data['transit']['score']])
#    elif ('walkscore' in json_data) and ('transit' not in json_data):
#        if (json_data['walkscore']==0):
#            blocks_walk.extend([json_data['walkscore']])
#            blocks_transit.extend([0])
#        else:
#            blocks_walk.extend([json_data['walkscore']])
#            blocks_transit.extend([np.nan])            
#    elif ('walkscore' not in json_data) and ('transit' in json_data):
#        blocks_walk.extend([np.nan])
#        blocks_transit.extend([json_data['transit']['score']])
#    else:
#        blocks_walk.extend([np.nan])
#        blocks_transit.extend([np.nan])
#
#df_blocks['BLOCK_WALK'] = pd.Series(blocks_walk)
#df_blocks['BLOCK_TRANSIT'] = pd.Series(blocks_transit)
#df_blocks.to_csv('data_blocks/BlocksWithTransit.csv', header=True)

In [7]:
df_blocks = pd.read_csv("data/BlocksWithTransit.csv", index_col=0)
df_blocks.head()

Unnamed: 0,STATEFP10,COUNTYFP10,TRACTCE10,BLOCKCE10,GEOID10,INTPTLAT10,INTPTLON10,POP10,BLOCK_WALK,BLOCK_TRANSIT
0,6,73,16902,1159,60730169021159,32.902893,-116.907012,0,0.0,0.0
1,6,73,16902,1012,60730169021012,32.960891,-116.816774,0,0.0,0.0
2,6,73,13310,2013,60730133102013,32.636382,-116.970306,156,30.0,32.0
3,6,73,16701,2009,60730167012009,32.843571,-116.955879,83,50.0,
4,6,73,16902,1186,60730169021186,32.929292,-116.857474,42,0.0,0.0


#### Calculate weighted walk/transit scores across each Census Tract
Only blocks with a walkscore (or transit score) are included in the total population count for that tract. This means that blocks with no walkscore (or transit score) have no impact on the overall tract score, but that also may skew the weighting among the remaining blocks.

The file created here ('CensusTract_Transit_Blocks.csv') is used in the RASP_Tract notebook.

In [8]:
# Convert Tract code to strings with uniform length
df_blocks['TRACTCE10'] = df_blocks['TRACTCE10'].apply(str)

mask = (df_blocks['TRACTCE10'].str.len() == 4)
df_blocks.loc[mask, 'TRACTCE10'] = '00' + df_blocks.loc[mask, 'TRACTCE10']

mask = (df_blocks['TRACTCE10'].str.len() == 5)
df_blocks.loc[mask, 'TRACTCE10'] = '0' + df_blocks.loc[mask, 'TRACTCE10']

tracts = df_blocks.TRACTCE10.unique()
prefix = '06073'

# Calculate weighted scores across each Census tract
tract_list = []
for tract in tracts:
    df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_WALK'].isna()) , 'POP10']
    tot_pop = df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_WALK'].isna())].POP10.sum()
    tot_walk = (df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_WALK'].isna())].POP10
                * df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_WALK'].isna())].BLOCK_WALK).sum()
    weight_walk = tot_walk/tot_pop

    df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_TRANSIT'].isna()) , 'POP10']
    tot_pop = df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_TRANSIT'].isna())].POP10.sum()
    tot_transit = (df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_TRANSIT'].isna())].POP10
                * df_blocks.loc[(df_blocks['TRACTCE10']==tract)&(~df_blocks['BLOCK_TRANSIT'].isna())].BLOCK_TRANSIT).sum()
    weight_transit = tot_transit/tot_pop

    geoid2 = prefix + tract
    tract_list.append([geoid2, weight_walk, weight_transit])
    
# Create new dataframe of Tract values
blocks_transit = pd.DataFrame(tract_list, columns=['GEOID','TRACT_WALK','TRACT_TRANSIT'])
blocks_transit.to_csv('data/CensusTract_Transit_Blocks.csv', header=True)
blocks_transit.head()



Unnamed: 0,GEOID,TRACT_WALK,TRACT_TRANSIT
0,6073016902,2.259718,0.0
1,6073013310,20.139261,37.504178
2,6073016701,24.755481,0.0
3,6073016804,43.78017,
4,6073016810,14.006805,
