# Selected Economic Characteristics: Commuting to Work from the American Community Survey

**[Work in progress]**

This notebook downloads [selected economic characteristics (DP03)](https://data.census.gov/cedsci/table?tid=ACSDP5Y2018.DP03) from the American Community Survey 2018 5-Year Data.

Data source: [American Community Survey 5-Year Data 2018](https://www.census.gov/data/developers/data-sets/acs-5year.html)

Authors: Peter Rose (pwrose@ucsd.edu), Ilya Zaslavsky (zaslavsk@sdsc.edu)

In [1]:
import os
import pandas as pd
from pathlib import Path
import time

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [3]:
NEO4J_IMPORT = Path(os.getenv('NEO4J_IMPORT'))
print(NEO4J_IMPORT)

/Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-9f7418e6-ef5d-4a2d-ae16-29a5a6814849/installation-4.1.0/import


## Download selected variables

* [Selected economic characteristics for US](https://data.census.gov/cedsci/table?tid=ACSDP5Y2018.DP03)

* [List of variables as HTML](https://api.census.gov/data/2018/acs/acs5/profile/groups/DP03.html) or [JSON](https://api.census.gov/data/2018/acs/acs5/profile/groups/DP03/)

* [Description of variables](https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2018_ACSSubjectDefinitions.pdf)

* [Example URLs for API](https://api.census.gov/data/2018/acs/acs5/profile/examples.html)

### Specify variables from DP03 group and assign property names

Names must follow the [Neo4j property naming conventions](https://neo4j.com/docs/getting-started/current/graphdb-concepts/#graphdb-naming-rules-and-recommendations).

In [4]:
variables = {# COMMUTING TO WORK
             'DP03_0018E': 'workers16YearsAndOver',
             'DP03_0019E': 'droveAloneToWorkInCarTruckOrVan',
             'DP03_0019PE': 'droveAloneToWorkInCarTruckOrVanPct',
             'DP03_0020E': 'carpooledToWorkInCarTruckOrVan',
             'DP03_0020PE': 'carpooledToWorkInCarTruckOrVanPct',
             'DP03_0021E': 'publicTransportToWork',
             'DP03_0021PE': 'publicTransportToWorkPct',
             'DP03_0022E': 'walkedToWork',
             'DP03_0022PE': 'walkedToWorkPct',
             'DP03_0023E': 'otherMeansOfCommutingToWork',
             'DP03_0023PE': 'otherMeansOfCommutingToWorkPct',
             'DP03_0024E': 'workedAtHome',
             'DP03_0024PE': 'workedAtHomePct',
             'DP03_0025E': 'meanTravelTimeToWorkMinutes',
            }

In [5]:
fields = ",".join(variables.keys())

In [6]:
for v in variables.values():
    if 'Pct' in v:
        print('c.' + v + ' = toFloat(row.' + v + '),')
    else:
        print('c.' + v + ' = toInteger(row.' + v + '),')

c.workers16YearsAndOver = toInteger(row.workers16YearsAndOver),
c.droveAloneToWorkInCarTruckOrVan = toInteger(row.droveAloneToWorkInCarTruckOrVan),
c.droveAloneToWorkInCarTruckOrVanPct = toFloat(row.droveAloneToWorkInCarTruckOrVanPct),
c.carpooledToWorkInCarTruckOrVan = toInteger(row.carpooledToWorkInCarTruckOrVan),
c.carpooledToWorkInCarTruckOrVanPct = toFloat(row.carpooledToWorkInCarTruckOrVanPct),
c.publicTransportToWork = toInteger(row.publicTransportToWork),
c.publicTransportToWorkPct = toFloat(row.publicTransportToWorkPct),
c.walkedToWork = toInteger(row.walkedToWork),
c.walkedToWorkPct = toFloat(row.walkedToWorkPct),
c.otherMeansOfCommutingToWork = toInteger(row.otherMeansOfCommutingToWork),
c.otherMeansOfCommutingToWorkPct = toFloat(row.otherMeansOfCommutingToWorkPct),
c.workedAtHome = toInteger(row.workedAtHome),
c.workedAtHomePct = toFloat(row.workedAtHomePct),
c.meanTravelTimeToWorkMinutes = toInteger(row.meanTravelTimeToWorkMinutes),


In [7]:
print(len(variables.keys()))

14


## Download county-level data using US Census API

In [8]:
url_county = f'https://api.census.gov/data/2018/acs/acs5/profile?get={fields}&for=county:*'

In [9]:
df = pd.read_json(url_county, dtype='str')
df.fillna('', inplace=True)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,DP03_0018E,DP03_0019E,DP03_0019PE,DP03_0020E,DP03_0020PE,DP03_0021E,DP03_0021PE,DP03_0022E,DP03_0022PE,DP03_0023E,DP03_0023PE,DP03_0024E,DP03_0024PE,DP03_0025E,state,county
1,16833,14542,86.4,1726,10.3,6,0.0,219,1.3,147,0.9,193,1.1,16.7,28,151
2,4491,3853,85.8,204,4.5,0,0.0,79,1.8,265,5.9,90,2.0,32.3,28,111
3,2990,2558,85.6,255,8.5,9,0.3,32,1.1,73,2.4,63,2.1,30.9,28,019
4,9340,7695,82.4,1116,11.9,17,0.2,69,0.7,135,1.4,308,3.3,25.3,28,057


##### Add column names

In [10]:
df = df[1:].copy() # skip first row of labels
columns = list(variables.values())
columns.append('stateFips')
columns.append('countyFips')
df.columns = columns

Remove Puerto Rico (stateFips = 72) to limit data to US States

TODO handle data for Puerto Rico (GeoNames represents Puerto Rico as a country)

In [11]:
df.query("stateFips != '72'", inplace=True)

Save list of state fips (required later to get tract data by state)

In [12]:
stateFips = list(df['stateFips'].unique())
stateFips.sort()
print(stateFips)

['01', '02', '04', '05', '06', '08', '09', '10', '11', '12', '13', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54', '55', '56']


In [13]:
df.head()

Unnamed: 0,workers16YearsAndOver,droveAloneToWorkInCarTruckOrVan,droveAloneToWorkInCarTruckOrVanPct,carpooledToWorkInCarTruckOrVan,carpooledToWorkInCarTruckOrVanPct,publicTransportToWork,publicTransportToWorkPct,walkedToWork,walkedToWorkPct,otherMeansOfCommutingToWork,otherMeansOfCommutingToWorkPct,workedAtHome,workedAtHomePct,meanTravelTimeToWorkMinutes,stateFips,countyFips
1,16833,14542,86.4,1726,10.3,6,0.0,219,1.3,147,0.9,193,1.1,16.7,28,151
2,4491,3853,85.8,204,4.5,0,0.0,79,1.8,265,5.9,90,2.0,32.3,28,111
3,2990,2558,85.6,255,8.5,9,0.3,32,1.1,73,2.4,63,2.1,30.9,28,19
4,9340,7695,82.4,1116,11.9,17,0.2,69,0.7,135,1.4,308,3.3,25.3,28,57
5,3407,3066,90.0,197,5.8,0,0.0,49,1.4,0,0.0,95,2.8,28.6,28,15


In [14]:
# Example data
df[(df['stateFips'] == '06') & (df['countyFips'] == '073')]

Unnamed: 0,workers16YearsAndOver,droveAloneToWorkInCarTruckOrVan,droveAloneToWorkInCarTruckOrVanPct,carpooledToWorkInCarTruckOrVan,carpooledToWorkInCarTruckOrVanPct,publicTransportToWork,publicTransportToWorkPct,walkedToWork,walkedToWorkPct,otherMeansOfCommutingToWork,otherMeansOfCommutingToWorkPct,workedAtHome,workedAtHomePct,meanTravelTimeToWorkMinutes,stateFips,countyFips
1869,1603486,1223159,76.3,138748,8.7,46506,2.9,46313,2.9,36799,2.3,111961,7.0,26.0,6,73


In [15]:
df['source'] = 'American Community Survey 5 year'
df['aggregationLevel'] = 'Admin2'

### Save data

In [16]:
df.to_csv(NEO4J_IMPORT / "03a-USCensusDP03CommutingAdmin2.csv", index=False)

## Download zip-level data using US Census API

In [17]:
url_zip = f'https://api.census.gov/data/2018/acs/acs5/profile?get={fields}&for=zip%20code%20tabulation%20area:*'

In [18]:
df = pd.read_json(url_zip, dtype='str')
df.fillna('', inplace=True)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,DP03_0018E,DP03_0019E,DP03_0019PE,DP03_0020E,DP03_0020PE,DP03_0021E,DP03_0021PE,DP03_0022E,DP03_0022PE,DP03_0023E,DP03_0023PE,DP03_0024E,DP03_0024PE,DP03_0025E,zip code tabulation area
1,3519,3173,90.2,164,4.7,18,0.5,57,1.6,32,0.9,75,2.1,25.9,43964
2,25440,20406,80.2,2227,8.8,742,2.9,253,1.0,313,1.2,1499,5.9,25.8,28216
3,36363,28558,78.5,1827,5.0,1240,3.4,268,0.7,398,1.1,4072,11.2,28.8,28277
4,14840,11700,78.8,1199,8.1,155,1.0,229,1.5,275,1.9,1282,8.6,28.0,28278


##### Add column names

In [19]:
df = df[1:].copy() # skip first row
columns = list(variables.values())
columns.append('postalCode')
df.columns = columns

In [20]:
df.head()

Unnamed: 0,workers16YearsAndOver,droveAloneToWorkInCarTruckOrVan,droveAloneToWorkInCarTruckOrVanPct,carpooledToWorkInCarTruckOrVan,carpooledToWorkInCarTruckOrVanPct,publicTransportToWork,publicTransportToWorkPct,walkedToWork,walkedToWorkPct,otherMeansOfCommutingToWork,otherMeansOfCommutingToWorkPct,workedAtHome,workedAtHomePct,meanTravelTimeToWorkMinutes,postalCode
1,3519,3173,90.2,164,4.7,18,0.5,57,1.6,32,0.9,75,2.1,25.9,43964
2,25440,20406,80.2,2227,8.8,742,2.9,253,1.0,313,1.2,1499,5.9,25.8,28216
3,36363,28558,78.5,1827,5.0,1240,3.4,268,0.7,398,1.1,4072,11.2,28.8,28277
4,14840,11700,78.8,1199,8.1,155,1.0,229,1.5,275,1.9,1282,8.6,28.0,28278
5,13303,11272,84.7,901,6.8,131,1.0,309,2.3,178,1.3,512,3.8,17.2,28303


In [21]:
# Example data
df.query("postalCode == '90210'")

Unnamed: 0,workers16YearsAndOver,droveAloneToWorkInCarTruckOrVan,droveAloneToWorkInCarTruckOrVanPct,carpooledToWorkInCarTruckOrVan,carpooledToWorkInCarTruckOrVanPct,publicTransportToWork,publicTransportToWorkPct,walkedToWork,walkedToWorkPct,otherMeansOfCommutingToWork,otherMeansOfCommutingToWorkPct,workedAtHome,workedAtHomePct,meanTravelTimeToWorkMinutes,postalCode
30897,8489,6386,75.2,491,5.8,91,1.1,123,1.4,139,1.6,1259,14.8,25.9,90210


In [22]:
df['source'] = 'American Community Survey 5 year'
df['aggregationLevel'] = 'PostalCode'

### Save data

In [23]:
df.to_csv(NEO4J_IMPORT / "03a-USCensusDP03CommutingZip.csv", index=False)

## Download tract-level data using US Census API
Tract-level data are only available by state, so we need to loop over all states.

In [24]:
def get_tract_data(state):
    url_tract = f'https://api.census.gov/data/2018/acs/acs5/profile?get={fields}&for=tract:*&in=state:{state}'
    df = pd.read_json(url_tract, dtype='str')
    time.sleep(1)
    # skip first row of labels
    df = df[1:].copy()
    # Add column names
    columns = list(variables.values())
    columns.append('stateFips')
    columns.append('countyFips')
    columns.append('tract')
    df.columns = columns
    return df

In [25]:
df = pd.concat((get_tract_data(state) for state in stateFips))
df.fillna('', inplace=True)

In [26]:
df['tract'] = df['stateFips'] + df['countyFips'] + df['tract']

In [27]:
df['source'] = 'American Community Survey 5 year'
df['aggregationLevel'] = 'Tract'

In [28]:
# Example data for San Diego County
df[(df['stateFips'] == '06') & (df['countyFips'] == '073')].head()

Unnamed: 0,workers16YearsAndOver,droveAloneToWorkInCarTruckOrVan,droveAloneToWorkInCarTruckOrVanPct,carpooledToWorkInCarTruckOrVan,carpooledToWorkInCarTruckOrVanPct,publicTransportToWork,publicTransportToWorkPct,walkedToWork,walkedToWorkPct,otherMeansOfCommutingToWork,otherMeansOfCommutingToWorkPct,workedAtHome,workedAtHomePct,meanTravelTimeToWorkMinutes,stateFips,countyFips,tract,source,aggregationLevel
56,3735,2774,74.3,218,5.8,0,0.0,45,1.2,36,1.0,662,17.7,22.6,6,73,6073008324,American Community Survey 5 year,Tract
57,1252,876,70.0,110,8.8,98,7.8,56,4.5,72,5.8,40,3.2,18.6,6,73,6073008339,American Community Survey 5 year,Tract
58,3644,2971,81.5,363,10.0,42,1.2,9,0.2,141,3.9,118,3.2,26.5,6,73,6073008347,American Community Survey 5 year,Tract
59,5529,4445,80.4,650,11.8,47,0.9,43,0.8,135,2.4,209,3.8,23.6,6,73,6073008354,American Community Survey 5 year,Tract
60,3234,2596,80.3,163,5.0,121,3.7,52,1.6,26,0.8,276,8.5,20.7,6,73,6073008505,American Community Survey 5 year,Tract


### Save data

In [29]:
df.to_csv(NEO4J_IMPORT / "03a-USCensusDP03CommutingTract.csv", index=False)

In [30]:
df.shape

(73056, 19)