# Selected Economic Characteristics: Health Insurance Coverage from the American Community Survey

**[Work in progress]**

This notebook downloads [selected economic characteristics (DP03)](https://data.census.gov/cedsci/table?tid=ACSDP5Y2018.DP03) from the American Community Survey 5-Year Data (2009-2018).

Data source: [American Community Survey 5-Year Data (2009-2018)](https://www.census.gov/data/developers/data-sets/acs-5year.html)

Authors: Peter Rose (pwrose@ucsd.edu), Ilya Zaslavsky (zaslavsk@sdsc.edu)

In [1]:
import os
import pandas as pd
from pathlib import Path
import time

In [2]:
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [3]:
NEO4J_IMPORT = Path(os.getenv('NEO4J_IMPORT'))
print(NEO4J_IMPORT)

/Users/peter/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-19636412-9e74-4bac-8a4c-c6c8b49bb9d3/installation-4.1.0/import


## Download selected variables

* [Selected economic characteristics for US](https://data.census.gov/cedsci/table?tid=ACSDP5Y2018.DP03)

* [List of variables as HTML](https://api.census.gov/data/2018/acs/acs5/profile/groups/DP03.html) or [JSON](https://api.census.gov/data/2018/acs/acs5/profile/groups/DP03/)

* [Description of variables](https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2018_ACSSubjectDefinitions.pdf)

* [Example URLs for API](https://api.census.gov/data/2018/acs/acs5/profile/examples.html)

### Specify variables from DP03 group and assign property names

Names must follow the [Neo4j property naming conventions](https://neo4j.com/docs/getting-started/current/graphdb-concepts/#graphdb-naming-rules-and-recommendations).

In [4]:
variables = {# HEALTH INSURANCE COVERAGE
             'DP03_0095E': 'civilianNoninstitutionalizedPopulation',
             'DP03_0096E': 'withHealthInsuranceCoverage',
             'DP03_0096PE': 'withHealthInsuranceCoveragePct',
             'DP03_0097E': 'withPrivateHealthInsurance',
             'DP03_0097PE': 'withPrivateHealthInsurancePct',
             'DP03_0098E': 'withPublicCoverage',
             'DP03_0098PE': 'withPublicCoveragePct',
             'DP03_0099E': 'noHealthInsuranceCoverage',
             'DP03_0099PE': 'noHealthInsuranceCoveragePct',
            }

In [5]:
fields = ",".join(variables.keys())

In [6]:
for v in variables.values():
    if 'Pct' in v:
        print('h.' + v + ' = toFloat(row.' + v + '),')
    else:
        print('h.' + v + ' = toInteger(row.' + v + '),')

h.civilianNoninstitutionalizedPopulation = toInteger(row.civilianNoninstitutionalizedPopulation),
h.withHealthInsuranceCoverage = toInteger(row.withHealthInsuranceCoverage),
h.withHealthInsuranceCoveragePct = toFloat(row.withHealthInsuranceCoveragePct),
h.withPrivateHealthInsurance = toInteger(row.withPrivateHealthInsurance),
h.withPrivateHealthInsurancePct = toFloat(row.withPrivateHealthInsurancePct),
h.withPublicCoverage = toInteger(row.withPublicCoverage),
h.withPublicCoveragePct = toFloat(row.withPublicCoveragePct),
h.noHealthInsuranceCoverage = toInteger(row.noHealthInsuranceCoverage),
h.noHealthInsuranceCoveragePct = toFloat(row.noHealthInsuranceCoveragePct),


In [7]:
print(len(variables.keys()))

9


## Download county-level data using US Census API

In [8]:
url_county = f'https://api.census.gov/data/2018/acs/acs5/profile?get={fields}&for=county:*'

In [9]:
df = pd.read_json(url_county, dtype='str')
df.fillna('', inplace=True)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,DP03_0095E,DP03_0096E,DP03_0096PE,DP03_0097E,DP03_0097PE,DP03_0098E,DP03_0098PE,DP03_0099E,DP03_0099PE,state,county
1,46637,38131,81.8,19632,42.1,22656,48.6,8506,18.2,28,151
2,11920,10119,84.9,6012,50.4,5472,45.9,1801,15.1,28,111
3,8195,7501,91.5,4048,49.4,4201,51.3,694,8.5,28,019
4,23252,20610,88.6,13867,59.6,9319,40.1,2642,11.4,28,057


##### Add column names

In [10]:
df = df[1:].copy() # skip first row of labels
columns = list(variables.values())
columns.append('stateFips')
columns.append('countyFips')
df.columns = columns

Remove Puerto Rico (stateFips = 72) to limit data to US States

TODO handle data for Puerto Rico (GeoNames represents Puerto Rico as a country)

In [11]:
df.query("stateFips != '72'", inplace=True)

Save list of state fips (required later to get tract data by state)

In [12]:
stateFips = list(df['stateFips'].unique())
stateFips.sort()
print(stateFips)

['01', '02', '04', '05', '06', '08', '09', '10', '11', '12', '13', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', '53', '54', '55', '56']


In [13]:
df.head()

Unnamed: 0,civilianNoninstitutionalizedPopulation,withHealthInsuranceCoverage,withHealthInsuranceCoveragePct,withPrivateHealthInsurance,withPrivateHealthInsurancePct,withPublicCoverage,withPublicCoveragePct,noHealthInsuranceCoverage,noHealthInsuranceCoveragePct,stateFips,countyFips
1,46637,38131,81.8,19632,42.1,22656,48.6,8506,18.2,28,151
2,11920,10119,84.9,6012,50.4,5472,45.9,1801,15.1,28,111
3,8195,7501,91.5,4048,49.4,4201,51.3,694,8.5,28,19
4,23252,20610,88.6,13867,59.6,9319,40.1,2642,11.4,28,57
5,9963,9122,91.6,5799,58.2,4680,47.0,841,8.4,28,15


In [14]:
# Example data
df[(df['stateFips'] == '06') & (df['countyFips'] == '073')]

Unnamed: 0,civilianNoninstitutionalizedPopulation,withHealthInsuranceCoverage,withHealthInsuranceCoveragePct,withPrivateHealthInsurance,withPrivateHealthInsurancePct,withPublicCoverage,withPublicCoveragePct,noHealthInsuranceCoverage,noHealthInsuranceCoveragePct,stateFips,countyFips
1869,3204470,2924868,91.3,2185866,68.2,1056014,33.0,279602,8.7,6,73


In [15]:
df['source'] = 'American Community Survey 5 year'
df['aggregationLevel'] = 'Admin2'

### Save data

In [16]:
df.to_csv(NEO4J_IMPORT / "03a-USCensusDP03HealthInsuranceAdmin2.csv", index=False)

## Download zip-level data using US Census API

In [17]:
url_zip = f'https://api.census.gov/data/2018/acs/acs5/profile?get={fields}&for=zip%20code%20tabulation%20area:*'

In [18]:
df = pd.read_json(url_zip, dtype='str')
df.fillna('', inplace=True)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,DP03_0095E,DP03_0096E,DP03_0096PE,DP03_0097E,DP03_0097PE,DP03_0098E,DP03_0098PE,DP03_0099E,DP03_0099PE,zip code tabulation area
1,8642,8047,93.1,5451,63.1,3843,44.5,595,6.9,43964
2,51044,44666,87.5,32359,63.4,16177,31.7,6378,12.5,28216
3,71605,68080,95.1,63326,88.4,10197,14.2,3525,4.9,28277
4,27266,25164,92.3,22128,81.2,4916,18.0,2102,7.7,28278


##### Add column names

In [19]:
df = df[1:].copy() # skip first row
columns = list(variables.values())
columns.append('postalCode')
df.columns = columns

In [20]:
df.head()

Unnamed: 0,civilianNoninstitutionalizedPopulation,withHealthInsuranceCoverage,withHealthInsuranceCoveragePct,withPrivateHealthInsurance,withPrivateHealthInsurancePct,withPublicCoverage,withPublicCoveragePct,noHealthInsuranceCoverage,noHealthInsuranceCoveragePct,postalCode
1,8642,8047,93.1,5451,63.1,3843,44.5,595,6.9,43964
2,51044,44666,87.5,32359,63.4,16177,31.7,6378,12.5,28216
3,71605,68080,95.1,63326,88.4,10197,14.2,3525,4.9,28277
4,27266,25164,92.3,22128,81.2,4916,18.0,2102,7.7,28278
5,27078,23973,88.5,16638,61.4,11833,43.7,3105,11.5,28303


In [21]:
# Example data
df.query("postalCode == '90210'")

Unnamed: 0,civilianNoninstitutionalizedPopulation,withHealthInsuranceCoverage,withHealthInsuranceCoveragePct,withPrivateHealthInsurance,withPrivateHealthInsurancePct,withPublicCoverage,withPublicCoveragePct,noHealthInsuranceCoverage,noHealthInsuranceCoveragePct,postalCode
30897,19866,18989,95.6,16129,81.2,6086,30.6,877,4.4,90210


In [22]:
df['source'] = 'American Community Survey 5 year'
df['aggregationLevel'] = 'PostalCode'

### Save data

In [23]:
df.to_csv(NEO4J_IMPORT / "03a-USCensusDP03HealthInsuranceZip.csv", index=False)

## Download tract-level data using US Census API
Tract-level data are only available by state, so we need to loop over all states.

In [24]:
def get_tract_data(state):
    url_tract = f'https://api.census.gov/data/2018/acs/acs5/profile?get={fields}&for=tract:*&in=state:{state}'
    df = pd.read_json(url_tract, dtype='str')
    time.sleep(1)
    # skip first row of labels
    df = df[1:].copy()
    # Add column names
    columns = list(variables.values())
    columns.append('stateFips')
    columns.append('countyFips')
    columns.append('tract')
    df.columns = columns
    return df

In [25]:
df = pd.concat((get_tract_data(state) for state in stateFips))
df.fillna('', inplace=True)

In [26]:
df['tract'] = df['stateFips'] + df['countyFips'] + df['tract']

In [27]:
df['source'] = 'American Community Survey 5 year'
df['aggregationLevel'] = 'Tract'

In [28]:
# Example data for San Diego County
df[(df['stateFips'] == '06') & (df['countyFips'] == '073')].head()

Unnamed: 0,civilianNoninstitutionalizedPopulation,withHealthInsuranceCoverage,withHealthInsuranceCoveragePct,withPrivateHealthInsurance,withPrivateHealthInsurancePct,withPublicCoverage,withPublicCoveragePct,noHealthInsuranceCoverage,noHealthInsuranceCoveragePct,stateFips,countyFips,tract,source,aggregationLevel
56,7210,6886,95.5,6090,84.5,1829,25.4,324,4.5,6,73,6073008324,American Community Survey 5 year,Tract
57,2049,1928,94.1,1801,87.9,173,8.4,121,5.9,6,73,6073008339,American Community Survey 5 year,Tract
58,7016,6782,96.7,5473,78.0,2182,31.1,234,3.3,6,73,6073008347,American Community Survey 5 year,Tract
59,9756,9535,97.7,8266,84.7,2122,21.8,221,2.3,6,73,6073008354,American Community Survey 5 year,Tract
60,6576,5673,86.3,4454,67.7,1728,26.3,903,13.7,6,73,6073008505,American Community Survey 5 year,Tract


### Save data

In [29]:
df.to_csv(NEO4J_IMPORT / "03a-USCensusDP03InsuranceTract.csv", index=False)

In [30]:
df.shape

(73056, 14)