Title: RP- Spatial Accessibility of COVID-19 Healthcare Resources in Illinois Pre-Processing Script
---

This is a script that automates the data gathering and pre-processing for our reproduction of Kang et al.

**Reproduction of**: Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA

Original study *by* Kang, J. Y., A. Michels, F. Lyu, Shaohua Wang, N. Agbodo, V. L. Freeman, and Shaowen Wang. 2020. Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA. International Journal of Health Geographics 19 (1):1–17. DOI:[10.1186/s12942-020-00229-x](https://ij-healthgeographics.biomedcentral.com/articles/10.1186/s12942-020-00229-x).

Reproduction Authors: Joe Holler, Kufre Udoh, Derrick Burt, Drew An-Pham, & Spring '21 Middlebury Geog 0323.

Reproduction Materials Available at: [RP-Kang Repository](https://github.com/derrickburt/RP-Kang-Improvements)

Created: `29 Jun 2021`
Revised: `29 Jun 2021`

### Modules
Import necessary libraries to run this model.
See `requirements.txt` for the library versions used for this analysis.

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import osmnx as ox
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from tqdm import tqdm
import multiprocessing as mp
import folium, itertools, os, time, warnings
from IPython.display import display, clear_output

warnings.filterwarnings("ignore")

## Check Directories

Because we have restructured the repository for replication, we need to check our working directory and make necessary adjustments.

In [None]:
os.getcwd()

In [None]:
## Use to set work directory properly
if os.getcwd() == '/home/jovyan/work/RP-Kang2020/procedure/code':
    os.chdir('../../')
if os.getcwd() == '/home/jovyan/work/RP-Kang2020/':
    None 

os.getcwd()

### Load and Plot the Street Network

In [None]:
%%time
if not os.path.exists("data/raw/private/Chicago_Network_Buffer.graphml"):
    G = ox.graph_from_place('Chicago', network_type='drive', buffer_dist = 24140.2) # pulling the drive network the first time will take a while
    ox.save_graphml(G, 'raw/private/Chicago_Network_Buffer.graphml')
else:
    G = ox.load_graphml('raw/private/Chicago_Network_Buffer.graphml', node_type=str)
ox.plot_graph(G)

In [None]:
## Get unique counts for each road network
# turn nodes and edges in geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)

# count
print(edges['maxspeed'].value_counts())

### Automate/Pre-Process Census Data  with API

*Note* you will need to download a new module called 'censusdata'

To do this, open a terminal in the cybergisx environment and type:

```pip install censusdata```

**Note: we deviate from the original paper's methodology here bringing in a larger buffer distance of census tracts

#### Tract

In [None]:
# load module
import censusdata as cd

In [None]:
%time

# Read in all Illinois tracts using census API
pop_api = cd.download('acs5', 2018,
                             cd.censusgeo([('state', '17'), ('tract', '*')]),
                             ['B01001_001E', 'B01001_016E', 'B01001_017E', 'B01001_018E', 'B01001_019E', 
                              'B01001_020E', 'B01001_021E', 'B01001_022E', 'B01001_023E', 'B01001_024E', 
                              'B01001_025E', 'B01001_040E', 'B01001_041E', 'B01001_042E', 'B01001_043E', 
                              'B01001_044E', 'B01001_045E', 'B01001_046E', 'B01001_047E', 'B01001_048E',
                              'B01001_049E'])

# check
#pop_api.head()

In [None]:
## Reformat and Rename columns
# Sum + Rename 50+ population
pop_api['OverFifty'] = pop_api.iloc[:, 1:21].sum(axis=1)

# Rename Total 
pop_api['TotalPop'] = pop_api['B01001_001E']

# Drop irrelevant columns
pop_api = pop_api.drop(pop_api.columns[0:21], axis=1)

# Check
pop_api.head()

In [None]:
# ADD CODE TO SAFE INTO RAW DATA FOLDER
# Create column from index tract # -- we will need thee tract ID for a join
pop_api['TRACTCE'] = pop_api.index

# Convert to string 
pop_api['TRACTCE'] = pop_api['TRACTCE'].astype(str)

# Slice last 6 digits (tract id)
pop_api['TRACTCE'] = pop_api['TRACTCE'].str.slice(-6)

# Check
pop_api.head()

In [None]:
### Note: We are using a larger subset of data here
len(pop_api)

#### Zip Codes

In [None]:
# Read in all Illinois tracts using census API
zip_api = cd.download('acs5', 2019,
                             cd.censusgeo([('state', '17'), ('zip code tabulation area', '*')]),
                             ['B01003_001E'])

# check
zip_api.head()

In [None]:
# rename population column
pop_col = {"B01003_001E":"pop"}
zip_api = zip_api.rename(columns=pop_col)

# Create column from index tract # -- we will need thee tract ID for a join
zip_api['ZCTA5CE10'] = zip_api.index

# Convert to string 
zip_api['ZCTA5CE10'] = zip_api['ZCTA5CE10'].astype(str)

# Slice last 6 digits (tract id)
zip_api['ZCTA5CE10'] = zip_api['ZCTA5CE10'].str.slice(6,11)

# Check
zip_api.head()

In [None]:
len(zip_api)

### Automate/Pre-Process COVID-19 with Requests

Download covid data that will be kjoined to zip code geographies.

In [None]:
import requests
import geojson
import json

In [None]:
# Unfortunately... I have not found how to access archived COVID-119 case data, so this data is cases from 4/6/2020 - 6/30/2021 
# set file path
fp_covid = 'https://idph.illinois.gov/DPHPublicInformation/api/COVIDExport/GetZip'

# make reqeuest
r_covid = requests.get(fp_covid)

# save request as dataframe
covid_cases = pd.DataFrame.from_dict(json.loads(r_covid.content))

# check
covid_cases.head()

In [None]:
# change confirmed cases to cases
cases_col = {'zip':"ZCTA5CE10", "confirmed_cases":"cases"}
covid_cases = covid_cases.rename(columns=cases_col)

In [None]:
# Merge covid case data with zip code geographis to normalize cases
covid_api = covid_cases.merge(zip_api, how="inner", on="ZCTA5CE10")
covid_api

### Automate/Pre-Process Census Boundary Shapefiles with FTP Site

#### Note: Here, we extract *census tracts* and *zip code geographies* based on their spatial relationship (intersection) with the street network

Census TIGER/Line shapefiles can bee accessed from ftp://ftp2.census.gov/geo/tiger/ using !wget

File path for Cook County 2010 tracts: ftp://ftp2.census.gov/geo/tiger//TIGER2010/TRACT/2010/tl_2010_17031_tract10.zip

File path for Illinois 2010 tracts: ftp://ftp2.census.gov/geo/tiger//TIGER2018/TRACT/tl_2018_17_tract.zip

#### Tracts

In [None]:
# check directory -- we want to downlaod the raw data directly into our pre-processing data folder
%ls

In [None]:
# Download census tract shapefiles to data/raw/public/Pre-Processing/ for All of Chicago (017)
if not os.path.exists('data/raw/public/Pre-Processing/tl_2018_24_tract.zip'):
    !wget -P data/raw/public/Pre-Processing/ ftp://ftp2.census.gov/geo/tiger//TIGER2018/TRACT/tl_2018_24_tract.zip
    # Extract shapefiles
    !unzip -d data/raw/public/Pre-Processing/ data/raw/public/Pre-Processing/tl_2018_24_tract.zip
    # read in all census tracts for Illinois
    tracts_shp = gpd.read_file('data/raw/public/Pre-Processing/tl_2018_24_tract.shp')
else:
    # read in all census tracts for Illinois
    tracts_shp = gpd.read_file('data/raw/public/Pre-Processing/tl_2018_24_tract.shp')

In [None]:
# set crs to WGS 84
tracts_shp = tracts_shp.to_crs(epsg=4326)

# check crs
print(tracts_shp.crs)

# check length
print(len(tracts_shp))

# check column names
tracts_shp.head()

In [None]:
# rename columns for join
new_names = {"GEOID10":"GEOID", "TRACTCE10":"TRACTCE"}
tracts_shp = tracts_shp.rename(columns=new_names)

In [None]:
# Join Tracts shape with Tracts Population data
## This drops duplicate values so that we do not end up with
overfifty_data = tracts_shp.merge(pop_api.drop_duplicates(subset=['TRACTCE']), how='left', on="TRACTCE")
len(overfifty_data)

##### Zip codes

In [None]:
# check directory -- we want to downlaod the raw tract data directly into our data folder
%ls

In [None]:
# Download zip code shapefiles to data/raw/public/Pre-Processing/ for entire US
## I have not yet found a way to select by state before extracting
if not os.path.exists('data/raw/public/Pre-Processing/cb_2018_us_zcta510_500k.zip'):
    !wget -P data/raw/public/Pre-Processing/ ftp://ftp2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_zcta510_500k.zip
    # Extract shapefiles
    !unzip -d data/raw/public/Pre-Processing/ data/raw/public/Pre-Processing/cb_2018_us_zcta510_500k.zip
    # read in zip code data
    usa_zip = gpd.read_file('data/raw/public/Pre-Processing/cb_2018_us_zcta510_500k.shp')
else:
    # read in zip code data
    usa_zip = gpd.read_file('data/raw/public/Pre-Processing/cb_2018_us_zcta510_500k.shp')

In [None]:
# select only illinois zip code data
## there may be some zip codes outside of Illinois included in this selecti
## it's only meant to reduce the size, the proper data will be insured with
## a join in the coming code cells
ill_zip = usa_zip.loc[(usa_zip['GEOID10'] >= '60002') & (usa_zip['GEOID10'] <= '63433')]

# set crs to WGS 84
ill_zip = ill_zip.to_crs(epsg=4326)

# check crs
print(ill_zip.crs)

# check length
print(len(ill_zip))

# renamw column names
ill_zip.head()

### Join Population and COVID Data to Zip and Tract Shapefiles and Save To Data Folder

#### Tracts

In [None]:
# Join pop_api to censustracts
pop_tracts_geo = tracts_shp.merge(pop_api, how='left', on='TRACTCE')
# Drop extra columns
pop_tracts_geo = pop_tracts_geo.drop(pop_tracts_geo.columns[5:10], axis=1)
pop_tracts_geo

#### Zip

In [None]:
# Join covid_zip to zip code geographis 
covid_zip_geo = ill_zip.merge(covid_api, how='left', on='ZCTA5CE10')
# Drop extra columns
# covid_zip_geo = covid_zip_geo.drop(covid_zip_geo.columns[5:10], axis=1)
covid_zip_geo

### Extract Population and Covid Data only where they intersect the street nodes

In [None]:
# find zip covde polygons that contain a G node
covid_selection = covid_zip_geo.contains(nodes)
covid_selection.value_counts()

In [None]:
# find tract  polygons that contain a G node
tract_selection = pop_tracts_geo.contains(nodes)
covid_selection

### Hospital Data

Note that 999 is treated as a "NULL"/"NA" so these hospitals are filtered out. This data contains the number of ICU beds and ventilators at each hospital.

In [None]:
hospitals = gpd.read_file('./data/raw/public/HospitalData/Chicago_Hospital_Info.shp')
hospitals.head()

### Automate/Pre-Process Hospital with Requests -- Still Drafting this

documentation for requests: https://docs.python-requests.org/en/master/

In [None]:
# ADD CODE TO SAFE INTO RAW DATA FOLDER
# set file paths for general care and icu beds
fp_gen = 'https://opendata.arcgis.com/datasets/6ac5e325468c4cb9b905f1728d6fbf0f_0.geojson'
fp_icu = 'https://healthdata.gov/resource/uqq2-txqb.json'

# requests for general care and icu beds
r_gen = requests.get(fp_gen)
r_icu = requests.get(fp_icu)

# get hospitals 
hospitals_gen = gpd.GeoDataFrame.from_features(geojson.loads(r_gen.content),  crs="EPSG:26971")
hospitals_icu = pd.DataFrame.from_dict(json.loads(r_icu.content))

# filter for icu and general care
hospitals_gen = hospitals_gen.loc[(hospitals_gen['STATE'] == 'IL') & (hospitals_gen['TYPE'] == 'GENERAL ACUTE CARE')]
# ERROR Here: it is only taking the first thousand
hospitals_icu = hospitals_icu[['hospital_pk', 'collection_week', 'state', 'city', 'ccn', 'hospital_name', 'zip', 'ccn', 'address', 'total_icu_beds_7_day_avg']]

# capitalize join column 
hospitals_icu.rename(columns={'address':'ADDRESS'}, inplace=True)

# join
hospitals_api = hospitals_gen.merge(hospitals_icu, on='ADDRESS')

# check 
hospitals_icu['city'].unique()