# Working with demand data



In [1]:
import pandas as pd
import geopandas as gpd

from covidcaremap.data import (PUBLISHED_DATA_DIR, published_data_path, 
                                  PROCESSED_DATA_DIR, processed_data_path,
                                  EXTERNAL_DATA_DIR, external_data_path)

## Cases (actuals)

There are open, updated datasets of confirmed cases and deaths from two sources: USAFacts and NYTtimes.

### NY Times Data

The NY Times data shows cumulative cases and deaths per state or county per day.

This data is pulled from their GitHub repository dynamically via these `covidcaremap.data` package methods:

In [2]:
from covidcaremap.cases import get_nytimes_cases_by_county, get_nytimes_cases_by_state

nytimes_county_cases = get_nytimes_cases_by_county()
nytimes_state_cases = get_nytimes_cases_by_state()

In [3]:
nytimes_county_cases

Unnamed: 0,date,county,state,fips,cases,deaths
0,2020-01-21,Snohomish,Washington,53061.0,1,0
1,2020-01-22,Snohomish,Washington,53061.0,1,0
2,2020-01-23,Snohomish,Washington,53061.0,1,0
3,2020-01-24,Cook,Illinois,17031.0,1,0
4,2020-01-24,Snohomish,Washington,53061.0,1,0
...,...,...,...,...,...,...
30838,2020-04-03,Sublette,Wyoming,56035.0,1,0
30839,2020-04-03,Sweetwater,Wyoming,56037.0,3,0
30840,2020-04-03,Teton,Wyoming,56039.0,32,0
30841,2020-04-03,Uinta,Wyoming,56041.0,1,0


### USAFacts Data

The USAFacts data is by county, and is a different format than the NYTimes data. It shown total accumulated counts of death per date. It also seperates out the cases and deaths into separate files:

In [4]:
from covidcaremap.cases import get_usafacts_cases_by_county, get_usafacts_deaths_by_county

usafacts_cases_df = get_usafacts_cases_by_county()
usafacts_deaths_df = get_usafacts_deaths_by_county()

In [5]:
usafacts_cases_df

Unnamed: 0,countyFIPS,County Name,State,stateFIPS,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/25/20,3/26/20,3/27/20,3/28/20,3/29/20,3/30/20,3/31/20,4/1/20,4/2/20,4/3/20
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,4,6,6,6,6,7,7,10,10,12
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,4,5,5,10,15,18,19,23,25,28
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,2,3,3,4,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3190,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,1,1,1,1,1,2,2,3,3,3
3191,56039,Teton County,WY,56,0,0,0,0,0,0,...,6,8,12,14,14,17,23,26,29,32
3192,56041,Uinta County,WY,56,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1
3193,56043,Washakie County,WY,56,0,0,0,0,0,0,...,0,0,1,1,1,1,1,1,2,2


We can begin to compare the datasets, e.g. to determine the total counts of Philadelphia County on 3/20:

In [6]:
usafacts_cases_df[usafacts_cases_df['County Name'] == 'Philadelphia County'].loc[:,'4/3/20'].to_frame()

Unnamed: 0,4/3/20
2335,2284


In [7]:
nytimes_county_cases[
    (nytimes_county_cases['county'] == 'Philadelphia') &
    (nytimes_county_cases['date'] == '2020-04-03')]


Unnamed: 0,date,county,state,fips,cases,deaths
30227,2020-04-03,Philadelphia,Pennsylvania,42101.0,2284,14


## Forecasts

Forecasting demand on the healthcare system is an essential part of identifying the capacity gap. We rely on groups exprienced in epidemiological modeling to produce models we can integrate and data we can ingest.

### IHME - by State

The Institute for Health Metric and Evaluation, University of Washington (IHME) produced a fantastic [report](http://www.healthdata.org/research-article/forecasting-covid-19-impact-hospital-bed-days-icu-days-ventilator-days-and-deaths) along with a [data explorer](http://covid19.healthdata.org/projections). They are releasing new data every Monday, with predictions around bed needs per day.

Data dictionary taken from the 2020_03_27 data release:

- **location_name**: Name of the state
- **date_reported**:Date
- **allbed_mean**: Mean covid beds needed by day
- **allbed_lower**: Lower uncertainty bound of covid beds needed by day
- **allbed_upper**: Upper uncertainty bound of covid beds needed by day 
- **ICUbed_mean**: Mean ICU covid beds needed by day
- **ICUbed_lower**: Lower uncertainty bound of ICU covid beds needed by day 
- **ICUbed_upper**: Upper uncertainty bound of ICU covid beds needed by day
- **InvVen_mean**: Mean invasive ventilation needed by day
- **InvVen_lower**: Lower uncertainty bound of invasive ventilation needed by day
- **InvVen_upper**: Upper uncertainty bound of invasive ventilation needed by day
- **deaths_mean**: Mean daily covid deaths
- **deaths_lower**: Lower uncertainty bound of daily covid deaths
- **deaths_upper**: Upper uncertainty bound of daily covid deaths
- **admis_mean**: Mean hospital admissions by day
- **admis_lower**: Lower uncertainty bound of hospital admissions by day
- **admis_upper**: Upper uncertainty bound of hospital admissions by day
- **newICU_mean**: Mean number of new people going to the ICU by day
- **newICU_lower**: Lower uncertainty bound of the number of new people going to the ICU by day
- **newICU_upper**: Upper uncertainty bound of the number of new people going to the ICU by day
- **totdea_mean**: Mean cumulative covid deaths
- **totdea_lower**: Lower uncertainty bound of cumulative covid deaths
- **totdea_upper**: Upper uncertainty bound of cumulative covid deaths
- **bedover_mean**: `covid all beds needed` - (`total bed capacity` - `average all bed usage`)
- **bedover_lower**: Lower uncertainty bound of bedover (above)
- **bedover_upper**: Upper uncertainty bound of bedover (above)
- **icuover_mean**: `covid ICU beds needed` - (`total ICU capacity` - `average ICU bed usage`)
- **icuover_lower**: Lower uncertainty bound of icuover (above)
- **icuover_upper**: Upper uncertainty bound of icuover (above)

In [8]:
from covidcaremap.data import get_ihme_forecast

ihme_df = get_ihme_forecast()

In [9]:
list(ihme_df.columns)

['V1',
 'location',
 'date',
 'allbed_mean',
 'allbed_lower',
 'allbed_upper',
 'ICUbed_mean',
 'ICUbed_lower',
 'ICUbed_upper',
 'InvVen_mean',
 'InvVen_lower',
 'InvVen_upper',
 'deaths_mean',
 'deaths_lower',
 'deaths_upper',
 'admis_mean',
 'admis_lower',
 'admis_upper',
 'newICU_mean',
 'newICU_lower',
 'newICU_upper',
 'totdea_mean',
 'totdea_lower',
 'totdea_upper',
 'bedover_mean',
 'bedover_lower',
 'bedover_upper',
 'icuover_mean',
 'icuover_lower',
 'icuover_upper',
 'location_name']

In [10]:
# Join in case data and compare projected total deaths for NY on 2020-03-26
nytimes_state_df = get_nytimes_cases_by_state()

forecast_and_cases = ihme_df.rename(columns={
    'location_name': 'state', 'date_reported': 'date'
}).merge(nytimes_state_df, on=['state', 'date'])

forecast_and_cases[(forecast_and_cases['state'] == 'New York') &
                    (forecast_and_cases['date'] == '2020-04-03')][['totdea_mean', 'deaths']]

Unnamed: 0,totdea_mean,deaths
632,2957.593,2935


### CHIME

[CHIME](https://github.com/CodeForPhilly/chime) is a tool was developed by the Predictive Healthcare team at Penn Medicine. It [implements a SIR model](https://code-for-philly.gitbook.io/chime/what-is-chime/sir-modeling) that takes a set of parameters, population, and current confirmed cases to produce a several week estimate of hospitalized, ICU, and ventilated patients. The parameters with their default values can be found in the `covidcaremap.chime` package:

In [11]:
import covidcaremap.chime as ccm_chime

ccm_chime.DEFAULT_PARAMS

{'detection_probability': 0.14,
 'doubling_time': 4,
 'relative_contact_rate': 0.3,
 'hospitalized_rate': 0.025,
 'hospitalized_los': 7,
 'icu_rate': 0.0075,
 'icu_los': 9,
 'ventilated_rate': 0.005,
 'ventilated_los': 10,
 'recovery_days': 14}

The parameters are documented in `covidcaremap/chime.py`:

```
DEFAULT_PARAMS = {

    # Detection Probability: Used to infer infected population from confirmed cases.
    "detection_probability": 0.14,

    # Doubling time before social distancing (days)
    "doubling_time" : 4,

    # Social Distancing Reduction Rate: 0.0 - 1.0
    "relative_contact_rate": 0.3,

    # Hospitalized Rate: 0.00001 - 1.0
    "hospitalized_rate": 0.025,

    # Hospitalized Length of Stay (days)
    "hospitalized_los": 7,

    # ICU Length of Stay (days)
    "icu_rate": 0.0075,

    # ICU Rate: 0.0 - 1.0
    "icu_los": 9,

    # Ventilated Rate: 0.0 - 1.0
    "ventilated_rate": 0.005,

    #Ventilated Length of Stay (days)
    "ventilated_los": 10,

    "recovery_days": 14

}
```

This package also has a method to run CHIME over a region:

In [12]:
help(ccm_chime.get_regional_predictions)

Help on function get_regional_predictions in module covidcaremap.chime:

get_regional_predictions(regions_df, region_id_column, population_column='Population', cases_column='Confirmed Cases', num_days=60, region_param_override=None)
    Runs a regional CHIME prediction based on region population and case counts.
    
    Args:
        regions_df: The regions to be run over. Requires an ID, population, and cases columns.
        region_id_column: The column holding the region ID.
        population_column: The column holding the population count. Default
        cases_column: The column holding the number of confirmed cases
        region_param_override: A dictionary with keys of region IDs and values being
            being a dict of overridding values for the CHIME parameters. This allows
            regional parameters to be supplied by the user per region.
    
    Returns:
        A dataframe with the region_id, day, and projection numbers.



We can use this to create predictions over every county in the US:

In [16]:
from covidcaremap.cases import get_county_case_info

# Gets confirmed cases from USA Facts per county for date.
cases_by_county = get_county_case_info('4/3/20') 
chime_county_df = ccm_chime.get_regional_predictions(cases_by_county,
                                       region_id_column='County Name')
chime_county_df

Unnamed: 0,County Name,day,hospitalized_total,icu_total,ventilated_total,hospitalized_admitted,icu_admitted,ventilated_admitted,hospitalized_census,icu_census,ventilated_census
1,Autauga,1,3.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0
2,Autauga,2,3.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0
3,Autauga,3,3.0,1.0,1.0,0.0,0.0,0.0,2.0,1.0,1.0
4,Autauga,4,4.0,1.0,1.0,1.0,0.0,0.0,2.0,1.0,1.0
5,Autauga,5,5.0,1.0,1.0,1.0,0.0,0.0,3.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...
46,Washington,46,1062.0,318.0,212.0,67.0,20.0,13.0,425.0,157.0,114.0
47,Washington,47,1130.0,339.0,226.0,68.0,21.0,14.0,440.0,163.0,119.0
48,Washington,48,1199.0,360.0,240.0,70.0,21.0,14.0,454.0,169.0,123.0
49,Washington,49,1270.0,381.0,254.0,70.0,21.0,14.0,465.0,174.0,127.0


### HGHI

The data from the [Harvard Global Health Institute (HGHI)](https://globalepidemics.org/our-data/hospital-capacity/) study also includes forecasts. The columns for projections are:

- **Projected Infected Individuals** – How many individuals over the age of 18 are expected to get infected with COVID-19 over the entire course of the pandemic
- **Projected Hospitalized Individuals** – How many individuals over the age of 18 are expected to need hospitalization due to COVID-19 over the entire course of the pandemic        
- **Projected Individuals Needing ICU Care** – How many individuals over the age of 18 are expected to need ICU care due to COVID-19 over the entire course of the pandemic            

These numbers are based on rough percentages of infected population and hospitalization rates.

See their [data dictionary](https://globalepidemics.org/our-data-guide/) for more column descriptions.

In [19]:
hghi_state_gdf = gpd.read_file(processed_data_path('hghi_state_data.geojson'))
hghi_state_gdf[[
    'State Name',
    'Projected Infected Individuals',
    'Projected Hospitalized Individuals',
    'Projected Individuals Needing ICU Care'
]]

Unnamed: 0,State Name,Projected Infected Individuals,Projected Hospitalized Individuals,Projected Individuals Needing ICU Care
0,Alaska,331391,67202,13976
1,Alabama,2248853,470718,101816
2,Arkansas,1363336,286175,62109
3,Arizona,3112512,654440,142316
4,California,17920876,3698428,786338
5,Colorado,2511112,517433,109804
6,Connecticut,1699048,355637,76924
7,District of Columbia,332600,67741,14167
8,Delaware,443807,93476,20369
9,Florida,9700119,2066855,456495


Here we can roughly compare of HGHI and IHME total ICU patients per state.

In [18]:
# Sum up all the mean new ICU patient forecasts per day for a state to get the
# total number of patients needing ICU care.
ihme_hghi_df = ihme_df.rename(columns={'location_name': 'State Name'}) \
        .groupby('State Name')[['newICU_mean',
                               'newICU_lower',
                               'newICU_upper']].sum() \
        .merge(hghi_state_gdf, on='State Name')

ihme_hghi_df['Difference (Mean)'] = (ihme_hghi_df['newICU_mean'] - 
                              ihme_hghi_df['Projected Individuals Needing ICU Care'])
ihme_hghi_df[['State Name', 
              'newICU_mean',
              'newICU_lower',
              'newICU_upper',
              'Projected Individuals Needing ICU Care', 
              'Difference (Mean)']]

Unnamed: 0,State Name,newICU_mean,newICU_lower,newICU_upper,Projected Individuals Needing ICU Care,Difference (Mean)
0,Alabama,10860.385673,1581.867815,19470.8875,101816,-90955.614327
1,Alaska,326.4707,130.0275,699.12625,13976,-13649.5293
2,Arizona,2684.803229,968.773804,5096.420373,142316,-139631.196771
3,Arkansas,1201.276658,588.182513,2108.841541,62109,-60907.723342
4,California,10011.022267,2272.81,23755.5625,786338,-776326.977733
5,Colorado,4381.150995,2830.808124,7400.556649,109804,-105422.849005
6,Connecticut,2203.997054,1244.954377,2968.096294,76924,-74720.002946
7,Delaware,321.89135,125.40875,616.1125,20369,-20047.10865
8,District of Columbia,427.5532,267.16125,649.54875,14167,-13739.4468
9,Florida,12845.488395,2955.687813,35300.680357,456495,-443649.511605
