# Working with supply data

COVID Care Map has collected information about the US healthcare system supply, including information about hospital bed capacity and occupancy, ventilator supply, and models that estimate staffing needs and PPE burn rates.

**Table of Contents**
- [Hospital Bed Capacity and Occupancy](#Hospital-Bed-Capacity-and-Occupancy)
  - [CovidCareMap.org Data](#CovidCareMap.org-Data---by-facility,-state,-county-and-HRR)
  - [HGHI Data](#HGHI---by-state-and-HRR)
  - [HIFLD Data](#HGHI---by-state-and-HRR)
- [Ventilator Data](#Ventilators---2010-estimates-by-state)
- [Staffing Model](#Staffing-model)
- [PPE burn rate model](#PPE-burn-rate-model)

In [None]:
import pandas as pd
import geopandas as gpd

from covidcaremap.data import (PUBLISHED_DATA_DIR, published_data_path, 
                                  PROCESSED_DATA_DIR, processed_data_path,
                                  EXTERNAL_DATA_DIR, external_data_path)

## Hospital Bed Capacity and Occupancy

The amount of hospitals beds that a facility has to service the surge of COVID-19 patients is determined by the counts of beds in that facility as well as the availability of those beds. In normal operation hospitals have a bed occupancy rate that describes how many beds are occupied by non-COVID-19 patients and therefore not available to handle the surge of patients as a result of the pandemic. Furthermore, not all hospital beds are the same: we also need to know the counts and occupancy rates of ICU (Intensive Care Unit) beds, which are required to service patients in critical condition.

The values related to bed counts are defined as follows:

- **Staffed All Beds** - Number of hospital beds of all types typically set up and staffed for inpatient care as reported/estimated in selected facility or area

- **Staffed ICU Beds** - Number of ICU beds typically set up and staffed for intensive inpatient care as reported/estimated in selected facility or area

- **Licensed All Beds** - Number of hospital beds of all types licensed for potential use in selected facility or area

- **All Bed Occupancy Rate** - % of hospital beds of all types typically occupied by patients in selected facility or area

- **ICU Bed Occupancy Rate** - % of ICU beds typically occupied by patients in selected facility or area

We have a few data sources that describe bed information at different spatial levels:

### CovidCareMap.org Data - by facility, state, county and HRR

The CovidCareMap.org data utilizes [Healthcare Cost Report Information System (HCRIS) data](https://github.com/covidcaremap/covid19-healthsystemcapacity/tree/master/data#healthcare-cost-report-information-system-hcris-data) and [Definitive Health (DH) data](https://github.com/covidcaremap/covid19-healthsystemcapacity/tree/master/data#healthcare-cost-report-information-system-hcris-data) in a processing pipeline that determines merges the two datasets and rolls them up to the county, state, and HRR (Hospital Referral Region) level. See the notebooks at [../processing] to see the steps to generate this data if you're interested; however you can just start consuming the data for analysis via the CSV and GeoJSON files we produce.

This is the data that powers the [US healthcare system capacity map](https://www.covidcaremap.org/maps/us-healthcare-system-capacity).

This data is in the `PUBLISHED_DATA_DIR`:

In [None]:
!ls $PUBLISHED_DATA_DIR

We can read in read in the CSV with pandas as well as the GeoJSON with GeoPandas:

In [None]:
ccm_facilities_df = pd.read_csv(published_data_path('us_healthcare_capacity-facility-CovidCareMap.csv'))
ccm_facilities_df.columns

In [None]:
ccm_facilities_gdf = gpd.read_file(published_data_path('us_healthcare_capacity-facility-CovidCareMap.geojson'))
ccm_facilities_gdf.plot()

We can read in the county, state, or HRR level information the same way:

In [None]:
ccm_states_gdf = gpd.read_file(published_data_path('us_healthcare_capacity-state-CovidCareMap.geojson'))
# Remove Hawaii and Alaska to make the plot nicer looking - no shade
ccm_states_gdf[~ccm_states_gdf['State Name'].isin(['Hawaii', 'Alaska'])].plot()

The facility, county, state, and HRR data will have the same sort of bed information, but each can have their own columns available. See the [data dictionary](https://github.com/covidcaremap/covid19-healthsystemcapacity/tree/master/data#covidcaremap-capacity-data-dictionary) or explore the DataFrames to see what information is available for what level of detail. For instance, regional levels all have per-capita numbers, broken down in to age groups:

In [None]:
per_capita_columns = [x for x in ccm_states_gdf.columns if 'Per ' in x]
per_capita_columns

In [None]:
ccm_states_gdf[['State Name', 'Population'] + per_capita_columns]

### HGHI - by state and HRR

This is data that is taken from a study by the [Harvard Global Health Institute (HGHI)](https://globalepidemics.org/2020-03-17-caring-for-covid-19-patients/). It describes bed counts sourced from a different set of sources than what the CovidCareMap.org data uses. It also includes data about projected bed needs based on forecasted patient numbers.

See their [data dictionary](https://globalepidemics.org/2020-03-17-caring-for-covid-19-patients/#dictionary) for column descriptions.

The original datasets are in the `EXTERNAL_DATA_DIR`:

In [None]:
!ls $EXTERNAL_DATA_DIR/HGHI*

We've processed this data GeoJSON format, and have a version for states that combines ventilator data - this is what's used to power the [hghi-vents map](https://www.covidcaremap.org/maps/hghi-vents).

In [None]:
!ls $PROCESSED_DATA_DIR/hghi_state*

Here we read in the state data and inspect the columns:

In [None]:
hghi_state_gdf = gpd.read_file(processed_data_path('hghi_state_data.geojson'))
list(hghi_state_gdf.columns)

### HIFLD - by facility

The Homeland Infrastructure Foundation-Level Data (HIFLD) dataset includes information about hospital facilities similar to the HCRIS and DH data. We plan to merge in this facility information to the CovidCareMap.org data; this work is pending [Issue #70](https://github.com/covidcaremap/covid19-healthsystemcapacity/issues/70).

See https://hifld-geoplatform.opendata.arcgis.com/datasets/hospitals for more information.

In [None]:
!ls $EXTERNAL_DATA_DIR/hifld*

In [None]:
hifld_facility_df = pd.read_csv(external_data_path('hifld-hospitals.csv'))
list(hifld_facility_df.columns)

## Ventilators - 2010 estimates by state

The latest published ventilator estimates we could find were from [a 2010 study](https://www.cambridge.org/core/journals/disaster-medicine-and-public-health-preparedness/article/mechanical-ventilators-in-us-acute-care-hospitals/F1FDBACA53531F2A150D6AD8E96F144D). This is old and not ideal data, of course, but it is currently the best estimates we have right now for analysis. It includes per capita numbers (per 100,000) that are based on a 2008 population estimate.

In [None]:
!ls $EXTERNAL_DATA_DIR/vent*

In [None]:
vents_df = pd.read_csv(external_data_path('ventilators_by_state.csv'))
list(vents_df.columns)

## Staffing model

**Note: This is work in progress. See [Generate_CCM_CareModel_Facility_Data.ipynb](../processing/Generate_CCM_CareModel_Facility_Data.ipynb) and [Generate_CareModel_Regional_Data.ipynb](../processing/Generate_CareModel_Regional_Data.ipynb) to see current work. Help wanted!**


##  PPE burn rate model

**Note: This is work in progress. See [PPE_needs_for_confirmed_covid-19_at_county_level.ipynb](../processing/PPE_needs_for_confirmed_covid-19_at_county_level.ipynb) to see current work. Help wanted!**