A catalog of all data used in the project.
└── data ├── README.md <- This file. ├── external <- Intermediate data that has been transformed. ├── published <- The final, canonical data sets for COVID Care Map. ├── processed <- Folder containing intermediate processing data. └── local <- Folder containing intermediate data that is not comitted to the repository.
|us_healthcare_capacity-facility-CovidCareMap.geojson||Capacity information for US Health Facilities in GeoJSON format.|
|us_healthcare_capacity-facility-CovidCareMap.csv||Capacity information for US Health Facilities in CSV format.|
|us_healthcare_capacity-county-CovidCareMap.geojson||Aggregated facility capacity information by County in GeoJSON format.|
|us_healthcare_capacity-county-CovidCareMap.csv||Aggregated facility capacity information by County in CSV format.|
|us_healthcare_capacity-state-CovidCareMap.geojson||Aggregated facility capacity information by State in GeoJSON format.|
|us_healthcare_capacity-state-CovidCareMap.csv||Aggregated facility capacity information by State in CSV format.|
|us_healthcare_capacity-hrr-CovidCareMap.geojson||Aggregated facility capacity information by Healthcare Referral Region (HRR) in GeoJSON format.|
|us_healthcare_capacity-hrr-CovidCareMap.csv||Aggregated facility capacity information by Healthcare Referral Region (HRR) in CSV format.|
The 'published' directory contains datasets published by COVID Care Map. These are data that have been aggregated from various sources, analyzed, processed, inspected for validity and written to common data formats for easy consumption.
IMPORTANT NOTE: This data may be updated. If you want to pull directly from links where the data does not change, see the note in the main README about Using Tags
CovidCareMap US Healthcare System Capacity data
This data aggregates information about the healthcare system capacity. It sources data from the Healthcare Cost Report Information System (HCRIS) and an open hospital facilities dataset by Definitive Healthcare.
CovidCareMap Capacity Data Dictionary
These fields are across all facility and regional datasets:
Staffed All Beds - Number of hospital beds of all types typically set up and staffed for inpatient care as reported/estimated in selected facility or area
Staffed ICU Beds - Number of ICU beds typically set up and staffed for intensive inpatient care as reported/estimated in selected facility or area
Licensed All Beds - Number of hospital beds of all types licensed for potential use in selected facility or area
All Bed Occupancy Rate - % of hospital beds of all types typically occupied by patients in selected facility or area
ICU Bed Occupancy Rate - % of ICU beds typically occupied by patients in selected facility or area
In addition to the above fields, facility data has the following:
Name: Name of the facility, same as Definitive Healthcare data.
Hospital Type: Hospital Type from Definititve Healthcare data. See Hospital Types
Address, Address_2, City, State, Zipcode, County, Latitude, Longitude: Location information from the Definitive Healthcare data.
CCM_ID - Unique identifier for the facility. Matches the Definitive Healtchare ID until new facilities are added or other datasets are brought in.
DH-OBJECTID - The
OBJECTIDin the Definitive Healthcare dataset for this facility.
HCRIS-Provider Number - The
Provider Numberfrom the HCRIS reports (also matches the
PROVIDER_NUMBERfield in the facility information).
Source information: In the facility dataset there is also a set of columns suffixed with '- SOURCE', which describes the source of the value for the corresponding column. The values are prefixed with 'HCRIS', in which case the number comes from HCRIS data, or 'DH', in which it's Definitive Health data. It also has the column name from that dataset so you can trace everything back to the source datasets. The HCRIS columns are created from the data processing steps, which are traceable back to the origin HCRIS file data value through the processing notebook workflow. The DH data columns match directly to the DH external dataset.
Per Capita Information: There are additional per-capita fields in the regional datasets:
Population - Population of this region, sourced by the US Census Bureau 2018 county population estimates.
Population (20+) - Population of people aged 20 years or older.
Population (65+) - Population of people aged 65 years or older.
Staffed All Beds [Per 1000 People], Staffed All Beds [Per 1000 Adults (20+)], Staffed All Beds [Per 1000 Elderly (65+)], etc. - The
Staffed All Beds,
Staffed ICU Beds, and
Licensed All Bedsfields per capita of the population described.
The count dataset includes a
fips_code, a unique identifier for counties.
- fips_code - The FIPS county code for the given county.
Ventilator Information: There is additional information about estimated mechanical ventilators for state-level data which is from a 2010 study that uses survey data of hospitals conducted in 2009 and US Census population estimates from 2008:
- Estimated No. Full-Featured Mechanical Ventilators
- Estimated No. Full-Featured Mechanical Ventilators per 100,000 Population
- Estimated No. Pediatrics-Capable Full-Feature Mechanical Ventilators
- Estimated No. Full-Feature Mechanical Ventilators, Pediatrics Capable per 100,000 Population <14
This information directly from the ESRI page for the Definitive Healthcare dataset
- Short Term Acute Care Hospital (STAC)
- Provides inpatient care and other services for surgery, acute medical conditions, or injuries
- Patients care can be provided overnight, and average length of stay is less than 25 days
- Critical Access Hospital (CAH)
- 25 or fewer acute care inpatient beds
- Located more than 35 miles from another hospital
- Annual average length of stay is 96 hours or less for acute care patients
- Must provide 24/7 emergency care services
- Designation by CMS to reduce financial vulnerability of rural hospitals and improve access to healthcare
- Religious Non-Medical Health Care Institutions
- Provide nonmedical health care items and services to people who need hospital or skilled nursing facility care, but for whom that care would be inconsistent with their religious beliefs
- Long Term Acute Care Hospitals
- Average length of stay is more than 25 days
- Patients are receiving acute care - services often include respiratory therapy, head trauma treatment, and pain management
- Rehabilitation Hospitals
- Specializes in improving or restoring patients' functional abilities through therapies
- Children’s Hospitals
- Majority of inpatients under 18 years old
- Psychiatric Hospitals
- Provides inpatient services for diagnosis and treatment of mental illness 24/7
- Under the supervision of a physician
- Veteran's Affairs (VA) Hospital
- Responsible for the care of war veterans and other retired military personnel
- Administered by the U.S. VA, and funded by the federal government
- Department of Defense (DoD) Hospital
US Healthcare Capacity by Facility
- us_healthcare_capacity-facility-CovidCareMap.geojson: Data in GeoJSON format.
- us_healthcare_capacity-facility-CovidCareMap.csv: Data in CSV format.
External datasets used in the project will be documented here. This includes data committed to
the repository in the
data/external folder, data downloaded by the
Download_Data.ipynb notebook, and data
dynamically fetched from updating endpoints, normally exposed through the
covidcaremap python package.
Note that some data will be committed to the repository, while others are too big and need to be downloaded by running the Download_Data.ipynb notebook. You must download all the project data to ensure all notebooks run.
Health System Data
Healthcare Cost Report Information System (HCRIS) Data
CMS Healthcare Cost Report Information System (HCRIS): https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/Hospital-2010-form
This include hospital facility information and facility cost reports.
Note Not committed to repository. See Downloading Data
- HCRIS-HOSPITAL10_PROVIDER_ID_INFO.CSV: Facility level IDs, names and addresses.
- HCRIS-HCRIS_DataDictionary.csv: Data dictionary with report column mappings.
- HCRIS-hosp10_2018_RPT.CSV: HCRIS report data for 2018.
- HCRIS-hosp10_2018_NMRC.CSV: Numeric column information for the HCRIS report data.
HCRIS data from 2017 and 2018 from Jacob Fenton project
We used data from this project to override bed counts and occupancy rates where there is data, otherwise we use our 2018 HCRIS calculations.
- hospital_data_jsfenfen20200406.csv: HCRIS data from Jacob Fenton's project.
Institute for Health Metric and Evaluation, University of Washington (IHME) Forecasts.
IHME COVID-19 health service utilization forecasting team. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator days and deaths by US state in the next 4 months. MedRxiv. 26 March 2020.
Harvard Global Health Institute (HGHI) Data
This data was collected from the Harvard Global Health Institute (HGHI) study described here: https://globalepidemics.org/2020-03-17-caring-for-covid-19-patients/
- HGHI - Hospital Capacity by State.csv: Exported from https://docs.google.com/spreadsheets/d/1XUVyZF3X_4m72ztFnXZFvDKn5Yys1aKgu2Zmefd7wVo/edit?usp=sharing. This data is the 60% Population estimate.
- HGHI - HRR Scorecard - 60% Population.csv - Exported from https://docs.google.com/spreadsheets/d/1xAyBFTrlxSsTKQS7IDyr_Ah4JLBYj6_HX6ijKdm4fAY/edit?usp=sharing
- HGHI - HRR Scorecard - 40% Population.csv - Exported from https://docs.google.com/spreadsheets/d/1xAyBFTrlxSsTKQS7IDyr_Ah4JLBYj6_HX6ijKdm4fAY/edit?usp=sharing
- HGHI - HRR Scorecard - 20% Population.csv - Exported from https://docs.google.com/spreadsheets/d/1xAyBFTrlxSsTKQS7IDyr_Ah4JLBYj6_HX6ijKdm4fAY/edit?usp=sharing
Definitive Health (DH) Data
Data generated by Definitive Health, who is opening up their data for COVID-19 relief (thank you!!!). Definitive Health acquired Billian's HealthDATA in 2016; Billian's is the database referenced in the paper Assessing the capacity of the healthcare system to use additional mechanical ventilators during a large-scale public health emergency which we are hoping to use to model capacity under different capacity levels (Conventional, Contingency, Crisis).
- dh_facility_data.geojson: Source US Facility data. This data was converted from a Shapefile to GeoJSON via
Homeland Infrastructure Foundation-Level Data (HIFLD)
This feature class/shapefile contains locations of Hospitals for 50 US states, Washington D.C., US territories of Puerto Rico, Guam, American Samoa, Northern Mariana Islands, Palau, and Virgin Islands. The dataset only includes hospital facilities based on data acquired from various state departments or federal sources which has been referenced in the SOURCE field. Hospital facilities which do not occur in these sources will be not present in the database. The source data was available in a variety of formats (pdfs, tables, webpages, etc.) which was cleaned and geocoded and then converted into a spatial database. The database does not contain nursing homes or health centers. Hospitals have been categorized into children, chronic disease, critical access, general acute care, long term care, military, psychiatric, rehabilitation, special, and women based on the range of the available values from the various sources after removing similarities. In this update the TRAUMA field was populated for 172 additional hospitals and helipad presence were verified for all hospitals.
- hifld-hospitals.csv: Source HIFLD facility data
Rubinson, L., Vaughn, F., Nelson, S., Giordano, S., Kallstrom, T., Buckley, T., . . . Branson, R. (2010). Mechanical Ventilators in US Acute Care Hospitals. Disaster Medicine and Public Health Preparedness, 4(3), 199-206. doi:10.1001/dmp.2010.18
From spreadsheet constructed by Dave Luo: https://docs.google.com/spreadsheets/d/1IDeFJJ1Kq5fXAp5vR_Fqp1jtf_4qjqqfeha5BsKUGe8/edit#gid=891030621
- ventilators_by_state.csv: Ventilators by state. From spreadsheet constructed by Dave Luo: https://docs.google.com/spreadsheets/d/1IDeFJJ1Kq5fXAp5vR_Fqp1jtf_4qjqqfeha5BsKUGe8/edit#gid=891030621
CovidCareMap.org US Healthcare System Capacity - Manual Override
This is a file that allows CovidCareMap.org members to manually override facility information. It has the same layout as the CovidCareMap.org US Healthcare Sysmtem Capacity facility data, with two new columns -
Manual Override Reason and
Manual Override New Data Source. These columns describe the reason for manuall overriding, and the source where the new data is coming from, respectively.
- covidcaremap-ushcsc-facility-manual-override.csv: The CSV container rows that are facility-level data to be used in the CCM-USHCSC facility data generation.
Kaiser Family Foundation data
Hospital Beds per 1,000 Population by Ownership Type
- kff_hospital_beds_per_capita_by_state.csv: State-level hospital bed data.
- us_counties.geojson: From https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_20m.json
- us_states.geojson: From https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_20m.json
Hospital Referral Region (HRR) Boundaries
Hospital Referral Regions (HRRs) represent regional health care markets for tertiary medical care.
Zip Code Convex hulls
Generated by Simon Kassel. TODO: Describe source, generate with notebook.
- us_zip_codes-convex_hulls.geojson: The zip code geojson file for the whole US was prohibitively large so we reduced the size by simplifying the polygons into their convex hulls. This dramatically reduced the file size while keeping enough spatial information for the simple task of validating basic location.
- us_hrr.geojson: Downloaded from https://atlasdata.dartmouth.edu/downloads/geography/hrr_bdry.zip as Shapefile. Converted to GeoJSON via
US Census Data
For county level data: Latest census data for population demographics by us county. See https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-detail.html
Puerto Rico populations taken from Puerto Rico Commonwealth Population by Characteristics: 2010-2019
- us-census-cc-est2018-alldata.csv: County level data. Data Description. Downloaded from https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/cc-est2018-alldata.csv
- PEP_2018_PEPAGESEX_with_ann.csv: State level data including Puerto Rico. Downloaded from https://www2.census.gov/programs-surveys/popest/tables/2010-2018/state/asrh/PEP_2018_PEPAGESEX.zip
Note County level data is not committed to repository. See Downloading Data
WorldPop data as utilized in zonal summary calculates for country and region geometries produced by @echeipesh via https://github.com/echeipesh/geotrellis-worldpop to create 2020 population estimates.
- worldpop-region-pop-for-ihme-2020.csv: Populations for regions in IHME projections.
- worldpop-country-pop-for-ihme-2020.csv: Populations for countries in IHME projections.
Covid19 confirmed cases
USAFacts county-level COVID-19 data, see https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/.
This data is fetched dynamically via methods in the
covidcaremap.cases python package.
NY Times provides county-level COVID-19 data, see
https://github.com/nytimes/covid-19-data. This data is fetched
dynamically via methods in the
covidcaremap.cases python package.
Processed Datasets are ones which CovidCareMap.org processing and analytics produce and commit to the repository, but are not as well documented or verified as the published datasets as they are generall intermediary output.
Note: The below list is not complete.
Produce by the notebook Process HCRIS Data.ipynb.