Skip to content
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
Cannot retrieve contributors at this time

Data Catalog

A catalog of all data used in the project.

└── data
    ├──          <- This file.
    ├── external           <- Intermediate data that has been transformed.
    ├── published          <- The final, canonical data sets for COVID Care Map.
    ├── processed          <- Folder containing intermediate processing data.
    └── local              <- Folder containing intermediate data that is not comitted to the repository.

Published Datasets

File Description
us_healthcare_capacity-facility-CovidCareMap.geojson Capacity information for US Health Facilities in GeoJSON format.
us_healthcare_capacity-facility-CovidCareMap.csv Capacity information for US Health Facilities in CSV format.
us_healthcare_capacity-county-CovidCareMap.geojson Aggregated facility capacity information by County in GeoJSON format.
us_healthcare_capacity-county-CovidCareMap.csv Aggregated facility capacity information by County in CSV format.
us_healthcare_capacity-state-CovidCareMap.geojson Aggregated facility capacity information by State in GeoJSON format.
us_healthcare_capacity-state-CovidCareMap.csv Aggregated facility capacity information by State in CSV format.
us_healthcare_capacity-hrr-CovidCareMap.geojson Aggregated facility capacity information by Healthcare Referral Region (HRR) in GeoJSON format.
us_healthcare_capacity-hrr-CovidCareMap.csv Aggregated facility capacity information by Healthcare Referral Region (HRR) in CSV format.

The 'published' directory contains datasets published by COVID Care Map. These are data that have been aggregated from various sources, analyzed, processed, inspected for validity and written to common data formats for easy consumption.

IMPORTANT NOTE: This data may be updated. If you want to pull directly from links where the data does not change, see the note in the main README about Using Tags

CovidCareMap US Healthcare System Capacity data

This data aggregates information about the healthcare system capacity. It sources data from the Healthcare Cost Report Information System (HCRIS) and an open hospital facilities dataset by Definitive Healthcare.

CovidCareMap Capacity Data Dictionary

Common Fields

These fields are across all facility and regional datasets:

  • Staffed All Beds - Number of hospital beds of all types typically set up and staffed for inpatient care as reported/estimated in selected facility or area

  • Staffed ICU Beds - Number of ICU beds typically set up and staffed for intensive inpatient care as reported/estimated in selected facility or area

  • Licensed All Beds - Number of hospital beds of all types licensed for potential use in selected facility or area

  • All Bed Occupancy Rate - % of hospital beds of all types typically occupied by patients in selected facility or area

  • ICU Bed Occupancy Rate - % of ICU beds typically occupied by patients in selected facility or area

Facility Fields

In addition to the above fields, facility data has the following:

  • Name: Name of the facility, same as Definitive Healthcare data.

  • Hospital Type: Hospital Type from Definititve Healthcare data. See Hospital Types

  • Address, Address_2, City, State, Zipcode, County, Latitude, Longitude: Location information from the Definitive Healthcare data.

  • CCM_ID - Unique identifier for the facility. Matches the Definitive Healtchare ID until new facilities are added or other datasets are brought in.

  • DH-OBJECTID - The OBJECTID in the Definitive Healthcare dataset for this facility.

  • HCRIS-Provider Number - The Provider Number from the HCRIS reports (also matches the PROVIDER_NUMBER field in the facility information).

Source information: In the facility dataset there is also a set of columns suffixed with '- SOURCE', which describes the source of the value for the corresponding column. The values are prefixed with 'HCRIS', in which case the number comes from HCRIS data, or 'DH', in which it's Definitive Health data. It also has the column name from that dataset so you can trace everything back to the source datasets. The HCRIS columns are created from the data processing steps, which are traceable back to the origin HCRIS file data value through the processing notebook workflow. The DH data columns match directly to the DH external dataset.

Regional Fields

Per Capita Information: There are additional per-capita fields in the regional datasets:

  • Population - Population of this region, sourced by the US Census Bureau 2018 county population estimates.

  • Population (20+) - Population of people aged 20 years or older.

  • Population (65+) - Population of people aged 65 years or older.

  • Staffed All Beds [Per 1000 People], Staffed All Beds [Per 1000 Adults (20+)], Staffed All Beds [Per 1000 Elderly (65+)], etc. - The Staffed All Beds, Staffed ICU Beds, and Licensed All Beds fields per capita of the population described.

County-level fields

The count dataset includes a fips_code, a unique identifier for counties.

  • fips_code - The FIPS county code for the given county.

State-level fields

Ventilator Information: There is additional information about estimated mechanical ventilators for state-level data which is from a 2010 study that uses survey data of hospitals conducted in 2009 and US Census population estimates from 2008:

  • Estimated No. Full-Featured Mechanical Ventilators
  • Estimated No. Full-Featured Mechanical Ventilators per 100,000 Population
  • Estimated No. Pediatrics-Capable Full-Feature Mechanical Ventilators
  • Estimated No. Full-Feature Mechanical Ventilators, Pediatrics Capable per 100,000 Population <14
Hospital Types

This information directly from the ESRI page for the Definitive Healthcare dataset

  • Short Term Acute Care Hospital (STAC)
    • Provides inpatient care and other services for surgery, acute medical conditions, or injuries
    • Patients care can be provided overnight, and average length of stay is less than 25 days
  • Critical Access Hospital (CAH)
    • 25 or fewer acute care inpatient beds
    • Located more than 35 miles from another hospital
    • Annual average length of stay is 96 hours or less for acute care patients
    • Must provide 24/7 emergency care services
    • Designation by CMS to reduce financial vulnerability of rural hospitals and improve access to healthcare
  • Religious Non-Medical Health Care Institutions
    • Provide nonmedical health care items and services to people who need hospital or skilled nursing facility care, but for whom that care would be inconsistent with their religious beliefs
  • Long Term Acute Care Hospitals
    • Average length of stay is more than 25 days
    • Patients are receiving acute care - services often include respiratory therapy, head trauma treatment, and pain management
  • Rehabilitation Hospitals
    • Specializes in improving or restoring patients' functional abilities through therapies
  • Children’s Hospitals
    • Majority of inpatients under 18 years old
  • Psychiatric Hospitals
    • Provides inpatient services for diagnosis and treatment of mental illness 24/7
    • Under the supervision of a physician
  • Veteran's Affairs (VA) Hospital
    • Responsible for the care of war veterans and other retired military personnel
    • Administered by the U.S. VA, and funded by the federal government
  • Department of Defense (DoD) Hospital

US Healthcare Capacity by Facility


External Datasets

External datasets used in the project will be documented here. This includes data committed to the repository in the data/external folder, data downloaded by the Download_Data.ipynb notebook, and data dynamically fetched from updating endpoints, normally exposed through the covidcaremap python package.

Downloading Data

Note that some data will be committed to the repository, while others are too big and need to be downloaded by running the Download_Data.ipynb notebook. You must download all the project data to ensure all notebooks run.

Health System Data

Healthcare Cost Report Information System (HCRIS) Data

CMS Healthcare Cost Report Information System (HCRIS):

This include hospital facility information and facility cost reports.


Note Not committed to repository. See Downloading Data

  • HCRIS-HOSPITAL10_PROVIDER_ID_INFO.CSV: Facility level IDs, names and addresses.
  • HCRIS-HCRIS_DataDictionary.csv: Data dictionary with report column mappings.
  • HCRIS-hosp10_2018_RPT.CSV: HCRIS report data for 2018.
  • HCRIS-hosp10_2018_NMRC.CSV: Numeric column information for the HCRIS report data.

HCRIS data from 2017 and 2018 from Jacob Fenton project

We used data from this project to override bed counts and occupancy rates where there is data, otherwise we use our 2018 HCRIS calculations.

  • hospital_data_jsfenfen20200406.csv: HCRIS data from Jacob Fenton's project.

Institute for Health Metric and Evaluation, University of Washington (IHME) Forecasts.

IHME COVID-19 health service utilization forecasting team. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator days and deaths by US state in the next 4 months. MedRxiv. 26 March 2020.

Harvard Global Health Institute (HGHI) Data

This data was collected from the Harvard Global Health Institute (HGHI) study described here:


Definitive Health (DH) Data

Data generated by Definitive Health, who is opening up their data for COVID-19 relief (thank you!!!). Definitive Health acquired Billian's HealthDATA in 2016; Billian's is the database referenced in the paper Assessing the capacity of the healthcare system to use additional mechanical ventilators during a large-scale public health emergency which we are hoping to use to model capacity under different capacity levels (Conventional, Contingency, Crisis).

  • dh_facility_data.geojson: Source US Facility data. This data was converted from a Shapefile to GeoJSON via ogr2ogr

Homeland Infrastructure Foundation-Level Data (HIFLD)


This feature class/shapefile contains locations of Hospitals for 50 US states, Washington D.C., US territories of Puerto Rico, Guam, American Samoa, Northern Mariana Islands, Palau, and Virgin Islands. The dataset only includes hospital facilities based on data acquired from various state departments or federal sources which has been referenced in the SOURCE field. Hospital facilities which do not occur in these sources will be not present in the database. The source data was available in a variety of formats (pdfs, tables, webpages, etc.) which was cleaned and geocoded and then converted into a spatial database. The database does not contain nursing homes or health centers. Hospitals have been categorized into children, chronic disease, critical access, general acute care, long term care, military, psychiatric, rehabilitation, special, and women based on the range of the available values from the various sources after removing similarities. In this update the TRAUMA field was populated for 172 additional hospitals and helipad presence were verified for all hospitals.

  • hifld-hospitals.csv: Source HIFLD facility data

Ventilator Data


Rubinson, L., Vaughn, F., Nelson, S., Giordano, S., Kallstrom, T., Buckley, T., . . . Branson, R. (2010). Mechanical Ventilators in US Acute Care Hospitals. Disaster Medicine and Public Health Preparedness, 4(3), 199-206. doi:10.1001/dmp.2010.18

From spreadsheet constructed by Dave Luo:

Files US Healthcare System Capacity - Manual Override

This is a file that allows members to manually override facility information. It has the same layout as the US Healthcare Sysmtem Capacity facility data, with two new columns - Manual Override Reason and Manual Override New Data Source. These columns describe the reason for manuall overriding, and the source where the new data is coming from, respectively.

  • covidcaremap-ushcsc-facility-manual-override.csv: The CSV container rows that are facility-level data to be used in the CCM-USHCSC facility data generation.

Kaiser Family Foundation data

Hospital Beds per 1,000 Population by Ownership Type

Downloaded from,%22sort%22:%22asc%22%7D

  • kff_hospital_beds_per_capita_by_state.csv: State-level hospital bed data.

Geospatial Information

County Boundaries



State Boundaries



Hospital Referral Region (HRR) Boundaries

Hospital Referral Regions (HRRs) represent regional health care markets for tertiary medical care.


Zip Code Convex hulls

Generated by Simon Kassel. TODO: Describe source, generate with notebook.

  • us_zip_codes-convex_hulls.geojson: The zip code geojson file for the whole US was prohibitively large so we reduced the size by simplifying the polygons into their convex hulls. This dramatically reduced the file size while keeping enough spatial information for the simple task of validating basic location.

Population Data

US Census Data

For county level data: Latest census data for population demographics by us county. See

Puerto Rico populations taken from Puerto Rico Commonwealth Population by Characteristics: 2010-2019


Note County level data is not committed to repository. See Downloading Data


WorldPop data as utilized in zonal summary calculates for country and region geometries produced by @echeipesh via to create 2020 population estimates.

  • worldpop-region-pop-for-ihme-2020.csv: Populations for regions in IHME projections.
  • worldpop-country-pop-for-ihme-2020.csv: Populations for countries in IHME projections.

Covid19 confirmed cases

USAFacts county-level COVID-19 data, see This data is fetched dynamically via methods in the covidcaremap.cases python package.

NY Times provides county-level COVID-19 data, see This data is fetched dynamically via methods in the covidcaremap.cases python package.

Processed Datasets

Processed Datasets are ones which processing and analytics produce and commit to the repository, but are not as well documented or verified as the published datasets as they are generall intermediary output.

Note: The below list is not complete.

HCRIS data


Produce by the notebook Process HCRIS Data.ipynb.