# Generate CovidCareMap Regional Data

This rolls up count values from the facility data based on three regions: county, state, and Hospital Referral Region (HRR).

Most of the work is done in the `covidcaremap.geo` package, in the `sum_per_region` method. See that code for specifics.

## Methods

- Take the facility data, spatially join to regional data, and sum count properties for each region. See note about the calculation of occupancy rates.
- Based on population counts, create "per 1000" versions of each column for each of total population, adult population, and elderly population.
- Save the three aggregated files as GeoJSONs and CSVs.

### Notes on aggregation of occupancy rates

Occupancy rates are a weighted average based on the the number of beds (or icu beds for ICU Occupancy) contributing to the total amount of beds for that aggregation.

If the occupancy rate is NaN, then no beds are contributed to the amount of beds used to weight the aggregation for that facility.

So the occupancy rate $O$ is calculated as 
$$O=\frac{\sum_{f\in F}b_{f}o_{f}}{\sum_{f\in F}b_{f}}$$
where $F$ is the set of facilities that have a non-NaN value for occupancy, $o_{f}$ is the occupancy rate for facility $f$, and $b_{f}$ is the bed count for facility $f$.

In some cases HCRIS data reports an occupancy rate that is greater than 1. This is left in the facility-level data as source data error. Any occupancy rate greater than 1 is considered 1 for this calculation.

In [None]:
import geopandas as gpd
import pandas as pd

from covidcaremap.data import (read_facility_gdf, 
                               read_us_hrr_gdf,
                               read_us_states_gdf,
                               read_us_counties_gdf,
                               external_data_path,
                               published_data_path)
from covidcaremap.geo import sum_per_region

In [None]:
facility_gdf = read_facility_gdf()

## By HRR

In [None]:
hrr_fname = 'us_healthcare_capacity-hrr-CovidCareMap'
hrr_geojson_path = published_data_path('{}.geojson'.format(hrr_fname))
hrr_csv_path = published_data_path('{}.csv'.format(hrr_fname))

hrr_gdf = read_us_hrr_gdf()
hrr_gdf = hrr_gdf.drop(columns=['HRR_BDRY_I', 'HRRNUM'])

hosp_hrr_gdf = sum_per_region(facility_gdf,
                          hrr_gdf,
                          groupby_columns=['HRRCITY'],
                          region_id_column='HRRCITY')

hosp_hrr_gdf.to_file(hrr_geojson_path, driver='GeoJSON')

hosp_hrr_df = hosp_hrr_gdf.drop(columns=['geometry']).sort_values(by='HRRCITY')
hosp_hrr_df.to_csv(hrr_csv_path, index=False)

## By County

In [None]:
county_fname = 'us_healthcare_capacity-county-CovidCareMap'
county_geojson_path = published_data_path('{}.geojson'.format(county_fname))
county_csv_path = published_data_path('{}.csv'.format(county_fname))

county_gdf = read_us_counties_gdf().rename(columns={ 'COUNTY_FIPS': 'fips_code'})
filtered_county_gdf = county_gdf[['GEO_ID', 
                                  'geometry',
                                  'Population',
                                  'Population (20+)',
                                  'Population (65+)']]

hosp_county_gdf = sum_per_region(facility_gdf,
                          filtered_county_gdf,
                          groupby_columns=['GEO_ID'],
                          region_id_column='GEO_ID')

merged_county_gdf = county_gdf[['GEO_ID', 'fips_code', 'State', 'County Name']] \
    .merge(hosp_county_gdf, on='GEO_ID') \
    .drop(columns=['GEO_ID'])

hosp_county_gdf = gpd.GeoDataFrame(merged_county_gdf, crs='EPSG:4326')

hosp_county_gdf.to_file(county_geojson_path, driver='GeoJSON')

hosp_county_df = hosp_county_gdf.drop(columns=['geometry']).sort_values(by=['State',
                                                                            'County Name'])
hosp_county_df.to_csv(county_csv_path, index=False)

## By State

In [None]:
state_fname = 'us_healthcare_capacity-state-CovidCareMap'
state_geojson_path = published_data_path('{}.geojson'.format(state_fname))
state_csv_path = published_data_path('{}.csv'.format(state_fname))

state_gdf = read_us_states_gdf()
filtered_state_gdf = state_gdf[['State', 
                                'geometry',
                                'Population',
                                'Population (20+)',
                                'Population (65+)']]

facility_without_state_gdf = facility_gdf.drop(columns=['State'])

hosp_state_gdf = sum_per_region(facility_without_state_gdf,
                          filtered_state_gdf,
                          groupby_columns=['State'],
                          region_id_column='State')

hosp_state_gdf = gpd.GeoDataFrame(
    state_gdf[['State', 'State Name']].merge(hosp_state_gdf, on='State'),
    crs='EPSG:4326'
)

### Merge ventilator data into state data

In [None]:
vents_path = external_data_path('ventilators_by_state.csv')
vents_df = pd.read_csv(vents_path, encoding='utf-8')
vents_df = vents_df.drop(columns=['Location']).rename(columns={'State Abbrv': 'State'})

# Rename columns to be explicit that this is older estimate data.
vent_renames = {
    'Estimated No. Full-Featured Mechanical Ventilators': (
        'Estimated No. Full-Featured Mechanical Ventilators (2010 study estimate)'        
    ),
    'Estimated No. Full-Featured Mechanical Ventilators per 100,000 Population': (
        'Estimated No. Full-Featured Mechanical Ventilators per 100,000 Population (2010 study estimate)'
    ),
    'Estimated No. Pediatrics-Capable Full-Feature Mechanical Ventilators': (
        'Estimated No. Pediatrics-Capable Full-Feature Mechanical Ventilators (2010 study estimate)'
    ),
    'Estimated No. Full-Feature Mechanical Ventilators, Pediatrics Capable per 100,000 Population <14 y': (
        'Estimated No. Full-Feature Mechanical Ventilators, Pediatrics Capable per 100,000 Population <14 y (2010 study estimate)'
    )
}

for column in vent_renames:
    assert column in vents_df

vents_df = vents_df.rename(columns=vent_renames)

In [None]:
hosp_state_gdf = hosp_state_gdf.merge(vents_df, on='State')

In [None]:
hosp_state_gdf.to_file(state_geojson_path, driver='GeoJSON')

hosp_state_df = hosp_state_gdf.drop(columns=['geometry']).sort_values(by='State')
hosp_state_df.to_csv(state_csv_path, index=False)