In [44]:
import os
import json
from copy import copy

import pandas as pd
import numpy as np
import geopandas as gpd

from covidcaremap.data import external_data_path, processed_data_path

# Get total & ICU staffed bed counts for every acute hospital facility in USA

Following methodology from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5514420/

> Study Design and Data Sources

> We performed a repeated-measures time series analysis of US ICU bed supply during the 16-year period between 1996 and 2011. We obtained data on hospital characteristics and intensive care occupancy from the Centers for Medicare and Medicaid Services Hospital Cost Report Information System (HCRIS), a publicly available hospital-level database with detailed information on structural, organizational and cost data for all US hospitals. We excluded skilled nursing facilities, long term acute care hospitals, hospitals located in US territories and stand-alone pediatric hospitals (1, 2). We augmented the HCRIS data with data from the US Census Bureau’s 2010 urban-rural classification file which we used to designate hospitals as urban or rural by ZIP code (5).

> Variables

> The primary dependent variable was each hospital’s number of ICU beds compared to the previous year. We defined total ICU beds using the summed counts of four HCRIS bed categories that were available throughout the study interval: intensive care beds, surgical intensive care beds, cardiac intensive care beds and burn intensive care beds 

## Useful References & Links:

CMS Healthcare Cost Report Information System (HCRIS):
https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/Hospital-2010-form

Hospital Facilities: 
- http://downloads.cms.gov/files/hcris/hosp10-reports.zip

Public Use File (annual, from 2015): 
- https://data.cms.gov/api/views/absp-nd3x/files/e0ca9126-8fd6-42ca-82bf-c2fe40bd4c0e?download=true&filename=CostReport_Documentation_2015_Final_Oct2019.xlsx
- https://www.cms.gov/files/zip/hospital-cost-report-public-use-file-2015.zip

![](https://www.resdac.org/sites/resdac.umn.edu/files/kb-images/Figure%204_2.png)

Direct link to 2018 reporting from hospitals: http://downloads.cms.gov/Files/hcris/HOSP10FY2018.zip


## Specific guidance for what we're doing here:
from CMS data research guide: https://www.resdac.org/articles/medicare-cost-report-data-structure

![alt text](https://www.resdac.org/sites/resdac.umn.edu/files/kb-images/Figure%205.PNG)

> In summary, the number of beds will be located in the numeric file. 

> To identify the number of beds for every report in numeric file, filter the records where the second column (Worksheet Indicator) is “S300001,” the third column (Line Number) is “01400”, and the fourth column (Column Number) is “00200”. 

> To identify the number of beds for a specific report submitted by a specific facility, filter the records by the “Record Report Number,” which is reported in Column 1. The Report Record Number for a specific facility can be found in the Report data file.

## Official Worksheet Definitions

from Provider Reimbursement Manual (https://www.cms.gov/Regulations-and-Guidance/Guidance/Manuals/Paper-Based-Manuals-Items/CMS021935):

Column 2--Refer to 42 CFR 412.105(b) and 69 FR 49093-49098 (August 11, 2004) to determine the facility bed count. Indicate the number of beds available for use by patients at the end of the cost reporting period.

A bed means an adult bed, pediatric bed, portion of inpatient labor/delivery/postpartum (LDP) room (also referred to as birthing room) bed when used for services other than labor and delivery, or newborn ICU bed (excluding newborn bassinets) maintained in a patient care area for lodging patients in acute, long term, or domiciliary areas of the hospital. Beds in post-anesthesia, post- operative recovery rooms, outpatient areas, emergency rooms, ancillary departments (however, see exception for labor and delivery department), nurses' and other staff residences, and other such areas that are regularly maintained and utilized for only a portion of the stay of patients (primarily for special procedures or not for inpatient lodging) are not termed a bed for these purposes. (See CMS Pub. 15-1, chapter 22, §2205.)

For cost reporting periods beginning prior to October 1, 2012, beds in distinct ancillary labor and delivery rooms and the proportion of LDP room (birthing room) beds used for labor and delivery services are not a bed for these purposes. (See 68 FR 45420 (August 1, 2003).)

For cost reporting periods beginning on or after October 1, 2012, in accordance with 77 FR 53411- 53413 (August 31, 2012), beds in distinct labor and delivery rooms, when occupied by an inpatient receiving IPPS-level acute care hospital services or when unoccupied, are considered to be part of a hospital’s inpatient available bed count in accordance with 42 CFR 412.105(b) and are to be reported on line 32. Furthermore, the proportion of the inpatient LDP room (birthing room) beds used for ancillary labor and delivery services is considered part of the hospital’s available bed count.

Column 8--Enter the number of inpatient days for all classes of patients for each component. Include organ acquisition and HMO days in this column. This amount will not equal the sum of columns 5 through 7, when the provider renders services to other than titles V, XVIII, or XIX patients.

Line 1--For cost reporting periods beginning before October 1, 2012, exclude from column 2 the portion of LDP room (birthing room) beds used for ancillary labor and delivery services, but include on this line beds used for routine adult and pediatric services (postpartum). In accordance with the instructions in 68 FR 45420 (August 1, 2003), compute this proportion (off the cost report) by multiplying the total number of occupied and unoccupied available beds in the LDP room by the percentage of time these beds were used for ancillary labor and delivery services. An example of how to calculate the “percentage of time” would be for a hospital to determine the number of hours for the cost reporting period during which each LDP room maternity patient received labor and delivery services and divide the sum of those hours for all such patients by the sum of the total hours (for both, ancillary labor and delivery services and for routine postpartum services) that all maternity patients spent in the LDP room during that cost reporting period. Alternatively, a hospital could calculate an average percentage of time maternity patients received ancillary labor and delivery services in an LDP room during a typical month.
For cost reporting periods beginning on or after October 1, 2012, include all the available LDP room (birthing room) beds in the available bed count in column 2. (See 77 FR 53411-53413 (August 31, 2012).) The proportion of available LDP room beds related to the ancillary labor and delivery services must not be excluded from column 2 for those cost reporting periods.
In columns 5, 6, 7 and 8, enter the number of adult and pediatric hospital days excluding the SNF and NF swing-bed, observation bed, and hospice days. In columns 6 and 7, also exclude HMO days. Do not include in column 6 Medicare Secondary Payer/Lesser of Reasonable Cost (MSP/LCC) days. Include these days only in column 8. However, do not include employee discount days in column 8.

Line 7--Enter the sum of lines 1, 5, and 6.
Lines 8 through 13--Enter the appropriate statistic applicable to each discipline for all programs.
Line 14--Enter the sum of lines 7 through 13 for columns 2 through 8, and for columns 12 through 15, enter the amount from line 1. For columns 9 through 11, enter the total for each from your records. Labor and delivery days (as defined in the instructions for Worksheet S-3, Part I, line 32) must not be included on this line.




## Methods

This notebook gathers the HCRIS information based on the following inputs:

- The hospital data at http://downloads.cms.gov/files/hcris/hosp10-reports.zip 
  - HOSPITAL10_PROVIDER_ID_INFO.CSV (`HOSPITAL10`): Provides facility level IDs, names and addresses
- The 2018 reporting data from http://downloads.cms.gov/Files/hcris/HOSP10FY2018.zip
  - hosp10_2018_RPT.CSV (`HOSP10_RPT`): Provides facility report information.
  - hosp10_2018_NMRC.CSV (`HOSP10_NMRC`): Contains the numeric column values that are linked back to the report data from above.
- A data dictionary at https://www.cms.gov/files/zip/hospital2010-documentation.zip
  - HCRIS_DataDictionary.csv: provides report column codes with titles.

It then takes the following steps:

- Join the `HOSPITAL10` data and `HOSP10_RPT` to get all information about a facility per report.
- Filter the `HOSP10_NMRC` to only those line numbers and columns we care about
  - Filter `Line Number` to keep information about Staffed Beds, Staffed Bed Days, and Inpatient Days AND
  - Filter `Column Number` to keep information concerning the numeric counts related to: 
    - Hospital Adult and Peds
    - Intensive Care Unit 
    - Coronary Care Unit
    - Burn ICU
    - Surgical ICU 
    - Total
- Join the filtered `HOSP10_NMRC` data to the previously joined `HOSPITAL10` and `HOSP10_RPT` data.
- Aggregate counts for 'Intensive Care Unit', 'Coronary Care Unit', 'Burn ICU', 'Surgical ICU' into 'ICU Total Staffed Beds',  'ICU Total Bed Days Available', 'ICU Total Inpatient Days'.
- Calculate 'ICU Occupancy Rate' and 'Total Bed Occupancy Rate' using the ratio of Inpatient Days to Bed Day Available for ICU and over all totals. If the '* Bed Days' columns are not available, then compute them by using the number of staffed bed multiplied by the number of days between the report's Fiscal Year Begin Date and Fiscal Year End Date
- The resulting dataset will have information per Record per Facility. Some facility have multiple records. Choose the record with the 'Fiscal Year End Date' being the most recent.
- Any facilities that do not have report records are dropped.
- Drop problematic facilities, e.g. PARKVIEW MEDICAL CENTER which reports 2,290,239,239 total beds. 
- Join with the `usa_hospital_beds_hcris2018_geocoded.geojson` dataset based on 'Provider Number', generate a new GeoJSON file that contains all aggregated counts and other relevant HCRIS information with this joined data.


### Load the HCRIS Facility data

In [2]:
hosp_df = pd.read_csv(external_data_path('HCRIS-HOSPITAL10_PROVIDER_ID_INFO.CSV'))
hosp_df.head()

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County
0,10001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON
1,10005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL
2,10006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE
3,10007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON
4,10008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW


In [3]:
# provider num should be 6 char so need to zfill
hosp_df['PROVIDER_NUMBER'] = hosp_df['PROVIDER_NUMBER'].apply(lambda x: str(x).zfill(6))

# Rename this column to match up with reports
hosp_df = hosp_df.rename(columns={'PROVIDER_NUMBER': 'Provider Number'})

In [4]:
# Show all providers in San Francisco county
hosp_df[hosp_df['County'] == 'SAN FRANCISCO']

Unnamed: 0,Provider Number,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County
399,50008,01-JAN-18,31-DEC-18,As Submitted,2,CPMC-R.K. DAVIES MEDICAL CENTER,601 DUBOCE AVE,,SAN FRANCISCO,CA,94117-3389,SAN FRANCISCO
420,50047,01-JAN-18,31-DEC-18,As Submitted,2,CALIFORNIA PACIFIC MEDICAL CENTER,2333 BUCHANAN ST,,SAN FRANCISCO,CA,94115-1925,SAN FRANCISCO
422,50055,01-JAN-18,31-DEC-18,As Submitted,2,CPMC - MISSION BERNAL CAMPUS,3555 CESAR CHAVEZ STREET,,SAN FRANCISCO,CA,94110-4403,SAN FRANCISCO
435,50076,01-JAN-18,31-DEC-18,As Submitted,2,KFH - SAN FRANCISCO,2425 GEARY BOULEVARD,,SAN FRANCISCO,CA,94115-,SAN FRANCISCO
482,50152,01-JUL-18,30-JUN-19,As Submitted,2,SAINT FRANCIS MEMORIAL HOSPITAL,900 HYDE STREET,,SAN FRANCISCO,CA,94109,SAN FRANCISCO
507,50228,01-JUL-18,30-JUN-19,As Submitted,8,ZUCKERBERG SAN FRANCISCO GENERAL,1001 POTRERO AVENUE,,SAN FRANCISCO,CA,94110-,SAN FRANCISCO
576,50407,01-JAN-18,31-DEC-18,As Submitted,2,CHINESE HOSPITAL,845 JACKSON STREET,,SAN FRANCISCO,CA,94133-,SAN FRANCISCO
590,50454,01-JUL-18,30-JUN-19,As Submitted,10,UCSF MEDICAL CENTER,505 PARNASSUS,,SAN FRANCISCO,CA,94143-0824,SAN FRANCISCO
592,50457,01-JUL-18,30-JUN-19,As Submitted,1,ST. MARYS MEDICAL CENTER,450 STANYAN STREET,,SAN FRANCISCO,CA,94117,SAN FRANCISCO
654,50668,01-JUL-18,30-JUN-19,As Submitted,8,LAGUNA HONDA HOSPITAL,375 LAGUNA HONDA BLVD,,SAN FRANCISCO,CA,94116-,SAN FRANCISCO


### Use the HCRIS data dictionary to generate data mappings

In [5]:
hcris_dict = pd.read_csv(external_data_path('HCRIS-HCRIS_DataDictionary.csv'))
hcris_dict.head()

Unnamed: 0,Column Code,TABLES,SUBSYSTEM,Null/Not Null,Title,Description,Valid Entries
0,ADR_VNDR_CD,RPT,ALL,,Automated Desk Review Vendor Code,Vendor for Fiscal Intermediary.,2 or A03 - E & Y ...
1,ALPHNMRC_ITM_TXT,ALPHA,ALL,NOT NULL,Alphanumeric Item Text,Provider reported alpha data.,Per Specification Table
2,CLMN_NUM,"ALPHA,NMRC",HOSP10,NOT NULL,Column Number,Valid Column Number defined as follows: xxxyy...,"Example: Column 1 = 00100, Column 1.01 = 00101"
3,CLMN_NUM,"ALPHA,NMRC",ALL BUT HOSP10,NOT NULL,Column Number,Valid Column Number defined as follows: xxyy ...,"Example: Column 1 = 0100, Column 1.01 = 0101"
4,FI_CREAT_DT,RPT,ALL,,Fiscal Intermediary Create Date,Date the FI created the HCRIS file.,MM/DD/YYYY


In [6]:
data_dict = {c:t for c,t in zip(hcris_dict['Column Code'],hcris_dict['Title'])}
data_dict

{'ADR_VNDR_CD': 'Automated Desk Review Vendor Code',
 'ALPHNMRC_ITM_TXT': 'Alphanumeric Item Text',
 'CLMN_NUM': 'Column Number',
 'FI_CREAT_DT': 'Fiscal Intermediary Create Date',
 'FI_NUM': 'Fiscal Intermediary Number',
 'FI_RCPT_DT': 'Fiscal Intermediary Receipt Date',
 'FY_BGN_DT': 'Fiscal Year Begin Date',
 'FY_END_DT': 'Fiscal Year End Date',
 'INITL_RPT_SW': 'Initial Report Switch',
 'ITM_VAL_NUM': 'Item Value Number',
 'LAST_RPT_SW': 'Last Report Switch',
 'LINE_NUM': 'Line Number',
 'NPR_DT': 'Notice of Program Reimbursement Date',
 'NPI': 'National Provider Identifier',
 'PROC_DT': 'Process Date',
 'PRVDR_CTRL_TYPE_CD': 'Provider Control Type Code',
 'PRVDR_NUM': 'Provider Number',
 'RPT_REC_NUM': 'Report Record Number',
 'RPT_STUS_CD': 'Report Status Code',
 'SPEC_IND': 'Special Indicator',
 'TRNSMTL_NUM': 'The current transmittal or version number in effect for each sub-system.',
 'UTIL_CD': 'Utilization Code',
 'LABEL': 'Rollup label',
 'ITEM': 'Rollup value',
 'WKSHT_CD':

In [7]:
# Report Table file columns
rpt_columns = [
               'RPT_REC_NUM',
               'PRVDR_CTRL_TYPE_CD',
               'PRVDR_NUM',
               'NPI',
               'RPT_STUS_CD',
               'FY_BGN_DT',
               'FY_END_DT',
               'PROC_DT',
               'INITL_RPT_SW',
               'LAST_RPT_SW',
               'TRNSMTL_NUM',
               'FI_NUM',
               'ADR_VNDR_CD',
               'FI_CREAT_DT',
               'UTIL_CD',
               'NPR_DT',
               'SPEC_IND',
               'FI_RCPT_DT'
]
[data_dict[col] for col in rpt_columns]

['Report Record Number',
 'Provider Control Type Code',
 'Provider Number',
 'National Provider Identifier',
 'Report Status Code',
 'Fiscal Year Begin Date',
 'Fiscal Year End Date',
 'Process Date',
 'Initial Report Switch',
 'Last Report Switch',
 'The current transmittal or version number in effect for each sub-system.',
 'Fiscal Intermediary Number',
 'Automated Desk Review Vendor Code',
 'Fiscal Intermediary Create Date',
 'Utilization Code',
 'Notice of Program Reimbursement Date',
 'Special Indicator',
 'Fiscal Intermediary Receipt Date']

In [8]:
# Numerical Table file columns
nmrc_columns = [
             'RPT_REC_NUM',
             'WKSHT_CD',
             'LINE_NUM',
             'CLMN_NUM',
             'ITM_VAL_NUM'
]
[data_dict[col] for col in nmrc_columns]

['Report Record Number',
 'Worksheet Identifier',
 'Line Number',
 'Column Number',
 'Item Value Number']

In [9]:
# Maps to 'Line Number' in the numeric report
beds_dict = {
    'Hospital Adult and Peds': '00100',
    'Intensive Care Unit': '00800',
    'Coronary Care Unit': '00900',
    'Burn ICU': '01000',
    'Surgical ICU': '01100',
    'Total': '01400'
}

# Maps to 'Column Number' in numeric report
value_count_dict = {
    'Staffed Beds': '00200',
    'Bed Days Available': '00300',
    'Inpatient Days': '00800'
}

icu_beds = ['Intensive Care Unit', 'Coronary Care Unit', 'Burn ICU', 'Surgical ICU']
icu_staffed_beds_columns = ['{} Staffed Beds'.format(x) for x in icu_beds]
icu_bed_days_columns = ['{} Bed Days Available'.format(x) for x in icu_beds]
icu_inpatient_days_columns = ['{} Inpatient Days'.format(x) for x in icu_beds]

all_count_columns = [
    'ICU Total Staffed Beds', 
    'ICU Total Bed Days Available',
    'ICU Total Inpatient Days',
    'ICU Occupancy Rate',
    'Total Bed Occupancy Rate'
]

for bed_desc in beds_dict:
    for value_desc in value_count_dict:
        column_name = '{} {}'.format(bed_desc, value_desc)
        all_count_columns.append(column_name)

In [10]:
beds_dict_flip = {v:k for k,v in beds_dict.items()}
bedtype_list = list(beds_dict_flip.keys())

value_count_dict_flip = {v:k for k,v in value_count_dict.items()}
value_count_list = list(value_count_dict_flip.keys())

### Load the HCRIS report file

In [11]:
hosp10_rpt_df = pd.read_csv(external_data_path('HCRIS-hosp10_2018_RPT.CSV'), 
                            names=[data_dict[col] for col in rpt_columns], 
                            dtype={'Provider Number':object})
hosp10_nmrc_df = pd.read_csv(external_data_path('HCRIS-hosp10_2018_NMRC.CSV'),  
                             names=[data_dict[col] for col in nmrc_columns], 
                             dtype={'Line Number':object, 'Column Number':object})    

In [12]:
hosp10_rpt_df[hosp10_rpt_df['Provider Number'] == '010032']

Unnamed: 0,Report Record Number,Provider Control Type Code,Provider Number,National Provider Identifier,Report Status Code,Fiscal Year Begin Date,Fiscal Year End Date,Process Date,Initial Report Switch,Last Report Switch,The current transmittal or version number in effect for each sub-system.,Fiscal Intermediary Number,Automated Desk Review Vendor Code,Fiscal Intermediary Create Date,Utilization Code,Notice of Program Reimbursement Date,Special Indicator,Fiscal Intermediary Receipt Date
0,623132,9,10032,,1,10/01/2017,11/13/2017,04/26/2018,N,N,K,10001,4,04/19/2018,F,,,04/16/2018
48,638926,9,10032,,1,11/14/2017,06/30/2018,12/18/2018,N,N,L,10001,4,12/14/2018,F,,,11/29/2018
2921,651150,2,10032,,1,07/01/2018,01/08/2019,07/10/2019,N,N,M,10001,4,07/02/2019,F,,,06/07/2019


In [13]:
hosp_df[hosp_df['Provider Number'] == '010032']

Unnamed: 0,Provider Number,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County
20,10032,01-JUL-18,08-JAN-19,As Submitted,2,TANNER MEDICAL CENTER-EAST ALABAMA,1032 MAIN STREET SOUTH,,WEDOWEE,AL,36278,RANDOLPH


In [14]:
hosp10_nmrc_df[hosp10_nmrc_df['Report Record Number'] == 623132] 

Unnamed: 0,Report Record Number,Worksheet Identifier,Line Number,Column Number,Item Value Number
0,623132,A000000,00100,00200,33286.0
1,623132,A000000,00100,00300,33286.0
2,623132,A000000,00100,00500,33286.0
3,623132,A000000,00100,00700,33286.0
4,623132,A000000,00400,00200,162635.0
...,...,...,...,...,...
1223,623132,S300004,01700,00100,35071.0
1224,623132,S300004,02400,00100,119947.0
1225,623132,S300004,02500,00100,42688.0
1226,623132,S300005,00100,00200,119947.0


### Join the data and format into counts

#### Filter hosp10_2018_nmrc_df records to just what we care about and index by report number.

In [15]:
filter_condition = (hosp10_nmrc_df['Worksheet Identifier'] == 'S300001') & \
                   (hosp10_nmrc_df['Column Number'].isin(value_count_list)) & \
                   (hosp10_nmrc_df['Line Number'].isin(bedtype_list))
filtered_record_df = hosp10_nmrc_df[filter_condition].set_index('Report Record Number')
filtered_record_df.count()


Worksheet Identifier    44629
Line Number             44629
Column Number           44629
Item Value Number       44629
dtype: int64

#### Join filtered numeric records, HOSP10_RPT, and HOSPITAL10 data

In [16]:
hosp_and_rpt = hosp10_rpt_df.join(hosp_df.set_index('Provider Number'), 
                                 on='Provider Number')
hosp_and_rpt_and_records = hosp_and_rpt.join(filtered_record_df, on='Report Record Number')


In [17]:
hosp_and_rpt_and_records[['Provider Number', 'Process Date', 'Report Record Number', 'Line Number', 'Column Number', 'Item Value Number']].head()

Unnamed: 0,Provider Number,Process Date,Report Record Number,Line Number,Column Number,Item Value Number
0,10032,04/26/2018,623132,100,200,34.0
0,10032,04/26/2018,623132,100,300,1496.0
0,10032,04/26/2018,623132,100,800,63.0
0,10032,04/26/2018,623132,1400,200,34.0
0,10032,04/26/2018,623132,1400,300,1496.0


#### Group the data, creating columns that contain the count values we are interested in

This step creates columns that have a 0 value for all rows in the dataframe except those that match the Line Number and Column Number for the target counts; for these rows, the value will be the count value in 'Item Number Value'.
We then perform a "groupby" operation that will sum lines up per Report Record Number and Provider Number, so that we end up with a row per report that has all the counts of interest.

In [18]:
report_cells_to_be_grouped = hosp_and_rpt_and_records.copy()

In [19]:
count_columns = []
for bed_desc, bed_key in beds_dict.items():
    for value_desc, value_key in value_count_dict.items():
        column_name = '{} {}'.format(bed_desc, value_desc)
        count_columns.append(column_name)
        def column_mapper(row):            
            if row['Line Number'] == bed_key and row['Column Number'] == value_key:
                return row['Item Value Number']
            else:
                return 0
        report_cells_to_be_grouped[column_name] = report_cells_to_be_grouped.apply(column_mapper, axis=1)

In [20]:
report_cells_to_be_grouped

Unnamed: 0,Report Record Number,Provider Control Type Code,Provider Number,National Provider Identifier,Report Status Code,Fiscal Year Begin Date,Fiscal Year End Date,Process Date,Initial Report Switch,Last Report Switch,...,Coronary Care Unit Inpatient Days,Burn ICU Staffed Beds,Burn ICU Bed Days Available,Burn ICU Inpatient Days,Surgical ICU Staffed Beds,Surgical ICU Bed Days Available,Surgical ICU Inpatient Days,Total Staffed Beds,Total Bed Days Available,Total Inpatient Days
0,623132,9,010032,,1,10/01/2017,11/13/2017,04/26/2018,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0,623132,9,010032,,1,10/01/2017,11/13/2017,04/26/2018,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0,623132,9,010032,,1,10/01/2017,11/13/2017,04/26/2018,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0,623132,9,010032,,1,10/01/2017,11/13/2017,04/26/2018,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,34.0,0.0,0.0
0,623132,9,010032,,1,10/01/2017,11/13/2017,04/26/2018,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1496.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5703,662634,2,531310,,1,07/01/2018,06/30/2019,12/31/2019,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5703,662634,2,531310,,1,07/01/2018,06/30/2019,12/31/2019,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5703,662634,2,531310,,1,07/01/2018,06/30/2019,12/31/2019,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,25.0,0.0,0.0
5703,662634,2,531310,,1,07/01/2018,06/30/2019,12/31/2019,N,N,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9125.0,0.0


In [21]:
record_sum_columns = ['Provider Number', 'Report Record Number'] + count_columns
provider_record_sums = report_cells_to_be_grouped[record_sum_columns].groupby(
    ['Provider Number', 'Report Record Number']).sum()

# Merge in report dates to use in case bed days available is missing
provider_record_sums = provider_record_sums.reset_index().merge(
    report_cells_to_be_grouped[['Report Record Number', 
                                'Fiscal Year Begin Date',
                                'Fiscal Year End Date']], on='Report Record Number'
)

### Calculate ICU totals and Occupancy rates

In the case where Bed Days are 0, they are computed using the staffed bed counts and the duration of the reporting periods.

In [22]:
provider_record_sums['ICU Total Staffed Beds'] = provider_record_sums[icu_staffed_beds_columns].sum(axis=1)
provider_record_sums['ICU Total Bed Days Available'] = provider_record_sums[icu_bed_days_columns].sum(axis=1)
provider_record_sums['ICU Total Inpatient Days'] = provider_record_sums[icu_inpatient_days_columns].sum(axis=1)

provider_record_sums['beg_date'] = pd.to_datetime(provider_record_sums['Fiscal Year Begin Date'])
provider_record_sums['end_date'] = pd.to_datetime(provider_record_sums['Fiscal Year End Date'])
provider_record_sums['days'] = (provider_record_sums['end_date'] -
                                provider_record_sums['beg_date']).dt.days

provider_record_sums.loc[provider_record_sums['ICU Total Bed Days Available'] == 0,
                         'ICU Total Bed Days Available'] = \
    provider_record_sums['days'] * provider_record_sums['ICU Total Staffed Beds']
provider_record_sums.loc[provider_record_sums['Total Bed Days Available'] == 0,
                         'Total Bed Days Available'] = \
    provider_record_sums['days'] * provider_record_sums['Total Staffed Beds']

provider_record_sums = provider_record_sums.drop(columns=['Fiscal Year Begin Date',
                                       'Fiscal Year End Date',
                                       'beg_date',
                                       'end_date',
                                       'days'])

provider_record_sums['ICU Occupancy Rate'] = provider_record_sums['ICU Total Inpatient Days']/provider_record_sums['ICU Total Bed Days Available']
provider_record_sums.loc[provider_record_sums['ICU Total Bed Days Available'] == 0, 
                         'ICU Occupancy Rate'] = 0.0

provider_record_sums['Total Bed Occupancy Rate'] = provider_record_sums['Total Inpatient Days']/provider_record_sums['Total Bed Days Available']
provider_record_sums.loc[provider_record_sums['Total Bed Days Available'] == 0, 
                         'Total Bed Occupancy Rate'] = 0.0


### Join with full report data

In [23]:
provider_record_sums = provider_record_sums.set_index(['Provider Number', 'Report Record Number'])
full_df = hosp_and_rpt.set_index(['Provider Number', 'Report Record Number']).join(provider_record_sums)


### Filter to a single report by dropping all reports besides one with latest fiscal year end date

In [24]:
full_df = full_df.reset_index()
full_df['Fiscal Year End Date'] = pd.to_datetime(full_df['Fiscal Year End Date'])
full_df = full_df.sort_values('Fiscal Year End Date', ascending=False)

In [25]:
# check that duplicates are dropped by Fiscal Year End Date correctly
dup_provider_nums = full_df[full_df.duplicated('Provider Number')]['Provider Number'].values
full_df.loc[full_df['Provider Number'].isin(dup_provider_nums)].sort_values('Provider Number')

Unnamed: 0,Provider Number,Report Record Number,Provider Control Type Code,National Provider Identifier,Report Status Code,Fiscal Year Begin Date,Fiscal Year End Date,Process Date,Initial Report Switch,Last Report Switch,...,Surgical ICU Bed Days Available,Surgical ICU Inpatient Days,Total Staffed Beds,Total Bed Days Available,Total Inpatient Days,ICU Total Staffed Beds,ICU Total Bed Days Available,ICU Total Inpatient Days,ICU Occupancy Rate,Total Bed Occupancy Rate
3,010001,644080,9,,1,10/01/2017,2018-09-30,03/12/2019,N,N,...,0.0,0.0,327.0,119355.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
4,010001,644080,9,,1,10/01/2017,2018-09-30,03/12/2019,N,N,...,0.0,0.0,327.0,119355.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
5,010001,644080,9,,1,10/01/2017,2018-09-30,03/12/2019,N,N,...,0.0,0.0,327.0,119355.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
6,010001,644080,9,,1,10/01/2017,2018-09-30,03/12/2019,N,N,...,0.0,0.0,327.0,119355.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
2,010001,644080,9,,1,10/01/2017,2018-09-30,03/12/2019,N,N,...,0.0,0.0,327.0,119355.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
44672,673067,655869,5,,1,05/25/2018,2019-05-31,11/05/2019,Y,N,...,0.0,0.0,40.0,14880.0,8565.0,0.0,0.0,0.0,0.00000,0.575605
44669,673067,655869,5,,1,05/25/2018,2019-05-31,11/05/2019,Y,N,...,0.0,0.0,40.0,14880.0,8565.0,0.0,0.0,0.0,0.00000,0.575605
44670,673067,655869,5,,1,05/25/2018,2019-05-31,11/05/2019,Y,N,...,0.0,0.0,40.0,14880.0,8565.0,0.0,0.0,0.0,0.00000,0.575605
44668,673067,655869,5,,1,05/25/2018,2019-05-31,11/05/2019,Y,N,...,0.0,0.0,40.0,14880.0,8565.0,0.0,0.0,0.0,0.00000,0.575605


In [26]:
full_df = full_df.drop_duplicates('Provider Number')

In [27]:
# check that duplicates are dropped by Fiscal Year End Date correctly
full_df.loc[full_df['Provider Number'].isin(dup_provider_nums)].sort_values('Provider Number')

Unnamed: 0,Provider Number,Report Record Number,Provider Control Type Code,National Provider Identifier,Report Status Code,Fiscal Year Begin Date,Fiscal Year End Date,Process Date,Initial Report Switch,Last Report Switch,...,Surgical ICU Bed Days Available,Surgical ICU Inpatient Days,Total Staffed Beds,Total Bed Days Available,Total Inpatient Days,ICU Total Staffed Beds,ICU Total Bed Days Available,ICU Total Inpatient Days,ICU Occupancy Rate,Total Bed Occupancy Rate
3,010001,644080,9,,1,10/01/2017,2018-09-30,03/12/2019,N,N,...,0.0,0.0,327.0,119355.0,95560.0,40.0,14600.0,11992.0,0.821370,0.800637
10,010005,644081,9,,1,10/01/2017,2018-09-30,03/12/2019,N,N,...,0.0,0.0,204.0,74460.0,38089.0,20.0,7300.0,5283.0,0.723699,0.511536
23,010006,660158,4,,1,07/01/2018,2019-06-30,12/10/2019,N,N,...,0.0,0.0,233.0,104170.0,61969.0,52.0,18368.0,13247.0,0.721200,0.594883
30,010007,644003,9,,1,10/01/2017,2018-09-30,03/11/2019,N,N,...,0.0,0.0,45.0,16425.0,4571.0,5.0,1825.0,1126.0,0.616986,0.278295
39,010008,647847,4,,1,01/01/2018,2018-12-31,05/24/2019,N,N,...,0.0,0.0,29.0,10585.0,1334.0,0.0,0.0,0.0,0.000000,0.126027
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
44648,673062,650558,4,,1,01/01/2018,2018-12-31,06/27/2019,N,N,...,0.0,0.0,26.0,9490.0,5582.0,0.0,0.0,0.0,0.000000,0.588198
44649,673064,653529,6,,1,04/01/2018,2019-03-31,09/19/2019,N,N,...,0.0,0.0,41.0,14965.0,3532.0,0.0,0.0,0.0,0.000000,0.236017
44658,673065,642123,5,,1,10/01/2017,2018-09-30,02/05/2019,N,N,...,0.0,0.0,49.0,17885.0,15895.0,0.0,0.0,0.0,0.000000,0.888734
44666,673066,642453,4,,1,11/10/2017,2018-09-30,02/08/2019,Y,N,...,0.0,0.0,40.0,13000.0,8997.0,0.0,0.0,0.0,0.000000,0.692077


### Generate final dataframe of provider information

In [28]:
final_columns = list(hosp_df.columns.values) + list(provider_record_sums.columns.values)

In [29]:
final_df = full_df[final_columns]

In [30]:
final_df.head()

Unnamed: 0,Provider Number,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,...,Surgical ICU Bed Days Available,Surgical ICU Inpatient Days,Total Staffed Beds,Total Bed Days Available,Total Inpatient Days,ICU Total Staffed Beds,ICU Total Bed Days Available,ICU Total Inpatient Days,ICU Occupancy Rate,Total Bed Occupancy Rate
8595,102010,01-SEP-18,31-AUG-19,As Submitted,4,KINDRED HOSPITAL SOUTH FLORIDA,1516 EAST LAS OLAS BOULEVARD,,FT. LAUDERDALE,FL,...,0.0,0.0,214.0,78110.0,49247.0,23.0,8395.0,5560.0,0.662299,0.630483
15870,180104,01-SEP-18,31-AUG-19,As Submitted,2,BAPTIST HEALTH PADUCAH,2501 KENTUCKY AVENUE,,PADUCAH,KY,...,0.0,0.0,271.0,98915.0,42445.0,32.0,11680.0,5985.0,0.512414,0.429106
12216,142013,01-SEP-18,31-AUG-19,As Submitted,4,KINDRED HOSPITAL PEORIA,500 WEST ROMEO B GARRETT AVE.,,PEORIA,IL,...,0.0,0.0,50.0,18250.0,8466.0,0.0,0.0,0.0,0.0,0.46389
41493,494022,01-SEP-18,31-AUG-19,As Submitted,4,POPLAR SPRINGS HOSPITAL,350 POPLAR DRIVE,,PETERSBURG,VA,...,0.0,0.0,124.0,45260.0,28428.0,0.0,0.0,0.0,0.0,0.628104
12180,142006,01-SEP-18,31-AUG-19,As Submitted,4,KINDRED HOSPITAL SYCAMORE,225 EDWARD STREET,,SYCAMORE,IL,...,0.0,0.0,69.0,25185.0,9925.0,0.0,0.0,0.0,0.0,0.394084


### Drop bad facility data

In [None]:
# Drop PARKVIEW MEDICAL CENTER which reports over 2 million staffed beds.
# final_df = final_df[final_df['Provider Number'] != '060020']

### Match data to geocoded facility information

These points are geocoded from the origina v1 notebook `usa_hcris2018_facilitybedcounts_20200313_v1.ipynb`.

Notes from original GeoCoding:

- geocoding first with google maps which gave best results, fall back to mapbox geocode and then search_str without street address
- TODO: figure out why geocoder with gmaps stopped sending requests halfway through, temp switch to directly using gmaps api

In [31]:
geojson = json.loads(open(processed_data_path('usa_facilities_hcris_geocoded.geojson')).read())

In [32]:
## Key our data by Provider Number
final_feature_data = final_df.to_dict(orient='records')
final_feature_data_keyed = dict((r['Provider Number'], r) for r in final_feature_data)


#### Create empty facility information for facilities without HCRIS records

In [33]:
nodata_properties = copy(final_feature_data[0])
for key in nodata_properties:
    if type(nodata_properties[key]) is int:
        nodata_properties[key] = 0
    elif type(nodata_properties[key]) is float:
        nodata_properties[key] = 0.0
    else:
        nodata_properties[key] = 'NoData'

In [34]:
def generate_properties(provider_number):
    if provider_number in final_feature_data_keyed:
        new_properties = final_feature_data_keyed[provider_number]
        return new_properties
    return None

#### Replace properties from geocoded facilities with calculated HRCIS properties

In [35]:
facilities_with_data = 0
facilities_without_data = 0
new_features = []
for feature in geojson['features']:
    provider_number = feature['properties']['Provider Number']
    new_props = generate_properties(provider_number)
    if new_props is not None:
        facilities_with_data += 1
        feature['properties'] = new_props
        new_features.append(feature)
    else:
        facilities_without_data += 1
        
    
geojson['features'] = new_features
print('{} Found, {} not found'.format(facilities_with_data, facilities_without_data))

5604 Found, 1058 not found


In [36]:
geojson['features'][0]

{'type': 'Feature',
 'properties': {'Provider Number': '010001',
  'FYB': '01-OCT-17',
  'FYE': '30-SEP-18',
  'STATUS': 'As Submitted',
  'CTRL_TYPE': 9,
  'HOSP10_Name': 'SOUTHEAST HEALTH MEDICAL CENTER',
  'Street_Addr': '1108 ROSS CLARK CIRCLE',
  'PO_Box': '6987',
  'City': 'DOTHAN',
  'State': 'AL',
  'Zip_Code': '36301',
  'County': 'HOUSTON',
  'Hospital Adult and Peds Staffed Beds': 271.0,
  'Hospital Adult and Peds Bed Days Available': 98915.0,
  'Hospital Adult and Peds Inpatient Days': 78031.0,
  'Intensive Care Unit Staffed Beds': 40.0,
  'Intensive Care Unit Bed Days Available': 14600.0,
  'Intensive Care Unit Inpatient Days': 11992.0,
  'Coronary Care Unit Staffed Beds': 0.0,
  'Coronary Care Unit Bed Days Available': 0.0,
  'Coronary Care Unit Inpatient Days': 0.0,
  'Burn ICU Staffed Beds': 0.0,
  'Burn ICU Bed Days Available': 0.0,
  'Burn ICU Inpatient Days': 0.0,
  'Surgical ICU Staffed Beds': 0.0,
  'Surgical ICU Bed Days Available': 0.0,
  'Surgical ICU Inpatient 

### Write usa_hospital_beds_hcris2018 GeoJSON

In [37]:
with open(processed_data_path('usa_hospital_beds_hcris2018.geojson'), 'w') as f:
    f.write(json.dumps(geojson, indent=4))

# Update HCRIS facility reports with work by Jacob Fenton (jsfenfen)

Here, we incorporate the excellent, open-source work by Jacob who pulled together more HCRIS reporting data from multiple years (2017-2019) at https://github.com/jsfenfen/covid_hospitals_demographics

Comparison of data sources: https://docs.google.com/spreadsheets/d/1ew9i4BZJoJuKLXDqUDnxXzasSroi9cQX3TZkbQVO-HA/edit?usp=sharing


Wherever the facility in `usa_hospital_beds_hcris2018.geojson` matches the provider number, we'll replace the ICU and all bed counts and occupancy rates with `jsfenfen`. Since that source does not give the bed days or inpt days, we'll replace those fields with `NaN`s. The updated rows will be indicated by a new `Source` column showing either `jsfenfen` or `HCRIS 2018`.

**Note** that jsfenfen includes other specialty adult "0899" ICU beds in the calculatation of the ICU Occupancy Rate 


In [41]:
hospdata_jsf_df = pd.read_csv(external_data_path('hospital_data_jsfenfen20200406.csv'))

In [42]:
hospdata_jsf_df.head()

Unnamed: 0,hospital_name,hospital_ownership,hospital_type,lat,lng,address,city,location,county_name,emergency_services,...,skilled_nursing_bed_days_1900,nursing_fac_bed_days_2000,oth_longterm_bed_days_2100,hospice_bed_days_2400,obs_bed_days_2800,labor_delivery_bed_days_3200,all_adult_icu_utilization,subtotal_acute_utilization,interns_residents,payroll_employees
0,PORTERVILLE DEVELOPMENTAL CENTER,Government - State,Acute Care Hospitals,36.043908,-118.980483,26501 AVENUE 140,PORTERVILLE,,TULARE,False,...,,,102691.0,,,,,5.0,,938.9
1,RANDOLPH HOSPITAL,Voluntary non-profit - Private,Acute Care Hospitals,35.711958,-79.814825,364 WHITE OAK STREET,ASHEBORO,,RANDOLPH,True,...,,,,,4208.0,,37.0,27.0,,742.93
2,REDMOND REGIONAL MEDICAL CENTER,Proprietary,Acute Care Hospitals,34.278191,-85.194131,501 REDMOND ROAD,ROME,,FLOYD,True,...,,,,,9483.0,,80.0,65.0,39.52,1068.1
3,PHYSICIAN'S CARE SURGICAL HOSPITAL,Physician,Acute Care Hospitals,40.216786,-75.553707,454 ENTERPRISE DRIVE,ROYERSFORD,,MONTGOMERY,False,...,,,,,116.0,,,37.0,,84.11
4,HSHS GOOD SHEPHERD HOSPITAL INC,Voluntary non-profit - Private,Acute Care Hospitals,39.405801,-88.807113,200 S CEDAR ST,SHELBYVILLE,,SHELBY,True,...,,,,,363.0,,,12.0,,133.68


In [45]:
hcris2018_gdf = gpd.read_file(processed_data_path('usa_hospital_beds_hcris2018.geojson'))

In [46]:
hcris2018_gdf.shape, hospdata_jsf_df.shape

((5604, 36), (4710, 89))

In [47]:
hcris2jsf_cols_dict = {
    'Hospital Adult and Peds Staffed Beds': 'acute_beds_0700',
    'Intensive Care Unit Staffed Beds': 'icu_beds_0800',
    'Coronary Care Unit Staffed Beds': 'coronary_beds_0900',
    'Burn ICU Staffed Beds': 'burn_beds_1000',
    'Surgical ICU Staffed Beds': 'surg_icu_beds_1100',
    'Total Staffed Beds': 'subtotal_acute_beds_1400',
    'ICU Occupancy Rate': 'all_adult_icu_utilization',
    'Total Bed Occupancy Rate': 'subtotal_acute_utilization'
}

jsf2hcris_cols_dict = {v:k for k,v in hcris2jsf_cols_dict.items()}

jsf2hcris_cols_dict

{'acute_beds_0700': 'Hospital Adult and Peds Staffed Beds',
 'icu_beds_0800': 'Intensive Care Unit Staffed Beds',
 'coronary_beds_0900': 'Coronary Care Unit Staffed Beds',
 'burn_beds_1000': 'Burn ICU Staffed Beds',
 'surg_icu_beds_1100': 'Surgical ICU Staffed Beds',
 'subtotal_acute_beds_1400': 'Total Staffed Beds',
 'all_adult_icu_utilization': 'ICU Occupancy Rate',
 'subtotal_acute_utilization': 'Total Bed Occupancy Rate'}

In [70]:
hcris2018_gdf.index

RangeIndex(start=0, stop=5604, step=1)

In [78]:
hcris2018_gdf2 = hcris2018_gdf.set_index('Provider Number')

In [79]:
hcris2018_gdf2['Source'] = 'HCRIS 2018'

In [73]:
days_cols = [col for col in hcris2018_gdf2.columns if 'Days' in col]
days_cols

['Hospital Adult and Peds Bed Days Available',
 'Hospital Adult and Peds Inpatient Days',
 'Intensive Care Unit Bed Days Available',
 'Intensive Care Unit Inpatient Days',
 'Coronary Care Unit Bed Days Available',
 'Coronary Care Unit Inpatient Days',
 'Burn ICU Bed Days Available',
 'Burn ICU Inpatient Days',
 'Surgical ICU Bed Days Available',
 'Surgical ICU Inpatient Days',
 'Total Bed Days Available',
 'Total Inpatient Days',
 'ICU Total Bed Days Available',
 'ICU Total Inpatient Days']

In [80]:
hcris2018_gdf2.index

Index(['010001', '010005', '010006', '010007', '010008', '010011', '010012',
       '010016', '010018', '010019',
       ...
       '673056', '673058', '673059', '673060', '673061', '673062', '673064',
       '673065', '673066', '673067'],
      dtype='object', name='Provider Number', length=5604)

In [77]:
jsf2hcris_cols_dict[col]

'Hospital Adult and Peds Staffed Beds'

In [81]:
for i, row in hospdata_jsf_df.iterrows():
    provider_num_str = str(row['provider_id_int']).zfill(6)
    
    for col in list(jsf2hcris_cols_dict.keys()):
        if 'utilization' in col: hcris2018_gdf2.loc[provider_num_str, jsf2hcris_cols_dict[col]] = row[col]/100
        else: hcris2018_gdf2.loc[provider_num_str, jsf2hcris_cols_dict[col]] = row[col]
    
    hcris2018_gdf2.loc[provider_num_str, list(days_cols)] = np.nan

    all_icu_beds = row[['icu_beds_0800', 'coronary_beds_0900', 'burn_beds_1000', 'surg_icu_beds_1100']].sum(min_count=1, skipna=True)

    hcris2018_gdf2.loc[provider_num_str, 'ICU Total Staffed Beds'] = all_icu_beds
    hcris2018_gdf2.loc[provider_num_str, 'Source'] = 'jsfenfen'

In [83]:
hcris2018_gdf2['Total Staffed Beds'].sum(), hcris2018_gdf2['ICU Total Staffed Beds'].sum()

(776964.0, 78902.0)

In [89]:
# find the facilities that are present in jsf but not in hcris2018 (probably because it didn't report in 2018)
hcris2018_gdf2[hcris2018_gdf2['HOSP10_Name'].isna()].index

Index(['050546', '290042', '190144', '040074', '400124', '670112', '450605',
       '340016', '403301', '433300',
       ...
       '450054', '420004', '450107', '450346', '380021', '050060', '050159',
       '050292', '050441', '450788'],
      dtype='object', name='Provider Number', length=240)

In [91]:
# TODO: how to handle adding in non-2018 reports, dropping these ~200 records for now
hcris2018_gdf2.drop(hcris2018_gdf2[hcris2018_gdf2['HOSP10_Name'].isna()].index, inplace=True)

In [92]:
hcris2018_gdf2.sort_values('Total Staffed Beds', ascending=False)

Unnamed: 0_level_0,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,...,Total Staffed Beds,Total Bed Days Available,Total Inpatient Days,ICU Total Staffed Beds,ICU Total Bed Days Available,ICU Total Inpatient Days,ICU Occupancy Rate,Total Bed Occupancy Rate,geometry,Source
Provider Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100007,01-JAN-18,31-DEC-18,As Submitted,2.0,ADVENTHEALTH ORLANDO,601 EAST ROLLINS STREET,,ORLANDO,FL,32803,...,2753.0,,,261.0,,,0.61,0.66,POINT (-81.37036 28.57451),jsfenfen
330101,01-JAN-18,31-DEC-18,As Submitted,2.0,NEW YORK PRESBYTERIAN HOSPITAL,525 EAST 68TH STREET,,NEW YORK,NY,10065,...,2272.0,,,242.0,,,0.63,0.84,POINT (-73.95429 40.76431),jsfenfen
450388,01-JUL-18,30-JUN-19,As Submitted,5.0,METHODIST HOSPITAL,7700 FLOYD CURL DRIVE,,SAN ANTONIO,TX,78229-3902,...,1560.0,,,207.0,,,0.85,0.77,POINT (-98.57222 29.50788),jsfenfen
100006,01-OCT-17,30-SEP-18,Amended,2.0,ORLANDO HEALTH,1414 KUHL AVENUE,,ORLANDO,FL,32806,...,1468.0,,,113.0,,,0.72,0.75,POINT (-81.37727 28.52525),jsfenfen
330214,01-SEP-18,31-AUG-19,As Submitted,2.0,NYU LANGONE HOSPITALS,550 FIRST AVENUE,,NEW YORK,NY,10016,...,1468.0,,,212.0,,,0.28,0.58,POINT (-73.97367 40.74217),jsfenfen
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
443302,01-JUL-18,30-JUN-19,Settled,4.0,ST. JUDE CHILDRENS RESEARCH HOSPITAL,262 DANNY THOAS PLACE,,MEMPHIS,TN,38105-3678,...,,,,,,,,,POINT (-90.04348 35.15359),jsfenfen
443303,01-JUL-18,30-JUN-19,Settled,13.0,EAST TENNESSEE CHILDRENS HOSPITAL,2018 CLINCH AVE,,KNOXVILLE,TN,37916-2301,...,,,,,,,,,POINT (-83.93855 35.95580),jsfenfen
453314,01-OCT-17,30-SEP-18,Settled,1.0,TX SCOTTISH RITE HOSPITAL FOR CHILDR,2222 WELBORN STREET,,DALLAS,TX,75219,...,,,,,,,,,POINT (-96.81413 32.80212),jsfenfen
490129,01-JAN-18,31-DEC-18,As Submitted,2.0,CAPITAL HOSPICE,2900 TELESTAR COURT,,FALLS CHURCH,VA,22042-1206,...,,,,,,,,,POINT (-77.22461 38.87239),jsfenfen


In [93]:
hcris2018_gdf2.sort_values('ICU Total Staffed Beds', ascending=False)

Unnamed: 0_level_0,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,...,Total Staffed Beds,Total Bed Days Available,Total Inpatient Days,ICU Total Staffed Beds,ICU Total Bed Days Available,ICU Total Inpatient Days,ICU Occupancy Rate,Total Bed Occupancy Rate,geometry,Source
Provider Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
061336,01-APR-18,31-MAR-19,As Submitted,2.0,ARKANSAS VALLEY REGL MED CTR,1100 CARSON AVENUE,,LA JUNTA,CO,81050,...,522.0,,,506.0,,,0.00,0.02,POINT (-103.54911 37.97827),jsfenfen
440039,01-JUL-18,30-JUN-19,As Submitted,2.0,VANDERBILT UNIVERSITY MEDICAL CENTER,1211 MEDICAL CENTER DRIVE,,NASHVILLE,TN,37232,...,954.0,,,277.0,,,0.89,0.93,POINT (-86.80140 36.14252),jsfenfen
100007,01-JAN-18,31-DEC-18,As Submitted,2.0,ADVENTHEALTH ORLANDO,601 EAST ROLLINS STREET,,ORLANDO,FL,32803,...,2753.0,,,261.0,,,0.61,0.66,POINT (-81.37036 28.57451),jsfenfen
290001,01-JUL-18,30-JUN-19,As Submitted,2.0,RENOWN REGIONAL MEDICAL CENTER,1155 MILL STREET,,RENO,NV,89502,...,637.0,,,254.0,,,0.78,0.80,POINT (-119.79521 39.52552),jsfenfen
360180,01-JAN-18,31-DEC-18,As Submitted,2.0,CLEVELAND CLINIC HOSPITAL,9500 EUCLID AVENUE,,CLEVELAND,OH,44195-,...,1285.0,,,247.0,,,0.91,0.79,POINT (-81.62126 41.50306),jsfenfen
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
670116,01-JAN-18,31-DEC-18,As Submitted,12.0,WISE HEALTH SYSTEM - PARKWAY,3200 NORTH TARRANT PARKWAY,,FORT WORTH,TX,76177-8611,...,36.0,,,,,,,0.07,POINT (-97.31286 32.89692),jsfenfen
670118,01-JAN-18,31-DEC-18,As Submitted,5.0,FIRST TEXAS HOSPITAL CY FAIR LLC,9922 LOUETTA RD,,HOUSTON,TX,77070,...,12.0,,,,,,,0.14,POINT (-95.56126 30.00128),jsfenfen
670119,01-JAN-18,31-DEC-18,As Submitted,4.0,PROVIDENCE HOSPITAL OF NORTH HOUSTON,16750 RED OAK DR.,,HOUSTON,TX,77090,...,16.0,,,,,,,0.16,POINT (-95.44093 30.01428),jsfenfen
670121,01-JAN-18,31-DEC-18,As Submitted,4.0,PSG MIDCITIES MEDICAL CENTER LLC,1612 HURST TOWN CENTER DRIVE,,HURST,TX,76054,...,23.0,,,,,,,0.03,POINT (-97.18829 32.84283),jsfenfen


In [96]:
hcris2018_gdf2.to_file(processed_data_path('usa_hospital_beds_hcris2018_jsf.geojson'), driver='GeoJSON', encoding='utf-8')