<a href="https://colab.research.google.com/github/daveluo/covid19-healthsystemcapacity/blob/master/nbs/usa_hcris2018_facilitybedcounts_20200313_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Get total & ICU staffed bed counts for every acute hospital facility in USA

Following methodology from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5514420/

> Study Design and Data Sources

> We performed a repeated-measures time series analysis of US ICU bed supply during the 16-year period between 1996 and 2011. We obtained data on hospital characteristics and intensive care occupancy from the Centers for Medicare and Medicaid Services Hospital Cost Report Information System (HCRIS), a publicly available hospital-level database with detailed information on structural, organizational and cost data for all US hospitals. We excluded skilled nursing facilities, long term acute care hospitals, hospitals located in US territories and stand-alone pediatric hospitals (1, 2). We augmented the HCRIS data with data from the US Census Bureau’s 2010 urban-rural classification file which we used to designate hospitals as urban or rural by ZIP code (5).

> Variables

> The primary dependent variable was each hospital’s number of ICU beds compared to the previous year. We defined total ICU beds using the summed counts of four HCRIS bed categories that were available throughout the study interval: intensive care beds, surgical intensive care beds, cardiac intensive care beds and burn intensive care beds 

## Useful References & Links:

CMS Healthcare Cost Report Information System (HCRIS):
https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/Hospital-2010-form

Hospital Facilities: 
- http://downloads.cms.gov/files/hcris/hosp10-reports.zip

Public Use File (annual, from 2015): 
- https://data.cms.gov/api/views/absp-nd3x/files/e0ca9126-8fd6-42ca-82bf-c2fe40bd4c0e?download=true&filename=CostReport_Documentation_2015_Final_Oct2019.xlsx
- https://www.cms.gov/files/zip/hospital-cost-report-public-use-file-2015.zip

![](https://www.resdac.org/sites/resdac.umn.edu/files/kb-images/Figure%204_2.png)

Direct link to 2018 reporting from hospitals: http://downloads.cms.gov/Files/hcris/HOSP10FY2018.zip


## Specific guidance for what we're doing here:
from CMS data research guide: https://www.resdac.org/articles/medicare-cost-report-data-structure

![alt text](https://www.resdac.org/sites/resdac.umn.edu/files/kb-images/Figure%205.PNG)

> In summary, the number of beds will be located in the numeric file. 

> To identify the number of beds for every report in numeric file, filter the records where the second column (Worksheet Indicator) is “S300001,” the third column (Line Number) is “01400”, and the fourth column (Column Number) is “00200”. 

> To identify the number of beds for a specific report submitted by a specific facility, filter the records by the “Record Report Number,” which is reported in Column 1. The Report Record Number for a specific facility can be found in the Report data file.

## Official Worksheet Definitions

from Provider Reimbursement Manual (https://www.cms.gov/Regulations-and-Guidance/Guidance/Manuals/Paper-Based-Manuals-Items/CMS021935):

Column 2--Refer to 42 CFR 412.105(b) and 69 FR 49093-49098 (August 11, 2004) to determine the facility bed count. Indicate the number of beds available for use by patients at the end of the cost reporting period.

A bed means an adult bed, pediatric bed, portion of inpatient labor/delivery/postpartum (LDP) room (also referred to as birthing room) bed when used for services other than labor and delivery, or newborn ICU bed (excluding newborn bassinets) maintained in a patient care area for lodging patients in acute, long term, or domiciliary areas of the hospital. Beds in post-anesthesia, post- operative recovery rooms, outpatient areas, emergency rooms, ancillary departments (however, see exception for labor and delivery department), nurses' and other staff residences, and other such areas that are regularly maintained and utilized for only a portion of the stay of patients (primarily for special procedures or not for inpatient lodging) are not termed a bed for these purposes. (See CMS Pub. 15-1, chapter 22, §2205.)

For cost reporting periods beginning prior to October 1, 2012, beds in distinct ancillary labor and delivery rooms and the proportion of LDP room (birthing room) beds used for labor and delivery services are not a bed for these purposes. (See 68 FR 45420 (August 1, 2003).)

For cost reporting periods beginning on or after October 1, 2012, in accordance with 77 FR 53411- 53413 (August 31, 2012), beds in distinct labor and delivery rooms, when occupied by an inpatient receiving IPPS-level acute care hospital services or when unoccupied, are considered to be part of a hospital’s inpatient available bed count in accordance with 42 CFR 412.105(b) and are to be reported on line 32. Furthermore, the proportion of the inpatient LDP room (birthing room) beds used for ancillary labor and delivery services is considered part of the hospital’s available bed count.

Column 8--Enter the number of inpatient days for all classes of patients for each component. Include organ acquisition and HMO days in this column. This amount will not equal the sum of columns 5 through 7, when the provider renders services to other than titles V, XVIII, or XIX patients.

Line 1--For cost reporting periods beginning before October 1, 2012, exclude from column 2 the portion of LDP room (birthing room) beds used for ancillary labor and delivery services, but include on this line beds used for routine adult and pediatric services (postpartum). In accordance with the instructions in 68 FR 45420 (August 1, 2003), compute this proportion (off the cost report) by multiplying the total number of occupied and unoccupied available beds in the LDP room by the percentage of time these beds were used for ancillary labor and delivery services. An example of how to calculate the “percentage of time” would be for a hospital to determine the number of hours for the cost reporting period during which each LDP room maternity patient received labor and delivery services and divide the sum of those hours for all such patients by the sum of the total hours (for both, ancillary labor and delivery services and for routine postpartum services) that all maternity patients spent in the LDP room during that cost reporting period. Alternatively, a hospital could calculate an average percentage of time maternity patients received ancillary labor and delivery services in an LDP room during a typical month.
For cost reporting periods beginning on or after October 1, 2012, include all the available LDP room (birthing room) beds in the available bed count in column 2. (See 77 FR 53411-53413 (August 31, 2012).) The proportion of available LDP room beds related to the ancillary labor and delivery services must not be excluded from column 2 for those cost reporting periods.
In columns 5, 6, 7 and 8, enter the number of adult and pediatric hospital days excluding the SNF and NF swing-bed, observation bed, and hospice days. In columns 6 and 7, also exclude HMO days. Do not include in column 6 Medicare Secondary Payer/Lesser of Reasonable Cost (MSP/LCC) days. Include these days only in column 8. However, do not include employee discount days in column 8.

Line 7--Enter the sum of lines 1, 5, and 6.
Lines 8 through 13--Enter the appropriate statistic applicable to each discipline for all programs.
Line 14--Enter the sum of lines 7 through 13 for columns 2 through 8, and for columns 12 through 15, enter the amount from line 1. For columns 9 through 11, enter the total for each from your records. Labor and delivery days (as defined in the instructions for Worksheet S-3, Part I, line 32) must not be included on this line.




In [0]:
# get facility level IDs, names and addresses
!wget http://downloads.cms.gov/files/hcris/hosp10-reports.zip

--2020-03-12 19:25:11--  http://downloads.cms.gov/files/hcris/hosp10-reports.zip
Resolving downloads.cms.gov (downloads.cms.gov)... 104.85.210.238, 2600:1417:76:582::1fc4, 2600:1417:76:595::1fc4
Connecting to downloads.cms.gov (downloads.cms.gov)|104.85.210.238|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cms.gov/files/hcris/hosp10-reports.zip [following]
--2020-03-12 19:25:11--  https://downloads.cms.gov/files/hcris/hosp10-reports.zip
Connecting to downloads.cms.gov (downloads.cms.gov)|104.85.210.238|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5190360 (4.9M) [application/zip]
Saving to: ‘hosp10-reports.zip’


2020-03-12 19:25:13 (16.5 MB/s) - ‘hosp10-reports.zip’ saved [5190360/5190360]



In [0]:
!unzip hosp10-reports.zip

Archive:  hosp10-reports.zip
  inflating: COST_CHARGES/CSTS_CHRGS2010.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2011.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2012.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2013.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2014.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2015.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2016.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2017.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2018.CSV  
  inflating: COST_CHARGES/CSTS_CHRGS2019.CSV  
  inflating: hosp10_COST_REPORT_STATUS_COUNTS.CSV  
  inflating: hosp10_RECORD_COUNTS.CSV  
  inflating: HOSPITAL10_PROVIDER_ID_INFO.CSV  
  inflating: IME_GME/IME_GME2010.CSV  
  inflating: IME_GME/IME_GME2011.CSV  
  inflating: IME_GME/IME_GME2012.CSV  
  inflating: IME_GME/IME_GME2013.CSV  
  inflating: IME_GME/IME_GME2014.CSV  
  inflating: IME_GME/IME_GME2015.CSV  
  inflating: IME_GME/IME_GME2016.CSV  
  inflating: IME_GME/IME_GME2017.CSV  
  inflating: IME_GME/IME_GME2018.CSV  
  inflating

In [0]:
import pandas as pd
import numpy as np

In [0]:
hosp_df = pd.read_csv('HOSPITAL10_PROVIDER_ID_INFO.CSV')

In [0]:
hosp_df.head()

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County
0,10001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON
1,10005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL
2,10006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE
3,10007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON
4,10008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW


In [0]:
# provider num should be 6 char so need to zfill
hosp_df['PROVIDER_NUMBER'] = hosp_df['PROVIDER_NUMBER'].apply(lambda x: str(x).zfill(6))

In [0]:
hosp_df[hosp_df['County'] == 'SAN FRANCISCO']

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County
399,50008,01-JAN-18,31-DEC-18,As Submitted,2,CPMC-R.K. DAVIES MEDICAL CENTER,601 DUBOCE AVE,,SAN FRANCISCO,CA,94117-3389,SAN FRANCISCO
420,50047,01-JAN-18,31-DEC-18,As Submitted,2,CALIFORNIA PACIFIC MEDICAL CENTER,2333 BUCHANAN ST,,SAN FRANCISCO,CA,94115-1925,SAN FRANCISCO
422,50055,01-JAN-18,31-DEC-18,As Submitted,2,CPMC - MISSION BERNAL CAMPUS,3555 CESAR CHAVEZ STREET,,SAN FRANCISCO,CA,94110-4403,SAN FRANCISCO
435,50076,01-JAN-18,31-DEC-18,As Submitted,2,KFH - SAN FRANCISCO,2425 GEARY BOULEVARD,,SAN FRANCISCO,CA,94115-,SAN FRANCISCO
482,50152,01-JUL-18,30-JUN-19,As Submitted,2,SAINT FRANCIS MEMORIAL HOSPITAL,900 HYDE STREET,,SAN FRANCISCO,CA,94109,SAN FRANCISCO
507,50228,01-JUL-18,30-JUN-19,As Submitted,8,ZUCKERBERG SAN FRANCISCO GENERAL,1001 POTRERO AVENUE,,SAN FRANCISCO,CA,94110-,SAN FRANCISCO
576,50407,01-JAN-18,31-DEC-18,As Submitted,2,CHINESE HOSPITAL,845 JACKSON STREET,,SAN FRANCISCO,CA,94133-,SAN FRANCISCO
590,50454,01-JUL-18,30-JUN-19,As Submitted,10,UCSF MEDICAL CENTER,505 PARNASSUS,,SAN FRANCISCO,CA,94143-0824,SAN FRANCISCO
592,50457,01-JUL-18,30-JUN-19,As Submitted,1,ST. MARYS MEDICAL CENTER,450 STANYAN STREET,,SAN FRANCISCO,CA,94117,SAN FRANCISCO
654,50668,01-JUL-18,30-JUN-19,As Submitted,8,LAGUNA HONDA HOSPITAL,375 LAGUNA HONDA BLVD,,SAN FRANCISCO,CA,94116-,SAN FRANCISCO


In [0]:
# TODO: tried using FY2019 first but it seemed to be missing a lot of facilities, maybe incomplete because their reporting year didn't end yet? Go back and confirm
# !wget http://downloads.cms.gov/Files/hcris/HOSP10FY2019.zip
# !unzip HOSP10FY2019.zip

URL transformed to HTTPS due to an HSTS policy
--2020-03-12 19:29:39--  https://downloads.cms.gov/Files/hcris/HOSP10FY2019.zip
Resolving downloads.cms.gov (downloads.cms.gov)... 104.85.210.238, 2600:1417:76:582::1fc4, 2600:1417:76:595::1fc4
Connecting to downloads.cms.gov (downloads.cms.gov)|104.85.210.238|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1725612 (1.6M) [application/zip]
Saving to: ‘HOSP10FY2019.zip’


2020-03-12 19:29:41 (16.3 MB/s) - ‘HOSP10FY2019.zip’ saved [1725612/1725612]

Archive:  HOSP10FY2019.zip
  inflating: hosp10_2019_ALPHA.CSV   
  inflating: hosp10_2019_NMRC.CSV    
  inflating: hosp10_2019_RPT.CSV     


In [0]:
# get the HCRIS file for FY2018
!wget http://downloads.cms.gov/Files/hcris/HOSP10FY2018.zip
!unzip HOSP10FY2018.zip

URL transformed to HTTPS due to an HSTS policy
--2020-03-12 19:53:17--  https://downloads.cms.gov/Files/hcris/HOSP10FY2018.zip
Resolving downloads.cms.gov (downloads.cms.gov)... 104.85.210.238, 2600:1417:76:595::1fc4, 2600:1417:76:582::1fc4
Connecting to downloads.cms.gov (downloads.cms.gov)|104.85.210.238|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 125893646 (120M) [application/zip]
Saving to: ‘HOSP10FY2018.zip’


2020-03-12 19:54:17 (2.10 MB/s) - ‘HOSP10FY2018.zip’ saved [125893646/125893646]

Archive:  HOSP10FY2018.zip
  inflating: hosp10_2018_ALPHA.CSV   
  inflating: hosp10_2018_NMRC.CSV    
  inflating: hosp10_2018_RPT.CSV     


In [0]:
!wget https://www.cms.gov/files/zip/hospital2010-documentation.zip
!unzip hospital2010-documentation.zip

hcris_dict = pd.read_csv('/content/HCRIS_DataDictionary.csv')
hcris_dict.head()

Unnamed: 0,Column Code,TABLES,SUBSYSTEM,Null/Not Null,Title,Description,Valid Entries
0,ADR_VNDR_CD,RPT,ALL,,Automated Desk Review Vendor Code,Vendor for Fiscal Intermediary.,2 or A03 - E & Y ...
1,ALPHNMRC_ITM_TXT,ALPHA,ALL,NOT NULL,Alphanumeric Item Text,Provider reported alpha data.,Per Specification Table
2,CLMN_NUM,"ALPHA,NMRC",HOSP10,NOT NULL,Column Number,Valid Column Number defined as follows: xxxyy...,"Example: Column 1 = 00100, Column 1.01 = 00101"
3,CLMN_NUM,"ALPHA,NMRC",ALL BUT HOSP10,NOT NULL,Column Number,Valid Column Number defined as follows: xxyy ...,"Example: Column 1 = 0100, Column 1.01 = 0101"
4,FI_CREAT_DT,RPT,ALL,,Fiscal Intermediary Create Date,Date the FI created the HCRIS file.,MM/DD/YYYY


In [0]:
data_dict = {c:t for c,t in zip(hcris_dict['Column Code'],hcris_dict['Title'])}

In [0]:
data_dict

{'ADR_VNDR_CD': 'Automated Desk Review Vendor Code',
 'ALPHNMRC_ITM_TXT': 'Alphanumeric Item Text',
 'CLMN_NUM': 'Column Number',
 'FI_CREAT_DT': 'Fiscal Intermediary Create Date',
 'FI_NUM': 'Fiscal Intermediary Number',
 'FI_RCPT_DT': 'Fiscal Intermediary Receipt Date',
 'FY_BGN_DT': 'Fiscal Year Begin Date',
 'FY_END_DT': 'Fiscal Year End Date',
 'INITL_RPT_SW': 'Initial Report Switch',
 'ITEM': 'Rollup value',
 'ITM_VAL_NUM': 'Item Value Number',
 'LABEL': 'Rollup label',
 'LAST_RPT_SW': 'Last Report Switch',
 'LINE_NUM': 'Line Number',
 'NPI': 'National Provider Identifier',
 'NPR_DT': 'Notice of Program Reimbursement Date',
 'PROC_DT': 'Process Date',
 'PRVDR_CTRL_TYPE_CD': 'Provider Control Type Code',
 'PRVDR_NUM': 'Provider Number',
 'RPT_REC_NUM': 'Report Record Number',
 'RPT_STUS_CD': 'Report Status Code',
 'SPEC_IND': 'Special Indicator',
 'TRNSMTL_NUM': 'The current transmittal or version number in effect for each sub-system.',
 'UTIL_CD': 'Utilization Code',
 'WKSHT_CD':

In [0]:
# Report Table file columns
rpt_columns = [
               'RPT_REC_NUM',
               'PRVDR_CTRL_TYPE_CD',
               'PRVDR_NUM',
               'NPI',
               'RPT_STUS_CD',
               'FY_BGN_DT',
               'FY_END_DT',
               'PROC_DT',
               'INITL_RPT_SW',
               'LAST_RPT_SW',
               'TRNSMTL_NUM',
               'FI_NUM',
               'ADR_VNDR_CD',
               'FI_CREAT_DT',
               'UTIL_CD',
               'NPR_DT',
               'SPEC_IND',
               'FI_RCPT_DT'
]

In [0]:
[data_dict[col] for col in rpt_columns]

['Report Record Number',
 'Provider Control Type Code',
 'Provider Number',
 'National Provider Identifier',
 'Report Status Code',
 'Fiscal Year Begin Date',
 'Fiscal Year End Date',
 'Process Date',
 'Initial Report Switch',
 'Last Report Switch',
 'The current transmittal or version number in effect for each sub-system.',
 'Fiscal Intermediary Number',
 'Automated Desk Review Vendor Code',
 'Fiscal Intermediary Create Date',
 'Utilization Code',
 'Notice of Program Reimbursement Date',
 'Special Indicator',
 'Fiscal Intermediary Receipt Date']

In [0]:
hosp10_2018_rpt_df = pd.read_csv('hosp10_2018_RPT.CSV', names=[data_dict[col] for col in rpt_columns], dtype={'Provider Number':object})

In [0]:
hosp10_2018_rpt_df.head(25)

Unnamed: 0,Report Record Number,Provider Control Type Code,Provider Number,National Provider Identifier,Report Status Code,Fiscal Year Begin Date,Fiscal Year End Date,Process Date,Initial Report Switch,Last Report Switch,The current transmittal or version number in effect for each sub-system.,Fiscal Intermediary Number,Automated Desk Review Vendor Code,Fiscal Intermediary Create Date,Utilization Code,Notice of Program Reimbursement Date,Special Indicator,Fiscal Intermediary Receipt Date
0,623132,9,10032,,1,10/01/2017,11/13/2017,04/26/2018,N,N,K,10001,4,04/19/2018,F,,,04/16/2018
1,628158,2,250042,,1,11/01/2017,12/31/2017,06/25/2018,N,N,L,5901,4,06/21/2018,F,,,06/01/2018
2,628456,2,140147,,1,10/01/2017,12/31/2017,06/27/2018,N,N,L,6101,4,06/25/2018,F,,,06/01/2018
3,628833,4,440235,,1,10/11/2017,12/31/2017,07/02/2018,N,N,L,10001,4,06/28/2018,F,,,06/04/2018
4,630367,4,290058,,1,10/23/2017,12/31/2017,07/30/2018,N,N,L,1011,4,07/26/2018,F,,,07/20/2018
5,631016,2,50523,,1,01/01/2018,02/28/2018,08/21/2018,N,N,L,1011,4,08/01/2018,F,,,07/30/2018
6,631094,2,50305,,1,01/01/2018,02/28/2018,08/21/2018,N,N,L,1011,4,08/07/2018,F,,,07/30/2018
7,631292,2,50043,,1,01/01/2018,02/28/2018,08/21/2018,N,N,L,1011,4,08/09/2018,F,,,07/30/2018
8,631415,2,340060,,1,10/01/2017,12/31/2017,08/21/2018,N,N,L,11501,4,08/02/2018,F,,,07/25/2018
9,631564,9,151320,,1,10/01/2017,02/28/2018,08/22/2018,N,N,L,8001,4,08/20/2018,F,,,08/01/2018


In [0]:
# Numerical Table file columns
nmrc_columns = [
             'RPT_REC_NUM',
             'WKSHT_CD',
             'LINE_NUM',
             'CLMN_NUM',
             'ITM_VAL_NUM'
]

In [0]:
[data_dict[col] for col in nmrc_columns]

['Report Record Number',
 'Worksheet Identifier',
 'Line Number',
 'Column Number',
 'Item Value Number']

In [0]:
hosp10_2018_nmrc_df = pd.read_csv('hosp10_2018_NMRC.CSV',  names=[data_dict[col] for col in nmrc_columns], dtype={'Line Number':object, 'Column Number':object})
hosp10_2018_nmrc_df.head()

Unnamed: 0,Report Record Number,Worksheet Identifier,Line Number,Column Number,Item Value Number
0,623132,A000000,100,200,33286.0
1,623132,A000000,100,300,33286.0
2,623132,A000000,100,500,33286.0
3,623132,A000000,100,700,33286.0
4,623132,A000000,400,200,162635.0


In [0]:
hosp10_2018_nmrc_df.dtypes

Report Record Number      int64
Worksheet Identifier     object
Line Number              object
Column Number            object
Item Value Number       float64
dtype: object

In [0]:
provider_num = '050228'
hosp10_2018_rpt_df[hosp10_2018_rpt_df['Provider Number'] == provider_num]['Report Record Number'].values[0]

662022

In [0]:
# dict of bed categories we need and their worksheet line number formatted for lookup
beds_dict = {
    'Hospital Adult and Peds': '00100',
    'Intensive Care Unit': '00800',
    'Coronary Care Unit': '00900',
    'Burn ICU': '01000',
    'Surgical ICU': '01100',
    'Total': '01400'
}

In [0]:
beds_dict_flip = {v:k for k,v in beds_dict.items()}
bedtype_list = list(beds_dict_flip.keys())

In [0]:
bedtype_list

['00100', '00800', '00900', '01000', '01100', '01400']

In [0]:
hosp_df.head()

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County
0,10001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON
1,10005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL
2,10006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE
3,10007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON
4,10008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW


In [0]:
# was working itertively to catch issues and pull new data, ugly code in next few cells. should be refactored if we're thinking to compile more data from these HCRIS worksheets

beds_list = []
beddays_list = []
ptdays_list = []

for idx, provider_num in hosp_df['PROVIDER_NUMBER'].items():

  try: 
    rpt_num = hosp10_2018_rpt_df[hosp10_2018_rpt_df['Provider Number'] == provider_num]['Report Record Number'].values[0]
    hosp_rpt = hosp10_2018_nmrc_df[(hosp10_2018_nmrc_df['Report Record Number'] == rpt_num)]

    beds = []
    ptdays = []
    beddays = []

    for bedtype in bedtype_list:
      try: bed_count = int(hosp_rpt[(hosp_rpt['Worksheet Identifier'] == 'S300001') 
                        & (hosp_rpt['Line Number'] == bedtype) 
                        & (hosp_rpt['Column Number'] == '00200') 
                        ]['Item Value Number'].values[0])
      except Exception as exc:
        # print(exc)
        bed_count = 0
      
      beds.append(bed_count)

      try: beddays_count = int(hosp_rpt[(hosp_rpt['Worksheet Identifier'] == 'S300001') 
                        & (hosp_rpt['Line Number'] == bedtype) 
                        & (hosp_rpt['Column Number'] == '00300') 
                        ]['Item Value Number'].values[0])
      except Exception as exc:
        # print(exc)
        beddays_count = 0
      
      beddays.append(beddays_count)
    
      try: ptdays_count = int(hosp_rpt[(hosp_rpt['Worksheet Identifier'] == 'S300001') 
                        & (hosp_rpt['Line Number'] == bedtype) 
                        & (hosp_rpt['Column Number'] == '00800') 
                        ]['Item Value Number'].values[0])
      except Exception as exc:
        # print(exc)
        ptdays_count = 0
      
      ptdays.append(ptdays_count)

  except Exception as exc:
    print('rpt_num does not exist')
    beds = []
    ptdays = []
    beddays = []
  
  beds_list.append(beds)
  ptdays_list.append(ptdays)
  beddays_list.append(beddays)

In [0]:
beddays_df = pd.DataFrame(beddays_list, columns=[beds_dict_flip[b]+ ' Bed Days Available' for b in bedtype_list])#.astype('uint8')
beddays_df['PROVIDER_NUMBER'] = hosp_df['PROVIDER_NUMBER'].values
beddays_df.head(25)

Unnamed: 0,Hospital Adult and Peds Bed Days Available,Intensive Care Unit Bed Days Available,Coronary Care Unit Bed Days Available,Burn ICU Bed Days Available,Surgical ICU Bed Days Available,Total Bed Days Available,PROVIDER_NUMBER
0,98915.0,14600.0,0.0,0.0,0.0,119355.0,10001
1,67160.0,7300.0,0.0,0.0,0.0,74460.0,10005
2,85802.0,10233.0,8135.0,0.0,0.0,104170.0,10006
3,14600.0,1825.0,0.0,0.0,0.0,16425.0,10007
4,10585.0,0.0,0.0,0.0,0.0,10585.0,10008
5,,,,,,,10009
6,,,,,,,10010
7,76285.0,10950.0,20440.0,0.0,5110.0,112785.0,10011
8,31025.0,4380.0,0.0,0.0,0.0,35405.0,10012
9,,,,,,,10015


In [0]:
ptdays_df = pd.DataFrame(ptdays_list, columns=[beds_dict_flip[b]+ ' Inpt Days' for b in bedtype_list])#.astype('uint8')
ptdays_df['PROVIDER_NUMBER'] = hosp_df['PROVIDER_NUMBER'].values

In [0]:
ptdays_df.head(25)

Unnamed: 0,Hospital Adult and Peds Inpt Days,Intensive Care Unit Inpt Days,Coronary Care Unit Inpt Days,Burn ICU Inpt Days,Surgical ICU Inpt Days,Total Inpt Days,PROVIDER_NUMBER
0,78031.0,11992.0,0.0,0.0,0.0,95560.0,10001
1,30597.0,5283.0,0.0,0.0,0.0,38089.0,10005
2,44822.0,5732.0,7515.0,0.0,0.0,61969.0,10006
3,3300.0,1126.0,0.0,0.0,0.0,4571.0,10007
4,1323.0,0.0,0.0,0.0,0.0,1334.0,10008
5,,,,,,,10009
6,,,,,,,10010
7,49842.0,10127.0,17692.0,0.0,4895.0,82556.0,10011
8,8332.0,2014.0,0.0,0.0,0.0,11591.0,10012
9,,,,,,,10015


In [0]:
beds_df = pd.DataFrame(beds_list, columns=[beds_dict_flip[b]+' Beds' for b in bedtype_list])#.astype('uint8')
beds_df['PROVIDER_NUMBER'] = hosp_df['PROVIDER_NUMBER'].values

In [0]:
beds_df.head(25)

Unnamed: 0,Hospital Adult and Peds Beds,Intensive Care Unit Beds,Coronary Care Unit Beds,Burn ICU Beds,Surgical ICU Beds,Total Beds,PROVIDER_NUMBER
0,271.0,40.0,0.0,0.0,0.0,327.0,10001
1,184.0,20.0,0.0,0.0,0.0,204.0,10005
2,181.0,36.0,16.0,0.0,0.0,233.0,10006
3,40.0,5.0,0.0,0.0,0.0,45.0,10007
4,29.0,0.0,0.0,0.0,0.0,29.0,10008
5,,,,,,,10009
6,,,,,,,10010
7,209.0,30.0,56.0,0.0,14.0,309.0,10011
8,85.0,12.0,0.0,0.0,0.0,97.0,10012
9,,,,,,,10015


In [0]:
hosp_df = pd.merge(hosp_df, beds_df)
hosp_df = pd.merge(hosp_df, beddays_df)
hosp_df = pd.merge(hosp_df, ptdays_df)

In [0]:
icu_cols = [beds_dict_flip[b] for b in bedtype_list][1:-1]
icu_cols

['Intensive Care Unit', 'Coronary Care Unit', 'Burn ICU', 'Surgical ICU']

In [0]:
hosp_df.head()

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County,Hospital Adult and Peds Beds,Intensive Care Unit Beds,Coronary Care Unit Beds,Burn ICU Beds,Surgical ICU Beds,Total Beds,Hospital Adult and Peds Bed Days Available,Intensive Care Unit Bed Days Available,Coronary Care Unit Bed Days Available,Burn ICU Bed Days Available,Surgical ICU Bed Days Available,Total Bed Days Available,Hospital Adult and Peds Inpt Days,Intensive Care Unit Inpt Days,Coronary Care Unit Inpt Days,Burn ICU Inpt Days,Surgical ICU Inpt Days,Total Inpt Days,ICU Total Beds
0,10001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON,271.0,40.0,0.0,0.0,0.0,327.0,98915.0,14600.0,0.0,0.0,0.0,119355.0,78031.0,11992.0,0.0,0.0,0.0,95560.0,
1,10005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL,184.0,20.0,0.0,0.0,0.0,204.0,67160.0,7300.0,0.0,0.0,0.0,74460.0,30597.0,5283.0,0.0,0.0,0.0,38089.0,
2,10006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE,181.0,36.0,16.0,0.0,0.0,233.0,85802.0,10233.0,8135.0,0.0,0.0,104170.0,44822.0,5732.0,7515.0,0.0,0.0,61969.0,
3,10007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON,40.0,5.0,0.0,0.0,0.0,45.0,14600.0,1825.0,0.0,0.0,0.0,16425.0,3300.0,1126.0,0.0,0.0,0.0,4571.0,
4,10008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW,29.0,0.0,0.0,0.0,0.0,29.0,10585.0,0.0,0.0,0.0,0.0,10585.0,1323.0,0.0,0.0,0.0,0.0,1334.0,


In [0]:
hosp_df[[c+' Beds' for c in icu_cols]].sum(axis=1)

0       40.0
1       20.0
2       52.0
3        5.0
4        0.0
        ... 
6657     0.0
6658     0.0
6659     0.0
6660     0.0
6661     0.0
Length: 6662, dtype: float64

In [0]:
hosp_df['ICU Total Beds'] = hosp_df[[c+' Beds' for c in icu_cols]].sum(axis=1)
hosp_df['ICU Total Bed Days Available'] = hosp_df[[c+' Bed Days Available' for c in icu_cols]].sum(axis=1)
hosp_df['ICU Total Inpt Days'] = hosp_df[[c+' Inpt Days' for c in icu_cols]].sum(axis=1)

In [0]:
hosp_df['ICU Occupancy Rate'] = hosp_df['ICU Total Inpt Days']/hosp_df['ICU Total Bed Days Available']
hosp_df['Total Bed Occupancy Rate'] = hosp_df['Total Inpt Days']/hosp_df['Total Bed Days Available']

In [0]:
hosp_df.head()

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County,Hospital Adult and Peds Beds,Intensive Care Unit Beds,Coronary Care Unit Beds,Burn ICU Beds,Surgical ICU Beds,Total Beds,Hospital Adult and Peds Bed Days Available,Intensive Care Unit Bed Days Available,Coronary Care Unit Bed Days Available,Burn ICU Bed Days Available,Surgical ICU Bed Days Available,Total Bed Days Available,Hospital Adult and Peds Inpt Days,Intensive Care Unit Inpt Days,Coronary Care Unit Inpt Days,Burn ICU Inpt Days,Surgical ICU Inpt Days,Total Inpt Days,ICU Total Beds,ICU Total Bed Days Available,ICU Total Inpt Days,ICU Occupancy Rate,Total Bed Occupancy Rate
0,10001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON,271.0,40.0,0.0,0.0,0.0,327.0,98915.0,14600.0,0.0,0.0,0.0,119355.0,78031.0,11992.0,0.0,0.0,0.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
1,10005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL,184.0,20.0,0.0,0.0,0.0,204.0,67160.0,7300.0,0.0,0.0,0.0,74460.0,30597.0,5283.0,0.0,0.0,0.0,38089.0,20.0,7300.0,5283.0,0.723699,0.511536
2,10006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE,181.0,36.0,16.0,0.0,0.0,233.0,85802.0,10233.0,8135.0,0.0,0.0,104170.0,44822.0,5732.0,7515.0,0.0,0.0,61969.0,52.0,18368.0,13247.0,0.7212,0.594883
3,10007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON,40.0,5.0,0.0,0.0,0.0,45.0,14600.0,1825.0,0.0,0.0,0.0,16425.0,3300.0,1126.0,0.0,0.0,0.0,4571.0,5.0,1825.0,1126.0,0.616986,0.278295
4,10008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW,29.0,0.0,0.0,0.0,0.0,29.0,10585.0,0.0,0.0,0.0,0.0,10585.0,1323.0,0.0,0.0,0.0,0.0,1334.0,0.0,0.0,0.0,,0.126027


In [0]:
# some rows have all NaN for bed counts, suspect because they didn't submit report in 2018 by looking at FYB (fiscal year begin), FYE (fiscal year end), Status
# TODO: need to confirm or clarify why NaN for all bed counts, dropping those would remove ~1k facilities
hosp_df.drop(hosp_df[hosp_df['Total Beds'].isna()].index).reset_index().drop('index', axis=1)

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County,Hospital Adult and Peds Beds,Intensive Care Unit Beds,Coronary Care Unit Beds,Burn ICU Beds,Surgical ICU Beds,Total Beds,Hospital Adult and Peds Bed Days Available,Intensive Care Unit Bed Days Available,Coronary Care Unit Bed Days Available,Burn ICU Bed Days Available,Surgical ICU Bed Days Available,Total Bed Days Available,Hospital Adult and Peds Inpt Days,Intensive Care Unit Inpt Days,Coronary Care Unit Inpt Days,Burn ICU Inpt Days,Surgical ICU Inpt Days,Total Inpt Days,ICU Total Beds,ICU Total Bed Days Available,ICU Total Inpt Days,ICU Occupancy Rate,Total Bed Occupancy Rate
0,010001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON,271.0,40.0,0.0,0.0,0.0,327.0,98915.0,14600.0,0.0,0.0,0.0,119355.0,78031.0,11992.0,0.0,0.0,0.0,95560.0,40.0,14600.0,11992.0,0.821370,0.800637
1,010005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL,184.0,20.0,0.0,0.0,0.0,204.0,67160.0,7300.0,0.0,0.0,0.0,74460.0,30597.0,5283.0,0.0,0.0,0.0,38089.0,20.0,7300.0,5283.0,0.723699,0.511536
2,010006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE,181.0,36.0,16.0,0.0,0.0,233.0,85802.0,10233.0,8135.0,0.0,0.0,104170.0,44822.0,5732.0,7515.0,0.0,0.0,61969.0,52.0,18368.0,13247.0,0.721200,0.594883
3,010007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON,40.0,5.0,0.0,0.0,0.0,45.0,14600.0,1825.0,0.0,0.0,0.0,16425.0,3300.0,1126.0,0.0,0.0,0.0,4571.0,5.0,1825.0,1126.0,0.616986,0.278295
4,010008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW,29.0,0.0,0.0,0.0,0.0,29.0,10585.0,0.0,0.0,0.0,0.0,10585.0,1323.0,0.0,0.0,0.0,0.0,1334.0,0.0,0.0,0.0,,0.126027
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5599,673062,01-JAN-18,31-DEC-18,As Submitted,4,WEATHERFORD REABILITATION HOSPITAL,703 EUREKA ST,,WEATHERFORD,TX,76086,PARKER,26.0,0.0,0.0,0.0,0.0,26.0,9490.0,0.0,0.0,0.0,0.0,9490.0,5582.0,0.0,0.0,0.0,0.0,5582.0,0.0,0.0,0.0,,0.588198
5600,673064,01-APR-18,31-MAR-19,As Submitted,6,ICARE REHABILITATION HOSPITAL,3100 PETERS COLONY ROAD,,FLOWER MOUND,TX,75022-2949,,41.0,0.0,0.0,0.0,0.0,41.0,14965.0,0.0,0.0,0.0,0.0,14965.0,3532.0,0.0,0.0,0.0,0.0,3532.0,0.0,0.0,0.0,,0.236017
5601,673065,01-OCT-17,30-SEP-18,As Submitted,5,CHI ST. JOSEPH HEALTH REHABILITATION,1600 JOSEPH DRIVE,,BRYAN,TX,77802,BRAZOS,49.0,0.0,0.0,0.0,0.0,49.0,17885.0,0.0,0.0,0.0,0.0,17885.0,15895.0,0.0,0.0,0.0,0.0,15895.0,0.0,0.0,0.0,,0.888734
5602,673066,10-NOV-17,30-SEP-18,As Submitted,4,ENCOMPASS HEALTH REHABILITATION HOSP,2121 BUSINESS CENTER DRIVE,,PEARLAND,TX,77584,BRAORIA,40.0,0.0,0.0,0.0,0.0,40.0,13000.0,0.0,0.0,0.0,0.0,13000.0,8997.0,0.0,0.0,0.0,0.0,8997.0,0.0,0.0,0.0,,0.692077


# Geocoding facilities

In [0]:
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |███▎                            | 10kB 15.7MB/s eta 0:00:01[K     |██████▋                         | 20kB 1.8MB/s eta 0:00:01[K     |██████████                      | 30kB 2.7MB/s eta 0:00:01[K     |█████████████▎                  | 40kB 1.7MB/s eta 0:00:01[K     |████████████████▋               | 51kB 2.1MB/s eta 0:00:01[K     |████████████████████            | 61kB 2.6MB/s eta 0:00:01[K     |███████████████████████▎        | 71kB 3.0MB/s eta 0:00:01[K     |██████████████████████████▋     | 81kB 3.3MB/s eta 0:00:01[K     |██████████████████████████████  | 92kB 3.7MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 2.2MB/s 
Collecting ratelim
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad4

In [0]:
!pip install geopandas

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/83/c5/3cf9cdc39a6f2552922f79915f36b45a95b71fd343cfc51170a5b6ddb6e8/geopandas-0.7.0-py2.py3-none-any.whl (928kB)
[K     |████████████████████████████████| 931kB 2.8MB/s 
Collecting pyproj>=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/77/15/d93f446d253d26b91553f86cf21049183e9b0f51f8c8e6cb2cff081bcc02/pyproj-2.5.0-cp36-cp36m-manylinux2010_x86_64.whl (10.4MB)
[K     |████████████████████████████████| 10.4MB 13.0MB/s 
Collecting fiona
[?25l  Downloading https://files.pythonhosted.org/packages/ec/20/4e63bc5c6e62df889297b382c3ccd4a7a488b00946aaaf81a118158c6f09/Fiona-1.8.13.post1-cp36-cp36m-manylinux1_x86_64.whl (14.7MB)
[K     |████████████████████████████████| 14.7MB 285kB/s 
Collecting click-plugins>=1.0
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Collecting munch
  Down

In [0]:
import geocoder
from shapely.geometry import Point
from pprint import pprint
import geopandas as gpd
from tqdm import tqdm
import requests

In [0]:
#add tokens here, don't forget to delete if git committing nb
MAPBOX_TOKEN = '' 
GMAP_TOKEN = ''

In [0]:
hosp_df.head()

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County,Hospital Adult and Peds Beds,Intensive Care Unit Beds,Coronary Care Unit Beds,Burn ICU Beds,Surgical ICU Beds,Total Beds,Hospital Adult and Peds Bed Days Available,Intensive Care Unit Bed Days Available,Coronary Care Unit Bed Days Available,Burn ICU Bed Days Available,Surgical ICU Bed Days Available,Total Bed Days Available,Hospital Adult and Peds Inpt Days,Intensive Care Unit Inpt Days,Coronary Care Unit Inpt Days,Burn ICU Inpt Days,Surgical ICU Inpt Days,Total Inpt Days,ICU Total Beds,ICU Total Bed Days Available,ICU Total Inpt Days,ICU Occupancy Rate,Total Bed Occupancy Rate
0,10001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON,271.0,40.0,0.0,0.0,0.0,327.0,98915.0,14600.0,0.0,0.0,0.0,119355.0,78031.0,11992.0,0.0,0.0,0.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
1,10005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL,184.0,20.0,0.0,0.0,0.0,204.0,67160.0,7300.0,0.0,0.0,0.0,74460.0,30597.0,5283.0,0.0,0.0,0.0,38089.0,20.0,7300.0,5283.0,0.723699,0.511536
2,10006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE,181.0,36.0,16.0,0.0,0.0,233.0,85802.0,10233.0,8135.0,0.0,0.0,104170.0,44822.0,5732.0,7515.0,0.0,0.0,61969.0,52.0,18368.0,13247.0,0.7212,0.594883
3,10007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON,40.0,5.0,0.0,0.0,0.0,45.0,14600.0,1825.0,0.0,0.0,0.0,16425.0,3300.0,1126.0,0.0,0.0,0.0,4571.0,5.0,1825.0,1126.0,0.616986,0.278295
4,10008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW,29.0,0.0,0.0,0.0,0.0,29.0,10585.0,0.0,0.0,0.0,0.0,10585.0,1323.0,0.0,0.0,0.0,0.0,1334.0,0.0,0.0,0.0,,0.126027


In [0]:
hosp_df.to_csv('usa_hospital_beds_hcris2018.csv')

In [0]:
hosp_df.head()

Unnamed: 0,PROVIDER_NUMBER,FYB,FYE,STATUS,CTRL_TYPE,HOSP10_Name,Street_Addr,PO_Box,City,State,Zip_Code,County,Hospital Adult and Peds Beds,Intensive Care Unit Beds,Coronary Care Unit Beds,Burn ICU Beds,Surgical ICU Beds,Total Beds,Hospital Adult and Peds Bed Days Available,Intensive Care Unit Bed Days Available,Coronary Care Unit Bed Days Available,Burn ICU Bed Days Available,Surgical ICU Bed Days Available,Total Bed Days Available,Hospital Adult and Peds Inpt Days,Intensive Care Unit Inpt Days,Coronary Care Unit Inpt Days,Burn ICU Inpt Days,Surgical ICU Inpt Days,Total Inpt Days,ICU Total Beds,ICU Total Bed Days Available,ICU Total Inpt Days,ICU Occupancy Rate,Total Bed Occupancy Rate
0,10001,01-OCT-17,30-SEP-18,As Submitted,9,SOUTHEAST HEALTH MEDICAL CENTER,1108 ROSS CLARK CIRCLE,6987,DOTHAN,AL,36301,HOUSTON,271.0,40.0,0.0,0.0,0.0,327.0,98915.0,14600.0,0.0,0.0,0.0,119355.0,78031.0,11992.0,0.0,0.0,0.0,95560.0,40.0,14600.0,11992.0,0.82137,0.800637
1,10005,01-OCT-17,30-SEP-18,As Submitted,9,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,,BOAZ,AL,35957-,MARSHALL,184.0,20.0,0.0,0.0,0.0,204.0,67160.0,7300.0,0.0,0.0,0.0,74460.0,30597.0,5283.0,0.0,0.0,0.0,38089.0,20.0,7300.0,5283.0,0.723699,0.511536
2,10006,01-JUL-18,30-JUN-19,As Submitted,4,NORTH ALABAMA MEDICAL CENTER,1701 VETERANS DRIVE,818,FLORENCE,AL,35630,LAUDERDALE,181.0,36.0,16.0,0.0,0.0,233.0,85802.0,10233.0,8135.0,0.0,0.0,104170.0,44822.0,5732.0,7515.0,0.0,0.0,61969.0,52.0,18368.0,13247.0,0.7212,0.594883
3,10007,01-OCT-17,30-SEP-18,As Submitted,9,MIZELL MEMORIAL HOSPITAL,702 MAIN STREET,429,OPP,AL,36462-,COVINGTON,40.0,5.0,0.0,0.0,0.0,45.0,14600.0,1825.0,0.0,0.0,0.0,16425.0,3300.0,1126.0,0.0,0.0,0.0,4571.0,5.0,1825.0,1126.0,0.616986,0.278295
4,10008,01-JAN-18,31-DEC-18,As Submitted,4,CRENSHAW COMMUNITY HOSPITAL,CRENSHAW COMMUNITY HOSPITAL,101 HOSPITAL CIRCLE,LUVERNE,AL,36049,CRENSHAW,29.0,0.0,0.0,0.0,0.0,29.0,10585.0,0.0,0.0,0.0,0.0,10585.0,1323.0,0.0,0.0,0.0,0.0,1334.0,0.0,0.0,0.0,,0.126027


In [0]:
# save geocoder results as Point(lon,lat) in dict with df idx as key
geoms = {}

In [0]:
# geocoding first with google maps which gave best results, fall back to mapbox geocode and then search_str without street address
# TODO: figure out why geocoder with gmaps stopped sending requests halfway through, temp switch to directly using gmaps api

for i, row in tqdm(hosp_df.iloc[:].iterrows()):
  search_str = str(', ').join(row[['HOSP10_Name','Street_Addr','City','State','Zip_Code']].astype(str).values).replace('-','').replace('#','').replace('/',' ')
  new_str = str(', ').join(row[['HOSP10_Name','City','State','Zip_Code']].astype(str).values).replace('-','').replace('#','').replace('/',' ').replace(';',',')
  
  try: 
    # g = geocoder.google(search_str, key=GMAP_TOKEN)
    # lon, lat = g.json['lng'], g.json['lat']
    
    res = requests.post(f'https://maps.googleapis.com/maps/api/geocode/json?address={search_str}&components=country:US&key={GMAP_TOKEN}')
    g1 = res.json()['results'][0]['geometry']['location']
    lon, lat = g1['lng'], g1['lat']

  except Exception as exc: 
    print(exc)
    try: 
      print(i, search_str, new_str)
      g = geocoder.mapbox(search_str, key=MAPBOX_TOKEN)
      pprint(g.json)
      lon, lat = g.json['lng'], g.json['lat']

    except Exception as exc:
      print('trying mapbox ',i, new_str)
      g = geocoder.mapbox(new_str, key=TOKEN)
      pprint(g.json)
      lon, lat = g.json['lng'], g.json['lat']  

  geoms[i] = Point(lon,lat)

In [0]:
geoms

{0: <shapely.geometry.point.Point at 0x7f93f0f09b00>,
 1: <shapely.geometry.point.Point at 0x7f93ebd767f0>,
 2: <shapely.geometry.point.Point at 0x7f93ebd76828>,
 3: <shapely.geometry.point.Point at 0x7f93ee63dd30>,
 4: <shapely.geometry.point.Point at 0x7f93eed197b8>,
 5: <shapely.geometry.point.Point at 0x7f93ebd765f8>,
 6: <shapely.geometry.point.Point at 0x7f93f0f28828>,
 7: <shapely.geometry.point.Point at 0x7f93ebd7af98>,
 8: <shapely.geometry.point.Point at 0x7f93ee73ebe0>,
 9: <shapely.geometry.point.Point at 0x7f93f0f17eb8>}

In [0]:
hosp_df.shape

(6662, 35)

In [0]:
len(geoms)

6662

In [0]:
geoms_list = [geoms[i] for i in range(len(geoms))]

In [0]:
hosp_df['Total Beds'].sort_values(ascending=False)

854     2.290239e+09
1026    2.753000e+03
3901    2.272000e+03
5538    1.560000e+03
3959    1.468000e+03
            ...     
6621             NaN
6628             NaN
6641             NaN
6651             NaN
6657             NaN
Name: Total Beds, Length: 6662, dtype: float64

In [0]:
# make gdf out of hosp_df and geocoder results, drop row with the huge # of reported beds, save to geojson
gpd.GeoDataFrame(hosp_df.iloc[:], geometry=geoms_list).drop(index=854).to_file('usa_hospital_beds_hcris2018_cleaned3.geojson', encoding='utf-8', driver='GeoJSON')