# Process HIFLD data

This notebook reads in the HIFLD dataset and converts it to GeoJSON format.

From https://hifld-geoplatform.opendata.arcgis.com/datasets/hospitals

> This feature class/shapefile contains locations of Hospitals for 50 US states, Washington D.C., US territories of Puerto Rico, Guam, American Samoa, Northern Mariana Islands, Palau, and Virgin Islands. The dataset only includes hospital facilities based on data acquired from various state departments or federal sources which has been referenced in the SOURCE field. Hospital facilities which do not occur in these sources will be not present in the database. The source data was available in a variety of formats (pdfs, tables, webpages, etc.) which was cleaned and geocoded and then converted into a spatial database. The database does not contain nursing homes or health centers. Hospitals have been categorized into children, chronic disease, critical access, general acute care, long term care, military, psychiatric, rehabilitation, special, and women based on the range of the available values from the various sources after removing similarities. In this update the TRAUMA field was populated for 172 additional hospitals and helipad presence were verified for all hospitals.

In [None]:
import pandas as pd
import geopandas as gpd

from covidcaremap.data import external_data_path, processed_data_path

In [None]:
hifld_file_path = external_data_path('hifld-hospitals.csv')
hifld_df = pd.read_csv(hifld_file_path, encoding='utf-8')

In [None]:
hifld_gdf = gpd.GeoDataFrame(
    hifld_df,
    crs='EPSG:4326',
    geometry=gpd.points_from_xy(hifld_df['X'], hifld_df['Y']))

#### Remove closed facilities

Filter out any facilities with STATUS = CLOSED.

In [None]:
hifld_gdf = hifld_gdf[hifld_gdf['STATUS'] != 'CLOSED']

In [None]:
hifld_gdf.to_file(processed_data_path('hifld_facility_data.geojson'), 
                  encoding='utf-8', 
                  driver='GeoJSON')