# Delivery! App
## Karla Knudson
### Insight Data Science Project

This is the base code that I used for the app, Delivery!, which I developed for my Insight Data Science Project. The app was launched on AWS using Dash. (The full code used for the Dash app can also be found in a separate file on Github.) The final web app is available at http://www.delivermybaby.io/

# Import tools

In [15]:
import math
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State
import numpy as np
import pandas as pd

# Load data

Data files used here: 

1. Zipcode data - This file contains the latitude and longitude for all California zipcodes. The file was sourced from: https://www.gaslampmedia.com/download-zip-code-latitude-longitude-city-state-county-csv/. There were some missing values, which I looked up using Google Maps and added to the file.

2. Hospital data - This spreadsheet compiles data from 4 different files sourced from the California Health and Human Services Open Data Portal:

    Primary Care Health Professional Shortage Area (HPSA) - Census Detail https://data.chhs.ca.gov/dataset/primary-care-health-professional-shortage-area-hpsa-census-detail This dataset contains the Medical Service Study Areas (MSSA), at the census tract level, designated as Primary Care Health Professional Shortage Areas in California in 2014.

    Utilization rates for select medical procedures (2005-2015) https://data.chhs.ca.gov/dataset/utilization-rates-for-selected-medical-procedures-in-california-hospitals For each hospital in CA, there is info on the rates of: Uncomplicated Primary C-section Uncomplicated VBAC Vaginal Birth after C-Section Laparoscopic Cholecystectomy (Gall Bladder Surgery)

    2017 Hospital Inpatient - Characteristics by Facility (Pivot Profile) https://data.chhs.ca.gov/dataset/ed5d21ce-9ec4-44ca-a482-2f4c767e0528/resource/fa44b3f7-7c0d-4baa-b49c-4ebb8c4ebe4e/download/2017pddpivot20180712.xlsm

    Licensed Bed Classification and Designations Trends https://data.chhs.ca.gov/dataset/licensed-bed-classification-and-designations-trends/resource/2cfb1ff7-c204-4f5b-8e1f-2373007fe1d1

Hospital data sets were compiled using VLOOKUP tables in Excel prior to importing into Jupyter Notebook.

In [22]:
zipcode_data = pd.read_csv('on_server/data/zipcode.csv', header=None)
zipcode_data = zipcode_data.set_index(0)

hospital_data = pd.read_csv('on_server/data/hospital8.csv', header=None)
hospital_data = hospital_data.set_index(0)


# Hospital calculations for all California maternity hospitals

Here, I calculate hospital means and standard deviations for each of the three hospital metrics: C-section rates, VBAC rates, and NICU size. These values will be used in individual hospital score calculations later.

### Calculate population mean values

In [23]:
avg_csec = np.mean(hospital_data[3]) # avg low-risk primary c-section rate(%)
avg_vbac = np.mean(hospital_data[4])#avg VBAC rate(%)
avg_nicu = np.mean(hospital_data[5]) # avg total NICU discharges in 2005

### Calculate the population standard deviation values

In [24]:
sd_csec = np.std(hospital_data[3]) # avg low-risk primary c-section rate(%)
sd_vbac = np.std(hospital_data[4])#avg VBAC rate(%)
sd_nicu = np.std(hospital_data[5]) # avg total NICU discharges in 2005

# Convert dataframes to nested dictionaries

I create a nested dictionary for all California maternity hospitals. The first index is the zip code in which a hospital is located, the keys are latitude of the hospital, longitude of the hospital, uncomplicated C-section rate, vaginal birth after c-section rate, NICU volume per year, city, and Health Professional Shortage Area (HPSA) status (1 = hospital is within a HPSA, 0 = hospital is not within a HPSA).

In [25]:
zipcode_data = zipcode_data.to_dict('index')
print(zipcode_data)
hospital_data = hospital_data.rename(
    index=str,
    columns={1: "lat", 2: "lng", 3: "c-sec", 4: "vbac", 5: "nicu", 6: "city", 7: "hpsa"}
)
hospital_data = hospital_data.to_dict('index')

{94538: {1: 37.509453, 2: -121.95832, 3: 'Fremont', 4: 'CA', 5: 'Alameda'}, 94539: {1: 37.520339, 2: -121.912568, 3: 'Fremont', 4: 'CA', 5: 'Alameda'}, 94560: {1: 37.534102000000004, 2: -122.025352, 3: 'Newark', 4: 'CA', 5: 'Alameda'}, 94536: {1: 37.565284999999996, 2: -121.982721, 3: 'Fremont', 4: 'CA', 5: 'Alameda'}, 94555: {1: 37.570681, 2: -122.063323, 3: 'Fremont', 4: 'CA', 5: 'Alameda'}, 94586: {1: 37.585883, 2: -121.88301799999999, 3: 'Sunol', 4: 'CA', 5: 'Alameda'}, 94587: {1: 37.589084, 2: -121.97362, 3: 'Union City', 4: 'CA', 5: 'Alameda'}, 94544: {1: 37.613883, 2: -122.061673, 3: 'Hayward', 4: 'CA', 5: 'Alameda'}, 94545: {1: 37.635482, 2: -122.092324, 3: 'Hayward', 4: 'CA', 5: 'Alameda'}, 94566: {1: 37.646081, 2: -121.862128, 3: 'Pleasanton', 4: 'CA', 5: 'Alameda'}, 94542: {1: 37.662552000000005, 2: -122.051179, 3: 'Hayward', 4: 'CA', 5: 'Alameda'}, 94541: {1: 37.675129999999996, 2: -121.97412, 3: 'Hayward', 4: 'CA', 5: 'Alameda'}, 94550: {1: 37.676781, 2: -121.91605, 3: 'Li

# Distance function 

This function takes the latitude and longitude (in degrees) of an origin and a destination and outputs how far away the two locations are from each other (in miles). This is needed so that a person using the app can locate all of the hospitals within a given distance from her home.

In [26]:
def distance(origin, destination):
    lat1, lon1 = origin
    lat2, lon2 = destination
    radius = 3959
    dlat = math.radians(lat2-lat1)
    dlon = math.radians(lon2-lon1)
    a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
        * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    d = radius * c
    return (d)

# Search for hospitals
### Look for hospitals within a given distance and rank them according to hospital criteria specified by the user

The following function takes user input from the Dash app, which includes:
a) the user's home zip code
b) the maximum distance that the user says she is willing to go to a hospital
c) how important the user says that each feature is to her - features are distance the the hospital, C-section rates, VBAC rates, NICU size

The function then uses the distance function, which was previously defined, to locate the all of the maternity hospitals within the designated range of the users home. If a hospital is within this range, the function then determines a unique hospital score in order to rank the hospitals for a particular user. The hospital score is calculated using the z-scores of each hospital metric (c-section rate, VBAC rate, and NICU size) multiplied by the relative weights of the importances of each factor assigned by the user. 

Specifically: hospital score = b1 * (1 - z_csec) + b2 * z_vbac + b3 * z_nicu - b4 * dist_to_hosp, where b(i) = the weight of each factor set by the user and z(i) = the z-score of each factor. 

All the weights are normalized to 1.

#### Hospital search and rank function

In [27]:
def get_hospitals(n_clicks, zipcode, distance_max, w_close, w_csec, w_vbac, w_nicu):
    zipcode = int(zipcode)
    distance_max = int(distance_max)
    zipcode = zipcode_data[zipcode]
    origin = (zipcode[1], zipcode[2]) # (lat, lng)
    results = {} #this will be a dictionary of all the hospitals and their associated data
    for hospital, values in hospital_data.items(): #key is the hospital; value is the information variables
        destination = (values['lat'], values['lng'])
        dist_to_hosp = distance(origin, destination)
        if dist_to_hosp <= distance_max:

            # Z-score of the c-section parameter:
            d_csec = np.subtract(values['c-sec'], avg_csec)
            z_csec = np.divide(d_csec, sd_csec)

            # Z-score of the VBAC parameter:
            d_vbac = np.subtract(values['vbac'], avg_vbac)
            z_vbac = np.divide(d_vbac, sd_vbac)

            # Z-score of the NICU parameter:
            d_nicu = np.subtract(values['nicu'], avg_nicu)
            z_nicu = np.divide(d_nicu, sd_nicu)

            # Normalized weights of all factors (sum = 1)
            b1 = w_csec / (w_csec + w_vbac + w_nicu + w_close) # weight of low c-section rates
            b2 = w_vbac / (w_csec + w_vbac + w_nicu + w_close) # weight of high VBAC rates
            b3 = w_nicu / (w_csec + w_vbac + w_nicu + w_close) # weight of high NICU volume
            b4 = w_close / (w_csec + w_vbac + w_nicu + w_close) # weight of short distance to hospital from home

            # Calculate hospital scores
            values['h_score'] = b1 * (1 - z_csec) + b2 * z_vbac + b3 * z_nicu - b4 * dist_to_hosp

            # add distance to hospital to the dictionary
            values['dist'] = '{0:.2f}'.format(dist_to_hosp) #format for 2 decimals
            
            # add an alert if a resulting hospital is within a HPSA
            if values['hpsa'] == 1:
                values['hpsa'] = 'Alert: This is a registered Health Professional Shortage Area (HPSA).'
            else:
                values['hpsa'] = ''
                
            results[hospital] = values

# Sort the results to provide a ranked list for each individual app user
    sorted_keys = sorted(results.keys(), key=lambda y: (results[y]['h_score']), reverse=True)
    
    data = []
    for hospital in sorted_keys:
        data.append([
            hospital,
            results[hospital]['city'],
            results[hospital]['dist'],
            str(results[hospital]['c-sec']),
            str(results[hospital]['vbac']),
            str(results[hospital]['nicu']),
            str(results[hospital]['h_score']),
            results[hospital]['hpsa']
        ])
    
    columns = [
        'Hospital',
        'City',
        'Distance',
        'c-sec',
        'vbac',
        'nicu',
        'h_score',
        'hpsa'
    ]
        
    sorted_results = pd.DataFrame(data, columns=columns)

    return sorted_results

# Test search results

This example is hard-coded to test the base code before deploying on AWS using Dash. The example shows the following input: 1 click, home zip code of 95060 (Santa Cruz, CA), willing to drive 100 miles to the hospital, and mid-values (5/10) for each hospital feature - c-section rates, VBAC rates, NICU size, and distance to hospital.

In [28]:
results = get_hospitals(1, 95060, 100, 5, 5, 5, 5) 
print(results.columns)
for i in range(len(results)):
    print(results.iloc[i])

Index(['Hospital', 'City', 'Distance', 'c-sec', 'vbac', 'nicu', 'h_score',
       'hpsa'],
      dtype='object')
Hospital    Sutter Maternity and Surgery Center of Santa Cruz
City                                               SANTA CRUZ
Distance                                                 8.59
c-sec                                                    17.4
vbac                                                     33.3
nicu                                                        0
h_score                                   -1.1443695054347165
hpsa                                                         
Name: 0, dtype: object
Hospital    Dominican Hospital
City                SANTA CRUZ
Distance                  8.28
c-sec                     26.6
vbac                      21.4
nicu                       309
h_score     -1.346932567950116
hpsa                          
Name: 1, dtype: object
Hospital    Good Samaritan Hospital – San Jose
City                                  SAN JOSE
Dis