# Coursera-IBM Data Science Capstone Project
## Los Angeles County Hospital-COVID Mapping
Bofan Chen  
15012021

# (1) Introduction

## (1-1) Background

In difficult public health circumstances, supplies may become scarce, and communities may not be affected equally. Looking at a range of demographic, public health, and hospital data may allow decision-makers to create informed plans about how to mobilize medical resources in a targeted fashion.

In more regular times, certain regions may still be underserved with regards to their healthcare needs. Geospatial analysis can also help determine which areas may benefit most from a newly constructed hospital or increased medical staffing.

## (1-2) Problem

This capstone project for the Coursera-IBM Data Science Certificate will explore hospital data by cities in Los Angeles County, as well as COVID case data in the region between mid-December 2020 to mid-January 2021.

More specifically, this assignment tries to analyze data in two areas.

1. Which zip codes in LAC are located furthest away from the nearest medical center, as found by Foursquare, and would therefore perhaps benefit most from a new hospital.
2. Which cities in LAC have the highest ratios of COVID cases to hospital beds, and could require increased medical assistance in the near future; as well as which cities in LAC have the highest ratios of people to hospital beds, and may benefit from increased hospital capacities in the longer term.

# (2) Data

Import `lxml` to parse webpage HTML and `requests` to connect to URLS (in this project, to FourSquare).

In [1]:
try:
    import lxml
except:
    !pip install lxml
    import lxml

try:
    import requests
except:
    !pip install requests
    import requests

print("Libraries imported.")

Libraries imported.


Import `numpy` and `pandas` for handling data sets.

In [2]:
# <numpy> for arrays functions.
try:
    import numpy as np
except:
    !pip install numpy
    import numpy as np
# <pandas> for data analysis and manipulation.
try:
    import pandas as pd
except:
    !pip install pandas
    import pandas as pd

print("Libraries imported.")

Libraries imported.


## (2-1) List Of Los Angeles County Cities
Data provided by Wikipedia.  
https://en.wikipedia.org/wiki/List_of_cities_in_Los_Angeles_County,_California

In [3]:
webpage = "https://en.wikipedia.org/wiki/List_of_cities_in_Los_Angeles_County,_California"
df_list = pd.read_html(webpage)

# Manual examination not shown.
# The first table contains the relevant city data.
# The second table contains CSS styling code.

city_list = df_list[0]

# Remove unnecessary column.
city_list.drop("Date incorporated", axis = 1, inplace = True)

# Rename a column.
city_list.rename(columns = {"Population as of(2010 Census)":"2010 Population"}, inplace = True)

# Basic inspection.
print("The Wikipedia Los Angeles County city populations data includes " + str(city_list.shape[0]) + " cities.")
city_list

The Wikipedia Los Angeles County city populations data includes 88 cities.


Unnamed: 0,City,2010 Population
0,Agoura Hills,20330
1,Alhambra,83653
2,Arcadia,56364
3,Artesia,16522
4,Avalon,3728
...,...,...
83,Walnut,29172
84,West Covina,106098
85,West Hollywood,34399
86,Westlake Village,8270


## (2-2) Los Angeles County Zip Code Coordinate Data
Data compiled by Schyuyler Erle at Geocoder and provided by Civic Space Labs.  
https://civicspacelabs.org/download/

In [4]:
# Import the zip code coordinate data.
zip_loc = pd.read_csv("data/zipcode.csv")

# Drop all zip codes not part of a city in Los Angeles County.
zip_loc = zip_loc.loc[zip_loc["state"] == "CA"]
zip_loc = zip_loc.loc[zip_loc["city"].isin(city_list["City"].to_list())]

# Drop unnecessary columns.
zip_loc.drop(["state", "timezone", "dst"], axis = 1, inplace = True)

# Rename certain columns.
zip_loc.rename(columns = {"zip":"Zip Code", \
                           "city":"City", \
                           "latitude":"Latitude", \
                           "longitude":"Longitude"}, inplace = True)

zip_loc.reset_index(drop = True, inplace = True)

# Do a basic examination of the data.
print("The zip code geographical coordinates data for California zip codes has " + str(zip_loc.shape[0]) + " zip codes.")
print("There are " + str(len(zip_loc["City"].unique())) + " cities represented in the above data.")
zip_loc

The zip code geographical coordinates data for California zip codes has 404 zip codes.
There are 75 cities represented in the above data.


Unnamed: 0,Zip Code,City,Latitude,Longitude
0,90001,Los Angeles,33.972914,-118.248780
1,90002,Los Angeles,33.948315,-118.248450
2,90003,Los Angeles,33.962714,-118.276000
3,90004,Los Angeles,34.077110,-118.307550
4,90005,Los Angeles,34.058911,-118.308480
...,...,...,...,...
399,93584,Lancaster,33.786594,-118.298662
400,93586,Lancaster,33.786594,-118.298662
401,93590,Palmdale,33.786594,-118.298662
402,93591,Palmdale,34.596742,-117.844670


## (2-3) Los Angeles County Hospital Bed Data
Data provided by the California Health and Human Services open data portal.  
https://data.chhs.ca.gov/dataset/licensed-healthcare-facility-listing/resource/677eb6cf-46b0-47f0-9184-7e36f6743ac4

In [5]:
file = "data/licensed-healthcare-facility-listing-december-31-2020.csv"
hsp_beds = pd.read_csv(file)

# Keep only medical care facilities in Los Angeles County.
hsp_beds = hsp_beds.loc[hsp_beds["COUNTY_NAME"] == "Los Angeles"]
hsp_beds.drop(["COUNTY_NAME"], axis = 1, inplace = True)
hsp_beds

Unnamed: 0,OSHPD_ID,FACILITY_NAME,LICENSE_NUM,FACILITY_LEVEL_DESC,DBA_ADDRESS1,DBA_CITY,DBA_ZIP_CODE,COUNTY_CODE,ER_SERVICE_LEVEL_DESC,TOTAL_NUMBER_BEDS,FACILITY_STATUS_DESC,FACILITY_STATUS_DATE,LICENSE_TYPE_DESC,LICENSE_CATEGORY_DESC,LATITUDE,LONGITUDE
88,106190017,ALHAMBRA HOSPITAL MEDICAL CENTER,930000005,Parent Facility,100 S RAYMOND AVE,ALHAMBRA,91801,19,Emergency - Basic,144,Open,1946-01-01,Hospital,General Acute Care Hospital,34.08988,-118.144900
89,106190020,BHC ALHAMBRA HOSPITAL,930000006,Parent Facility,4619 ROSEMEAD BLVD,ROSEMEAD,91770,19,Not Applicable,97,Open,1946-01-01,Hospital,Acute Psychiatric Hospital,34.08926,-118.073520
90,106190034,ANTELOPE VALLEY HOSPITAL,930000008,Parent Facility,1600 W AVE J,LANCASTER,93534,19,Emergency - Basic,420,Open,1955-09-27,Hospital,General Acute Care Hospital,34.68780,-118.157981
91,106190045,CATALINA ISLAND MEDICAL CENTER,930000010,Parent Facility,100 FALLS CANYON RD,AVALON,90704,19,Emergency - Standby,12,Open,1946-01-01,Hospital,General Acute Care Hospital,33.33887,-118.333690
92,106190049,KINDRED HOSPITAL - BALDWIN PARK,930000390,Parent Facility,14148 FRANCISQUITO AVE,BALDWIN PARK,91706,19,,95,Open,2003-12-23,Hospital,General Acute Care Hospital,34.06288,-117.967611
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7992,406561615,"SPIRIT HEALTHCARE HOSPICE, INC.",550004634,Parent Facility,30941 Agoura Rd,Westlake Village,91361,19,Not Applicable,,Open,2019-03-15,Home Health Agency/Hospice,Hospice,34.14489,-118.795400
8043,406564172,"GIBRALTAR HOME HEALTH SERVICES, LLC",550000587,Parent Facility,27001 Agoura Rd,Calabasas,91301,19,Not Applicable,,Open,2007-09-17,Home Health Agency/Hospice,Home Health Agency,34.13804,-118.713720
8073,406564227,CARE ONE HOSPICE,550001869,Parent Facility,18520 Burbank Blvd,Tarzana,91356,19,Not Applicable,,Open,2012-08-20,Home Health Agency/Hospice,Hospice,34.17231,-118.536840
8095,406564263,HESTIA HOSPICE AND PALLIATIVE CARE CORPORATION,550001490,Parent Facility,15545 Devonshire St,Mission Hills,91345,19,Not Applicable,,Open,2015-01-12,Home Health Agency/Hospice,Hospice,34.25735,-118.470810


Examine the types of facilities included in the data.

In [6]:
print(hsp_beds["LICENSE_CATEGORY_DESC"].unique())

['General Acute Care Hospital' 'Acute Psychiatric Hospital'
 'Chemical Dep. Recovery Hospital' 'Psychiatric Health Facility'
 'Skilled Nursing Facility' 'Congregate Living Health Facility'
 'ICF/Dev. Disabled' 'Intermediate Care Facility' 'Community Clinic'
 'Chronic Dialysis Clinic' 'Free Clinic' 'Psychology Clinic'
 'Rehabilitation Clinic' 'Surgical Clinic' 'Alternative Birthing Center'
 'Home Health Agency' 'Hospice']


See what statuses other than "open" are used to describe facilities.

In [7]:
print(hsp_beds["FACILITY_STATUS_DESC"].unique())

['Open' 'Suspense']


In [8]:
# Remove facilities whose operations are currently "suspended" and those with an unspecified number of beds.
hsp_beds = hsp_beds.loc[(hsp_beds["FACILITY_STATUS_DESC"] == "Open") & \
                        (hsp_beds["TOTAL_NUMBER_BEDS"].notna())]

# Remove unneeded columns. Keep coordinate data for merging purposes later on with FourSquare data.
hsp_beds = hsp_beds[["FACILITY_NAME", "TOTAL_NUMBER_BEDS", "DBA_ZIP_CODE", "DBA_CITY", "LATITUDE", "LONGITUDE"]]

# Rename certain columns.
hsp_beds.rename(columns = {"FACILITY_NAME":"Hospital Name", \
                           "TOTAL_NUMBER_BEDS":"Number Of Beds", \
                           "DBA_ZIP_CODE":"Zip Code", \
                           "DBA_CITY":"City", \
                           "LATITUDE":"Latitude", \
                           "LONGITUDE":"Longitude"}, inplace = True)

hsp_beds.reset_index(drop = True, inplace = True)
    
# Basic inspection.
nums = hsp_beds.shape
print("The filtered Los Angeles County hospital bed data covers " + str(nums[0]) + " healthcare facilities.")
hsp_beds

The filtered Los Angeles County hospital bed data covers 625 healthcare facilities.


Unnamed: 0,Hospital Name,Number Of Beds,Zip Code,City,Latitude,Longitude
0,ALHAMBRA HOSPITAL MEDICAL CENTER,144,91801,ALHAMBRA,34.08988,-118.144900
1,BHC ALHAMBRA HOSPITAL,97,91770,ROSEMEAD,34.08926,-118.073520
2,ANTELOPE VALLEY HOSPITAL,420,93534,LANCASTER,34.68780,-118.157981
3,CATALINA ISLAND MEDICAL CENTER,12,90704,AVALON,33.33887,-118.333690
4,KINDRED HOSPITAL - BALDWIN PARK,95,91706,BALDWIN PARK,34.06288,-117.967611
...,...,...,...,...,...,...
620,CONTINENTAL CLHF,6,91767,Pomona,34.07846,-117.736700
621,"A & A WELLNESS CONGREGATE, INC.",6,91343,North Hills,34.23844,-118.492040
622,COMFORTMAVENS-TORRANCE,6,90501,Torrance,33.81854,-118.327400
623,"S.F. VALLEY CONGREGATE LIVING, INC.",6,91335,Reseda,34.20390,-118.550230


In [9]:
# Format the city names.
hsp_beds["City"] = hsp_beds["City"].apply(lambda x: x.title())
hsp_beds

Unnamed: 0,Hospital Name,Number Of Beds,Zip Code,City,Latitude,Longitude
0,ALHAMBRA HOSPITAL MEDICAL CENTER,144,91801,Alhambra,34.08988,-118.144900
1,BHC ALHAMBRA HOSPITAL,97,91770,Rosemead,34.08926,-118.073520
2,ANTELOPE VALLEY HOSPITAL,420,93534,Lancaster,34.68780,-118.157981
3,CATALINA ISLAND MEDICAL CENTER,12,90704,Avalon,33.33887,-118.333690
4,KINDRED HOSPITAL - BALDWIN PARK,95,91706,Baldwin Park,34.06288,-117.967611
...,...,...,...,...,...,...
620,CONTINENTAL CLHF,6,91767,Pomona,34.07846,-117.736700
621,"A & A WELLNESS CONGREGATE, INC.",6,91343,North Hills,34.23844,-118.492040
622,COMFORTMAVENS-TORRANCE,6,90501,Torrance,33.81854,-118.327400
623,"S.F. VALLEY CONGREGATE LIVING, INC.",6,91335,Reseda,34.20390,-118.550230


## (2-4) Los Angeles County Coronavirus Case Data
Data provided by the Los Angeles Times.  
https://github.com/datadesk/california-coronavirus-data/blob/master/latimes-place-totals.csv

In [10]:
file = "data/latimes-place-totals.csv"
covid_cases = pd.read_csv(file)

# Retain only Los Angeles County data.
covid_cases = covid_cases.loc[covid_cases["county"] == "Los Angeles"]

# Drop unneeded columns.
covid_cases.drop(["county", "fips", "note", "x", "y", "population"], axis = 1, inplace = True)

# Create a new table of total new cases between 2020 December 15 and 2021 Janurary 15.
new_cases = covid_cases.loc[covid_cases["date"] == "2021-01-15"]
old_cases = covid_cases.loc[covid_cases["date"] == "2020-12-15"]

covid_cases = pd.merge(new_cases, old_cases, on = "place", how = "inner")
covid_cases

Unnamed: 0,date_x,place,confirmed_cases_x,date_y,confirmed_cases_y
0,2021-01-15,Acton,350,2020-12-15,183
1,2021-01-15,Adams-Normandie,931,2020-12-15,525
2,2021-01-15,Agoura Hills,751,2020-12-15,418
3,2021-01-15,Agua Dulce,178,2020-12-15,84
4,2021-01-15,Alhambra,5445,2020-12-15,2727
...,...,...,...,...,...
328,2021-01-15,Wilmington,6405,2020-12-15,3503
329,2021-01-15,Wilshire Center,4247,2020-12-15,2177
330,2021-01-15,Winnetka,5284,2020-12-15,2927
331,2021-01-15,Wiseburn,388,2020-12-15,232


In [11]:
covid_cases["Cases Last 30 Days"] = 0
for index, row in covid_cases.iterrows():
    covid_cases.loc[index, "Cases Last 30 Days"] = row["confirmed_cases_x"] - row["confirmed_cases_y"]

covid_cases.drop(["date_x", "confirmed_cases_x", "date_y", "confirmed_cases_y"], axis = 1, inplace = True)

# Rename a column.
covid_cases.rename(columns = {"place":"City"}, inplace = True)

# Basic inspection.
print("The Los Angeles County COVID-19 case data includes " + str(covid_cases.shape[0]) + \
  " cities and Los Angeles City neighborhoods.")
covid_cases

The Los Angeles County COVID-19 case data includes 333 cities and Los Angeles City neighborhoods.


Unnamed: 0,City,Cases Last 30 Days
0,Acton,167
1,Adams-Normandie,406
2,Agoura Hills,333
3,Agua Dulce,94
4,Alhambra,2718
...,...,...
328,Wilmington,2902
329,Wilshire Center,2070
330,Winnetka,2357
331,Wiseburn,156


Sum together all cases in Los Angeles City neighborhoods into a new data row. (Assume that all entries in the COVID table not in the Wikipedia list of cities is a neighborhood of Los Angeles City.)

In [12]:
la_cases = 0
for index, row in covid_cases.iterrows():
    if row["City"] not in city_list["City"].to_list():
        la_cases += row["Cases Last 30 Days"]
        covid_cases.drop(index, inplace = True)
covid_cases = covid_cases.append({"City":"Los Angeles", 
                    "Cases Last 30 Days":la_cases}, 
                    ignore_index = True)
# Re-sort the data frame by city name.
covid_cases = covid_cases.sort_values("City")

covid_cases.reset_index(drop = True, inplace = True)

covid_cases

Unnamed: 0,City,Cases Last 30 Days
0,Agoura Hills,333
1,Alhambra,2718
2,Arcadia,979
3,Artesia,801
4,Avalon,3
...,...,...
80,Walnut,634
81,West Covina,4472
82,West Hollywood,561
83,Westlake Village,16


In [13]:
covid_cases.loc[covid_cases["City"] == "Los Angeles"]

Unnamed: 0,City,Cases Last 30 Days
46,Los Angeles,228146


Yikes.

## (2-5) Foursquare's Los Angeles County Hospital Location Data
Data provided by Foursquare.  
Category IDs - https://developer.foursquare.com/docs/build-with-foursquare/categories  
Hospital Category ID - 4bf58dd8d48988d196941735  
Hospital Ward Category ID - 58daa1558bbb0b01f18ec1f7  
Emergency Room Category ID - 4bf58dd8d48988d194941735  
Urgent Care Center Category ID - 56aa371be4b08b9a8d573526  

In [14]:
# My account codes.
CLIENT_ID = # REDACTED
CLIENT_SECRET = # REDACTED

# The version of the Foursquare API to be used.
VERSION = "20180605"

# FourSquare category IDs for relevant healthcare facilities.
HOSPITAL_ID = "4bf58dd8d48988d196941735"
WARD_ID = "58daa1558bbb0b01f18ec1f7"
ER_ID = "4bf58dd8d48988d194941735"
URGENT_ID = "56aa371be4b08b9a8d573526"
# Current upper limit.
RANGE = 100000
# Current upper limit.
LIMIT = 50

In [15]:
hospitals = []

for index, row, in zip_loc.iterrows():
    zip_code = row["Zip"]
    city = row["City"]
    lat = row["Latitude"]
    lng = row["Longitude"]
    # Create the API request URL.
    url = \
      "https://api.foursquare.com/v2/venues/" + \
      "explore?categoryId={},{},{},{}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}"\
      .format(HOSPITAL_ID, \
              WARD_ID, \
              ER_ID, \
              URGENT_ID, \
              CLIENT_ID, \
              CLIENT_SECRET, \
              VERSION, \
              lat, \
              lng, \
              RANGE, \
              LIMIT)

    # Make the <GET> request.
    results = requests.get(url).json()["response"]["groups"][0]["items"]
    
    for h in results:
        hospital = []
        if "postalCode" in h["venue"]["location"]:
            zc = h["venue"]["location"]["postalCode"]
        else:
            zc = np.NaN
        if "city" in h["venue"]["location"]:
            ct = h["venue"]["location"]["city"]
        else:
            ct = np.NaN

        # Return relevant info for each hospital.
        hospital.append(h["venue"]["name"])
        hospital.append(zc)
        hospital.append(ct)
        hospital.append(h["venue"]["location"]["lat"])
        hospital.append(h["venue"]["location"]["lng"])
        hospital.append(h["venue"]["categories"][0]["name"])
        hospitals.append(hospital)

la_hsp = pd.DataFrame(hospitals)
la_hsp.columns = ["Hospital Name", \
  "Zip Code", \
  "City", \
  "Latitude", \
  "Longitude", \
  "Medical Center Type"]
la_hsp.drop_duplicates(inplace = True)
la_hsp.reset_index(drop = True, inplace = True)

In [16]:
# Save this huge table for future use.
la_hsp.to_csv("data/foursquare_la_county_hospitals.csv", index = False)

In [17]:
# R E L O A D
la_hsp = pd.read_csv("data/foursquare_la_county_hospitals.csv")

In [18]:
la_hsp

Unnamed: 0,Hospital Name,Zip Code,City,Latitude,Longitude,Medical Center Type
0,Long Beach Memorial Medical Center,90806,Long Beach,33.808181,-118.186854,Hospital
1,Hoag Hospital Newport Beach,92663,Newport Beach,33.624743,-117.930047,Hospital
2,UCI Medical Center,92868,Orange,33.787007,-117.888799,Hospital
3,Torrance Memorial Specialty Center,90505,Torrance,33.810385,-118.337228,Hospital
4,VA Long Beach Healthcare System,90822,Long Beach,33.776672,-118.118900,Medical Center
...,...,...,...,...,...,...
204,Providence Tarzana Medical Center,91356,Tarzana,34.170406,-118.532230,Hospital
205,Cottage Animal Hosiptal,93030,Oxnard,34.196666,-119.169181,Hospital
206,Community Memorial Hospital,93003,Ventura,34.274580,-119.258120,Hospital
207,VCMC,93003,Ventura,34.276742,-119.252109,Hospital


Only 209 facilities compared to the 625 in the earlier table from official government sources.

In [19]:
# Examine which FourSquare data points have a missing zip code or city.
la_hsp.loc[(la_hsp["Zip Code"].isna()) | (la_hsp["City"].isna())]

Unnamed: 0,Hospital Name,Zip Code,City,Latitude,Longitude,Medical Center Type
38,Harbor UCLA N-24,,Torrance,33.830958,-118.296194,Hospital
39,Torrance Memorial ER Ambulance Entrance,,Torrance,33.81153,-118.343641,Hospital
84,Methodist Hospital of Southern California - Su...,,Arcadia,34.135133,-118.042098,Hospital
88,Brawerman,,Bradbury,34.129375,-117.971046,Hospital
95,Gardens Regional (formally known as Tri-City R...,,Hawaiian Gardens,33.833684,-118.080836,Hospital
100,Lakewood Regional Medical Center,,Long Beach,33.859926,-118.149012,Hospital
101,Kaiser-Urgent Care Downey,,Downey,33.918744,-118.125864,Urgent Care Center
111,Kaiser Gastroenterology,,Baldwin Park,34.058997,-117.988506,Hospital
115,Kaiser Permanente Hospital,,Downey,33.919233,-118.129558,Hospital
120,Cedars Sinai-Emergency Room,,Beverly Hills,34.075599,-118.381013,Emergency Room


In [20]:
# Find the closest zip code to the LAC-USC Medical Center's Clinic Tower Pharmacy.

# Create a function to process this.
def find_loc(df, loc_df):
    indices = []
    for target_i, target_row in df.loc[(df["Zip Code"].isna()) | (df["City"].isna())].iterrows():
        latitude = df.at[target_i, "Latitude"]
        longitude = df.at[target_i, "Longitude"]
        index = -1
        dist = 100
        for i, row in loc_df.iterrows():
            lat = row["Latitude"]
            lng = row["Longitude"]
            d = ((latitude - lat)**2 + (longitude - lng)**2)**0.5
            if dist > d:
                dist = d
                index = i
        df.loc[target_i, "Zip Code"] = loc_df.at[index, "Zip Code"]
        df.loc[target_i, "City"] = loc_df.at[index, "City"]
        indices.append(target_i)
    return indices

fixed_indices = find_loc(la_hsp, zip_loc)

In [21]:
# Just to make sure....
la_hsp.iloc[fixed_indices]

Unnamed: 0,Hospital Name,Zip Code,City,Latitude,Longitude,Medical Center Type
38,Harbor UCLA N-24,90502,Torrance,33.830958,-118.296194,Hospital
39,Torrance Memorial ER Ambulance Entrance,90505,Torrance,33.81153,-118.343641,Hospital
84,Methodist Hospital of Southern California - Su...,91007,Arcadia,34.135133,-118.042098,Hospital
88,Brawerman,91010,Duarte,34.129375,-117.971046,Hospital
95,Gardens Regional (formally known as Tri-City R...,90715,Lakewood,33.833684,-118.080836,Hospital
100,Lakewood Regional Medical Center,90712,Lakewood,33.859926,-118.149012,Hospital
101,Kaiser-Urgent Care Downey,90242,Downey,33.918744,-118.125864,Urgent Care Center
111,Kaiser Gastroenterology,91746,La Puente,34.058997,-117.988506,Hospital
115,Kaiser Permanente Hospital,90242,Downey,33.919233,-118.129558,Hospital
120,Cedars Sinai-Emergency Room,90048,Los Angeles,34.075599,-118.381013,Emergency Room


# (3) Methodology

`Nominatim` is a geocoding library. It can find locations by name or address, and vice versa.

In [22]:
try:
    from geopy.geocoders import Nominatim
except:
    !pip install geopy
    from geopy.geocoders import Nominatim

print("Library imported.")

Library imported.


The `folium` library can create interactive maps.

In [23]:
try:
    import folium
except:
    !pip install folium
    import folium

print("Library imported.")

Library imported.


## (3-1) Foursquare LAC Healthcare Center Analysis

Get the geographic coordinates of Los Angeles, in order to center maps.

In [24]:
address = "Los Angeles, California"
geo = Nominatim(user_agent = "capstone")
location = geo.geocode(address)
la_lat = location.latitude
la_lng = location.longitude
print("The geograpical coordinates of Los Angeles are ({}, {})."\
  .format(la_lat, la_lng))

The geograpical coordinates of Los Angeles are (34.0536909, -118.242766).


Visualize the locations of the healthcare centers in Los Angeles County in Foursquare's database.

In [25]:
# Create an interactive Folium map of Toronto.
fs_map = folium.Map(location = [la_lat, la_lng], zoom_start = 9)

# Add markers to the map to indicate each postal code in the city.
for index, row in la_hsp.iterrows():
    label = "{}, {}, {} ({})".format(row["Hospital Name"], row["Zip Code"], row["City"], row["Medical Center Type"])
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
      [row["Latitude"], row["Longitude"]],
      radius = 2,
      popup = label,
      color = "red",
      fill = True,
      fill_color = "red",
      fill_opacity = 0.8,
      parse_html = False).add_to(fs_map)

fs_map
# See "folium/map1.png".

![](folium/map1.png)

Create a map with differently-colored labels for each medical center type.

In [26]:
fs_clstr = folium.Map(location = [la_lat, la_lng], zoom_start = 9)

# Create a color scheme for the clusters.
types = la_hsp["Medical Center Type"].unique().tolist()
from matplotlib import cm, colors
x = np.arange(len(types))
ys = [i + x + (i*x)**2 for i in range(len(types))]
clrs = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(c) for c in clrs]

# Add markers to the map.
marker_colors = []
for index, row in la_hsp.iterrows():
    # Put the corresponding colors onto the labels.
    label = "{}, {}, {} ({})".format(row["Hospital Name"], row["Zip Code"], row["City"], row["Medical Center Type"])
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
      [row["Latitude"], row["Longitude"]],
      radius = 2,
      popup = label,
      color = rainbow[types.index(row["Medical Center Type"])],
      fill = True,
      fill_color = rainbow[types.index(row["Medical Center Type"])],
      fill_opacity = 0.8,
      parse_html = False).add_to(fs_clstr)

fs_clstr
# See "folium/map2.png".

![](folium/map2.png)

Most facilities are labelled as simply, "Hospitals" (purple). A handful of "Medical Centers" (blue), "Emergency Rooms" (light green), and "Urgent Care Centers" (turquiose) are scattered primarily in the center of the map. There is one instance each of "Hospital Ward" (red) and "Office" (orange).  
___
The bulk of the medical centers in the Foursquare data reside in the central, urban area of the county. The outer regions of Los Angeles County may benefit from increased medical staffing.

Find the top 10 zip codes whose coordinates are furthest away form the nearest medical center, as found by Foursquare.

In [27]:
zip_loc["Distance To Nearest Medical Center"] = 0.0
zip_loc

Unnamed: 0,Zip Code,City,Latitude,Longitude,Distance To Nearest Medical Center
0,90001,Los Angeles,33.972914,-118.248780,0.0
1,90002,Los Angeles,33.948315,-118.248450,0.0
2,90003,Los Angeles,33.962714,-118.276000,0.0
3,90004,Los Angeles,34.077110,-118.307550,0.0
4,90005,Los Angeles,34.058911,-118.308480,0.0
...,...,...,...,...,...
399,93584,Lancaster,33.786594,-118.298662,0.0
400,93586,Lancaster,33.786594,-118.298662,0.0
401,93590,Palmdale,33.786594,-118.298662,0.0
402,93591,Palmdale,34.596742,-117.844670,0.0


In [28]:
for i, zc in zip_loc.iterrows():
    dist = 0
    zc_lat = zc["Latitude"]
    zc_lng = zc["Longitude"]
    for j, hsp in la_hsp.iterrows():
        hsp_lat = hsp["Latitude"]
        hsp_lng = hsp["Longitude"]
        d = ((zc_lat - hsp_lat)**2 + (zc_lng - hsp_lng)**2)**0.5
        if d > dist:
            dist = d
    zip_loc.loc[i, "Distance To Nearest Medical Center"] = dist

zip_loc.sort_values(by = "Distance To Nearest Medical Center", ascending = False, inplace = True)

zip_loc[["Zip Code", "Distance To Nearest Medical Center"]].head(10)

Unnamed: 0,Zip Code,Distance To Nearest Medical Center
318,91359,2.009561
319,91361,1.768385
208,90704,1.761639
134,90265,1.696156
313,91301,1.676376
394,93536,1.607227
132,90263,1.595294
314,91302,1.583451
344,91711,1.547168
365,91767,1.5319


Note that the distance unit is \[latitude/longitude\] degrees.

## (3-2) Los Angeles County COVID Data Analysis

Create a choropleth map of COVID cases per hospital bed per city in Los Angeles County.

In [29]:
city_beds = hsp_beds[["City", "Number Of Beds"]]

# Convert the bed numbers to <int>s. Remove thousand commas.
city_beds["Number Of Beds"] = city_beds["Number Of Beds"].str.replace(",","").astype(int)

city_beds = city_beds.groupby(["City"])["Number Of Beds"].sum().reset_index()
city_beds

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,City,Number Of Beds
0,Agoura Hills,20
1,Alhambra,482
2,Arcadia,570
3,Artesia,372
4,Avalon,12
...,...,...
94,West Hills,388
95,Whittier,1228
96,Wilmington,6
97,Winnetka,112


In [30]:
cases_per_bed = pd.merge(covid_cases, city_beds, how = "inner", on = "City")

cases_per_bed["Cases/Bed"] = cases_per_bed["Cases Last 30 Days"] / cases_per_bed["Number Of Beds"]
cases_per_bed.sort_values(by = "Cases/Bed", ascending = False, inplace = True)
cases_per_bed.head(15)

Unnamed: 0,City,Cases Last 30 Days,Number Of Beds,Cases/Bed
52,South Gate,6916,99,69.858586
14,Compton,6174,99,62.363636
11,Calabasas,336,6,56.0
40,Palmdale,8468,214,39.570093
5,Azusa,2194,65,33.753846
7,Bell,2502,99,25.272727
48,San Marino,141,6,23.5
24,Huntington Park,4214,180,23.411111
31,Lawndale,1284,59,21.762712
28,La Verne,1003,59,17.0


In [31]:
cases_per_bed = pd.merge(left = zip_loc, right = cases_per_bed, how = "inner", on = "City")
cases_per_bed["Zip Code"] = cases_per_bed["Zip Code"].astype(str)

In [32]:
# Create a map legend with 5 bars.
threshold_scale = np.linspace(cases_per_bed["Cases/Bed"].min(),
                              cases_per_bed["Cases/Bed"].max(),
                              5, 
                              dtype = float)
threshold_scale = threshold_scale.tolist()
threshold_scale[-1] = threshold_scale[-1] + 1

GeoJSON zip code boundary data provided by the Los Angeles Times.  
http://boundaries.latimes.com/sets/

In [33]:
# M A P
cases_beds_map = folium.Map(location = [la_lat, la_lng], zoom_start = 9)
choropleth = folium.Choropleth(\
  geo_data = r"data/zip-code-tabulation-areas-2012.geojson", \
  data = cases_per_bed, \
  columns = ["Zip Code", "Cases/Bed"], \
  key_on = "feature.properties.name", \
  threshold_scale = threshold_scale, \
  fill_color = "YlOrRd", \
  fill_opacity = 0.8, \
  line_opacity = 0.2, \
  legend_name = "COVID Cases Per Hospital Bed", \
  highlight = True, \
  nan_fill_color = "white", \
  nan_fill_opacity = 1.0).add_to(cases_beds_map)

choropleth.geojson.add_child(
  folium.features.GeoJsonTooltip(["name"], style = "font-size: 16px", labels = False))

cases_beds_map
# See "folium/map3.png".

![](folium/map3.png)

Zip codes 90220-90223, 90280, and 91302 have the most COVID cases per hospital bed. These correspond to the cities of Compton, South Gate, and Calabasas.

Now create a choropleth map of population per hospital bed by city in Los Angeles County.

In [34]:
pop_per_bed = pd.merge(left = city_list, right = city_beds, how = "inner", on = "City")

pop_per_bed["People Per Bed"] = pop_per_bed["2010 Population"] / pop_per_bed["Number Of Beds"]
pop_per_bed.sort_values(by = "People Per Bed", ascending = False, inplace = True)
pop_per_bed.head(15)

Unnamed: 0,City,2010 Population,Number Of Beds,People Per Bed
11,Calabasas,23058,6,3843.0
50,San Marino,13147,6,2191.166667
46,Rancho Palos Verdes,41643,34,1224.794118
0,Agoura Hills,20330,20,1016.5
14,Compton,96455,99,974.292929
54,South Gate,94396,99,953.494949
41,Palmdale,152750,214,713.785047
5,Azusa,46361,65,713.246154
31,Lawndale,32769,59,555.40678
28,La Verne,31063,59,526.491525


In [35]:
pop_per_bed = pd.merge(left = zip_loc, right = pop_per_bed, how = "inner", on = "City")
pop_per_bed["Zip Code"] = pop_per_bed["Zip Code"].astype(str)

In [36]:
threshold_scale = np.linspace(pop_per_bed["People Per Bed"].min(),
                              pop_per_bed["People Per Bed"].max(),
                              8, 
                              dtype = float)
threshold_scale = threshold_scale.tolist()
threshold_scale[-1] = threshold_scale[-1] + 1

# M A P
pop_beds_map = folium.Map(location = [la_lat, la_lng], zoom_start = 9)
choropleth = folium.Choropleth(\
  geo_data = r"data/zip-code-tabulation-areas-2012.geojson", \
  data = pop_per_bed, \
  columns = ["Zip Code", "People Per Bed"], \
  key_on = "feature.properties.name", \
  threshold_scale = threshold_scale, \
  fill_color = "YlOrRd", \
  fill_opacity = 0.8, \
  line_opacity = 0.2, \
  legend_name = "People Per Hospital Bed", \
  highlight = True, \
  nan_fill_color = "white", \
  nan_fill_opacity = 1.0).add_to(pop_beds_map)

choropleth.geojson.add_child(
  folium.features.GeoJsonTooltip(["name"], style = "font-size: 16px", labels = False))

pop_beds_map
# See "folium/map4.png".

![](folium/map4.png)

Zip codes 91302, 91108, and 90275 have the highest ratios of people in the general population for each hospital bed. These correspond to the cities of Calabasas (again), San Marino, and Rancho Palos Verdes.