# Coursera IMB Data Science Capstone Project
# Hospital Location Recommendation in Toronto


This notebook will be used for the final capstone project in the Coursera IBM Data Science 9 course certification. 

For this project, I will be using data analysis techniques to recommend the best location for a new hospital in Toronto, Canada based on preexisting hospitals and the needs of the community. I will consider population and income data to build a case for the community that would most benefit from new healthcare providers. 

## Initial Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib as plt
import scipy
from sklearn import preprocessing
import requests
from bs4 import BeautifulSoup as bs
from IPython.display import display, HTML
import csv
import googlemaps
import os
from geopy import geocoders
from geopy.geocoders import GoogleV3
from datetime import datetime
from geopy.geocoders import Nominatim
import folium
import json

print("Modules imported successfully.")

Modules imported successfully.


## Data Wrangling

There is a variety of data that will be used in this project, all coming from a range of sources. 
- Hospitals in Toronto data from Wikipedia (https://en.wikipedia.org/wiki/List_of_hospitals_in_Toronto)
- Data about hospitals and other health-oriented venues from the Foursquare API
- Health data from government sources to build correlations between income and community to health outcomes

In [2]:
# Collecting data about the hospitals in Toronto


page = requests.get("https://en.wikipedia.org/wiki/List_of_hospitals_in_Toronto")
soup = bs(page.text, 'lxml')
table = soup.find("table", class_ = "wikitable")

A = []
B = []


for row in table.findAll("tr"):
    cells = row.findAll("td")
    if len(cells) == 7: # Isolates rows from the body
        A.append(cells[0].find(text=True))
        B.append(cells[2].find(text=True))
        
hospitals = pd.DataFrame(A, columns = ['Name'])
hospitals['District'] = B

hospitals.drop_duplicates(subset ="Name", 
                     keep = False, inplace = True)
hospitals = hospitals.reset_index(drop = True)

hospitals.to_csv("hospitals.csv")
print(hospitals.shape)
display(hospitals)

(28, 2)


Unnamed: 0,Name,District
0,Baycrest Health Sciences,North York
1,Bellwood Health Services,Leaside
2,Bridgepoint Active Healthcare,Old Toronto
3,Casey House,Old Toronto
4,Centric Health Surgical Centre Toronto,North York
5,Etobicoke General Hospital,Etobicoke
6,Holland Bloorview Kids Rehabilitation Hospital,North York
7,Hospital for Sick Children,Old Toronto
8,Humber River Hospital,North York
9,Michael Garron Hospital,East York


In [3]:
# Adding coordinate data for each neighborhood

addresses = pd.read_csv("hospitaladdresses.csv", delimiter = ',')

print(addresses.shape)
display(addresses)

(28, 4)


Unnamed: 0.1,Unnamed: 0,Name,District,Address
0,0,Baycrest Health Sciences,North York,"3560 Bathurst St, North York, ON M6A 2"
1,1,Bellwood Health Services,Leaside,"175 Brentcliffe Rd, Toronto, ON M4G 0C5, Canada"
2,2,Bridgepoint Active Healthcare,Old Toronto,"1 Bridgepoint Dr, Toronto, ON M4M 2B5, Canada"
3,3,Casey House,Old Toronto,"119 Isabella St, Toronto, ON M4Y 1P2, Canada"
4,4,Centric Health Surgical Centre Toronto,North York,"20 Wynford Dr Suite 103, North York, ON M3C 1J..."
5,5,Etobicoke General Hospital,Etobicoke,"101 Humber College Blvd, Etobicoke, ON M9V 1R8..."
6,6,Holland Bloorview Kids Rehabilitation Hospital,North York,"150 Kilgour Rd, East York, ON M4G 1R8, Canada"
7,7,Hospital for Sick Children,Old Toronto,"555 University Ave, Toronto, ON M5G 1X8, Canada"
8,8,Humber River Hospital,North York,"1235 Wilson Ave, Toronto, ON M3M 0B2, Canada"
9,9,Michael Garron Hospital,East York,"825 Coxwell Ave, East York, ON M4C 3E7, Canada"


## Mapping Hospitals in Toronto

We will use the **Google Maps API** to add coordinate data to each of the hospital locations. Then using **Folium** we can plot these on a map of Toronto to visualize the hospitals. 


In [4]:
# @hidden_cell
KEY = "AIzaSyB_zT5cGKMZylRgARvB9VdNz5FpTzhnW1s"
gmaps = googlemaps.Client(key = KEY)

In [5]:
all_names = addresses['Name']
all_addresses = addresses['Address']

In [7]:
lats = []
lngs = []

for i in all_addresses:
    geocode_result = gmaps.geocode(i)[0]
    lat = geocode_result['geometry']['location']['lat']
    lats.append(lat)
    lng = geocode_result['geometry']['location']['lng']
    lngs.append(lng)

In [8]:
addresses['Latitude'] = lats
addresses['Longitude'] = lngs

display(addresses)

Unnamed: 0.1,Unnamed: 0,Name,District,Address,Latitude,Longitude
0,0,Baycrest Health Sciences,North York,"3560 Bathurst St, North York, ON M6A 2",43.730007,-79.434238
1,1,Bellwood Health Services,Leaside,"175 Brentcliffe Rd, Toronto, ON M4G 0C5, Canada",43.719576,-79.366223
2,2,Bridgepoint Active Healthcare,Old Toronto,"1 Bridgepoint Dr, Toronto, ON M4M 2B5, Canada",43.666199,-79.355365
3,3,Casey House,Old Toronto,"119 Isabella St, Toronto, ON M4Y 1P2, Canada",43.668828,-79.37891
4,4,Centric Health Surgical Centre Toronto,North York,"20 Wynford Dr Suite 103, North York, ON M3C 1J...",43.723906,-79.336258
5,5,Etobicoke General Hospital,Etobicoke,"101 Humber College Blvd, Etobicoke, ON M9V 1R8...",43.730884,-79.599387
6,6,Holland Bloorview Kids Rehabilitation Hospital,North York,"150 Kilgour Rd, East York, ON M4G 1R8, Canada",43.718295,-79.374274
7,7,Hospital for Sick Children,Old Toronto,"555 University Ave, Toronto, ON M5G 1X8, Canada",43.657096,-79.387734
8,8,Humber River Hospital,North York,"1235 Wilson Ave, Toronto, ON M3M 0B2, Canada",43.723912,-79.489639
9,9,Michael Garron Hospital,East York,"825 Coxwell Ave, East York, ON M4C 3E7, Canada",43.689871,-79.324858


Now we can map these hospials using **Folium**. Note that the coordinates for the center of Toronto are 43.7170226,  -79.4197830350134.

In [51]:
# Toronto latitude and longitude
latitute = 43.7170226
longitute = -79.4197830350134


# Generating a map of Toronto
map_toronto = folium.Map(location=[43.7170226, -79.4197830350134], zoom_start = 10)

# Adding markers for the neighborhoods
for latitude, longitude, name in zip(addresses['Latitude'], addresses['Longitude'], addresses['Name']):
    label = "{}".format(name)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [latitude, longitude],
        radius = 5, 
        popup = label,
        icon=folium.Icon(color = "blue", icon='header')).add_to(map_toronto)
    
map_toronto

## Population Data

Now I can create a heat map that shows the population in each region in Toronto. This map can be overlayed with the hospitals map to create a robust picture that highlights any correlation between hospitals and the population in the area.

First, I will import population data. This was obtained from the [Statistics Canada website](https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/comprehensive.cfm), which uses census data. 

In [59]:
df_pop = pd.read_csv("populationdata.CSV")
df_pop = df_pop[['Geographic code', 'Province or territory', 'Population, 2016']]
df_pop = df_pop[(df_pop['Province or territory'] == "Ontario")]

In [53]:
geography = 'toronto_crs.geojson'
map_toronto.choropleth(geo_data = geography,
                data = df_pop,
                columns = ['Geographic code','Population, 2016'],
                key_on='feature.properties.CFSAUID',
                fill_color='BuGn',
                fill_opacity=0.8,
                line_opacity=0.3,
                legend_name="Population")
map_toronto

From the map, we can see that the regions with the highest populations seem to be far from the densest hospital area. 

## Foursquare Venues

Now, I'd like to see if there is any correlation with population, hospitals, and health-related venues:
- Pharmacy
- Dance Studio
- Gym
- Fitness Center
- Spa
- Tennis Court
- Yoga Studio

In [47]:
CLIENT_ID = "Z5GRDXCYA1MPMN1QIZ2VN23EJLT40LFMPGL5ETQLCW2PW3JG"
CLIENT_SECRET = "PDVT5KZVEUERMIBZCQZA5ZBIQTQIN22RRMZJZTT1N1BMPJ1H"
VERSION = "20190608"
venue = "4bf58dd8d48988d196941735"

In [116]:
def getNearbyVenues(names, latitudes, longitudes, radius = 1000):
    
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = "https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}".format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v["venue"]["name"], 
            v["venue"]["location"]["lat"], 
            v["venue"]["location"]["lng"],  
            v["venue"]['categories'][0]["name"]) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ["Neighborhood", 
                  "Neighborhood Latitude", 
                  "Neighborhood Longitude", 
                  "Venue", 
                  "Venue Latitude", 
                  "Venue Longitude", 
                  "Venue Category"]
    
    return(nearby_venues)

In [117]:
csv_url = "http://cocl.us/Geospatial_data"
coordinates = pd.read_csv(csv_url)

print(coordinates.shape)

(103, 3)


In [122]:
all_venues = getNearbyVenues(names = coordinates['Postal Code'],
                            latitudes = coordinates['Latitude'],
                            longitudes = coordinates['Longitude'])

M1B
M1C
M1E
M1G
M1H
M1J
M1K
M1L
M1M
M1N
M1P
M1R
M1S
M1T
M1V
M1W
M1X
M2H
M2J
M2K
M2L
M2M
M2N
M2P
M2R
M3A
M3B
M3C
M3H
M3J
M3K
M3L
M3M
M3N
M4A
M4B
M4C
M4E
M4G
M4H
M4J
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5M
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6A
M6B
M6C
M6E
M6G
M6H
M6J
M6K
M6L
M6M
M6N
M6P
M6R
M6S
M7A
M7R
M7Y
M8V
M8W
M8X
M8Y
M8Z
M9A
M9B
M9C
M9L
M9M
M9N
M9P
M9R
M9V
M9W


In [123]:
print(all_venues.shape)

(2451, 7)


In [134]:
pharmacy = all_venues[all_venues['Venue Category'] == "Pharmacy"]
dancestudio = all_venues[all_venues['Venue Category'] == "Dance Studio"]
gym = all_venues[all_venues['Venue Category'] == "Gym"]
fitness = all_venues[all_venues['Venue Category'] == "Gym / Fitness Center"]
spa = all_venues[all_venues['Venue Category'] == "Spa"]
tennis = all_venues[all_venues['Venue Category'] == "Tennis Court"]
yoga = all_venues[all_venues['Venue Category'] == "Yoga Studio"]


venues = [pharmacy, dancestudio, gym, fitness, spa, tennis, yoga]
health_venues = pd.concat(venues)

display(health_venues.shape)

(135, 7)

Now we can add these venues to a new map.

In [140]:
health_map = folium.Map(location=[43.7170226, -79.4197830350134], zoom_start = 10)

health_map.choropleth(geo_data = geography,
            data = df_pop,
            columns = ['Geographic code','Population, 2016'],
            key_on='feature.properties.CFSAUID',
            fill_color='BuGn',
            fill_opacity=0.8,
            line_opacity=0.3,
            legend_name="Population")
    
for latitude, longitude, name in zip(health_venues['Venue Latitude'], health_venues['Venue Longitude'], health_venues['Venue']):
    label = "{}".format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius = 5, 
        popup = label,
        color = 'yellow',
        fill = True,
        fill_color = 'yellow',
        fill_opacity = 0.7,
        parse_html = False).add_to(health_map)
    
for latitude, longitude, name in zip(addresses['Latitude'], addresses['Longitude'], addresses['Name']):
    label = "{}".format(name)
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [latitude, longitude],
        radius = 5, 
        popup = label,
        icon=folium.Icon(color = "blue", icon='header')).add_to(health_map)
health_map

## Results

From the final map, we can see that there appears to be no correlation between population and hospial placements in the city of Toronto. However, there appears to be a uniform distribution of health-related centers like gyms and pharmacies. 

With that in mind, it seems like the best location for a new hospital would be in the north or north-east portions of the city. In particuarly, North York would be a good place for a hospital as, of the most densely populated regions, it is underserved in both hosptials and health-related venues.