<h1 align=center>A method for determining the location of a new hospital in Greater Toronto area</h1>

## Introduction

With the increase in the population in the Greater Toronto area, the current health centers are increasingly beyond their capacity. Regional authorities and the Ontario Ministry of Health are considering the construction of a new hospital in the region.
In addition to the available space, authorities need to consider population density in order to implement this new infrastructure.

This study will offer an optimal solution based on geographic and demographic information from the Greater Toronto area to find the ideal place to build the new hospital.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Project and data description</a>

2. <a href="#item2">Data exploration</a>

3. <a href="#item3">Analyze locations</a>

4. <a href="#item4">Results</a>

5. <a href="#item5">Conclusion</a>    
</font>
</div>

In [2]:
import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json 
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes 
import folium 
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

<a id='item1'></a>

## 1. Project and data description

In this project we will use the data set from Statistics Canada, the federal agency established to produce statistics on Canada: 
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=304&SR=46&S=87&O=A&RPP=5&PR=35&CMA=0&CSD=0

This dataset from 2016 census will be used to retreive information related to Greater Toronto area and densities of every neighborhoods.

Then, we use the Foursquare API to explore neighborhoods in that area. The _explore function_ will be used to get the existing health centers in each minicipalities. Then the Folium library will help to visualize the neighborhoods and their densities. With the k-means clustering algorithm, we will set up an optimal solution for an additional hospital that is taken account the current hospital locations and the population density.

#### Data exploration

Let's first load the data! From the website, it's not possible to save only data from GTA. We need to extract the usefull data from many files from the entire Ontario data.
We know that the GTA is composed of five Census divisions: City of Toronto, Durham, Halton, Peel and York. These divisions are containing twenty five census subdivisions.


In [3]:
trt_data = pd.read_csv('Toronto_census.CSV')
dur_data = pd.read_csv('Durham_census.CSV')
hal_data = pd.read_csv('Halton_census.CSV')
pel_data = pd.read_csv('Peel_census.CSV')


As the file is containing many unnecessary information, let's clean up data and keep only those related to the project. As 

In [4]:
dur_data.columns

Index(['Geographic code', 'Geographic name', 'Geographic type',
       'Province or territory',
       'Incompletely enumerated Indian reserves and Indian settlements, 2016',
       'Population, 2016', 'Population, 2011', '2011 adjusted population flag',
       'Incompletely enumerated Indian reserves and Indian settlements, 2011',
       '2016 population review flag',
       '2011 population review or received update flag',
       'Population, % change', 'Total private dwellings, 2016',
       'Private dwellings occupied by usual residents, 2016',
       'Land area in square kilometres, 2016',
       'Population density per square kilometre, 2016',
       'National population rank, 2016',
       'Provincial/territorial population rank, 2016'],
      dtype='object')

We choose to use the number of dwellings occupied by usual residents so that hotels are excluded in the dwellings.

Let's first select the data to be used in Peel region!

In [5]:
## Peel region
pel_data.rename(columns={"Geographic name":"Name", "Population, 2016": "2016", "Population, 2011":"2011", "Private dwellings occupied by usual residents, 2016":"Dwellings", "Land area in square kilometres, 2016" : "Area","Population density per square kilometre, 2016": "Density"}, inplace = True)
pel_data=pel_data.loc[:, ['Name','2016','2011', 'Dwellings',"Area","Density"]]#, axis = 1, inplace = True)
peel_cities = ["Brampton","Caledon","Mississauga"]
peel_data=pel_data[pel_data["Name"].isin(peel_cities)]


Now, let's add the Durham region!

In [6]:
#Durham region
dur_data.rename(columns={"Geographic name":"Name", "Population, 2016": "2016", "Population, 2011":"2011", "Private dwellings occupied by usual residents, 2016":"Dwellings", "Land area in square kilometres, 2016" : "Area","Population density per square kilometre, 2016": "Density"}, inplace = True)
dur_data=dur_data.loc[:, ['Name','2016','2011', 'Dwellings',"Area","Density"]]#, axis = 1, inplace = True)
dur_cities = ["Ajax","Brock","Clarington","Oshawa","Pickering","Scugog","Uxbridge","Whitby"]
durh_data=dur_data[dur_data["Name"].isin(dur_cities)]


Now let's move to Halton region

In [7]:
#Halton region
hal_data.rename(columns={"Geographic name":"Name", "Population, 2016": "2016", "Population, 2011":"2011", "Private dwellings occupied by usual residents, 2016":"Dwellings", "Land area in square kilometres, 2016" : "Area","Population density per square kilometre, 2016": "Density"}, inplace = True)
hal_data=hal_data.loc[:, ['Name','2016','2011', 'Dwellings',"Area","Density"]]#, axis = 1, inplace = True)
hal_cities = ["Burlington","Halton Hills","Milton","Oakville"]
halt_data=hal_data[hal_data["Name"].isin(hal_cities)]


Now, it's time for Toronto and York. We will process the data from Yord and Toronto downtown.

In [8]:
#Toronto city and York region
trt_data.rename(columns={"Geographic name":"Name", "Population, 2016": "2016", "Population, 2011":"2011", "Private dwellings occupied by usual residents, 2016":"Dwellings", "Land area in square kilometres, 2016" : "Area","Population density per square kilometre, 2016": "Density"}, inplace = True)
trt_data=trt_data.loc[:, ['Name','2016','2011', 'Dwellings',"Area","Density"]]#, axis = 1, inplace = True)
trt_cities = ["Toronto","Aurora","East Gwillimbury","Markham","Georgina","King","Newmarket","Richmond Hill","Whitchurch-Stouffville","Vaughan"]
trth_data=trt_data[trt_data["Name"].isin(trt_cities)].drop_duplicates(subset ="Name") 

Now we need to make a unique dataframe for all the regions 

In [26]:
all_data= pd.concat([trth_data,halt_data,durh_data,peel_data],ignore_index=True)
all_data.sort_values(by=['Name'],inplace=True)
all_data.reset_index(drop=True,inplace=True)

In [28]:
all_data

Unnamed: 0,Name,2016,2011,Dwellings,Area,Density
0,Ajax,119677.0,109600.0,37549.0,67.0,1786.4
1,Aurora,55445.0,53203.0,18851.0,49.85,1112.3
2,Brampton,593638.0,523906.0,168011.0,266.36,2228.7
3,Brock,11642.0,11341.0,4543.0,423.34,27.5
4,Burlington,183314.0,175779.0,71373.0,185.66,987.3
5,Caledon,66502.0,59460.0,21256.0,688.16,96.6
6,Clarington,92013.0,84548.0,32838.0,611.4,150.5
7,East Gwillimbury,23991.0,22473.0,8077.0,245.04,97.9
8,Georgina,45418.0,43517.0,16821.0,287.75,157.8
9,Halton Hills,61161.0,59013.0,21078.0,276.27,221.4


In [29]:
with open('gta.geojson') as json_data:
    gta_data = json.load(json_data)

In [33]:
#gta_data

In [34]:
neighborhoods_data = gta_data['features']

In [35]:
neighborhoods_data[0]

{'type': 'Feature',
 'properties': {'CSDUID': '1310004',
  'CSDNAME': 'Manners Sutton',
  'CSDTYPE': 'P',
  'PRUID': '13',
  'PRNAME': 'New Brunswick / Nouveau-Brunswick',
  'CDUID': '1310',
  'CDNAME': 'York',
  'CDTYPE': 'CT',
  'CCSUID': '1310004',
  'CCSNAME': 'Manners Sutton',
  'ERUID': '1340',
  'ERNAME': 'Fredericton--Oromocto',
  'SACCODE': '320',
  'SACTYPE': '2',
  'CMAUID': '320',
  'CMAPUID': '13320',
  'CMANAME': 'Fredericton',
  'CMATYPE': 'K'},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[-66.88164438100155, 45.702638873312495],
     [-66.88504036041316, 45.698687710469194],
     [-66.88609985261242, 45.697454840450355],
     [-66.88785321361684, 45.69541445520283],
     [-66.89242406279888, 45.69030062517926],
     [-66.89609649740562, 45.68616951419102],
     [-66.8966417573756, 45.68555312446406],
     [-66.89953511501302, 45.68230747316793],
     [-66.89974492987047, 45.68206489391657],
     [-66.9102607535977, 45.66990446032455],
     [-66.91185008242