# Capstone Final Project - The Battle of the Neighborhoods 
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will be working alongside the **Toronto Public Health** Division. Toronto Public Health (TPH) reports to the Board of Health and is responsible for the health and well-being of all 2.9 million residents. TPH has focused on protecting and promoting the health of Toronto residents since 1883 by:

* preventing the spread of disease, promoting healthy living and advocating for conditions that improve health for Toronto residents
* using surveillance to monitor the health status of the population in order to respond to on-going and emerging health needs
* developing and implementing public policy and practices that enhance the health of individuals, communities and the entire city

![alt text](http://www.gtaweekly.ca/wp-content/uploads/2019/07/PublicHealth-678x381.jpeg "Toronto Public Health Logo")

To address the unique needs of their community they wanted to research the effect of **Neighborhood and Built Environment** as social determinants of health and include any convenient measures to their 2020-2024 Strategic Plan. In order to do so we will use data science to determine if there are significant differences between neighborhoods, and if so, generate a cluster of neighborhoods that are in need of more help. 

## Data <a name="data"></a>

Based on the definition of our problem, possible factors that could determine how living in a certain neighborhood could affect health are:
* Economic status
* Access to foods that support healthy eating patterns
* Access to parks, fitness or recreation centers
* Crime and Violence
* Environmental Conditions
* Access to Health Care
* Access to Education

The following data sources will be needed to extract/generate the required information:
* List of Toronto's neighborhoods, previously extracted in this course
* Access to the different services, their type and location in every neighborhood will be obtained using **Foursquare API**
* Income data will be obtained from the **Canadian census**
* Crime rates will be obtained from data from the **Toronto Police Service**
* Air Quality Health Index (AQHI) data will be obtained from the **Ministry of Environment, Conservation and Parks**

### Data Preparation

We can start by importing all the libraries that we will need throughout

In [2]:
import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json 
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium 

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         238 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0        conda-forge
    geopy:         1.20.0-py_0      conda-forge

The following packages will be UPDATED:

    certifi:       2019.6.

----

Then load each of the data files and clean the data:

### 1. Toronto neighbourhood data

#### Load the .csv file

In [3]:
df_nbh = pd.read_csv("TorontoNBs.csv", sep=";")
df_nbh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Drop the rows with "Not assigned" boroughs

In [4]:
df_nbh.drop( df_nbh[ df_nbh['Borough'] == "Not assigned" ].index , inplace=True)
df_nbh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


#### Group by Postcode and reset the index

In [5]:
df_nbh = df_nbh.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(', '.join).reset_index()
df_nbh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### And add a Neighbourhood to row index 85

In [6]:
df_nbh['Neighbourhood'].replace('Not assigned', "Queen's Park", inplace=True)

#### Add location data

In [7]:
df_gsd = pd.read_csv('http://cocl.us/Geospatial_data')
df_gsd.columns = ['Postcode', 'Latitude', 'Longitude']
df_nbh = pd.merge(df_nbh, df_gsd, on=['Postcode'], how='inner')
df_nbh = df_nbh[['Borough', 'Neighbourhood', 'Postcode', 'Latitude', 'Longitude']].copy()
df_nbh.head()

Unnamed: 0,Borough,Neighbourhood,Postcode,Latitude,Longitude
0,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497
2,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711
3,Scarborough,Woburn,M1G,43.770992,-79.216917
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


#### Visualize the neighbourhoods in a map

In [8]:
address = 'Toronto, Canada'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(df_nbh['Latitude'], df_nbh['Longitude'], df_nbh['Borough'], df_nbh['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Now lets add the income data

In [9]:
df_inc = pd.read_csv("Toronto_Inc.csv", thousands=',')
df_inc.columns = ['Postcode', 'Avg Income']
df_nbh2 = pd.merge(df_nbh, df_inc, on=['Postcode'], how='inner')
df_nbh2 = df_nbh2[['Borough', 'Neighbourhood', 'Postcode', 'Latitude', 'Longitude','Avg Income']].copy()
df_nbh2.head()

Unnamed: 0,Borough,Neighbourhood,Postcode,Latitude,Longitude,Avg Income
0,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353,25750
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497,35239
2,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711,19687
3,Scarborough,Woburn,M1G,43.770992,-79.216917,45592
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,27546


_(The average income data is in dollars per year per person)_

#### The air quality data

In [10]:
df_aq = pd.read_csv("Toronto_AQ.csv")
df_aq.columns = ['Postcode', 'AQHI']
df_nbh3 = pd.merge(df_nbh2, df_aq, on=['Postcode'], how='inner')
df_nbh3 = df_nbh3[['Borough', 'Neighbourhood', 'Postcode', 'Latitude', 'Longitude','Avg Income', 'AQHI']].copy()
df_nbh3.head()

Unnamed: 0,Borough,Neighbourhood,Postcode,Latitude,Longitude,Avg Income,AQHI
0,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353,25750,2.0
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497,35239,
2,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711,19687,2.0
3,Scarborough,Woburn,M1G,43.770992,-79.216917,45592,
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,27546,1.0


_(AQHI stands for Air Quality Health Index. Here is an image to illustrate the spectrum:)_ 

![alt text](https://lungontario.ca/wp-content/uploads/2017/08/Air-Quality-Health-Index-Scale.jpg "AQHI Scale")

#### And finally, the crime data

In [11]:
df_cri = pd.read_csv("Toronto_Crime.csv")
df_cri.columns = ['Postcode', 'Assault', 'Auto Theft', 'Break and Enter', 'Robbery', 'Theft Over $5,000', 'Homicide']
df_nbh4 = pd.merge(df_nbh3, df_cri, on=['Postcode'], how='inner')
df_nbh4 = df_nbh4[['Borough', 'Neighbourhood', 'Postcode', 'Latitude', 'Longitude','Avg Income', 'AQHI', 'Assault', 'Auto Theft', 'Break and Enter', 'Robbery', 'Theft Over $5,000', 'Homicide']].copy()
df_nbh4.head()

Unnamed: 0,Borough,Neighbourhood,Postcode,Latitude,Longitude,Avg Income,AQHI,Assault,Auto Theft,Break and Enter,Robbery,"Theft Over $5,000",Homicide
0,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353,25750,2.0,1912.8,2163.7,721.2,595.8,94.1,0.0
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497,35239,,375.4,62.6,141.4,40.8,5.4,2.7
2,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711,19687,2.0,1923.5,214.8,507.7,400.3,58.6,0.0
3,Scarborough,Woburn,M1G,43.770992,-79.216917,45592,,696.5,153.6,307.1,192.0,71.3,11.0
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,27546,1.0,576.4,184.6,184.6,162.1,18.0,0.0


_(the crime rates are normalized by 100,000 people, using data from the 2016 census)_

### Data overview

Exploring the data we've got so far there's two main things that stand out:
* 1. The Air Quality ("AQHI") column seems like it has many missing values, probably cause there are no stations in many of the neighbourhoods. Also, the ones that do have measurements seem like they have very similar values. 
* 2. There are too many columns dedicated to crime, we shouldn't need that much detail at this stage

Therefore, to solve this problems we will:
* 1. Check the value counts for the "AQHI" column and probably drop it if there's not much information we can extract from it
* 2. Compute a new column with the total normalized crime per area

#### First, let's work on the AQHI column

In [12]:
df_nbh4['AQHI'].value_counts(dropna=False)

 2     54
NaN    30
 1     15
 3      4
Name: AQHI, dtype: int64


As we can see, there's 30 missing values. Also, AQHI is measured on a scale from 1 to 10 as we saw in an image previously. Here we only have values from 1 to 3, all corresponding to "Low Risk", so there's no significant differences between neighbourhoods in terms of air quality. Taking both of these ideas into account, we should drop the "AQHI" from the columns.

In [13]:
df_nbh5 = df_nbh4.drop(columns=['AQHI'])
df_nbh5.head()

Unnamed: 0,Borough,Neighbourhood,Postcode,Latitude,Longitude,Avg Income,Assault,Auto Theft,Break and Enter,Robbery,"Theft Over $5,000",Homicide
0,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353,25750,1912.8,2163.7,721.2,595.8,94.1,0.0
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497,35239,375.4,62.6,141.4,40.8,5.4,2.7
2,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711,19687,1923.5,214.8,507.7,400.3,58.6,0.0
3,Scarborough,Woburn,M1G,43.770992,-79.216917,45592,696.5,153.6,307.1,192.0,71.3,11.0
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,27546,576.4,184.6,184.6,162.1,18.0,0.0


#### Now lets create a column with the total crime 

In [14]:
df_nbh5['Total Crime'] = df_nbh5['Assault'] + df_nbh5['Auto Theft'] + df_nbh5['Break and Enter'] + df_nbh5['Robbery'] + df_nbh5['Theft Over $5,000'] + df_nbh5['Homicide']
df_nbh6 = df_nbh5.drop(columns=['Assault', 'Auto Theft', 'Break and Enter', 'Robbery', 'Theft Over $5,000', 'Homicide'])
df_nbh6.head()

Unnamed: 0,Borough,Neighbourhood,Postcode,Latitude,Longitude,Avg Income,Total Crime
0,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353,25750,5487.6
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497,35239,628.3
2,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711,19687,3104.9
3,Scarborough,Woburn,M1G,43.770992,-79.216917,45592,1431.5
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,27546,1125.7


### Using Foursquare to research the availability of different services 

##### Define Foursquare Credentials and Version

In [15]:
CLIENT_ID = 'KLAGEJY3BTZEKKPOCKWBQNBDMIUUIN4MYBNKNDL3AXKPKVA2' 
CLIENT_SECRET = '41WYZYPWZYCT202Z4ZSMRIT5CCB2YZRSYKNZTFIZJ30YS33D' 
VERSION = '20180604'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KLAGEJY3BTZEKKPOCKWBQNBDMIUUIN4MYBNKNDL3AXKPKVA2
CLIENT_SECRET:41WYZYPWZYCT202Z4ZSMRIT5CCB2YZRSYKNZTFIZJ30YS33D


We still have to determine:
* ~~Economic status~~
* Access to foods that support healthy eating patterns 
* Access to parks, fitness or recreation centers
* ~~Crime and Violence~~
* ~~Environmental Conditions~~
* Access to Health Care 
* Access to Education

So we will carry out 5 different searches to find:
* Healthy food shops (**Labels:** Bakery, Butcher, Farmers Market, Fish Market, Fruit & Vegetable Store, Grocery Store, Market and Supermarket)
* Non-healthy food shops (**Labels:** Candy Store, Convenience Store and Fast Food Restaurant)
* Exercise-promoting venues (**Labels:** Athletics & Sports, Badminton Court, Baseball Field, Basketball Court, Climbing Gym, College Gym, College Rec Center, Dance Studio, Gym, Gym/Fitness Center, Gym Pool, Other Great Outdoors, Park, Pool and Recreation Center)
* Health services (**Labels:** Medical Center and Pharmacy)
* Education buildings (**Labels:** High School, School and University)

_The labels of the different categories are taken from https://developer.foursquare.com/docs/resources/categories)_

### Explore the venues around each neighborhood

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500, limit=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
toronto_venues = getNearbyVenues(names=df_nbh6['Neighbourhood'],
                                   latitudes=df_nbh6['Latitude'],
                                   longitudes=df_nbh6['Longitude']
                                  )

#### Perform One hot encoding for the different venue categories

In [18]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighbourhood'] = toronto_venues['Neighborhood'] 
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Big Box Store,Bike Shop,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Daycare,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Leather Goods Store,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,National Park,Neighborhood,New American Restaurant,Nightclub,Noodle House,Nudist Beach,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Supply Store,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Poutine Place,Print Shop,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Group the results by neighborhood

In [19]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').sum().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Big Box Store,Bike Shop,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Daycare,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Leather Goods Store,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,National Park,Neighborhood,New American Restaurant,Nightclub,Noodle House,Nudist Beach,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Supply Store,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Poutine Place,Print Shop,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Adelaide, King, Richmond",0,0,0,0,0,2,0,0,0,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,3,0,0,0,0,0,0,2,0,0,0,1,1,0,0,0,0,0,1,2,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,7,0,0,0,0,1,0,1,2,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,3,0,0,0,0,0,0,0,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,2,0,2,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,4,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,1,0,0,0,0,2,0,0,0,1,0,1,0,1,0,0,2,4,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
1,Agincourt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,11,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,12,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,3,1,0,2,0,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,4,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,3,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,0,0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
4,"Alderwood, Long Branch",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0


#### Extract dataframes with the labels we need for each category

In [20]:
toronto_gfood = toronto_grouped[['Neighbourhood', 'Butcher', 'Bakery', 'Farmers Market', 'Fish Market', 'Fruit & Vegetable Store', 'Grocery Store', 'Market', 'Supermarket']]
toronto_bfood = toronto_grouped[['Neighbourhood','Candy Store', 'Convenience Store', 'Fast Food Restaurant']]
toronto_exnout = toronto_grouped[['Neighbourhood','Athletics & Sports', 'Badminton Court', 'Baseball Field', 'Basketball Court', 'Climbing Gym', 'College Gym', 'College Rec Center', 'Dance Studio', 'Gym', 'Gym / Fitness Center', 'Gym Pool', 'Other Great Outdoors', 'Park', 'Pool', 'Recreation Center']]
toronto_health = toronto_grouped[['Neighbourhood','Medical Center', 'Pharmacy']]
toronto_edu = toronto_grouped[['Neighbourhood','High School', 'School', 'University']]

#### Add "Total" columns to all the previous tables

In [21]:
toronto_gfood['Total'] = toronto_gfood.sum(axis=1)
toronto_bfood['Total'] = toronto_bfood.sum(axis=1)
toronto_exnout['Total'] = toronto_exnout.sum(axis=1)
toronto_health['Total'] = toronto_health.sum(axis=1)
toronto_edu['Total'] = toronto_edu.sum(axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .l

#### Add all of the Total columns to the original dataframe

In [22]:
df_nbh6['Healthy Food Places'] = toronto_gfood['Total'].to_numpy()
df_nbh6['Un-healthy Food Places'] = toronto_bfood['Total'].to_numpy()
df_nbh6['Excercise & Outdoor Places'] = toronto_exnout['Total'].to_numpy()
df_nbh6['Health Services'] = toronto_health['Total'].to_numpy()
df_nbh6['Education Buildings'] = toronto_edu['Total'].to_numpy()
df_nbh6.head()

Unnamed: 0,Borough,Neighbourhood,Postcode,Latitude,Longitude,Avg Income,Total Crime,Healthy Food Places,Un-healthy Food Places,Excercise & Outdoor Places,Health Services,Education Buildings
0,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353,25750,5487.6,0,0,6,0,1
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497,35239,628.3,4,0,4,1,0
2,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711,19687,3104.9,5,1,2,3,0
3,Scarborough,Woburn,M1G,43.770992,-79.216917,45592,1431.5,3,7,0,1,0
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,27546,1125.7,3,1,4,2,0


----

## Methodology <a name="methodology"></a>

In this project we directed our efforts on detecting several variables related to **neighborhoods in Toronto** and their built environment in order to check if they could work as social determinants of health. 

Firstly, we imported the base .csv file containing information on **Postal Code, Borough and Neighborhood**. After exploring the data and cleaning it, we incorporated the columns corresponding to the other databases found. These include the data on: **Lat/Lon, Average Income, Several Crime Rates and AQHI or Air Quality**. After further examination we had to drop this last column due to the great amount of missing values and the little information that the rest of the data offered.

Secondly, we have collected the required **data: location and type (category) of healthy & unhealthy food stores, exercise-related venues, health services (labels determined according to Foursquare categorization)**, grouping it by neighborhood and computing the total value for each. Adding these columns completed our dataframe, ready for analysis. 

In third and final step we will focus on the analysis.  We will create clusters (using **k-means clustering**) of those neighborhoods in order to identify those that are in most need of help from the Toronto Public Health division. We will also present a map of all such locations, color-coded for easier visualization.

----

## Analysis <a name="analysis"></a>

#### First, lets perform the Elbow method to determine the optimum value of K

In [24]:
toronto_grouped_clustering = df_nbh6.drop(['Borough', 'Neighbourhood', 'Postcode', 'Latitude', 'Longitude'], 1)

In [25]:
from sklearn.cluster import KMeans 
from sklearn import metrics 
from scipy.spatial.distance import cdist 
import numpy as np 
import matplotlib.pyplot as plt 

distortions = [] 
inertias = [] 
mapping1 = {} 
mapping2 = {} 
K = range(1,10) 
  
for k in K:  
    kmeanModel = KMeans(n_clusters=k).fit(toronto_grouped_clustering) 
    kmeanModel.fit(toronto_grouped_clustering)     
      
    distortions.append(sum(np.min(cdist(toronto_grouped_clustering, kmeanModel.cluster_centers_, 
                      'euclidean'),axis=1)) / toronto_grouped_clustering.shape[0]) 
    inertias.append(kmeanModel.inertia_) 
  
    mapping1[k] = sum(np.min(cdist(toronto_grouped_clustering, kmeanModel.cluster_centers_, 
                 'euclidean'),axis=1)) / toronto_grouped_clustering.shape[0] 
    mapping2[k] = kmeanModel.inertia_
    
for key,val in mapping1.items(): 
    print(str(key)+' : '+str(val))
    
plt.plot(K, distortions, 'bx-') 
plt.xlabel('Values of K') 
plt.ylabel('Distortion') 
plt.title('The Elbow Method using Distortion') 
plt.show()

1 : 21494.547553279423
2 : 10968.394509657857
3 : 9835.006258854282
4 : 6584.727041330581
5 : 5296.707418641556
6 : 4811.591318949299
7 : 4021.5044658639413
8 : 3532.3855044251122
9 : 3252.7595752973198


<Figure size 640x480 with 1 Axes>

#### Now lets perform the k-means clustering to group the neighborhoods using K = 4

In [26]:
kclusters = 4
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
kmeans.labels_[0:103]  

array([1, 1, 1, 3, 1, 0, 0, 1, 3, 3, 3, 0, 1, 3, 3, 3, 1, 0, 1, 3, 0, 1,
       0, 1, 0, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 0, 1, 1, 1, 1, 3, 1, 1,
       3, 0, 1, 1, 1, 3, 1, 3, 2, 1, 1, 3, 1, 1, 1, 3, 1, 1, 0, 1, 1, 0,
       3, 3, 1, 3, 3, 3, 1, 1, 0, 3, 1, 1, 3, 1, 3, 1, 1, 3, 1, 3, 1, 1,
       1, 1, 3, 3, 1, 3, 0, 1, 0, 1, 1, 3, 1, 1, 3], dtype=int32)

#### And add the cluster labels as a column

In [27]:
df_nbh6.insert(0, 'Cluster Labels', kmeans.labels_)
df_nbh6.head()

Unnamed: 0,Cluster Labels,Borough,Neighbourhood,Postcode,Latitude,Longitude,Avg Income,Total Crime,Healthy Food Places,Un-healthy Food Places,Excercise & Outdoor Places,Health Services,Education Buildings
0,1,Scarborough,"Rouge, Malvern",M1B,43.806686,-79.194353,25750,5487.6,0,0,6,0,1
1,1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497,35239,628.3,4,0,4,1,0
2,1,Scarborough,"Guildwood, Morningside, West Hill",M1E,43.763573,-79.188711,19687,3104.9,5,1,2,3,0
3,3,Scarborough,Woburn,M1G,43.770992,-79.216917,45592,1431.5,3,7,0,1,0
4,1,Scarborough,Cedarbrae,M1H,43.773136,-79.239476,27546,1125.7,3,1,4,2,0


### Exploring the clusters:

#### 1. By visualization in a map

In [28]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(df_nbh6['Latitude'], df_nbh6['Longitude'], df_nbh6['Neighbourhood'], df_nbh6['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### 2. By closely examining the different clusters

**Cluster 1**

In [29]:
cluster1 = df_nbh6.loc[df_nbh6['Cluster Labels'] == 0, df_nbh6.columns[[1] + list(range(5, df_nbh6.shape[1]))]]
cluster1.head()

Unnamed: 0,Borough,Longitude,Avg Income,Total Crime,Healthy Food Places,Un-healthy Food Places,Excercise & Outdoor Places,Health Services,Education Buildings
5,Scarborough,-79.239476,116651,306.8,1,1,8,1,0
6,Scarborough,-79.262029,92319,2900.1,1,1,2,0,0
11,Scarborough,-79.295849,80827,671.8,5,0,5,0,0
17,North York,-79.363452,114107,896.9,2,1,0,1,0
20,North York,-79.374714,103703,1725.0,4,0,3,0,1


**Cluster 2**

In [30]:
cluster2 = df_nbh6.loc[df_nbh6['Cluster Labels'] == 1, df_nbh6.columns[[1] + list(range(5, df_nbh6.shape[1]))]]
cluster2.head()

Unnamed: 0,Borough,Longitude,Avg Income,Total Crime,Healthy Food Places,Un-healthy Food Places,Excercise & Outdoor Places,Health Services,Education Buildings
0,Scarborough,-79.194353,25750,5487.6,0,0,6,0,1
1,Scarborough,-79.160497,35239,628.3,4,0,4,1,0
2,Scarborough,-79.188711,19687,3104.9,5,1,2,3,0
4,Scarborough,-79.239476,27546,1125.7,3,1,4,2,0
7,Scarborough,-79.284577,34169,604.8,10,2,0,1,0


**Cluster 3**

In [31]:
cluster3 = df_nbh6.loc[df_nbh6['Cluster Labels'] == 2, df_nbh6.columns[[1] + list(range(5, df_nbh6.shape[1]))]]
cluster3.head()

Unnamed: 0,Borough,Longitude,Avg Income,Total Crime,Healthy Food Places,Un-healthy Food Places,Excercise & Outdoor Places,Health Services,Education Buildings
26,North York,-79.352188,154825,1344.5,3,1,4,1,0
52,Downtown Toronto,-79.38316,213941,1565.2,5,1,2,0,0


**Cluster 4**

In [32]:
cluster4 = df_nbh6.loc[df_nbh6['Cluster Labels'] == 3, df_nbh6.columns[[1] + list(range(5, df_nbh6.shape[1]))]]
cluster4.head()

Unnamed: 0,Borough,Longitude,Avg Income,Total Crime,Healthy Food Places,Un-healthy Food Places,Excercise & Outdoor Places,Health Services,Education Buildings
3,Scarborough,-79.216917,45592,1431.5,3,7,0,1,0
8,Scarborough,-79.239476,40598,4382.2,5,0,7,0,0
9,Scarborough,-79.264848,46752,5579.3,0,2,6,0,0
10,Scarborough,-79.273304,41485,3973.4,3,3,11,1,0
13,Scarborough,-79.304302,48965,2392.1,0,0,9,0,0


----

## Results <a name="results"></a>

|           | Avg Income | Total Crime | Healthy Food Places | Unhealthy Food Places | Exercise & Outdoor Places | Health Services | Education Buildings |
|-----------|:----------:|:-----------:|:-------------------:|:---------------------:|:-------------------------:|:---------------:|:-------------------:|
| Cluster 1 |   28,264   |    3,870    |         2.8         |          2.5          |            2.2            |       1.43      |         1.2         |
| Cluster 2 |   98,553   |    2,066    |         3.7         |          1.3          |            3.9            |       2.16      |         2.1         |
| Cluster 3 |   184,383  |    1,454    |         5,3         |          1.0          |            4.6            |       3.28      |         2.7         |
| Cluster 4 |   46,953   |    3,240    |         3.5         |          1.7          |            3.1            |       1.86      |         1.9         |

-----

## Discussion <a name="discussion"></a>

As expected, our analysis shows that they are differences between Toronto's neighborhoods regarding the variables used. For example, **Cluster 3** represents the rich neighborhoods were inhabitants have great salaries and have a great choice of services around them including hospitals, schools and healthy food options, and therefore crime rate is quite low. 

However, in this project we didn't plan to concentrate our efforts on these neighborhoods. As for our stakeholder's objective, we wanted to isolate those neighborhoods with worse conditions and implement policies to solve possible problems causing these inequalities. We are talking about the neighborhoods in **Cluster 1**. These neighborhoods:
* **Have low incomes.** Therefore, this cluster includes the poorest neighborhoods in Toronto. Economic status is a crucial factor and it's the first piece of the domino when it comes to health. Low income results in many other problems further explained 
* **Have high crime rates.** Crime is usually connected to lower incomes. Many people in a precarious situation turn to crime and violence in desperation. Moreover, these kind of neighborhoods usually don't have the means (therapists, social services, etc) needed to help people that are suffering from mental illnesses, violation of human rights and so on.
* **Don't have many stores that promote healthy eating habits.** They have a really high rate of places like fast food restaurants and convenience stores compared to the amount of grocery stores and markets. This is probably because of the low demand of this places, and the main reason for this is probably price. Fresh, healthy food is usually more expensive, so people with lower purchasing power tend to go for cheaper, usually less healthy options such as fast food joints. Also, there is a high chance that these areas lack the health education to be informed about how to maintain a healthy lifestyle.
* **Have less places to exercise and enjoy the outdoors.** This is probably due to two main reasons. Firstly, governments of these poorer regions rarely invest money in such things as people want to feel as if they're investing in covering their basic needs first. Secondly, the high crime rate might deter people from practicing sports outside and therefore there'll be no demand for such facilities.
* **Have less medical and education facilities.** At first sight this doesn't look right as by Canadian law, health and education services should be placed in terms of distant and population density to make sure all of the population is covered and has access to the same services. However, this increase in the richer neighborhoods might be explained by the fact that privately run hospitals and schools are more likely to open in richer neighborhoods where people can afford them.

In terms of geographical placement of the clustered neighborhoods in the map, we can see that the poorer neighborhoods are usually in the outskirts of the city and the richer ones near to the city center.

----

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify if certain variables related to neighborhood and built environment could be counted as social determinants of health in the city of Toronto. We identified the neighborhoods with worse conditions in order to aid our stakeholders in narrowing down the search for the optimal targets for the policies they'll include in their next strategic plan. A cluster of 17 out of the 103 neighborhoods was found to have more precarious conditions and therefore should be the priority. Several hypothesis to explain the inequalities were made and solutions recommended.  

Final decision on the specific policies to be included in the 2020-2024 Strategic Plan are left to the Toronto Public Health division to make. 