## Capstone Project - The Battle of Neighborhoods (Week 2)

### 1. Setting up the environment.

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Libraries imported.


### 2. Parsing the html

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urlopen(url).read().decode('utf-8')
soup = BeautifulSoup(page, 'html.parser')

wiki_table = soup.body.table.tbody

### 3. Extracting data from the table to the data frame

In [3]:
def get_cell(element):
    cells = element.find_all('td')
    row = []
    
    for cell in cells:
        if cell.a:            
            if (cell.a.text):
                row.append(cell.a.text)
                continue
        row.append(cell.string.strip())
        
    return row

In [4]:
def get_row():    
    data = []  
    
    for tr in wiki_table.find_all('tr'):
        row = get_cell(tr)
        if len(row) != 3:
            continue
        data.append(row)        
    
    return data

### 4. Cleaning the data

In [5]:
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

#print(table_contents)

In [6]:
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

In [7]:
df.shape

(103, 3)

In [8]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


### 5. Adding the geospatial data to the data frame

In [9]:
df1 = df[df.Borough != 'Not assigned']
df1 = df1.sort_values(by=['PostalCode','Borough'])

df1.reset_index(inplace=True)
df1.drop('index',axis=1,inplace=True)

In [10]:
df_postcodes = df1['PostalCode']
df_postcodes.drop_duplicates(inplace=True)
df2 = pd.DataFrame(df_postcodes)
df2['Borough'] = '';
df2['Neighbourhood'] = '';


df2.reset_index(inplace=True)
df2.drop('index', axis=1, inplace=True)
df1.reset_index(inplace=True)
df1.drop('index', axis=1, inplace=True)

for i in df2.index:
    for j in df1.index:
        if df2.iloc[i, 0] == df1.iloc[j, 0]:
            df2.iloc[i, 1] = df1.iloc[j, 1]
            df2.iloc[i, 2] = df2.iloc[i, 2] + ',' + df1.iloc[j, 2]
            
for i in df2.index:
    s = df2.iloc[i, 2]
    if s[0] == ',':
        s =s [1:]

In [13]:
df2['Latitude'] = '0';
df2['Longitude'] = '0';

df_geo = pd.read_csv('Geospatial_Coordinates.csv')
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
for i in df2.index:
    for j in df_geo.index:
        if df2.iloc[i, 0] == df_geo.iloc[j, 0]:
            df2.iloc[i, 3] = df_geo.iloc[j, 1]
            df2.iloc[i, 4] = df_geo.iloc[j, 2]
            
df2.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,",Malvern, Rouge",43.8067,-79.1944
1,M1C,Scarborough,",Rouge Hill, Port Union, Highland Creek",43.7845,-79.1605
2,M1E,Scarborough,",Guildwood, Morningside, West Hill",43.7636,-79.1887
3,M1G,Scarborough,",Woburn",43.771,-79.2169
4,M1H,Scarborough,",Cedarbrae",43.7731,-79.2395


### 6. MAP

In [22]:
df3 = df2.copy()
df3 = df3[df2.Borough.str.contains("Scarborough")]
df3.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,",Malvern, Rouge",43.8067,-79.1944
1,M1C,Scarborough,",Rouge Hill, Port Union, Highland Creek",43.7845,-79.1605
2,M1E,Scarborough,",Guildwood, Morningside, West Hill",43.7636,-79.1887
3,M1G,Scarborough,",Woburn",43.771,-79.2169
4,M1H,Scarborough,",Cedarbrae",43.7731,-79.2395


In [26]:
toronto_map = folium.Map(location=[43.65, -79.4], zoom_start=12)

X = df3['Latitude']
Y = df3['Longitude']
Z = np.stack((X, Y), axis=1)

kmeans = KMeans(n_clusters=4, random_state=0).fit(Z)

clusters = kmeans.labels_
colors = ['red', 'green', 'blue', 'yellow']
df3['Cluster'] = clusters

for latitude, longitude, borough, cluster in zip(df3['Latitude'], df3['Longitude'], df3['Borough'], df3['Cluster']):
    label = folium.Popup(borough, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7).add_to(toronto_map)  

toronto_map

### 1.1.1 Load Toronto geospatial cooridinates and merge to Toronto Postal Code Data

In [34]:
# Load Toronto geospatial cooridinates
!wget -O Geospatial_Coordinates.csv http://cocl.us/Geospatial_data

#Read into dataframe
gf = pd.read_csv('Geospatial_Coordinates.csv')

#rename the coloumns so the match
gf = gf.rename(columns={'Postal Code':'PostalCode'})

#Merge the Toronto data with geo cooridinate data
gf_new = pd.merge(df2, gf, on='PostalCode', how='inner')

# display the new dataframe
gf_new.head()

'wget' is not recognized as an internal or external command,
operable program or batch file.


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude_x,Longitude_x,Latitude_y,Longitude_y
0,M1B,Scarborough,",Malvern, Rouge",43.8067,-79.1944,43.806686,-79.194353
1,M1C,Scarborough,",Rouge Hill, Port Union, Highland Creek",43.7845,-79.1605,43.784535,-79.160497
2,M1E,Scarborough,",Guildwood, Morningside, West Hill",43.7636,-79.1887,43.763573,-79.188711
3,M1G,Scarborough,",Woburn",43.771,-79.2169,43.770992,-79.216917
4,M1H,Scarborough,",Cedarbrae",43.7731,-79.2395,43.773136,-79.239476


### 1.2 Toronto neighborhoods populations broken down by postal code

In [35]:
# Load this data from Stats Canada
df_pop = pd.read_csv('https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/File.cfm?T=1201&SR=1&RPP=9999&PR=0&CMA=0&CSD=0&S=22&O=A&Lang=Eng&OFT=CSV',encoding = 'unicode_escape')
# Rename the columns appropiatley
df_pop = df_pop.rename(columns={'Geographic code':'PostalCode', 'Geographic name':'PostalCod2', 'Province or territory':'Province', 'Incompletely enumerated Indian reserves and Indian settlements, 2016':'Incomplete', 'Population, 2016':'Population_2016', 'Total private dwellings, 2016':'TotalPrivDwellings', 'Private dwellings occupied by usual residents, 2016':'PrivDwellingsOccupied'})
df_pop= df_pop.drop(columns=['PostalCod2', 'Province', 'Incomplete', 'TotalPrivDwellings', 'PrivDwellingsOccupied'])

# Get rid of the first row 
df_pop = df_pop.iloc[1:]
df_pop.head()

Unnamed: 0,PostalCode,Population_2016
1,A0A,46587.0
2,A0B,19792.0
3,A0C,12587.0
4,A0E,22294.0
5,A0G,35266.0


### 1.2.1 Merge Toronto Neighbourhood populations data with Toronto Postal Code data

In [36]:
#Merge the Toronto Pop data with geo postalcode data
gf_new
gf_new = pd.merge(df_pop, gf_new, on='PostalCode', how='right')
# sort on population
gf_new = gf_new.sort_values(by=['Population_2016'], ascending=False)

# display the new dataframe
gf_new.head()

Unnamed: 0,PostalCode,Population_2016,Borough,Neighbourhood,Latitude_x,Longitude_x,Latitude_y,Longitude_y
22,M2N,75897.0,North York,",Willowdale South",43.7701,-79.4085,43.77012,-79.408493
0,M1B,66108.0,Scarborough,",Malvern, Rouge",43.8067,-79.1944,43.806686,-79.194353
18,M2J,58293.0,North York,",Fairview, Henry Farm, Oriole",43.7785,-79.3466,43.778517,-79.346556
101,M9V,55959.0,Etobicoke,",South Steeles, Silverstone, Humbergate, James...",43.7394,-79.5884,43.739416,-79.588437
14,M1V,54680.0,Scarborough,",Milliken, Agincourt North, Steeles East, L'Am...",43.8153,-79.2846,43.815252,-79.284577


#### Key Observation here is a list of Toronto Neighborhoods Populations

### 1.3 Toronto neighborhoods average after tax income broken down by postal code

In [48]:
# It was easier to extract this data manually from Stats Canada and load it then it was to scrape it.
# It was only accessible from indeividual queries per postal code on the statscan web site.
df_income = pd.read_csv('TorontoAvgIncomeByPC.csv',encoding = 'unicode_escape')
# Rename the after tax income column to a more maanageable name
df_income = df_income.rename(columns={"after-tax income of households in 2015":"AfterTaxIncome2015"})
df_income.head()

Unnamed: 0,PostalCode,AfterTaxIncome2015,Population_2016,Bourough,Neighborhood,Latitude,Longitude
66,M2P,115237.0,7843.0,North York,York Mills West,43.752758,-79.400049
55,M5M,111821.0,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975
61,M4N,109841.0,15330.0,Central Toronto,Lawrence Park,43.72802,-79.38879
74,M5R,108271.0,26496.0,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678
98,M8X,97210.0,10787.0,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944


### 1.3.1 Merge Toronto Neighbourhood income data with Toronto Postal Code data

In [50]:
#Merge the Toronto Income data with geo postalcode data

gf_new = pd.merge(df_income, gf_new, on='PostalCode', how='right')
# get rid of the Nulls
gf_new = gf_new.replace('Null', 0)
#gf_new cast as float
gf_new['AfterTaxIncome2015'] = gf_new['AfterTaxIncome2015'].astype('float64') 
# Sort on Income
gf_new = gf_new.sort_values(by=['AfterTaxIncome2015'], ascending=False)

# display the new dataframe
gf_new.to_csv('TO_Affluence.csv')
gf_new.head(10)

Unnamed: 0,PostalCode,AfterTaxIncome2015,Population_2016_x,Bourough,Neighborhood,Latitude,Longitude,AfterTaxIncome2015_x,Population_2016_y,Bourough_x,Neighborhood_x,Latitude_x,Longitude_x,AfterTaxIncome2015_y,Population_2016_x.1,Bourough_y,Neighborhood_y,Latitude_y,Longitude_y,Population_2016_y.1,Borough,Neighbourhood,Latitude_x.1,Longitude_x.1,Latitude_y.1,Longitude_y.1
0,M2P,115237.0,7843.0,North York,York Mills West,43.752758,-79.400049,115237.0,7843.0,North York,York Mills West,43.752758,-79.400049,115237.0,7843.0,North York,York Mills West,43.752758,-79.400049,7843.0,North York,",York Mills West",43.752758,-79.400049,43.752758,-79.400049
1,M5M,111821.0,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,111821.0,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,111821.0,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,25975.0,North York,",Bedford Park, Lawrence Manor East",43.733283,-79.41975,43.733283,-79.41975
2,M4N,109841.0,15330.0,Central Toronto,Lawrence Park,43.72802,-79.38879,109841.0,15330.0,Central Toronto,Lawrence Park,43.72802,-79.38879,109841.0,15330.0,Central Toronto,Lawrence Park,43.72802,-79.38879,15330.0,Central Toronto,",Lawrence Park",43.72802,-79.38879,43.72802,-79.38879
3,M5R,108271.0,26496.0,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,108271.0,26496.0,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,108271.0,26496.0,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,26496.0,Central Toronto,",The Annex, North Midtown, Yorkville",43.67271,-79.405678,43.67271,-79.405678
4,M8X,97210.0,10787.0,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,97210.0,10787.0,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,97210.0,10787.0,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,10787.0,Etobicoke,",The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,43.653654,-79.506944
5,M2L,96512.0,11717.0,North York,"Silver Hills, York Mills",43.75749,-79.374714,96512.0,11717.0,North York,"Silver Hills, York Mills",43.75749,-79.374714,96512.0,11717.0,North York,"Silver Hills, York Mills",43.75749,-79.374714,11717.0,North York,",York Mills, Silver Hills",43.75749,-79.374714,43.75749,-79.374714
6,M4G,94853.0,19076.0,East York,Leaside,43.70906,-79.363452,94853.0,19076.0,East York,Leaside,43.70906,-79.363452,94853.0,19076.0,East York,Leaside,43.70906,-79.363452,19076.0,East York,",Leaside",43.70906,-79.363452,43.70906,-79.363452
7,M1C,93943.0,35626.0,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,93943.0,35626.0,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,93943.0,35626.0,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,35626.0,Scarborough,",Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,43.784535,-79.160497
8,M9B,91110.0,32400.0,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724,91110.0,32400.0,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724,91110.0,32400.0,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724,32400.0,Etobicoke,",West Deane Park, Princess Gardens, Martin Gro...",43.650943,-79.554724,43.650943,-79.554724
9,M3B,90841.0,13324.0,North York,Don Mills North,43.745906,-79.352188,90841.0,13324.0,North York,Don Mills North,43.745906,-79.352188,90841.0,13324.0,North York,Don Mills North,43.745906,-79.352188,13324.0,North York,",Don Mills North",43.745906,-79.352188,43.745906,-79.352188


#### Key Observation Toronto Affluence by Neighborhood

### 1.4 What is the Canadian National Average After Tax Income.

###### Again obtained from the Stats Canada Website Canadian families and unattached individuals had a median after-tax income of $57,000 in 2016.

### 1.5 Toronto list of Restaurants or Venues that could potentially use Restaurant Equipment.

In [70]:
#FourSquare Credentials

CLIENT_ID = '0NXJMHYD1AEEL2REWVFDTXH2AQJA2NHYPXMLE0FG1LMSXZRW' # your Foursquare ID


CLIENT_SECRET = '2NC4I3TZWQW4TDFXD5LVEO4JDJUYEQPATUXTBM0WZWUOEJE5' # your Foursquare Secret


VERSION = '20180605' # Foursquare API version

In [52]:
#Let's explore neighborhoods in our dataframe.
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

LIMIT = 200 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [53]:
# Toronto Bouroughs
TO_data = gf_new
TO_data.head()

Unnamed: 0,PostalCode,AfterTaxIncome2015,Population_2016_x,Bourough,Neighborhood,Latitude,Longitude,AfterTaxIncome2015_x,Population_2016_y,Bourough_x,Neighborhood_x,Latitude_x,Longitude_x,AfterTaxIncome2015_y,Population_2016_x.1,Bourough_y,Neighborhood_y,Latitude_y,Longitude_y,Population_2016_y.1,Borough,Neighbourhood,Latitude_x.1,Longitude_x.1,Latitude_y.1,Longitude_y.1
0,M2P,115237.0,7843.0,North York,York Mills West,43.752758,-79.400049,115237.0,7843.0,North York,York Mills West,43.752758,-79.400049,115237.0,7843.0,North York,York Mills West,43.752758,-79.400049,7843.0,North York,",York Mills West",43.752758,-79.400049,43.752758,-79.400049
1,M5M,111821.0,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,111821.0,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,111821.0,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,25975.0,North York,",Bedford Park, Lawrence Manor East",43.733283,-79.41975,43.733283,-79.41975
2,M4N,109841.0,15330.0,Central Toronto,Lawrence Park,43.72802,-79.38879,109841.0,15330.0,Central Toronto,Lawrence Park,43.72802,-79.38879,109841.0,15330.0,Central Toronto,Lawrence Park,43.72802,-79.38879,15330.0,Central Toronto,",Lawrence Park",43.72802,-79.38879,43.72802,-79.38879
3,M5R,108271.0,26496.0,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,108271.0,26496.0,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,108271.0,26496.0,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,26496.0,Central Toronto,",The Annex, North Midtown, Yorkville",43.67271,-79.405678,43.67271,-79.405678
4,M8X,97210.0,10787.0,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,97210.0,10787.0,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,97210.0,10787.0,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,10787.0,Etobicoke,",The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,43.653654,-79.506944


### 1.5.1 Get all the Venues in Toronto.

In [54]:
# Get all of the Venues
TO_venues = getNearbyVenues(names=TO_data['Neighborhood'],
                                   latitudes=TO_data['Latitude'],
                                   longitudes=TO_data['Longitude']
                                  )

York Mills West
Bedford Park, Lawrence Manor East
Lawrence Park
The Annex, North Midtown, Yorkville
The Kingsway, Montgomery Road, Old Mill North
Silver Hills, York Mills
Leaside
Highland Creek, Rouge Hill, Port Union
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Don Mills North
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
East Toronto
Upper Rouge
Birch Cliff, Cliffside West
Little Portugal, Trinity
Rosedale
Moore Park, Summerhill East
Roselawn
The Beaches
Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor
North Toronto West
Runnymede, Swansea
Berczy Park
Forest Hill North, Forest Hill West
Woodbine Heights
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Harbourfront East, Toronto Islands, Union Station
Adelaide, King, Richmond
St. James Town
The Beaches West, India Bazaar
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Kings

##### Let's count the number of Venues per Neighorhood

In [55]:
TO_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",94,94,94,94,94,94
Agincourt,4,4,4,4,4,4
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",4,4,4,4,4,4
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",9,9,9,9,9,9
"Alderwood, Long Branch",7,7,7,7,7,7
"Bathurst Manor, Downsview North, Wilson Heights",22,22,22,22,22,22
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",28,28,28,28,28,28
Berczy Park,60,60,60,60,60,60
"Birch Cliff, Cliffside West",4,4,4,4,4,4


In [56]:
# Let's pick out restaurants from Venue Categories

print('Unique Venue Categories:')
list(TO_venues['Venue Category'].unique())

Unique Venue Categories:


['Convenience Store',
 'Park',
 'Construction & Landscaping',
 'Café',
 'Restaurant',
 'Comfort Food Restaurant',
 'Pub',
 'Indian Restaurant',
 'Italian Restaurant',
 'Sushi Restaurant',
 'Thai Restaurant',
 'Juice Bar',
 'Greek Restaurant',
 'Liquor Store',
 'Coffee Shop',
 'Pharmacy',
 'Sandwich Place',
 'Grocery Store',
 'Butcher',
 'American Restaurant',
 'Pizza Place',
 'Fast Food Restaurant',
 'Japanese Restaurant',
 'Spa',
 'Pet Store',
 'Boutique',
 'Cupcake Shop',
 'Business Service',
 'Swim School',
 'Bus Line',
 'Burger Joint',
 'BBQ Joint',
 'Middle Eastern Restaurant',
 'History Museum',
 'Donut Shop',
 'Flower Shop',
 'River',
 'Martial Arts School',
 'Sporting Goods Shop',
 'Sports Bar',
 'Fish & Chips Shop',
 'Bike Shop',
 'Supermarket',
 'Smoothie Shop',
 'Shopping Mall',
 'Department Store',
 'Dessert Shop',
 'Bank',
 'Brewery',
 'Breakfast Spot',
 'Electronics Store',
 'Beer Store',
 'Furniture / Home Store',
 'Mexican Restaurant',
 'Bagel Shop',
 'Bar',
 'Caribbean

### 1.5.2 Only add Restaurants as Venue Categories

In [57]:
# Here we manually pick out restaurants or 'features' from the unique venue list and that we want to examine for similiarity during clustering
rest_list = ['Steakhouse', 'Coffee Shop', 'Café', 'Ramen Restaurant', 'Indonesian Restaurant', 'Restaurant', 'Japanese Restaurant', 
             'Fast Food Restaurant', 'Sushi Restaurant', 'Vietnamese Restaurant', 'Pizza Place', 'Sandwich Place', 'Middle Eastern Restaurant', 
             'Burger Joint', 'American Restaurant', 'Food Court', 'Wings Joint', 'Burrito Place', 'Asian Restaurant', 'Deli / Bodega', 
             'Greek Restaurant', 'Fried Chicken Joint', 'Airport Food Court', 'Chinese Restaurant', 'Breakfast Spot', 'Mexican Restaurant',
             'Indian Restaurant', 'Latin American Restaurant', 'Bar', 'Pub', 'Italian Restaurant', 'French Restaurant', 'Ice Cream Shop', 
             'Caribbean Restaurant', 'Gastropub', 'Thai Restaurant', 'Cajun / Creole Restaurant', 'Diner', 'Dim Sum Restaurant', 'Seafood Restaurant', 
             'Food & Drink Shop', 'Noodle House', 'Food', 'Fish & Chips Shop', 'Falafel Restaurant', 'Gourmet Shop', 'Vegetarian / Vegan Restaurant', 
             'South American Restaurant', 'Korean Restaurant', 'Cuban Restaurant', 'New American Restaurant', 'Malay Restaurant', 'Mac & Cheese Joint',
             'Bistro', 'Southern / Soul Food Restaurant', 'Tapas Restaurant',  'Sports Bar', 'Polish Restaurant', 'Ethiopian Restaurant', 
             'Creperie', 'Sake Bar', 'Persian Restaurant', 'Afghan Restaurant','Mediterranean Restaurant', 'BBQ Joint', 'Jewish Restaurant', 
             'Comfort Food Restaurant',  'Hakka Restaurant', 'Food Truck', 'Taiwanese Restaurant',  'Snack Place', 'Eastern European Restaurant', 
             'Dumpling Restaurant', 'Belgian Restaurant', 'Arepa Restaurant', 'Taco Place', 'Doner Restaurant', 'Filipino Restaurant', 
             'Hotpot Restaurant', 'Poutine Place', 'Salad Place',  'Portuguese Restaurant', 'Modern European Restaurant', 'Empanada Restaurant', 
             'Irish Pub', 'Molecular Gastronomy Restaurant', 'German Restaurant', 'Brazilian Restaurant', 'Gluten-free Restaurant', 'Soup Place']

rest_pd = pd.DataFrame(rest_list)
#rest_pd
#rename the coloumns so the match
rest_pd = rest_pd.rename(columns={0:'Venue Category'})

#Join the 2 dataframes as instructed
TO_new = pd.merge(TO_venues, rest_pd, on='Venue Category', how='right')

# display the new dataframe
#TO_new

TO_new.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",55,55,55,55,55,55
Agincourt,2,2,2,2,2,2
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",4,4,4,4,4,4
"Alderwood, Long Branch",5,5,5,5,5,5
"Bathurst Manor, Downsview North, Wilson Heights",11,11,11,11,11,11
Bayview Village,3,3,3,3,3,3
"Bedford Park, Lawrence Manor East",19,19,19,19,19,19
Berczy Park,26,26,26,26,26,26
"Birch Cliff, Cliffside West",1,1,1,1,1,1
"Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe",3,3,3,3,3,3


### 1.5.3 OneHot encode and count restaurants

In [58]:
# one hot encoding
TO_new_onehot = pd.get_dummies(TO_new[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
TO_new_onehot['Neighborhood'] = TO_new['Neighborhood'] 


# move neighborhood column to the first column
fixed_columns = [TO_new_onehot.columns[-1]] + list(TO_new_onehot.columns[:-1])
TO_new_onehot = TO_new_onehot[fixed_columns]

TO_new_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Food Court,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bar,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Creperie,Cuban Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Hakka Restaurant,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Korean Restaurant,Latin American Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,New American Restaurant,Noodle House,Persian Restaurant,Pizza Place,Polish Restaurant,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Restaurant,Sake Bar,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Berczy Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1,"Harbourfront East, Toronto Islands, Union Station",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
2,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
3,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
4,"The Beaches West, India Bazaar",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


In [59]:
#Analyze each neighbourhood



TO_grouped = TO_new_onehot.groupby('Neighborhood').mean().reset_index()
TO_grouped.shape


TO_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Food Court,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bar,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Creperie,Cuban Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Hakka Restaurant,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Korean Restaurant,Latin American Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,New American Restaurant,Noodle House,Persian Restaurant,Pizza Place,Polish Restaurant,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Restaurant,Sake Bar,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,"Adelaide, King, Richmond",0.0,0.0,0.036364,0.0,0.018182,0.0,0.018182,0.0,0.0,0.018182,0.018182,0.018182,0.036364,0.090909,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.054545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.018182,0.018182,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.072727,0.0,0.036364,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.036364,0.036364,0.0,0.0,0.0,0.054545,0.018182,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 2. Begin to Cluster
#### Use silhouette score to find optimal number of clusters to segment the data

In [61]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np


TO_grouped_clustering = TO_grouped.drop('Neighborhood', 1)

# Use silhouette score to find optimal number of clusters to segment the data
kclusters = np.arange(2,10)
results = {}
for size in kclusters:
    model = KMeans(n_clusters = size).fit(TO_grouped_clustering)
    predictions = model.predict(TO_grouped_clustering)
    results[size] = silhouette_score(TO_grouped_clustering, predictions)

best_size = max(results, key=results.get)
best_size

2

### 2.1 Run K means and segment data into clusters and generate labels

In [62]:
#import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = best_size


# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(TO_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = TO_grouped['Neighborhood']

for ind in np.arange(TO_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(TO_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Restaurant,Deli / Bodega,Thai Restaurant,American Restaurant,Sushi Restaurant,Steakhouse,Burrito Place,Salad Place
1,Agincourt,Breakfast Spot,Latin American Restaurant,Food,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant
2,"Albion Gardens, Beaumond Heights, Humbergate, ...",Fast Food Restaurant,Fried Chicken Joint,Sandwich Place,Pizza Place,Wings Joint,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
3,"Alderwood, Long Branch",Pizza Place,Pub,Coffee Shop,Sandwich Place,Falafel Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant
4,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Pizza Place,Sandwich Place,Diner,Fried Chicken Joint,Ice Cream Shop,Middle Eastern Restaurant,Sushi Restaurant,Restaurant,Deli / Bodega


### 2.2 Merge the Toronto data with geo coordinates data and make sure it's the right shape

In [64]:
#Merge the Toronto data with geo cooridinate data and make sure it's the right shape
TO_labels = pd.merge(TO_data,TO_grouped, on='Neighborhood', how='right')
TO_labels.shape


TO_labels = TO_labels.drop(columns=['Steakhouse', 'Coffee Shop', 'Café', 'Ramen Restaurant', 'Indonesian Restaurant', 'Restaurant', 'Japanese Restaurant', 
             'Fast Food Restaurant', 'Sushi Restaurant', 'Vietnamese Restaurant', 'Pizza Place', 'Sandwich Place', 'Middle Eastern Restaurant', 
             'Burger Joint', 'American Restaurant', 'Food Court', 'Wings Joint', 'Burrito Place', 'Asian Restaurant', 'Deli / Bodega', 
             'Greek Restaurant', 'Fried Chicken Joint', 'Airport Food Court', 'Chinese Restaurant', 'Breakfast Spot', 'Mexican Restaurant',
             'Indian Restaurant', 'Latin American Restaurant', 'Bar', 'Pub', 'Italian Restaurant', 'French Restaurant', 'Ice Cream Shop', 
             'Caribbean Restaurant', 'Gastropub', 'Thai Restaurant', 'Cajun / Creole Restaurant', 'Diner', 'Dim Sum Restaurant', 'Seafood Restaurant', 
             'Food & Drink Shop', 'Noodle House', 'Food', 'Fish & Chips Shop', 'Falafel Restaurant', 'Gourmet Shop', 'Vegetarian / Vegan Restaurant', 
             'South American Restaurant', 'Korean Restaurant', 'Cuban Restaurant', 'New American Restaurant', 'Malay Restaurant', 'Mac & Cheese Joint',
             'Bistro', 'Southern / Soul Food Restaurant', 'Tapas Restaurant',  'Sports Bar', 'Polish Restaurant', 'Ethiopian Restaurant', 
             'Creperie', 'Sake Bar', 'Persian Restaurant', 'Afghan Restaurant','Mediterranean Restaurant', 'BBQ Joint', 'Jewish Restaurant', 
             'Comfort Food Restaurant',  'Hakka Restaurant', 'Food Truck', 'Taiwanese Restaurant',  'Snack Place', 'Eastern European Restaurant', 
             'Dumpling Restaurant', 'Belgian Restaurant', 'Arepa Restaurant', 'Taco Place', 'Doner Restaurant', 'Filipino Restaurant', 
             'Hotpot Restaurant', 'Poutine Place', 'Salad Place',  'Portuguese Restaurant', 'Modern European Restaurant', 'Empanada Restaurant', 
             'Irish Pub', 'Molecular Gastronomy Restaurant', 'German Restaurant', 'Brazilian Restaurant', 'Gluten-free Restaurant', 'Soup Place'])
TO_labels.head()

Unnamed: 0,PostalCode,AfterTaxIncome2015,Population_2016_x,Bourough,Neighborhood,Latitude,Longitude,AfterTaxIncome2015_x,Population_2016_y,Bourough_x,Neighborhood_x,Latitude_x,Longitude_x,AfterTaxIncome2015_y,Population_2016_x.1,Bourough_y,Neighborhood_y,Latitude_y,Longitude_y,Population_2016_y.1,Borough,Neighbourhood,Latitude_x.1,Longitude_x.1,Latitude_y.1,Longitude_y.1
0,M5H,70571.0,2005.0,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,70571.0,2005.0,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,70571.0,2005.0,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,2005.0,Downtown Toronto,",Richmond, Adelaide, King",43.650571,-79.384568,43.650571,-79.384568
1,M1S,57738.0,37769.0,Scarborough,Agincourt,43.7942,-79.262029,57738.0,37769.0,Scarborough,Agincourt,43.7942,-79.262029,57738.0,37769.0,Scarborough,Agincourt,43.7942,-79.262029,37769.0,Scarborough,",Agincourt",43.7942,-79.262029,43.7942,-79.262029
2,M9V,55443.0,55959.0,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437,55443.0,55959.0,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437,55443.0,55959.0,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437,55959.0,Etobicoke,",South Steeles, Silverstone, Humbergate, James...",43.739416,-79.588437,43.739416,-79.588437
3,M8W,63602.0,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,63602.0,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,63602.0,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,20674.0,Etobicoke,",Alderwood, Long Branch",43.602414,-79.543484,43.602414,-79.543484
4,M3H,63342.0,37011.0,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259,63342.0,37011.0,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259,63342.0,37011.0,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259,37011.0,North York,",Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,43.754328,-79.442259


### 2.3 Add the KMeans Labels

In [65]:
TO_merged = TO_labels

# add clustering labels
TO_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
TO_merged = TO_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

TO_merged.head() # check the last columns!

Unnamed: 0,PostalCode,AfterTaxIncome2015,Population_2016_x,Bourough,Neighborhood,Latitude,Longitude,AfterTaxIncome2015_x,Population_2016_y,Bourough_x,Neighborhood_x,Latitude_x,Longitude_x,AfterTaxIncome2015_y,Population_2016_x.1,Bourough_y,Neighborhood_y,Latitude_y,Longitude_y,Population_2016_y.1,Borough,Neighbourhood,Latitude_x.1,Longitude_x.1,Latitude_y.1,Longitude_y.1,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5H,70571.0,2005.0,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,70571.0,2005.0,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,70571.0,2005.0,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,2005.0,Downtown Toronto,",Richmond, Adelaide, King",43.650571,-79.384568,43.650571,-79.384568,1,Coffee Shop,Café,Restaurant,Deli / Bodega,Thai Restaurant,American Restaurant,Sushi Restaurant,Steakhouse,Burrito Place,Salad Place
1,M1S,57738.0,37769.0,Scarborough,Agincourt,43.7942,-79.262029,57738.0,37769.0,Scarborough,Agincourt,43.7942,-79.262029,57738.0,37769.0,Scarborough,Agincourt,43.7942,-79.262029,37769.0,Scarborough,",Agincourt",43.7942,-79.262029,43.7942,-79.262029,1,Breakfast Spot,Latin American Restaurant,Food,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant
2,M9V,55443.0,55959.0,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437,55443.0,55959.0,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437,55443.0,55959.0,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437,55959.0,Etobicoke,",South Steeles, Silverstone, Humbergate, James...",43.739416,-79.588437,43.739416,-79.588437,1,Fast Food Restaurant,Fried Chicken Joint,Sandwich Place,Pizza Place,Wings Joint,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
3,M8W,63602.0,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,63602.0,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,63602.0,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,20674.0,Etobicoke,",Alderwood, Long Branch",43.602414,-79.543484,43.602414,-79.543484,1,Pizza Place,Pub,Coffee Shop,Sandwich Place,Falafel Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant
4,M3H,63342.0,37011.0,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259,63342.0,37011.0,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259,63342.0,37011.0,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259,37011.0,North York,",Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,43.754328,-79.442259,1,Coffee Shop,Pizza Place,Sandwich Place,Diner,Fried Chicken Joint,Ice Cream Shop,Middle Eastern Restaurant,Sushi Restaurant,Restaurant,Deli / Bodega


In [66]:
TO_merged_new1 = TO_merged.loc[TO_merged['Cluster Labels'] == 0, TO_merged.columns[[3, 4] + list(range(5, TO_merged.shape[1]))]]
TO_merged_new1.shape

(3, 45)

In [67]:
TO_merged_new2 = TO_merged.loc[TO_merged['Cluster Labels'] == 1, TO_merged.columns[[3, 4] + list(range(5, TO_merged.shape[1]))]]
TO_merged_new2.shape

(78, 45)

### 3. Cluster 2 Contains the highest cluster density. We need to find the geographic centroid for this cluster. This is the optimum location for a new Restaurant Supply Store.

In [68]:
# Find the geographic center of the most dense or like cluster.
Cluster_0_coorid = TO_merged_new2[['Latitude', 'Longitude']]
Cluster_0_coorid = list(Cluster_0_coorid.values) 
lat = []
long = []



for l in Cluster_0_coorid:
  lat.append(l[0])
  long.append(l[1])



Blatitude = sum(lat)/len(lat)
Blongitude = sum(long)/len(long)
print(Blatitude)
print(Blongitude)

43.69831139615384
-79.38773257692307


### 3.1 Install opencage to reverse lookup the coordinates

In [73]:
# Intstall opencage to reverse lookup the cooridinates
!pip install opencage
from opencage.geocoder import OpenCageGeocode
from pprint import pprint

key = '1d97b344df184b1cb0d2427663f85ac6'
geocoder = OpenCageGeocode(key)

results = geocoder.reverse_geocode(Blatitude, Blongitude)
pprint(results)



NotAuthorizedError: Your API key is not authorized. You may have entered it incorrectly.

In [74]:
#Obtain the popupstring of the best location
popstring = TO_data[TO_data['PostalCode'].str.contains('M4S')]

def str_join(*args):
    return ''.join(map(str, args))

popstring_new = str_join('The Best Neighbourhood to locate a Restaurant Supply Store is in: ', popstring['Neighborhood'].values,  ' in ' ,  popstring['Bourough'].values)


print(popstring_new)

The Best Neighbourhood to locate a Restaurant Supply Store is in: ['Davisville'] in ['Central Toronto']


## 4. Results
### 4.1 Plot the clusters on a Map of the Toronto and Super Impose the best location of a Store

In [76]:
# getfolium
import folium 
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(TO_merged['Latitude'], TO_merged['Longitude'], TO_merged['Neighborhood'], TO_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
folium.CircleMarker([Blatitude, Blongitude],
                    radius=50,
                    popup='Toronto',
                    color='red',
                    ).add_to(map_clusters)

# Interactive marker
map_clusters.add_child(folium.ClickForMarker(popup=popstring_new))
       
#map_clusters
map_clusters.save('map_clusters.html')

### 4.2 Exact Address of desired Location

In [77]:
print('The exact Address to locate would be: 268 Balliol Street, ON M4S 1C2, Canada or lat: 43.6991598, lng: -79.3878871')

The exact Address to locate would be: 268 Balliol Street, ON M4S 1C2, Canada or lat: 43.6991598, lng: -79.3878871


## Discussion:
### 5.1 Explaining the results
 As we built our list of neighborhoods with Restaurant venues exclusively we discovered most neighborhoods were similar and the greatest concentration of restaurants was in Central Toronto and downtown Toronto. This might seem obvious but it would also appear that these are some of the most affluent neighborhoods in Toronto so there appears to be correlation. By Locating in the general vicinity of the Exact location my friend could be geographically centered in this cluster and poised to service his restaurant customer base with the greatest efficiency.

When we built our our K-Means dataset we used Silhouette analysis to tell us there was a lot of similarity between neighborhoods and the most common restaurants contained with in. Really there was only 2 types of cluster or neighborhoods in greater Toronto. The vast majority of those were in 1 cluster. So Toronto restaurants might be many but they are very homogeneously located near the center of Toronto.

Of the 103 Toronto Neighborhoods gathered only 55.3% or 57 Neighborhoods are above the median after-tax income. 37.8% or 39 Neighborhoods are below he median after-tax income. 6.7% or 7 neighborhoods did not register as it appears their populations are too low. It appears that the greatest concentration of affluence is near central Toronto. We decided to keep all neighborhoods in the dataset regardless of income of population as the majority were close enough.

## Conclusion:
I feel confident with the recommendation I have given my friend as it is backed up with demonstrated data analysis. While nothing can ever be 100% certain he will certainly be better informed than he was prior to asking for my help.

Much more inference can be obtained with more work. A potential side business for my friend might be assisting new restaurant owners where they might locate a new restaurant, who their competition is and who their clientele might be.

In [78]:
map_clusters