# The Battle of Neighborhoods(Final Project)

## Build a model to predict the best location to open an Indian Restaurant in Toronto.

### Methodology

I will be scrping data form the Wikipage of Toronto Neighborhoods.Then i will use geopy to find the location of Toronto. After that I will use Foursquare API to fetch all the venues in Toronto with venue names, venue category, venue latitudes and venue longitudes. Then using one-hot encoding find the count of Indian Restaurant present in all of Toronto. After that i will apply K-Means clustering to group the locations into different clusters and thus find the best cluster to open the restaurant.

### Import libraries

In [2]:
import numpy as np
import pandas as pd
import json,requests
from bs4 import BeautifulSoup
from selenium import webdriver
import folium
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans

print("Libraries imported")

Libraries imported


### Using BeautifulSoup and Selenium  Web driver for Scraping data

In [3]:
driver= webdriver.Chrome(executable_path=r'd:\Profiles\sahsrivastava\Downloads\chromedriver.exe')

In [4]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
driver.get(url)
soup=BeautifulSoup(driver.page_source,'lxml')

In [5]:
postalCodeList = []
boroughList = []
neighborhoodList = []

In [6]:
table=soup.find('table')
tr=table.find_all('tr')
for row in tr:
    cells=row.find_all('td')
    if len(cells)>0:
        postalCodeList.append(cells[0].text.rstrip('\n'))
        boroughList.append(cells[1].text.rstrip('\n'))
        neighborhoodList.append(cells[2].text.rstrip('\n'))

In [7]:
toronto_df=pd.DataFrame({'PostalCode': postalCodeList,'Borough': boroughList, 'NeighborhoodList': neighborhoodList})

In [8]:
toronto_df.head()

Unnamed: 0,PostalCode,Borough,NeighborhoodList
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Drop cells with a borough that is "Not assigned"

In [9]:
toronto_df_new=toronto_df[toronto_df['Borough']!='Not assigned'].reset_index(drop=True)
toronto_df_new.head()

Unnamed: 0,PostalCode,Borough,NeighborhoodList
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


We can see in the DataFrame that Neighborhood List is already grouped according to Postal code and Borough,thus no need to performn grouping step.

### Check whether any NeighborhoodList column has values "Not assigned"

In [10]:
toronto_df_new[toronto_df_new['NeighborhoodList']=='Not assigned']

Unnamed: 0,PostalCode,Borough,NeighborhoodList


### Shape of DataFrame

In [11]:
toronto_df_new.shape

(103, 3)

### Now we wil store the dataframe in a csv file

In [12]:
toronto_df_new.to_csv("Toronto_data.csv")

Now i will find coordinates for each borough but since there is some glitch in Geopy library, we will use csv file wwhich has geospatial coordinates.

### Adding coordinates csv file into DataFrame

In [13]:
coordinates=pd.read_csv('Geospatial_Coordinates.csv')
coordinates

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [14]:
coordinates.rename(columns={"Postal Code": "PostalCode"}, inplace=True)
coordinates.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merge two tables

In [15]:
toronto_df_final=pd.merge(toronto_df_new, coordinates, on='PostalCode', how='outer')
toronto_df_final.rename(columns={'NeighborhoodList': 'Neighborhood'},inplace=True)
toronto_df_final

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


### Now we will create a final csv file with full data of Borough with their corresponding coordinates.

In [16]:
toronto_df_final.to_csv('Toronto_Data_Full.csv')

In [22]:
print("The dataframe has {} unique Borough and {} neighborhoods."
      .format(len(toronto_df_final['Borough'].unique()),toronto_df_final.shape[0]))

The dataframe has 10 unique Borough and 103 neighborhoods.


### Use Geopy library to fetch the Latitudes and Longitudes of Toronto

In [23]:
address='Toronto'
geolocator=Nominatim(user_agent='toronto_explorer')
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print("The coordinates of Toronto are {},{}".format(latitude,longitude))

The coordinates of Toronto are 43.6534817,-79.3839347


### Create map of Toronto

In [24]:
map_toronto=folium.Map(location=[latitude,longitude],zoom_start=10)

for lat,long,borough,neighborhood in zip(toronto_df_final['Latitude'],toronto_df_final['Longitude'],toronto_df_final['Borough'],toronto_df_final['Neighborhood']):
    label='{},{}'.format(neighborhood,borough)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)

map_toronto

### Select Boroughs which contains Toronto in it

In [25]:
borough_list=toronto_df_final[toronto_df_final['Borough'].str.contains('Toronto')][['Borough']].reset_index(drop=True)
borough_with_toronto=list(borough_list['Borough'].unique())
borough_with_toronto

['Downtown Toronto', 'East Toronto', 'West Toronto', 'Central Toronto']

In [26]:
toronto_df_final=toronto_df_final[toronto_df_final['Borough'].isin(borough_with_toronto)].reset_index(drop=True)
toronto_df_final

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


In [27]:
toronto_df_final.shape

(39, 5)

### Create the Map of Borough which contains Toronto in it

In [28]:
map_toronto=folium.Map(location=[latitude,longitude],zoom_start=10)

for lat,long,borough,neighborhood in zip(toronto_df_final['Latitude'],toronto_df_final['Longitude'],toronto_df_final['Borough'],toronto_df_final['Neighborhood']):
    label='{},{}'.format(neighborhood,borough)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)

map_toronto

### Use FourSquare API to explore neighborhoods of Borough which contains Toronto

In [29]:
CLIENT_ID='SFH5QU3GBCQFYEOGP0YCSDVVRWFQ4PLEZFJBURYLT5TTXG2Y'
CLIENT_SECRET='SJCIS0BFL3XGPQSQAKT3HL5PQPMW1XPV54BM0F3TSGVXCJEZ'
VERSION='20200703'

print('Credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Credentials:
CLIENT_ID: SFH5QU3GBCQFYEOGP0YCSDVVRWFQ4PLEZFJBURYLT5TTXG2Y
CLIENT_SECRET:SJCIS0BFL3XGPQSQAKT3HL5PQPMW1XPV54BM0F3TSGVXCJEZ


### Get top 100 values and convert all values into a DataFrame

In [30]:
radius = 500
LIMIT = 100

def getNearbyVenues(names,postal_code,borough, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng, ps_code, br in zip(names, latitudes, longitudes,postal_code,borough):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            ps_code, 
            br,
            name,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
            

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code',
                             'Borough',
                             'Neighborhood',
                             'Neighborhood Latitude', 
                             'Neighborhood Longitude', 
                             'Venue', 
                             'Venue Latitude', 
                             'Venue Longitude', 
                             'Venue Category']
    
    return(nearby_venues)

In [31]:
toronto_venues = getNearbyVenues(names=toronto_df_final['Neighborhood'],
                                 postal_code=toronto_df_final['PostalCode'],
                                 borough=toronto_df_final['Borough'],
                                 latitudes=toronto_df_final['Latitude'],
                                   longitudes=toronto_df_final['Longitude'],
                                   
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


In [32]:
print(toronto_venues.shape)
toronto_venues.head()

(1614, 9)


Unnamed: 0,Postal Code,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub


### Count of values returned per Postal Code

In [33]:
toronto_venues.groupby(["Postal Code", "Borough", "Neighborhood"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Borough,Neighborhood,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
M4E,East Toronto,The Beaches,5,5,5,5,5,5
M4K,East Toronto,"The Danforth West, Riverdale",43,43,43,43,43,43
M4L,East Toronto,"India Bazaar, The Beaches West",23,23,23,23,23,23
M4M,East Toronto,Studio District,42,42,42,42,42,42
M4N,Central Toronto,Lawrence Park,3,3,3,3,3,3
M4P,Central Toronto,Davisville North,8,8,8,8,8,8
M4R,Central Toronto,"North Toronto West, Lawrence Park",18,18,18,18,18,18
M4S,Central Toronto,Davisville,35,35,35,35,35,35
M4T,Central Toronto,"Moore Park, Summerhill East",1,1,1,1,1,1
M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park",16,16,16,16,16,16


In [34]:
#Unique categories of Venue from result
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 233 uniques categories.


In [35]:
#Now there is no need of Postal code and boroguh as we just need to analyse each neighborhood, so we can drop them.
toronto_venues.drop(['Postal Code','Borough'],axis=1,inplace=True)
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub


In [36]:
#Number of venues per neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,58,58,58,58,58,58
"Brockton, Parkdale Village, Exhibition Place",22,22,22,22,22,22
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",19,19,19,19,19,19
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Central Bay Street,65,65,65,65,65,65
Christie,17,17,17,17,17,17
Church and Wellesley,75,75,75,75,75,75
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,35,35,35,35,35,35
Davisville North,8,8,8,8,8,8


### Analyze each neighborhood

In [37]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

In [38]:
toronto_onehot

Unnamed: 0,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1609,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1610,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1611,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1612,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
#neighborhood column back to dataframe
toronto_onehot['Neighborhoods'] = toronto_venues['Neighborhood'] 

In [40]:
# move postal, borough and neighborhood column to the first column
fixed_columns = list(toronto_onehot.columns[-1:]) + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(1614, 234)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Next, let's group rows by neighborhood,postal code and borough by taking the mean of the frequency of occurrence of each category

In [41]:
toronto_grouped = toronto_onehot.groupby("Neighborhoods").mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped

(39, 234)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.133333,0.2,0.066667,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.015385
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.026667
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Lets check indian resturant

In [42]:
len(toronto_grouped[toronto_grouped["Indian Restaurant"] > 0])

7

In [43]:
indian_df = toronto_grouped[["Neighborhoods","Indian Restaurant"]]

In [44]:
indian_df.head()

Unnamed: 0,Neighborhoods,Indian Restaurant
0,Berczy Park,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0
2,"Business reply mail Processing Centre, South C...",0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0
4,Central Bay Street,0.015385


### Clustering Neighborhoods which contains Indian Restaurants 

In [45]:
# set number of clusters
kclusters = 3

indian_clustering = indian_df.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(indian_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 2, 0, 2, 0, 2, 0])

In [46]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
indian_clusters_merged = indian_df.copy()

# add clustering labels
indian_clusters_merged["Cluster Labels"] = kmeans.labels_

In [47]:
indian_clusters_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
indian_clusters_merged.head()

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
0,Berczy Park,0.0,0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0
2,"Business reply mail Processing Centre, South C...",0.0,0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0
4,Central Bay Street,0.015385,2


In [48]:
indian_clusters_merged = indian_clusters_merged.join(toronto_venues.set_index("Neighborhood"), on="Neighborhood")

print(indian_clusters_merged.shape)
indian_clusters_merged.head()

(1614, 9)


Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.0,0,43.644771,-79.373306,LCBO,43.642944,-79.37244,Liquor Store
0,Berczy Park,0.0,0,43.644771,-79.373306,The Keg Steakhouse + Bar - Esplanade,43.646712,-79.374768,Restaurant
0,Berczy Park,0.0,0,43.644771,-79.373306,Meridian Hall,43.646292,-79.376022,Concert Hall
0,Berczy Park,0.0,0,43.644771,-79.373306,Fresh On Front,43.647815,-79.374453,Vegetarian / Vegan Restaurant
0,Berczy Park,0.0,0,43.644771,-79.373306,Hockey Hall Of Fame (Hockey Hall of Fame),43.646974,-79.377323,Museum


In [49]:
# sort the results by Cluster Labels
print(indian_clusters_merged.shape)
indian_clusters_merged.sort_values(["Cluster Labels"], inplace=True)
indian_clusters_merged

(1614, 9)


Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.000000,0,43.644771,-79.373306,LCBO,43.642944,-79.372440,Liquor Store
25,"Richmond, Adelaide, King",0.000000,0,43.650571,-79.384568,Cafe Landwer,43.648753,-79.385367,Café
25,"Richmond, Adelaide, King",0.000000,0,43.650571,-79.384568,M Square Coffee Co,43.651218,-79.383555,Coffee Shop
25,"Richmond, Adelaide, King",0.000000,0,43.650571,-79.384568,John & Sons Oyster House,43.650656,-79.381613,Seafood Restaurant
25,"Richmond, Adelaide, King",0.000000,0,43.650571,-79.384568,Soho House Toronto,43.648734,-79.386541,Speakeasy
...,...,...,...,...,...,...,...,...,...
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,The Big Carrot Organic Juice Bar,43.677438,-79.352683,Juice Bar
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,Rikkochez,43.677267,-79.353274,Restaurant
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,7 Numbers,43.677062,-79.353934,Italian Restaurant
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,Pizzeria Libretto,43.678489,-79.347576,Pizza Place


In [51]:
# create map of all locations which contains Indian Restaurants. 
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(indian_clusters_merged['Neighborhood Latitude'], indian_clusters_merged['Neighborhood Longitude'], indian_clusters_merged['Neighborhood'], indian_clusters_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster))
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [53]:
#Cluster 0
indian_clusters_merged.loc[indian_clusters_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.0,0,43.644771,-79.373306,LCBO,43.642944,-79.372440,Liquor Store
25,"Richmond, Adelaide, King",0.0,0,43.650571,-79.384568,Cafe Landwer,43.648753,-79.385367,Café
25,"Richmond, Adelaide, King",0.0,0,43.650571,-79.384568,M Square Coffee Co,43.651218,-79.383555,Coffee Shop
25,"Richmond, Adelaide, King",0.0,0,43.650571,-79.384568,John & Sons Oyster House,43.650656,-79.381613,Seafood Restaurant
25,"Richmond, Adelaide, King",0.0,0,43.650571,-79.384568,Soho House Toronto,43.648734,-79.386541,Speakeasy
...,...,...,...,...,...,...,...,...,...
11,"First Canadian Place, Underground city",0.0,0,43.648429,-79.382280,The Cambridge Club,43.651663,-79.383075,Gym
11,"First Canadian Place, Underground city",0.0,0,43.648429,-79.382280,Old City Hall,43.652009,-79.381744,Monument / Landmark
11,"First Canadian Place, Underground city",0.0,0,43.648429,-79.382280,Movenpick Cafe,43.647687,-79.377295,Café
11,"First Canadian Place, Underground city",0.0,0,43.648429,-79.382280,Sweet Lulu,43.650557,-79.381175,Asian Restaurant


In [54]:
#Cluster 1
indian_clusters_merged.loc[indian_clusters_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Haute Coffee,43.675818,-79.402793,Café
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Relative Space,43.673738,-79.411197,Furniture / Home Store
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Shoppers Drug Mart,43.674959,-79.407986,Pharmacy
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Pour House,43.675641,-79.403821,Pub
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Subway,43.67565,-79.410255,Sandwich Place
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Martino's Pizza,43.67556,-79.403558,Pizza Place
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Roti Cuisine of India,43.674618,-79.408249,Indian Restaurant
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Live Organic Food Bar,43.675053,-79.406715,Vegetarian / Vegan Restaurant
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Madame Boeuf And Flea,43.67524,-79.40662,Burger Joint
34,"The Annex, North Midtown, Yorkville",0.047619,1,43.67271,-79.405678,Big Crow,43.675896,-79.40368,BBQ Joint


In [55]:
#Cluster 2
indian_clusters_merged.loc[indian_clusters_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
6,Church and Wellesley,0.013333,2,43.665860,-79.383160,Storm Crow Manor,43.666840,-79.381593,Theme Restaurant
6,Church and Wellesley,0.013333,2,43.665860,-79.383160,Constantine,43.668773,-79.385287,Mediterranean Restaurant
6,Church and Wellesley,0.013333,2,43.665860,-79.383160,Rolltation,43.669388,-79.386566,Sushi Restaurant
8,Davisville,0.028571,2,43.704324,-79.388790,Thobors Boulangerie Patisserie Café,43.704514,-79.388616,Café
8,Davisville,0.028571,2,43.704324,-79.388790,Jules Cafe Patisserie,43.704138,-79.388413,Dessert Shop
...,...,...,...,...,...,...,...,...,...
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,The Big Carrot Organic Juice Bar,43.677438,-79.352683,Juice Bar
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,Rikkochez,43.677267,-79.353274,Restaurant
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,7 Numbers,43.677062,-79.353934,Italian Restaurant
36,"The Danforth West, Riverdale",0.023256,2,43.679557,-79.352188,Pizzeria Libretto,43.678489,-79.347576,Pizza Place


### Observations 

AS we can see from the resulting clusters, most Indian restaurants are located in Cluster 0 around Berczy Park, Richmond, Adelaide, King, etc., where as the least are located in Cluster 1 around The Annex, North Midtown and Yorkville. We can see from the map also that there are abundance of restaurants in Cluster 0 and Cluster 2 where as there is only few locations in Cluster 1. Thus from our observations, Cluster 1 would be the good area to open an Indian restaurant as there are not a lot of them. Thus through our model, we will recommend location around The Annex, North Midtown and Yorkville i.e., CLuster 0 to open a fine Indian restaurant and if the food quality and ambience is good then for sure it will do a good business.