<h1><center>Neighborhood Clustering for Entertainment Site</center></h1>

**<center>AMIT GUPTA</center>**

**<center>31st July, 2021</center>**

# 1. Introduction

We have a client who has a business of restaurants and cafe's. He is expecting to spread his/her branch to several other places in Toronto (Ontario,Canada). We are asked to build a map where we will be able to see which places are better and can help our client to earn more and more profit too.  

# 2. Problem Statement

With the help of this project we will be determining the places in Toronto (Ontario, Canada) that are most suitable for opening any restaurant or café, so that our client can have more and more profit with permanent customer’s availability in a specific area.

# 3. Data Description

We will be working with the data comprised of area code, neighborhood, and borough. Later on we will be using the area code from this data with foursquare API to determine the fun venues around. As a result, we will be getting a dataset with area code, borough, neighborhood, latitude, longitude, and venues for every particular area code.

## 3.1 Toronto (Ontario, Canada)

To derive our solution, We scrape our data from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M <br>

This wikipedia page has information about all the neighbourhoods in Toronto.<br>

 1.neighborhood : Name of area<br>
 2.town : Name of borough<br>
 3.post_code : Postal codes for Toronto<br>
This wikipedia page lacks information about the geographical locations. To solve this problem we use geospatial dataset.

## 3.2 Geospatial Dataset

Geospatial Data is the csv file that comprises of :<br>
 1. Postal Codes of toronto<br>
 2. Co ordinates of every Postal Code.

## 3.3 Foursquare API

We will need data about different venues in different neighbourhoods of that specific borough. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.<br>

After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters.<br>

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:<br>

 1. Neighbourhood : Name of the Neighbourhood<br>
 2. Neighbourhood Latitude : Latitude of the Neighbourhood<br>
 3. Neighbourhood Longitude : Longitude of the Neighbourhood<br>
 4. Venue : Name of the Venue<br>
 5. Venue Latitude : Latitude of Venue<br>
 6. Venue Longitude : Longitude of Venue<br>
 7. Venue Category : Category of Venue<br>
 
Based on the above information, we now have sufficient data for every postal codes of Toronto. We will use this data to create a cluster model based on similar venue categories. Now we will be presenting our findings to our client and based on which he/she can take necessary decisions.

# 4. Methodology

To start our project with python, we will be importing various libraries :<br>

 1. Pandas          : To read and manipulate data from json and html then do data analysis.<br>
 2. Numpy           : To manipulate mathematically the columns that are kind of arrays.<br>
 3. json            : To fetch json file.<br>
 4. geopy           : To fetch the cordinates of places.<br>
 5. matplotlib      : To feature details in the map.<br>
 6. sklearn.cluster : To create a clustering model.<br>
 7. folium          : To create a map according to the coordinates.<br>
 8. requests        : To handle http request.

In [107]:
import pandas as pd
import numpy as np
import json
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import requests

## 4.1 Fetching the data from wikipedia

In [108]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [109]:
df1 = df[0]

In [110]:
df1

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,M1ANot assigned,M2ANot assigned,M3ANorth York(Parkwoods),M4ANorth York(Victoria Village),M5ADowntown Toronto(Regent Park / Harbourfront),M6ANorth York(Lawrence Manor / Lawrence Heights),M7AQueen's Park(Ontario Provincial Government),M8ANot assigned,M9AEtobicoke(Islington Avenue)
1,M1BScarborough(Malvern / Rouge),M2BNot assigned,M3BNorth York(Don Mills)North,M4BEast York(Parkview Hill / Woodbine Gardens),"M5BDowntown Toronto(Garden District, Ryerson)",M6BNorth York(Glencairn),M7BNot assigned,M8BNot assigned,M9BEtobicoke(West Deane Park / Princess Garden...
2,M1CScarborough(Rouge Hill / Port Union / Highl...,M2CNot assigned,M3CNorth York(Don Mills)South(Flemingdon Park),M4CEast York(Woodbine Heights),M5CDowntown Toronto(St. James Town),M6CYork(Humewood-Cedarvale),M7CNot assigned,M8CNot assigned,M9CEtobicoke(Eringate / Bloordale Gardens / Ol...
3,M1EScarborough(Guildwood / Morningside / West ...,M2ENot assigned,M3ENot assigned,M4EEast Toronto(The Beaches),M5EDowntown Toronto(Berczy Park),M6EYork(Caledonia-Fairbanks),M7ENot assigned,M8ENot assigned,M9ENot assigned
4,M1GScarborough(Woburn),M2GNot assigned,M3GNot assigned,M4GEast York(Leaside),M5GDowntown Toronto(Central Bay Street),M6GDowntown Toronto(Christie),M7GNot assigned,M8GNot assigned,M9GNot assigned
5,M1HScarborough(Cedarbrae),M2HNorth York(Hillcrest Village),M3HNorth York(Bathurst Manor / Wilson Heights ...,M4HEast York(Thorncliffe Park),M5HDowntown Toronto(Richmond / Adelaide / King),M6HWest Toronto(Dufferin / Dovercourt Village),M7HNot assigned,M8HNot assigned,M9HNot assigned
6,M1JScarborough(Scarborough Village),M2JNorth York(Fairview / Henry Farm / Oriole),M3JNorth York(Northwood Park / York University),M4JEast YorkEast Toronto(The Danforth East),M5JDowntown Toronto(Harbourfront East / Union ...,M6JWest Toronto(Little Portugal / Trinity),M7JNot assigned,M8JNot assigned,M9JNot assigned
7,M1KScarborough(Kennedy Park / Ionview / East B...,M2KNorth York(Bayview Village),M3KNorth York(Downsview)East (CFB Toronto),M4KEast Toronto(The Danforth West / Riverdale),M5KDowntown Toronto(Toronto Dominion Centre / ...,M6KWest Toronto(Brockton / Parkdale Village / ...,M7KNot assigned,M8KNot assigned,M9KNot assigned
8,M1LScarborough(Golden Mile / Clairlea / Oakridge),M2LNorth York(York Mills / Silver Hills),M3LNorth York(Downsview)West,M4LEast Toronto(India Bazaar / The Beaches West),M5LDowntown Toronto(Commerce Court / Victoria ...,M6LNorth York(North Park / Maple Leaf Park / U...,M7LNot assigned,M8LNot assigned,M9LNorth York(Humber Summit)
9,M1MScarborough(Cliffside / Cliffcrest / Scarbo...,M2MNorth York(Willowdale / Newtonbrook),M3MNorth York(Downsview)Central,M4MEast Toronto(Studio District),M5MNorth York(Bedford Park / Lawrence Manor East),M6MYork(Del Ray / Mount Dennis / Keelsdale and...,M7MNot assigned,M8MNot assigned,M9MNorth York(Humberlea / Emery)


## 4.2 Creating a proper dataframe with specific columns and cleaning it 

In [111]:
data1 = pd.DataFrame(columns= ['Code','City'])

In [112]:
for i in range(len(df1)):
    for j in range(len(df1.columns)):
        p = df1.iat[i,j]
        if p[3:] == 'Not assigned':
            continue
        else:
            nr = {'Code':p[0:3], 'City':p[3:]}
            data1 = data1.append(nr , ignore_index = True)

In [113]:
data1

Unnamed: 0,Code,City
0,M3A,North York(Parkwoods)
1,M4A,North York(Victoria Village)
2,M5A,Downtown Toronto(Regent Park / Harbourfront)
3,M6A,North York(Lawrence Manor / Lawrence Heights)
4,M7A,Queen's Park(Ontario Provincial Government)
...,...,...
98,M8X,Etobicoke(The Kingsway / Montgomery Road / Old...
99,M4Y,Downtown Toronto(Church and Wellesley)
100,M7Y,East TorontoBusiness reply mail Processing Cen...
101,M8Y,Etobicoke(Old Mill South / King's Mill Park / ...


In [114]:
data1['Borough'] = data1['City'].str.split('(').str[0]

In [115]:
data1

Unnamed: 0,Code,City,Borough
0,M3A,North York(Parkwoods),North York
1,M4A,North York(Victoria Village),North York
2,M5A,Downtown Toronto(Regent Park / Harbourfront),Downtown Toronto
3,M6A,North York(Lawrence Manor / Lawrence Heights),North York
4,M7A,Queen's Park(Ontario Provincial Government),Queen's Park
...,...,...,...
98,M8X,Etobicoke(The Kingsway / Montgomery Road / Old...,Etobicoke
99,M4Y,Downtown Toronto(Church and Wellesley),Downtown Toronto
100,M7Y,East TorontoBusiness reply mail Processing Cen...,East TorontoBusiness reply mail Processing Cen...
101,M8Y,Etobicoke(Old Mill South / King's Mill Park / ...,Etobicoke


In [116]:
data1['Neighborhood'] = data1['City'].str.split('(').str[1]
data1

Unnamed: 0,Code,City,Borough,Neighborhood
0,M3A,North York(Parkwoods),North York,Parkwoods)
1,M4A,North York(Victoria Village),North York,Victoria Village)
2,M5A,Downtown Toronto(Regent Park / Harbourfront),Downtown Toronto,Regent Park / Harbourfront)
3,M6A,North York(Lawrence Manor / Lawrence Heights),North York,Lawrence Manor / Lawrence Heights)
4,M7A,Queen's Park(Ontario Provincial Government),Queen's Park,Ontario Provincial Government)
...,...,...,...,...
98,M8X,Etobicoke(The Kingsway / Montgomery Road / Old...,Etobicoke,The Kingsway / Montgomery Road / Old Mill North)
99,M4Y,Downtown Toronto(Church and Wellesley),Downtown Toronto,Church and Wellesley)
100,M7Y,East TorontoBusiness reply mail Processing Cen...,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L)
101,M8Y,Etobicoke(Old Mill South / King's Mill Park / ...,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


In [117]:
data1['Neighborhood'] = data1['Neighborhood'].str.split(')').str[0]
data1

Unnamed: 0,Code,City,Borough,Neighborhood
0,M3A,North York(Parkwoods),North York,Parkwoods
1,M4A,North York(Victoria Village),North York,Victoria Village
2,M5A,Downtown Toronto(Regent Park / Harbourfront),Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York(Lawrence Manor / Lawrence Heights),North York,Lawrence Manor / Lawrence Heights
4,M7A,Queen's Park(Ontario Provincial Government),Queen's Park,Ontario Provincial Government
...,...,...,...,...
98,M8X,Etobicoke(The Kingsway / Montgomery Road / Old...,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto(Church and Wellesley),Downtown Toronto,Church and Wellesley
100,M7Y,East TorontoBusiness reply mail Processing Cen...,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L
101,M8Y,Etobicoke(Old Mill South / King's Mill Park / ...,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


In [118]:
d = data1['Neighborhood'].str.split(')').str[1]
d.isna().sum()

103

In [119]:
data1.drop(data1.columns[[1]], axis =1 , inplace = True)
data1

Unnamed: 0,Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


In [120]:
data1['Neighborhood'] = data1['Neighborhood'].str.replace('/',',')
data1

Unnamed: 0,Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,..."


In [121]:
data_1 = data1.rename(columns = {'Code':'Postal Code'})

In [122]:
data_1['Borough']=data_1['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

## 4.3 Cleaned Dataset with needed columns

In [123]:
data_1

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,..."


## 4.4 Joining the co-ordinates data with the postal code data to form a full data.

In [124]:
data2 = pd.read_csv('C:/Users/AMIT/Downloads/Geospatial_Coordinates.csv')

In [125]:
data_1 = data_1.join(data2.set_index('Postal Code'), on = 'Postal Code')

In [126]:
data_1

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,...",43.636258,-79.498509


## 4.5 Fetching the latitude and longitude of toronto using geopy object

In [127]:
geolocator = Nominatim(user_agent = 'Tourist')
location = geolocator.geocode('Toronto, Canada')
lat = location.latitude
lng = location.longitude
print('The coordinates of Toronto are {} and {}'.format(lat,lng))

The coordinates of Toronto are 43.6534817 and -79.3839347


## 4.6 Displaying the map of toronto with the postal codes of neighborhood in toronto

In [128]:
toronto_map = folium.Map(location = [lat,lng] , zoom_start =10)
for lat1,lng1,neigh,bor in zip(data_1['Latitude'],data_1['Longitude'], data_1['Neighborhood'], data_1['Borough']):
    mark_top = "{}, {}".format(neigh,bor)
    mark_top = folium.Popup(mark_top, parse_html = True)
    folium.CircleMarker([lat1,lng1], radius = 5, popup = mark_top, color = 'blue', fill = True, fill_color = 'grey',
                       fill_opacity = 0.2, parse_html = False).add_to(toronto_map)


In [129]:
toronto_map

## 4.7 Using foursqaure API to get the venues in 500 m radius

### 4.7.1 Credentials to connect with Foursquare API.

In [130]:
CLIENT_ID = '1OSIV2ZGA0BQYSGE1GOOYE0WXRAX5IMWG33V1NV4XPKCACSP'
CLIENT_SECRET = 'RS0ZSWRVIHWFLHQIQFG2XQZ2ZWIPPTH3TNN2AAOWQ55F25T0'
VERSION = '20180605'
LIMIT = 100 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1OSIV2ZGA0BQYSGE1GOOYE0WXRAX5IMWG33V1NV4XPKCACSP
CLIENT_SECRET:RS0ZSWRVIHWFLHQIQFG2XQZ2ZWIPPTH3TNN2AAOWQ55F25T0


### 4.7.2 Calling an API through url to fetch venues for every neighborhod.

In [131]:
def getvenues(name, lati, lngi, radius = 500):
    ven_list = []
    for code, lat,lng in zip(name,lati,lngi):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
        result = requests.get(url).json()["response"]['groups'][0]['items']
        ven_list.append([(code, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in result])
    nearby_venues = pd.DataFrame([item for venue_list in ven_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 
                  'Venue Longitude', 'Venue Category']
    return(nearby_venues)

In [132]:
toronto_venues = getvenues(data_1['Postal Code'],data_1['Latitude'],data_1['Longitude'])

In [133]:
toronto_venues

Unnamed: 0,Postal Code,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,M3A,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,M3A,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,M3A,43.753259,-79.329656,GreenWin pool,43.756232,-79.333842,Pool
4,M4A,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
...,...,...,...,...,...,...,...
2118,M8Z,43.628841,-79.520999,Subway,43.631659,-79.519001,Sandwich Place
2119,M8Z,43.628841,-79.520999,Jim & Maria's No Frills,43.631152,-79.518617,Grocery Store
2120,M8Z,43.628841,-79.520999,Islington Florist & Nursery,43.630156,-79.518718,Flower Shop
2121,M8Z,43.628841,-79.520999,Koala Tan Tanning Salon & Sunless Spa,43.631370,-79.519006,Tanning Salon


### 4.7.3 For every postal code, how many venues are there.

In [134]:
toronto_venues.groupby('Postal Code').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1B,1,1,1,1,1,1
M1C,1,1,1,1,1,1
M1E,9,9,9,9,9,9
M1G,3,3,3,3,3,3
M1H,8,8,8,8,8,8
...,...,...,...,...,...,...
M9N,2,2,2,2,2,2
M9P,9,9,9,9,9,9
M9R,4,4,4,4,4,4
M9V,10,10,10,10,10,10


## 4.8 Using encoding technique on venues

In [135]:
toronto_data = pd.get_dummies(toronto_venues[['Venue Category']], prefix = "", prefix_sep = "")
toronto_data['Postal Code'] = toronto_venues['Postal Code']
toronto_data = toronto_data[[toronto_data.columns[-1]] + list(toronto_data.columns[:-1])]

In [136]:
toronto_data

Unnamed: 0,Postal Code,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2118,M8Z,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2119,M8Z,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2120,M8Z,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2121,M8Z,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## 4.9 Calculating the occurences of venues for particular postal code

In [137]:
toronto_data1 = toronto_data.groupby('Postal Code').mean().reset_index()
toronto_data1

Unnamed: 0,Postal Code,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,M9N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
96,M9P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
97,M9R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
98,M9V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 4.10 Displaying the top 5 nearby venues for every postal code.

In [138]:
for code in toronto_data1['Postal Code']:
    print("###############"+code+"###############")
    d = toronto_data1[toronto_data1['Postal Code'] == code].T.reset_index()
    d.columns = ['Venue','Value']
    d = d.iloc[1:]
    d['Value'] = d['Value'].astype(float)
    d = d.round({'Value':2})
    print(d.sort_values('Value' , ascending = False).reset_index(drop = True).head(5))
    print("\n")

###############M1B###############
                             Venue  Value
0             Fast Food Restaurant    1.0
1                    Metro Station    0.0
2  Molecular Gastronomy Restaurant    0.0
3       Modern European Restaurant    0.0
4                Mobile Phone Shop    0.0


###############M1C###############
                             Venue  Value
0                              Bar    1.0
1                Accessories Store    0.0
2               Mexican Restaurant    0.0
3  Molecular Gastronomy Restaurant    0.0
4       Modern European Restaurant    0.0


###############M1E###############
                 Venue  Value
0       Medical Center   0.11
1   Mexican Restaurant   0.11
2         Intersection   0.11
3           Restaurant   0.11
4  Rental Car Location   0.11


###############M1G###############
                             Venue  Value
0                      Coffee Shop   0.67
1            Korean BBQ Restaurant   0.33
2                Accessories Store   0.00
3     

             Venue  Value
0      Coffee Shop   0.08
1   Clothing Store   0.05
2             Café   0.05
3       Restaurant   0.04
4  Thai Restaurant   0.04


###############M5J###############
         Venue  Value
0  Coffee Shop   0.13
1     Aquarium   0.05
2         Café   0.04
3        Hotel   0.04
4   Restaurant   0.03


###############M5K###############
         Venue  Value
0  Coffee Shop   0.11
1        Hotel   0.07
2         Café   0.06
3   Restaurant   0.04
4       Bakery   0.03


###############M5L###############
         Venue  Value
0  Coffee Shop   0.14
1   Restaurant   0.07
2        Hotel   0.06
3         Café   0.06
4          Gym   0.04


###############M5M###############
                Venue  Value
0  Italian Restaurant   0.09
1      Sandwich Place   0.09
2         Coffee Shop   0.09
3    Toy / Game Store   0.04
4             Butcher   0.04


###############M5N###############
                             Venue  Value
0                           Garden    1.0
1         

## 4.11 Creating the dataset with top 10 nearby venues for every postal code.

In [139]:
def get_most_common_venues(row, n=10):
    categ = row.iloc[1:]
    categ_1 = categ.sort_values(ascending = False)
    return(categ_1.index.values[0:n])

In [140]:
columns = ['Postal Code']
for i in np.arange(10):
    columns.append('Common_{}'.format(i+1))
toronto_final = pd.DataFrame(columns = columns)
toronto_final['Postal Code'] = toronto_data1['Postal Code']
for i in np.arange(toronto_data1.shape[0]):
    toronto_final.iloc[i,1:] = get_most_common_venues(toronto_data1.iloc[i,:])

In [141]:
toronto_final

Unnamed: 0,Postal Code,Common_1,Common_2,Common_3,Common_4,Common_5,Common_6,Common_7,Common_8,Common_9,Common_10
0,M1B,Fast Food Restaurant,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Farm
1,M1C,Bar,Yoga Studio,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Farmers Market
2,M1E,Restaurant,Medical Center,Bank,Intersection,Mexican Restaurant,Donut Shop,Rental Car Location,Electronics Store,Breakfast Spot,Dumpling Restaurant
3,M1G,Coffee Shop,Korean BBQ Restaurant,Yoga Studio,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
4,M1H,Caribbean Restaurant,Gas Station,Bank,Fried Chicken Joint,Athletics & Sports,Thai Restaurant,Bakery,Hakka Restaurant,Electronics Store,Eastern European Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
95,M9N,Park,Convenience Store,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
96,M9P,Pizza Place,Intersection,Discount Store,Middle Eastern Restaurant,Chinese Restaurant,Coffee Shop,Playground,Sandwich Place,Yoga Studio,Dog Run
97,M9R,Park,Mobile Phone Shop,Bus Line,Sandwich Place,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio
98,M9V,Grocery Store,Beer Store,Pizza Place,Discount Store,Sandwich Place,Coffee Shop,Fried Chicken Joint,Pharmacy,Fast Food Restaurant,Eastern European Restaurant


## 4.12 Creating a clustering model

In [142]:
toronto_clus = toronto_data1.drop('Postal Code',1)

In [143]:
kclus = 4
toro_mod = KMeans(n_clusters = kclus, random_state = 0).fit(toronto_clus)

In [144]:
toro_mod.labels_[0:10]

array([3, 3, 3, 3, 3, 1, 3, 3, 3, 3])

In [145]:
toronto_final['Cluster name'] = toro_mod.labels_ 

## 4.13 Joining the actual data of toronto(latitude and longitude) with the top 10 nearby venues of every postal code.

In [146]:
toronto_real = data_1
toronto_real = toronto_real.merge(toronto_final, on = 'Postal Code',how = 'outer')
toronto_real

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Common_1,Common_2,Common_3,Common_4,Common_5,Common_6,Common_7,Common_8,Common_9,Common_10,Cluster name
0,M3A,North York,Parkwoods,43.753259,-79.329656,Park,Fast Food Restaurant,Food & Drink Shop,Pool,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dessert Shop,0.0
1,M4A,North York,Victoria Village,43.725882,-79.315572,Portuguese Restaurant,French Restaurant,Coffee Shop,Hockey Arena,Yoga Studio,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,3.0
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.654260,-79.360636,Coffee Shop,Park,Pub,Bakery,Theater,Café,Breakfast Spot,Yoga Studio,Bank,French Restaurant,3.0
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763,Furniture / Home Store,Clothing Store,Accessories Store,Gift Shop,Event Space,Athletics & Sports,Coffee Shop,Boutique,Vietnamese Restaurant,Creperie,3.0
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,Coffee Shop,Sushi Restaurant,Café,Yoga Studio,Bar,Beer Bar,Spa,Smoothie Shop,Sandwich Place,Burrito Place,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North",43.653654,-79.506944,Park,River,Colombian Restaurant,Comfort Food Restaurant,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,0.0
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Gay Bar,Restaurant,Yoga Studio,Hotel,Men's Store,Pub,Mediterranean Restaurant,3.0
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Light Rail Station,Comic Shop,Pizza Place,3.0
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,...",43.636258,-79.498509,Baseball Field,Yoga Studio,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Farmers Market,2.0


In [147]:
toronto_real1 = toronto_real[toronto_real['Cluster name'].notna()]
toronto_real1['Cluster name'] = toronto_real1['Cluster name'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


## 4.14 Creating a map of toronto with the clusters having same locality 

In [148]:
tor_map = folium.Map(location = [lat,lng], zoom_start = 10)

In [149]:
x = np.arange(4)
y = [i + x + (i*x)**2 for i in range(4)]
col_arr = cm.rainbow(np.linspace(0,1,len(y)))
rain_col = [colors.rgb2hex(i) for i in col_arr]

In [150]:
for lat,lng, nei, grp in zip(toronto_real1['Latitude'], toronto_real1['Longitude'], toronto_real1['Neighborhood'], toronto_real1['Cluster name']):
    mark = folium.Popup(str(nei) +'Cluster'+ str(grp), parse_html = True)
    folium.CircleMarker([lat,lng], radius = 5, popup = mark , color = rain_col[grp-1], fill = True, fill_color = rain_col[grp-1],
                        fill_opacity = 0.5).add_to(tor_map)
tor_map

## 4.15 Examining the clusters.

In [151]:
toronto_real1.loc[toronto_real1['Cluster name'] == 0, toronto_real1.columns[[1] + list(range(4, toronto_real1.shape[1]-1))]]

Unnamed: 0,Borough,Longitude,Common_1,Common_2,Common_3,Common_4,Common_5,Common_6,Common_7,Common_8,Common_9,Common_10
0,North York,-79.329656,Park,Fast Food Restaurant,Food & Drink Shop,Pool,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dessert Shop
21,York,-79.453512,Park,Pool,Women's Store,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
35,East York/East Toronto,-79.338106,Park,Convenience Store,Coffee Shop,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore
40,North York,-79.464763,Other Repair Shop,Snack Place,Park,Airport,Colombian Restaurant,Diner,Fabric Shop,Event Space,Ethiopian Restaurant,Escape Room
52,North York,-79.408493,Park,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Colombian Restaurant
61,Central Toronto,-79.38879,Park,Bus Line,Swim School,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Diner
64,York,-79.518188,Park,Convenience Store,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
66,North York,-79.400049,Park,Convenience Store,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
68,Central Toronto,-79.411307,Jewelry Store,Trail,Park,Sushi Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore
77,Etobicoke,-79.554724,Park,Mobile Phone Shop,Bus Line,Sandwich Place,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio


In [152]:
toronto_real1.loc[toronto_real1['Cluster name'] == 1, toronto_real1.columns[[1] + list(range(4, toronto_real1.shape[1]-1))]]

Unnamed: 0,Borough,Longitude,Common_1,Common_2,Common_3,Common_4,Common_5,Common_6,Common_7,Common_8,Common_9,Common_10
32,Scarborough,-79.239476,Playground,Yoga Studio,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant


In [153]:
toronto_real1.loc[toronto_real1['Cluster name'] == 2, toronto_real1.columns[[1] + list(range(4, toronto_real1.shape[1]-1))]]

Unnamed: 0,Borough,Longitude,Common_1,Common_2,Common_3,Common_4,Common_5,Common_6,Common_7,Common_8,Common_9,Common_10
53,North York,-79.495697,Home Service,Baseball Field,Food Truck,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Yoga Studio
57,North York,-79.532242,Fabric Shop,Baseball Field,Yoga Studio,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
101,Etobicoke,-79.498509,Baseball Field,Yoga Studio,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Farmers Market


In [154]:
toronto_real1.loc[toronto_real1['Cluster name'] == 3, toronto_real1.columns[[1] + list(range(4, toronto_real1.shape[1]-1))]]

Unnamed: 0,Borough,Longitude,Common_1,Common_2,Common_3,Common_4,Common_5,Common_6,Common_7,Common_8,Common_9,Common_10
1,North York,-79.315572,Portuguese Restaurant,French Restaurant,Coffee Shop,Hockey Arena,Yoga Studio,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
2,Downtown Toronto,-79.360636,Coffee Shop,Park,Pub,Bakery,Theater,Café,Breakfast Spot,Yoga Studio,Bank,French Restaurant
3,North York,-79.464763,Furniture / Home Store,Clothing Store,Accessories Store,Gift Shop,Event Space,Athletics & Sports,Coffee Shop,Boutique,Vietnamese Restaurant,Creperie
4,Queen's Park,-79.389494,Coffee Shop,Sushi Restaurant,Café,Yoga Studio,Bar,Beer Bar,Spa,Smoothie Shop,Sandwich Place,Burrito Place
6,Scarborough,-79.194353,Fast Food Restaurant,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Farm
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Downtown Toronto,-79.367675,Pizza Place,Coffee Shop,Bakery,Restaurant,Café,Park,Italian Restaurant,Pub,Indian Restaurant,Chinese Restaurant
97,Downtown Toronto,-79.382280,Coffee Shop,Café,Hotel,Japanese Restaurant,Gym,Restaurant,Steakhouse,American Restaurant,Bakery,Asian Restaurant
99,Downtown Toronto,-79.383160,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Gay Bar,Restaurant,Yoga Studio,Hotel,Men's Store,Pub,Mediterranean Restaurant
100,East Toronto Business,-79.321558,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Light Rail Station,Comic Shop,Pizza Place


# 5. Results and Discussion

The neighbourhoods of Toronto are diverse in culture. Here we can find all the kind of entertainment sites where restaurants like Indian, Chineese, Italian, Thai, Japanese etc. Here we can also find juice bars, bars, coffee shops, breakfast stops, grocery store, fish market, farmer market,pubs,  etc. The main modes of transportations are buses and trains. For an outdoor entertainment of people we do have parks ,zoos, yoga studio, hotel, flower shops, gym, spa, salon, etc.

# 6. Conclusion

The purpose of this project was to explore the similar neighbourhoods with venues within 500 m radius. We created a map with points pointing to the neighborhood with similar type of venues nearby.<br>

With clustering we did, we came to know that neighborhoods in Cluster-1 and Cluster-2 are limited. Hence, we can recommend our client to open their restaurant/cafe branch in these areas as these neighbourhoods are less in venues occurence as compared to the neighborhoods in the other cluster i.e., Cluster-0 and Cluster-3. 

# 7. References

 1. The Battle of Neighborhood's - Applied Data Science Capstone Course(Week 3 and Week 4).
 2. Foursquare API.