# Battle of Neighborhood - Toronto

# Introduction

The report is part of the IBM Applied Data Science Specialization Capstone Project. The main objectives of this project were to define a business problem, look for data on the web and use Foursquare location data to compare to different neighborhoods of Toronto to figure out which neighborhood is suitable for starting a new restaurant business.

# Business Problem

In this capstone project, we will analyze the neighborhoods in Toronto to identify the most profitable neighborhood for opening an Indian Restaurant, by using Web Scraping, Data Pre-processing, Machine learning algorithms like K-Means clustering algorithm, and Foursquare API Service.

# Target Audience

•	The business owner who wants to invest or open a start-up company or restaurant. 

•	The freelancer who loves to have their own small company or restaurant as a side business.

•	Indian crowd who wants to find neighborhoods with lots of options for Indian restaurants.

•	Tourists who want to eat Indian food.


# Data Sources

1.	Toronto City Neighbourhoods Data –https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
2.	Geographical Coordinates of the Neighbourhoods – https://cocl.us/Geospatial_data
3.	Location Data of Neighbourhood – Foursquare API Services


# Methodology

# Installing packages

In [1]:
!pip install geopy
!pip install folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 7.8 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


# Importing Libraries and Packages

In [2]:
import pandas as pd
import numpy as np
import json
import requests
from geopy.geocoders import Nominatim
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup

# Scraping Neighborhood Data

In [3]:
url_data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(url_data,'html5lib')

# Creating Dataframe - Neighborhood

In [4]:
content = []
table = soup.find('table')
for row in table.findAll('td'):
    data = {}
    if row.span.text == 'Not assigned':
        pass
    else:
        data['Postal Code'] = row.p.text[:3]
        data['Borough'] = (row.span.text).split('(')[0]
        data['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        content.append(data)
df = pd.DataFrame(content)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [5]:
df_coord = pd.read_csv('https://cocl.us/Geospatial_data')
df_coord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [6]:
df_toronto = pd.merge(df,df_coord,on='Postal Code')
df_toronto.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [7]:
df_only_toronto = df_toronto[df_toronto['Borough'].str.contains(pat='Toronto')]
df_only_toronto.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259
35,M4J,East York/East Toronto,The Danforth East,43.685347,-79.338106


# Usage of Foursquare API

# Coordinates of Toronto

In [8]:
latitude = df_only_toronto['Latitude'].mean()
longitude = df_only_toronto['Longitude'].mean()
print(latitude,longitude)

43.66772589743589 -79.38855562564103


# Foursquare Credentials

In [9]:
client_id = 'N4VXC5H2HFLE0NQ54UXW1G1VKDEXVXNTQOPLWR2JC1S5YWTL'
client_secret = 'K15RGMLUGOFVCTWAAHHGR40MQZTC34XLYUGISYONUP2ZSEVO'
version = '20210605'
limit = 100
radius = 500

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
toronto_restaurant_venues = getNearbyVenues(names= df_only_toronto['Neighborhood'],latitudes = df_only_toronto['Latitude'],longitudes = df_only_toronto['Longitude'])

Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Enclave of M5E
St. James Town, Cabbagetown
First Canadi

In [12]:
toronto_restaurant_venues.shape

(1589, 7)

In [13]:
toronto_restaurant_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [14]:
toronto_restaurant_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,55,55,55,55,55,55
"Brockton, Parkdale Village, Exhibition Place",24,24,24,24,24,24
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Central Bay Street,66,66,66,66,66,66
Christie,16,16,16,16,16,16
Church and Wellesley,80,80,80,80,80,80
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,36,36,36,36,36,36
Davisville North,9,9,9,9,9,9
"Dufferin, Dovercourt Village",14,14,14,14,14,14


# Types of Venues

In [15]:
toronto_restaurant_venues['Venue Category'].unique()

array(['Bakery', 'Coffee Shop', 'Distribution Center', 'Spa',
       'Restaurant', 'Park', 'Pub', 'Breakfast Spot',
       'Gym / Fitness Center', 'Historic Site', 'Chocolate Shop',
       'Farmers Market', 'Performing Arts Venue', 'Dessert Shop',
       'Mexican Restaurant', 'French Restaurant', 'Yoga Studio',
       'Shoe Store', 'Theater', 'Café', 'Event Space',
       'Electronics Store', 'Art Gallery', 'Bank', 'Beer Store',
       'Wine Shop', 'Antique Shop', 'Clothing Store', 'Pizza Place',
       'Comic Shop', 'Plaza', 'Burger Joint', 'Music Venue',
       'Burrito Place', 'Sandwich Place', 'Sporting Goods Shop',
       'Ramen Restaurant', 'Steakhouse', 'Movie Theater', 'Shopping Mall',
       'Tanning Salon', 'Japanese Restaurant', 'Diner', 'Bookstore',
       'Fast Food Restaurant', 'New American Restaurant', 'Gastropub',
       'Hotel', 'College Rec Center', 'Thai Restaurant',
       'Sushi Restaurant', 'Modern European Restaurant', 'Cosmetics Shop',
       'Miscellaneous Sho

In [16]:
# one hot encoding
indian_restaurant_onehot = pd.get_dummies(toronto_restaurant_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
indian_restaurant_onehot['Neighborhood'] = toronto_restaurant_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [indian_restaurant_onehot.columns[-1]] + list(indian_restaurant_onehot.columns[:-1])
indian_restaurant_onehot = indian_restaurant_onehot[fixed_columns]

indian_restaurant_onehot.shape

(1589, 236)

In [17]:
indian_restaurant_grouped = indian_restaurant_onehot.groupby('Neighborhood').mean().reset_index()
indian_restaurant_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.066667,0.066667,0.066667,0.133333,0.133333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.015152,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
indian_restaurant_grouped['Indian Restaurant'].unique()

array([0.        , 0.01515152, 0.0125    , 0.02777778, 0.01      ,
       0.02325581, 0.05263158, 0.02380952])

In [19]:
indian_restaurant = indian_restaurant_grouped[['Neighborhood','Indian Restaurant']]
#indian_restaurant = indian_restaurant[indian_restaurant['Indian Restaurant']>0]
indian_restaurant.head(10)

Unnamed: 0,Neighborhood,Indian Restaurant
0,Berczy Park,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0
3,Central Bay Street,0.015152
4,Christie,0.0
5,Church and Wellesley,0.0125
6,"Commerce Court, Victoria Hotel",0.0
7,Davisville,0.027778
8,Davisville North,0.0
9,"Dufferin, Dovercourt Village",0.0


# Machine Learning Algorithm (K-Mean Clustering Algorithm)

In [20]:
# set number of clusters
kclusters = 4
indian_restaurant_grouped_clustering = indian_restaurant.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(indian_restaurant_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 3, 0, 3, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 1, 0, 0], dtype=int32)

In [21]:
indian_restaurant_merged = indian_restaurant.copy()
# add clustering labels
indian_restaurant_merged["Cluster Labels"] = kmeans.labels_
indian_restaurant_merged 

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
0,Berczy Park,0.0,0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0
3,Central Bay Street,0.015152,3
4,Christie,0.0,0
5,Church and Wellesley,0.0125,3
6,"Commerce Court, Victoria Hotel",0.0,0
7,Davisville,0.027778,1
8,Davisville North,0.0,0
9,"Dufferin, Dovercourt Village",0.0,0


In [22]:
indian_restaurant_df = pd.merge(indian_restaurant_merged,toronto_restaurant_venues,on='Neighborhood')
indian_restaurant_df.head()

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.0,0,43.644771,-79.373306,LCBO,43.642944,-79.37244,Liquor Store
1,Berczy Park,0.0,0,43.644771,-79.373306,The Keg Steakhouse + Bar - Esplanade,43.646712,-79.374768,Restaurant
2,Berczy Park,0.0,0,43.644771,-79.373306,Fresh On Front,43.647815,-79.374453,Vegetarian / Vegan Restaurant
3,Berczy Park,0.0,0,43.644771,-79.373306,Goose Island Brewhouse,43.647329,-79.373541,Beer Bar
4,Berczy Park,0.0,0,43.644771,-79.373306,Hockey Hall Of Fame (Hockey Hall of Fame),43.646974,-79.377323,Museum


In [23]:
indian_restaurant_df[indian_restaurant_df['Venue Category']=='Indian Restaurant']

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
128,Central Bay Street,0.015152,3,43.657952,-79.387383,Colaba Junction,43.66094,-79.385635,Indian Restaurant
205,Church and Wellesley,0.0125,3,43.66586,-79.38316,Kothur Indian Cuisine,43.667872,-79.385659,Indian Restaurant
358,Davisville,0.027778,1,43.704324,-79.38879,Marigold Indian Bistro,43.702881,-79.388008,Indian Restaurant
809,"Harbourfront East, Union Station, Toronto Islands",0.01,3,43.640816,-79.381752,Indian Roti House,43.63906,-79.385422,Indian Restaurant
1293,"St. James Town, Cabbagetown",0.023256,1,43.667967,-79.367675,Butter Chicken Factory,43.667072,-79.369184,Indian Restaurant
1387,"The Annex, North Midtown, Yorkville",0.052632,2,43.67271,-79.405678,Roti Cuisine of India,43.674618,-79.408249,Indian Restaurant
1441,"The Danforth West, Riverdale",0.02381,1,43.679557,-79.352188,Sher-E-Punjab,43.677308,-79.353066,Indian Restaurant


# Clustered Map

In [24]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.brg(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(indian_restaurant_df['Neighborhood Latitude'], indian_restaurant_df['Neighborhood Longitude'], indian_restaurant_df['Neighborhood'], indian_restaurant_df['Cluster Labels']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Cluster Details

In [25]:
#Green
indian_restaurant_merged.loc[indian_restaurant_merged['Cluster Labels'] == 0, indian_restaurant_merged.columns[list(range(0, indian_restaurant_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
0,Berczy Park,0.0,0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0
4,Christie,0.0,0
6,"Commerce Court, Victoria Hotel",0.0,0
8,Davisville North,0.0,0
9,"Dufferin, Dovercourt Village",0.0,0
10,Enclave of M4L,0.0,0
11,Enclave of M5E,0.0,0
12,"First Canadian Place, Underground city",0.0,0


In [26]:
#Blue
indian_restaurant_merged.loc[indian_restaurant_merged['Cluster Labels'] == 1, indian_restaurant_merged.columns[list(range(0, indian_restaurant_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
7,Davisville,0.027778,1
30,"St. James Town, Cabbagetown",0.023256,1
36,"The Danforth West, Riverdale",0.02381,1


In [27]:
#Purple
indian_restaurant_merged.loc[indian_restaurant_merged['Cluster Labels'] == 2, indian_restaurant_merged.columns[list(range(0, indian_restaurant_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
33,"The Annex, North Midtown, Yorkville",0.052632,2


In [28]:
#Brown
indian_restaurant_merged.loc[indian_restaurant_merged['Cluster Labels'] == 3, indian_restaurant_merged.columns[list(range(0, indian_restaurant_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
3,Central Bay Street,0.015152,3
5,Church and Wellesley,0.0125,3
15,"Harbourfront East, Union Station, Toronto Islands",0.01,3


# Results & Discussion

1)Cluster 1 - Depicts the least frequency of Indian Restaurants among the neighborhoods.

2)Cluster 3(The Annex, North Midtown, Yorkville) - Depicts the maximum frequency of Indian Restaurants among the neighborhoods.

After analyzing, it is found that The Annex, North Midtown, Yorkville has the highest frequency amongst all other neighbourhoods, followed by Davisville. Approximately 80 percentage of the neighborhood has no authentic Indian Restaurant, thus it gives a good opportunity for business owner and freelancer to open a new Restaurant. The green cluster(Cluster 1) can be a good option to open an Indian Restaurant, for example The Beaches, St.James Town, India Bazaar, Forest Hill and Parkdale Village are some good options. This concludes the findings for the location and recommends the business owner and freelancer to open an authentic Indian restaurant in these locations. 

# Conclusion

Finally, to conclude this project, I have got a glimpse of how data-science project look-like. I have used various libraries like, folium, pandas, sklearn, requests. I have also used BeautifulSoup package for web scraping. Here, I have also used Foursquare API services to explore the neighborhoods. And finally, I have used machine learning algorithm, K-Means Clustering Algorithm, to predict the most profitable neighborhood for opening an Indian Restaurant. 