<h3>Coursera Capstone</h3>
<h2>Opening a Japanese Restaurant in Toronto</h2>

<h5>The following notebook shows the code I used to determine the best location for a new Japanese restaurant in Toronto.</h5>
A few assumptions before we begin:
<br>- Restaurant should be located within a neighborhood that already has a high-density of similar restaurants as this shows that there is likely demand for these restaurants. It is also said that this competition will help the entire market thrive, so we will use this business rule as a basis for the hypothesis as well.
<br>- We will look at neighbourhood populations to ensure that there is also a good balance of higher populations that would presumably go to these restaurants. Thus, we are making the assumption that the people who travel to these restaurants are likely within the same neighbourhood, or likely in a neighbouring one in which the population won't vary widely.<br>
<br>Data sources are listed as we go along.
<br>All packages and tools required are imported at the beginning so if anything is used, it is already installed.
<br><br>

In [132]:
# IMPORT ALL PACKAGES AND REQUIRED TOOLS

import pandas as pd
import requests
import numpy as np
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
from pprint import pprint

!pip install opencage
from opencage.geocoder import OpenCageGeocode

Collecting opencage
  Downloading opencage-1.2.2-py3-none-any.whl (6.1 kB)
Collecting backoff>=1.10.0
  Downloading backoff-1.10.0-py2.py3-none-any.whl (31 kB)
Installing collected packages: backoff, opencage
Successfully installed backoff-1.10.0 opencage-1.2.2


In [2]:
# SCRAPE DATA AND GENERATE DATAFRAME (SAME DATA AS WK3 ASSIGNMENT; STILL IN TORONTO)

source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'lxml')

table = soup.find("table")
table_rows = table.tbody.find_all("tr")

res = []
for tr in table_rows:
    td = tr.find_all("td")
    row = [tr.text for tr in td]
    
    # Ignore cells with borough 'Not assigned'.
    if row != [] and row[1] != "Not assigned\n":
    
        # If a cell contains a borough but is a "Not assigned" neighborhood, then the neighborhood will be the same as the borough.
        if "Not assigned" in row[2]: 
            row[2] = row[1]
        res.append(row)

df = pd.DataFrame(res, columns = ["PostalCode", "Borough", "Neighborhood"])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A\n,North York\n,Parkwoods\n
1,M4A\n,North York\n,Victoria Village\n
2,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
3,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
4,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"


In [3]:
# REMOVE '\n' APPENDED TO EACH LINE

df["PostalCode"] = df["PostalCode"].str.replace("\n","")
df["Borough"] = df["Borough"].str.replace("\n","")
df["Neighborhood"] = df["Neighborhood"].str.replace("\n","")

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [4]:
df = df.groupby(["PostalCode", "Borough"])["Neighborhood"].apply(", ".join).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [5]:
# USING THE GEOSPATIAL COORDINATES FILE

df_coords = pd.read_csv("./Geospatial_Coords.csv")

# MERGE DF AND COORDS INTO ONE DATAFRAME, THEN CLEAN DUPLICATE POSTCODE COLUMN

df2 = pd.merge(df, df_coords, how='left', left_on = 'PostalCode', right_on = 'Postal Code')
df2.drop("Postal Code", axis=1, inplace=True)

df2.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<h2>From here, we need population data to add to the neighbourhood data. This will allow us to make a hypothesis and eventually show what the best place for the new restaurant is.
<br>
<br>Statistics Canada (StatsCan) is the country's resource for all sorts of census and geographic information, among other things, which will be our source for this project.</h2>

In [6]:
# LATEST DATA FROM STATSCAN
# SOURCE: 'Statistics Canada. 2017. Population and dwelling counts, for Canada and forward sortation areas© as reported by the respondents, 2016 Census (table). Population and Dwelling Count Highlight Tables. 2016 Census.''
# SOURCE URL: 'https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=1201&S=22&O=A'

df_pop = pd.read_csv("./StatsCan_Toronto.csv",encoding = 'unicode_escape')

# CLEAN TABLE TO REMOVE UNNECESSARY COLUMNS/DATA

df_pop = df_pop.rename(columns={'Geographic code':'PostalCode', 'Geographic name':'Geoname', 'Province or territory':'Province', 'Incompletely enumerated Indian reserves and Indian settlements, 2016':'Incomplete', 'Population, 2016':'Population2016', 'Total private dwellings, 2016':'PrivateDwellings', 'Private dwellings occupied by usual residents, 2016':'OccupiedPrivateDwellings'})
df_pop = df_pop.drop(columns=['Geoname', 'Province', 'Incomplete', 'PrivateDwellings', 'OccupiedPrivateDwellings'])
df_pop = df_pop.iloc[1:]

df_pop.head()

Unnamed: 0,PostalCode,Population2016
1,M1C,35626
2,M1E,46943
3,M1G,29690
4,M1H,24383
5,M1J,36699


In [7]:
# MERGE TORONTO DATA WITH POSTALCODE DATA AND SORT

df3 = pd.merge(df_pop, df2, on="PostalCode", how='right')
df3 = df3.sort_values(by=['Population2016'], ascending=False)

df3.head()

Unnamed: 0,PostalCode,Population2016,Borough,Neighborhood,Latitude,Longitude
22,M2N,75897.0,North York,"Willowdale, Willowdale East",43.77012,-79.408493
18,M2J,58293.0,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
101,M9V,55959.0,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
14,M1V,54680.0,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577
68,M5V,49195.0,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.39442


<h2>The above dataframe shows us neighbourhoods in Toronto, sorted by population.</h2>

In [8]:
# INPUT FOURSQUARE CREDENTIALS

CLIENT_ID = '41MJB42PKA03HVBU14G3V5CVGHJEIW4JXMLUXSRJZNDDJUJM'
CLIENT_SECRET = 'KCE3T0U0LWXV5O2SDDJNIEZUIOBFFGLC1OEUJPHGT4QU2VWG'
VERSION = '20210101'

In [None]:
# SET LIMITS TO PREVENT OVERUSE OF FOURSQUARE FREE ACCOUNT

limit = 200

# SET SEARCH RADIUS TO 5000m. ASSUME PEOPLE WILL TRAVEL UP TO 5km TO VISIT A RESTAURANT.

radius = 5000

# DEFINE FUNCTION TO RETRIEVE VENUES

def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
# GET NEIGHBOURHOOD LIST
Toronto_Venues = getNearbyVenues(names=df3['Neighborhood'],
                                   latitudes=df3['Latitude'],
                                   longitudes=df3['Longitude'])

Willowdale, Willowdale East
Fairview, Henry Farm, Oriole
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens
Milliken, Agincourt North, Steeles East, L'Amoreaux East
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Steeles West, L'Amoreaux West
Kennedy Park, Ionview, East Birchmount Park
Guildwood, Morningside, West Hill
Woodbine Heights
Dorset Park, Wexford Heights, Scarborough Town Centre
Dufferin, Dovercourt Village
Del Ray, Mount Dennis, Keelsdale and Silverthorn
Downsview
Runnymede, The Junction North
Regent Park, Harbourfront
Brockton, Parkdale Village, Exhibition Place
Willowdale, Willowdale West
Northwest, West Humber - Clairville
High Park, The Junction South
Don Mills
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Caledonia-Fairbanks
New Toronto, Mimico South, Humber Bay Shores
Agincourt
Bathurst Manor, Wilson Heights, Downsview North
Scarborough 

In [18]:
# SHOW UNIQUE VENUE CATEGORIES

print('Unique Venue Categories:')
list(Toronto_Venues['Venue Category'].unique())

Unique Venue Categories:


['Hotel',
 'Ramen Restaurant',
 'Seafood Restaurant',
 'Steakhouse',
 'Japanese Restaurant',
 'Café',
 'Creperie',
 'Movie Theater',
 'Bubble Tea Shop',
 'Theater',
 'Grocery Store',
 'Sushi Restaurant',
 'Korean Restaurant',
 'Fried Chicken Joint',
 'Liquor Store',
 'Coffee Shop',
 'Bakery',
 'Thai Restaurant',
 'Burger Joint',
 'Park',
 'French Restaurant',
 'Supermarket',
 'Plaza',
 'Bridal Shop',
 'Gym',
 'Middle Eastern Restaurant',
 'Pizza Place',
 'Spa',
 'Fish Market',
 'Shopping Mall',
 'Ski Chalet',
 'Deli / Bodega',
 'Health Food Store',
 'Auto Dealership',
 'Bookstore',
 'Mediterranean Restaurant',
 'Persian Restaurant',
 'General Entertainment',
 'Greek Restaurant',
 'Clothing Store',
 'Outdoor Supply Store',
 'Gourmet Shop',
 'Restaurant',
 'Vietnamese Restaurant',
 'Sandwich Place',
 'Escape Room',
 'Dessert Shop',
 'Tea Room',
 'Italian Restaurant',
 'Sporting Goods Shop',
 'Bagel Shop',
 'Cosmetics Shop',
 'Hobby Shop',
 'Furniture / Home Store',
 'Breakfast Spot',
 'C

In [19]:
# ISOLATE ONLY THOSE CATEGORIES WITH JAPANESE THEMES (SUSHI, RAMEN, ETC.)
Japanese_restaurants = ['Ramen Restaurant', 'Japanese Restaurant', 'Sushi Restaurant', 'Noodle House', 'Sake Bar']

Japanese_rest_pd = pd.DataFrame(Japanese_restaurants)

Japanese_rest_pd

Unnamed: 0,0
0,Ramen Restaurant
1,Japanese Restaurant
2,Sushi Restaurant
3,Noodle House
4,Sake Bar


In [53]:
# RENAME COLUMN TO FIVE TYPES OF JAPANESE RESTAURANT
Japanese_rest_pd = Japanese_rest_pd.rename(columns={0:'Venue Category'})

# MERGE DATAFRAMES TO SEE ONLY JAPANESE RESTAURANT VARIANTS IN NEIGHBOURHOODS
Toronto_Japanese_rest = pd.merge(Toronto_Venues, Japanese_rest_pd, on='Venue Category', how='right')

Toronto_Japanese_rest.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Willowdale, Willowdale East",43.77012,-79.408493,Konjiki Ramen,43.766998,-79.412222,Ramen Restaurant
1,"Willowdale, Willowdale East",43.77012,-79.408493,Sansotei Ramen 三草亭,43.776709,-79.413927,Ramen Restaurant
2,"Willowdale, Willowdale West",43.782736,-79.442259,Konjiki Ramen,43.766998,-79.412222,Ramen Restaurant
3,Don Mills,43.7259,-79.340923,Kinton Ramen,43.707302,-79.395854,Ramen Restaurant
4,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,Konjiki Ramen,43.766998,-79.412222,Ramen Restaurant


In [56]:
# USING ONE HOT ENCODING
newonehot = pd.get_dummies(Toronto_Japanese_rest[['Venue Category']], prefix="", prefix_sep="")

# ADD NEIGHBOURHOOD BACK IN AND MOVE TO FIRST COLUMN
newonehot['Neighborhood'] = Toronto_Japanese_rest['Neighborhood'] 
fixed_columns = [newonehot.columns[-1]] + list(newonehot.columns[:-1])
newonehot = newonehot[fixed_columns]

newonehot.head()

Unnamed: 0,Neighborhood,Japanese Restaurant,Noodle House,Ramen Restaurant,Sake Bar,Sushi Restaurant
0,"Willowdale, Willowdale East",0,0,1,0,0
1,"Willowdale, Willowdale East",0,0,1,0,0
2,"Willowdale, Willowdale West",0,0,1,0,0
3,Don Mills,0,0,1,0,0
4,"Bathurst Manor, Wilson Heights, Downsview North",0,0,1,0,0


In [59]:
# ANALYSIS OF RESTAURANT TYPES (PERCENTAGES) IN EACH NEIGHBORHOOD

grouped = newonehot.groupby('Neighborhood').mean().reset_index()
grouped.shape

grouped.head()

Unnamed: 0,Neighborhood,Japanese Restaurant,Noodle House,Ramen Restaurant,Sake Bar,Sushi Restaurant
0,Agincourt,0.333333,0.333333,0.0,0.0,0.333333
1,"Alderwood, Long Branch",1.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.333333,0.0,0.166667,0.0,0.5
3,Bayview Village,0.571429,0.0,0.142857,0.0,0.285714
4,"Bedford Park, Lawrence Manor East",0.333333,0.0,0.166667,0.0,0.5


In [62]:
# CLUSTER MODELLING
# USE SILHOUETTE TO FIND BEST CLUSTER GROUPS

groupedclusters = grouped.drop('Neighborhood', 1)

kclusters = np.arange(2,10)
results = {}
for size in kclusters:
    model = KMeans(n_clusters = size).fit(groupedclusters)
    predictions = model.predict(groupedclusters)
    results[size] = silhouette_score(groupedclusters, predictions)

best_size = max(results, key=results.get)
best_size

9

In [64]:
# RUN K MEANS AND SEGMENT DATA

kclusters = best_size
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(groupedclusters)

# CHECK LABELS

kmeans.labels_[0:10]


array([5, 1, 0, 4, 0, 2, 1, 3, 3, 8], dtype=int32)

In [70]:
# CREATE FUNCTION TO RETURN MOST COMMON

def most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighborhood'] = grouped['Neighborhood']

for ind in np.arange(grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = most_common_venues(grouped.iloc[ind, :], num_top_venues)

venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Sushi Restaurant,Noodle House,Japanese Restaurant,Sake Bar,Ramen Restaurant
1,"Alderwood, Long Branch",Japanese Restaurant,Sushi Restaurant,Sake Bar,Ramen Restaurant,Noodle House
2,"Bathurst Manor, Wilson Heights, Downsview North",Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,Noodle House
3,Bayview Village,Japanese Restaurant,Sushi Restaurant,Ramen Restaurant,Sake Bar,Noodle House
4,"Bedford Park, Lawrence Manor East",Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,Noodle House


In [90]:
# MERGE DATAFRAMES TO INCLUDE ALL DATA FROM NEIGHBORHOOD AND RESTAURANT TYPE DFs

Toronto_complete = pd.merge(df3, venues_sorted, on='Neighborhood', how='left')
Toronto_complete.head()

Unnamed: 0,PostalCode,Population2016,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M2N,75897.0,North York,"Willowdale, Willowdale East",43.77012,-79.408493,Japanese Restaurant,Sushi Restaurant,Ramen Restaurant,Sake Bar,Noodle House
1,M2J,58293.0,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,Japanese Restaurant,Sushi Restaurant,Sake Bar,Ramen Restaurant,Noodle House
2,M9V,55959.0,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437,Sushi Restaurant,Sake Bar,Ramen Restaurant,Noodle House,Japanese Restaurant
3,M1V,54680.0,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577,Noodle House,Japanese Restaurant,Sushi Restaurant,Sake Bar,Ramen Restaurant
4,M5V,49195.0,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.39442,,,,,


In [102]:
# MERGE TORONTO DATA WITH COORDINATE DATA AND GET CLUSTER LABELS
labels = pd.merge(Toronto_complete, grouped, on='Neighborhood', how='right')
labels.shape
labels.head()

Unnamed: 0,PostalCode,Population2016,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Japanese Restaurant,Noodle House,Ramen Restaurant,Sake Bar,Sushi Restaurant
0,M1S,37769.0,Scarborough,Agincourt,43.7942,-79.262029,Sushi Restaurant,Noodle House,Japanese Restaurant,Sake Bar,Ramen Restaurant,0.333333,0.333333,0.0,0.0,0.333333
1,M8W,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,Japanese Restaurant,Sushi Restaurant,Sake Bar,Ramen Restaurant,Noodle House,1.0,0.0,0.0,0.0,0.0
2,M3H,37011.0,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,Noodle House,0.333333,0.0,0.166667,0.0,0.5
3,M2K,23852.0,North York,Bayview Village,43.786947,-79.385975,Japanese Restaurant,Sushi Restaurant,Ramen Restaurant,Sake Bar,Noodle House,0.571429,0.0,0.142857,0.0,0.285714
4,M5M,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,Noodle House,0.333333,0.0,0.166667,0.0,0.5


In [112]:
# ADD CLUSTERED LABELS

tablewithlabels = labels
tablewithlabels['Cluster Labels'] = kmeans.labels_

# MERGE TO ADD LAT LONG TO EACH NEIGHBORHOOD

tablewithlabels = pd.merge(labels, venues_sorted, on='Neighborhood', how='left')

tablewithlabels.head()

Unnamed: 0,PostalCode,Population2016,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue_x,2nd Most Common Venue_x,3rd Most Common Venue_x,4th Most Common Venue_x,...,Noodle House,Ramen Restaurant,Sake Bar,Sushi Restaurant,Cluster Labels,1st Most Common Venue_y,2nd Most Common Venue_y,3rd Most Common Venue_y,4th Most Common Venue_y,5th Most Common Venue_y
0,M1S,37769.0,Scarborough,Agincourt,43.7942,-79.262029,Sushi Restaurant,Noodle House,Japanese Restaurant,Sake Bar,...,0.333333,0.0,0.0,0.333333,5,Sushi Restaurant,Noodle House,Japanese Restaurant,Sake Bar,Ramen Restaurant
1,M8W,20674.0,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,Japanese Restaurant,Sushi Restaurant,Sake Bar,Ramen Restaurant,...,0.0,0.0,0.0,0.0,1,Japanese Restaurant,Sushi Restaurant,Sake Bar,Ramen Restaurant,Noodle House
2,M3H,37011.0,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,...,0.0,0.166667,0.0,0.5,0,Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,Noodle House
3,M2K,23852.0,North York,Bayview Village,43.786947,-79.385975,Japanese Restaurant,Sushi Restaurant,Ramen Restaurant,Sake Bar,...,0.0,0.142857,0.0,0.285714,4,Japanese Restaurant,Sushi Restaurant,Ramen Restaurant,Sake Bar,Noodle House
4,M5M,25975.0,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,...,0.0,0.166667,0.0,0.5,0,Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Sake Bar,Noodle House


In [166]:
# FIND VALUES FOR EACH OF THE CLUSTERS

In [117]:
cluster0 = tablewithlabels.loc[tablewithlabels['Cluster Labels'] == 0, tablewithlabels.columns[[3, 4] + list(range(5, tablewithlabels.shape[1]))]]
cluster0.shape

(9, 19)

In [118]:
cluster1 = tablewithlabels.loc[tablewithlabels['Cluster Labels'] == 1, tablewithlabels.columns[[3, 4] + list(range(5, tablewithlabels.shape[1]))]]
cluster1.shape

(21, 19)

In [119]:
cluster2 = tablewithlabels.loc[tablewithlabels['Cluster Labels'] == 2, tablewithlabels.columns[[3, 4] + list(range(5, tablewithlabels.shape[1]))]]
cluster2.shape

(8, 19)

In [120]:
cluster3 = tablewithlabels.loc[tablewithlabels['Cluster Labels'] == 3, tablewithlabels.columns[[3, 4] + list(range(5, tablewithlabels.shape[1]))]]
cluster1.shape

(21, 19)

In [122]:
cluster4 = tablewithlabels.loc[tablewithlabels['Cluster Labels'] == 4, tablewithlabels.columns[[3, 4] + list(range(5, tablewithlabels.shape[1]))]]
cluster4.shape

(13, 19)

In [123]:
cluster5 = tablewithlabels.loc[tablewithlabels['Cluster Labels'] == 5, tablewithlabels.columns[[3, 4] + list(range(5, tablewithlabels.shape[1]))]]
cluster5.shape

(4, 19)

In [None]:
# CLUSTER 1 AND 3 HAVE THE SAME DENSITY, SO THEY ARE BOTH OPTIMAL LOCATIONS WITH THESE VARIABLES

In [127]:
# FIND GEOGRAPHIC CENTRE OF EACH CLUSTER

cluster1coords = cluster1[['Latitude', 'Longitude']]
cluster1coords = list(cluster1coords.values) 
lat = []
long = []

for l in cluster1coords:
  lat.append(l[0])
  long.append(l[1])

Blatitude = sum(lat)/len(lat)
Blongitude = sum(long)/len(long)
print(Blatitude)
print(Blongitude)

43.69177028571429
-79.41449641904762


In [134]:
cluster3coords = cluster3[['Latitude', 'Longitude']]
cluster3coords = list(cluster3coords.values) 
lat = []
long = []

for l in cluster3coords:
  lat.append(l[0])
  long.append(l[1])

blatitude = sum(lat)/len(lat)
blongitude = sum(long)/len(long)
print(Blatitude)
print(Blongitude)

43.695958680000004
-79.39204819333334


<h3>Since the actual ideal coordinates for Cluster 1 is a residential street, whereas the actual ideal coordinates Cluster 3 is a parkette alongside shops and other stores, the more ideal location (though marginally), would be the actual ideal coordinates of Cluster 3 as it would be near other restaurants, a grocery store, etc.
    <br>
<br>43.695958680000004, -79.39204819333334

In [137]:
# INSTALL OPENCAGE TO CONVERT COORDINATES TO ADDRESS
key = 'b7110d6d829b48a9b718a09748d1628f'
geocoder = OpenCageGeocode(key)

results = geocoder.reverse_geocode(43.695958680000004, -79.39204819333334)
pprint(results)

[{'annotations': {'DMS': {'lat': "43° 41' 48.69240'' N",
                          'lng': "79° 23' 32.07804'' W"},
                  'MGRS': '17TPJ2955539460',
                  'Maidenhead': 'FN03hq27wf',
                  'Mercator': {'x': -8837904.164, 'y': 5389120.776},
                  'OSM': {'edit_url': 'https://www.openstreetmap.org/edit?way=401259394#map=17/43.69686/-79.39224',
                          'note_url': 'https://www.openstreetmap.org/note/new#map=17/43.69686/-79.39224&layers=N',
                          'url': 'https://www.openstreetmap.org/?mlat=43.69686&mlon=-79.39224#map=17/43.69686/-79.39224'},
                  'UN_M49': {'regions': {'AMERICAS': '019',
                                         'CA': '124',
                                         'NORTHERN_AMERICA': '021',
                                         'WORLD': '001'},
                             'statistical_groupings': ['MEDC']},
                  'callingcode': 1,
                  'currency': 

In [145]:
# FIND BEST LOCATION
popstr = df3[df3['PostalCode'].str.contains('M4S')]

def str_join(*args):
    return ''.join(map(str, args))

popstr = str_join('Best Location: ', popstr['Neighborhood'].values, ' in ', popstr['Borough'].values)

print(popstr)

Best Location: ['Davisville'] in ['Central Toronto']


In [152]:
# USING GEOPY GEOCODERS

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="http")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("Toronto's Geographical Coordinates: {}, {}".format(latitude, longitude))

Toronto's Geographical Coordinates: 43.6534817, -79.3839347


In [164]:
# USE FOLIUM TO SHOW BEST LOCATION

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# COLOURS

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# ADD MARKERS
markers_colors = []
for lat, lon, poi, cluster in zip(tablewithlabels['Latitude'], tablewithlabels['Longitude'], tablewithlabels['Neighborhood'], tablewithlabels['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
folium.CircleMarker([blatitude, blongitude],
                    radius=50,
                    popup='Toronto',
                    color='red',
                    ).add_to(map_clusters)

map_clusters.save('map_clusters.html')
map_clusters

<h4>The notebook does not properly display the Folium interactive map as it is not trusted properly by GitHub. Please see my full Coursera_Capstone repository at https://github.com/chriskirkos/Coursera_Capstone. The files are called 'Final Map 1' and 'Final Map 2', where the first is the generated map, and the second is the same map scrolled out to show the neighborhoods. You can also click here:</h4>

https://github.com/chriskirkos/Coursera_Capstone/blob/main/Final%20Map%201.png
https://github.com/chriskirkos/Coursera_Capstone/blob/main/Final%20Map%202.png


<h2>Results & Conclusion</h2>
As you might expect, the greatest concentration of restaurants was determined to be Central and Downtown Toronto. These are also, unsurprisingly, the most densely populated areas of the city, due to the abundance of high-rise buildings and walkable areas with plenty of restaurants, shopping, and nightlife. (Source: me. I live here!)
<br>
<br>By focusing on the Davisville area where there are already a relatively high number of Japanese restaurants, the competition will continue to encourage others to keep their quality high, and therefore keep discerning customers coming back for another quality Japanese retaurant.
<br><br>
Opening the new restaurant in Davisville is sure to make the most sense based on the research shown above, backed up with proper data analysis.

<h1>Thanks!</h1>