# Capstone Project - Advicing a high-level restaurant company for their next opening in Toronto
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

A holding company dedicated to the culinary industry based in Europe, owner of several restaurant with two and three Michelin stars in the major capitals of Europe has been studying an opening in the new continent.

The company's Directive Board have choosed Toronto as their next market due to the concept thought for the next restaurant, cultural and gastronomic aspects of the city and actual market conditions in Canada. So the next step is choose the location for their next saloon.

Due to the Directive's lack of knowledge of the city's gastronomic scene, they've been recieving advicing from a Canadian specialist but, also decided to get a second opinion and have double-check the information received.

In order to support the Directive border, we suggest an statistical analysis of all the Toronto's neighborhoods to recomemd the best one for the next high-category restaurant opening. So, this project will suggest the optimal(s) neighborhood(s) for our stakeholder. 

Consequently, we're interested in detect neighborhoods not so crowded with restaurants and also showing characteristics of being a good neighborhood.


## Data <a name="data"></a>

After defining our business problem, the data sources required are the following:

* The list of all the Toronto's neighborhoods available in <a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M">Wikipedia</a>. 
* <a href="http://cocl.us/Geospatial_data">The geospatial data of each neighborhood in Toronto</a>.
* All the venues of Toronto, obtained from <a href="https://es.foursquare.com/">Foursquare</a>.

Those data sources will be enough to find a answer to our project and find the best neighborhood for our client.

##### Importing all libraries

In [2]:
import requests
import lxml.html as lh
import pandas as pd
import numpy as np

from geopy.geocoders import Nominatim

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                       

##### Creating our inicial Data Frame with the geospatial data 

In [3]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

#Create a handle, page, to handle the contents of the website
page = requests.get(url)

#Store the contents of the website under doc
doc = lh.fromstring(page.content)

#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')

In [4]:
tr_elements = doc.xpath('//tr')
col=[]
i=0

#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    col.append((name,[]))

In [5]:
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 10, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1

In [6]:
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)

##### Making some corrections in the Data Frame:

In [7]:
# replacing the \n for nothing in the df
def reemplazar(x):
    return x.replace('\n', '')

df.columns = ['Postal code', 'Borough', 'Neighbourhood']
df = df.applymap(reemplazar)

# Replacing '/' for ','
def reemplazar(x):
    return x.replace(' /', ',')

df = df.applymap(reemplazar)

##### Droping rows with not assigned 

In [8]:
# droping not assigned. 

# Names of indexes where Borough == 'Not assigned'
indexNames = df[ df['Borough'] =='Not assigned'].index

# Delete rows
df.drop(indexNames , inplace=True)

##### Merging the repited postal codes into one row with all the neighborhoods

In [9]:
result = df.groupby(['Postal code','Borough'], sort=False).agg( ', '.join)

In [10]:
df_new=result.reset_index()
df_new= df_new[:103]

##### The result is the following Dataframe

In [11]:
df_new

Unnamed: 0,Postal code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


##### Adding the geospatial data to my Data Frame

In [12]:
!wget -q -O 'Toronto_long_lat_data.csv'  http://cocl.us/Geospatial_data
df_coords = pd.read_csv('Toronto_long_lat_data.csv')
df_coords.columns=['Postal code','Latitude','Longitude']
df_coords.head()

Unnamed: 0,Postal code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
Toronto_df = pd.merge(df_new,
                 df_coords[['Postal code','Latitude', 'Longitude']],
                 on='Postal code')

##### The resulting -and final- Dataframe is the following:

In [14]:
Toronto_df.head(10)

Unnamed: 0,Postal code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [15]:
Toronto_df.shape

(103, 5)

##### Now, we can explore the neighbourhoods in Toronto with Foursquare

In [16]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude_toronto, longitude_toronto))


The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


##### First, Locate all the Toronto's neighborhoods in our map

In [17]:
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)

# add markers to map
for lat, lng, borough, Neighbourhood in zip(Toronto_df['Latitude'], Toronto_df['Longitude'], Toronto_df['Borough'], Toronto_df['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

##### Define Foursquare credentials, conect to their API and download all the neighborhoods venues 

In [18]:
CLIENT_ID = 'xxxxx'
CLIENT_SECRET = 'xxxxx'
VERSION = 'xxxxx'
# defining radius and limit of venues to get
radius=500
LIMIT=100

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:

toronto_venues = getNearbyVenues(names=Toronto_df['Neighbourhood'],
                                   latitudes=Toronto_df['Latitude'],
                                   longitudes=Toronto_df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [21]:
toronto_venues

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
5,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
6,Victoria Village,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant
7,"Regent Park, Harbourfront",43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
8,"Regent Park, Harbourfront",43.654260,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
9,"Regent Park, Harbourfront",43.654260,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


In [22]:
toronto_venues.shape

(2137, 7)

##### How many venues there are for each Neighborhood:

In [23]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",10,10,10,10,10,10
"Bathurst Manor, Wilson Heights, Downsview North",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",22,22,22,22,22,22
Berczy Park,57,57,57,57,57,57
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",22,22,22,22,22,22
Business reply mail Processing Centre,18,18,18,18,18,18
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16


##### Analizing each neighbourhood

In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighbourhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighbourhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot.head()

Unnamed: 0,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Victoria Village
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Victoria Village


In [25]:
toronto_onehot.shape

(2137, 267)

##### Group rows by neighbourhood and taking the mean of the frequency of occurrence of each category

In [26]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head(5)

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Analyzing each Neighbourhood and his 5 top venues ...

In [27]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge  0.25
1               Skating Rink  0.25
2  Latin American Restaurant  0.25
3             Breakfast Spot  0.25
4              Metro Station  0.00


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place   0.2
1    Dance Studio   0.1
2  Sandwich Place   0.1
3             Pub   0.1
4            Pool   0.1


----Bathurst Manor, Wilson Heights, Downsview North----
            venue  freq
0            Bank  0.11
1     Coffee Shop  0.11
2        Pharmacy  0.05
3  Ice Cream Shop  0.05
4   Shopping Mall  0.05


----Bayview Village----
                 venue  freq
0                 Café  0.25
1  Japanese Restaurant  0.25
2                 Bank  0.25
3   Chinese Restaurant  0.25
4                Motel  0.00


----Bedford Park, Lawrence Manor East----
                venue  freq
0  Italian Restaurant  0.09
1         Coffee Shop  0.09
2      Sandwich Place  0.09
3          Restaurant  0.09
4        

                             venue  freq
0                   Baseball Field   1.0
1                Accessories Store   0.0
2               Mexican Restaurant   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----Humewood-Cedarvale----
                             venue  freq
0                            Field  0.33
1                            Trail  0.33
2                     Hockey Arena  0.33
3               Mexican Restaurant  0.00
4  Molecular Gastronomy Restaurant  0.00


----India Bazaar, The Beaches West----
                  venue  freq
0  Fast Food Restaurant  0.09
1        Sandwich Place  0.09
2     Fish & Chips Shop  0.05
3            Board Shop  0.05
4            Restaurant  0.05


----Kennedy Park, Ionview, East Birchmount Park----
               venue  freq
0      Train Station  0.14
1   Department Store  0.14
2     Discount Store  0.14
3         Hobby Shop  0.14
4  Convenience Store  0.14


----Kensington Market, Chinatown, Grange

                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.08
2              Brewery  0.05
3               Bakery  0.05
4  American Restaurant  0.05


----Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park----
                venue  freq
0         Coffee Shop  0.12
1                 Pub  0.12
2  Light Rail Station  0.06
3        Liquor Store  0.06
4          Sports Bar  0.06


----The Annex, North Midtown, Yorkville----
                           venue  freq
0                           Café  0.14
1                 Sandwich Place  0.14
2                    Coffee Shop  0.09
3                           Park  0.05
4  Vegetarian / Vegan Restaurant  0.05


----The Beaches----
               venue  freq
0  Health Food Store  0.25
1                Pub  0.25
2       Neighborhood  0.25
3              Trail  0.25
4  Accessories Store  0.00


----The Danforth West, Riverdale----
                    venue  freq
0        Greek Restaurant  0.21
1      Italian 

##### Converting all that text into a Dataframe:

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

##### Create a new dataframe and display the top 10 venues for each neighborhood.

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Skating Rink,Latin American Restaurant,Lounge,Breakfast Spot,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
1,"Alderwood, Long Branch",Pizza Place,Gym,Pharmacy,Skating Rink,Pub,Pool,Sandwich Place,Dance Studio,Coffee Shop,Distribution Center
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Middle Eastern Restaurant,Pizza Place,Sandwich Place,Bridal Shop,Diner,Restaurant,Deli / Bodega,Ice Cream Shop
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Restaurant,Coffee Shop,Italian Restaurant,Liquor Store,Butcher,Indian Restaurant,Café,Sushi Restaurant,Pizza Place


## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting neighborhoods of Toronto that have low restaurant relative density, particularly those with low relatively prescense of restaurants, but also good characteristics of a good neighborhood.

In first step we have collected the **geospatial data of the Toronto's neighborhoods** and we have also have **obtained all venues of each neighborhoods restaurants through the Foursquare API** in order to the third step.

The methodology applied will be the neighborhoods clusterization through a supervised method of called K-means (Our model focus in the relative prescense of the principal venues on each neighborhood), aiming to simplified the exploration of the different places. Restaurant relative density across different areas of Toronto and the prescense of the rest of venues that allows to cluster the neighborhoods will be analyzed in order to choosed our best recomendation. 

In [30]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3,
       1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 3, 0, 1, 2, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 1, 0, 1], dtype=int32)

##### Create a new Data Frame that includes the cluster as well as the top 10 venues for each neighborhood.

In [31]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

toronto_merged = Toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head()

Unnamed: 0,Postal code,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Bus Stop,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Ethiopian Restaurant,Discount Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,French Restaurant,Coffee Shop,Portuguese Restaurant,Hockey Arena,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Café,Restaurant,Theater,Yoga Studio,Hotel
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Accessories Store,Arts & Crafts Store,Coffee Shop,Miscellaneous Shop,Shoe Store,Boutique,Furniture / Home Store,Event Space,Vietnamese Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Sushi Restaurant,Yoga Studio,Diner,Park,Mexican Restaurant,Juice Bar,Japanese Restaurant,Italian Restaurant,Hobby Shop


##### There is no data available for some neighbourhood, so I drop those ones.

In [32]:
toronto_merged=toronto_merged.dropna()

In [33]:
toronto_merged['Cluster_Labels'] = toronto_merged.Cluster_Labels.astype(int)

##### Analyzing the results

In [34]:
map_clusters = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##### Cluster 1

In [35]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Park,Food & Drink Shop,Bus Stop,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Ethiopian Restaurant,Discount Store
10,North York,0,Park,Metro Station,Pub,Japanese Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Dance Studio,Discount Store
21,York,0,Park,Pool,Women's Store,Airport,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop
35,East York,0,Park,Convenience Store,Coffee Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
49,North York,0,Park,Bakery,Construction & Landscaping,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Deli / Bodega,Dog Run
61,Central Toronto,0,Park,Swim School,Bus Line,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center,Event Space
64,York,0,Park,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Yoga Studio,Dance Studio
66,North York,0,Park,Convenience Store,Bank,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Deli / Bodega
68,Central Toronto,0,Park,Trail,Sushi Restaurant,Jewelry Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
85,Scarborough,0,Park,Bakery,Playground,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run


##### Cluster 2

In [36]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,French Restaurant,Coffee Shop,Portuguese Restaurant,Hockey Arena,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,Downtown Toronto,1,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Café,Restaurant,Theater,Yoga Studio,Hotel
3,North York,1,Clothing Store,Accessories Store,Arts & Crafts Store,Coffee Shop,Miscellaneous Shop,Shoe Store,Boutique,Furniture / Home Store,Event Space,Vietnamese Restaurant
4,Downtown Toronto,1,Coffee Shop,Sushi Restaurant,Yoga Studio,Diner,Park,Mexican Restaurant,Juice Bar,Japanese Restaurant,Italian Restaurant,Hobby Shop
7,North York,1,Gym,Restaurant,Coffee Shop,Japanese Restaurant,Beer Store,Asian Restaurant,Chinese Restaurant,Discount Store,Supermarket,Café
8,East York,1,Pizza Place,Bank,Gym / Fitness Center,Intersection,Pet Store,Pharmacy,Gastropub,Fast Food Restaurant,Athletics & Sports,Diner
9,Downtown Toronto,1,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Middle Eastern Restaurant,Restaurant,Bubble Tea Shop,Japanese Restaurant,Italian Restaurant,Theater
11,Etobicoke,1,Golf Course,Yoga Studio,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
12,Scarborough,1,Bar,Construction & Landscaping,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
13,North York,1,Gym,Restaurant,Coffee Shop,Japanese Restaurant,Beer Store,Asian Restaurant,Chinese Restaurant,Discount Store,Supermarket,Café


##### Cluster 3

In [37]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,2,Playground,Convenience Store,Yoga Studio,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
83,Central Toronto,2,Playground,Yoga Studio,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run


##### Cluster 4

In [38]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,3,Fast Food Restaurant,Deli / Bodega,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
56,York,3,Fast Food Restaurant,Sandwich Place,Bar,Fried Chicken Joint,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run


##### Cluster 5

In [39]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] ==4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,4,Baseball Field,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Farmers Market


## Results and Discussion <a name="results"></a>

Our analysis create the following five clusters of neighborhoods:

1. <b>Cluster 1.</b> Which contains the 12% of all the Toronto's Boroughs. This cluster is characterized by their high relative presence of Parks.
2. <b>Cluster 2.</b> Which contains the 83% of all the Toronto's Boroughs. This cluster shows a heterogeinity in their most common venues in each neighborhood: Coffe Shops, Cafés,  Pizza Places, Grocery Stores, Clothing Stores, Gyms, etc. 
3. <b>Cluster 3.</b> Which contanins neighborhoods like Scarborough Village (Scarborough), Moore Park and Summerhill East (Central Toronto). This cluster shows the characteristics searched for our recomendation: a low presence of restaurants in the neighborhoods. The most common venues are Playgrounds in both Boroughs, but they doesn't show venues related to restaurants until their seventh and eight most commoms venues (Dim Sum Restaurant and Diner) in both Boroughs.    
4. <b>Cluster 4.</b> Which contanins two Boroughs with neighborhoods like Del Ray, Mount Dennis, Keelsdale, Silverthorn, Malvern and Rouge characterized by their high relative presence of Fast Food Restaurants.
5. <b>Cluster 5.</b> Which just contain one Borough with the neighborhoods of Humberlea and Emery Cluster, characterized by their high relative presence of Baseball Fields and Yoga Studios.

The result is the selection neighborhoods of Scarborough Village (Scarborough), Moore Park and Summerhill East (Central Toronto) as our recommendation for our stakeholdes due to their relative low presence of restaurants and a good combination of other main venues (also, not related to restaurants).  

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Berlin areas close to center with low number of restaurants in order to aid our client in narrowing down the search for optimal neighborhood for their next restaurant. By clustering the neighborhoods, through their venues relative composition with information obtained from Foursquare,  we have first identified __Scarborough Village (Scarborough), Moore Park and Summerhill East (Central Toronto)__.

The final decission of the restaurant location will be made by our stakeholders based on the availability of locals for sale/rent in the recommended neighborhoods, also taking into consideration additional factors like attractiveness of each location and social and economic characteristics, etc.