This notebook presents the assignment for week 4 of the Applied Data Science Capstone class, from IBM + COursera.

Author: Joana Smith

# Part 1 - Scrape the web and generate a Panda Dataframe

Scrape the Wikipedia page, https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

In [1]:
import requests
import wget
import urllib
from bs4 import BeautifulSoup

Open the website & creat at new object soup based on the website

In [3]:
url='https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods'
wiki_html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(wiki_html)
#print(soup.prettify())

### ii) Scrape the soup and get the data we need: names of the Super Neighborhoods

In [4]:
table=soup.find('table')

In [10]:
import pandas as pd

# define the dataframe columns
column_names = ['Super Neighborhood'] 

# instantiate the dataframe
df = pd.DataFrame(columns=column_names)

i=0
for neighborhood in table.tbody.find_all('tr'):
    temp=[]
    for info in neighborhood.find_all('td'):
        temp.append(info.text.replace('\n',''))
    
    if temp!=[]: #skip in case there are no "td"
        df=df.append({'Super Neighborhood':temp[1]}, ignore_index=True)




In [13]:
df.head()

Unnamed: 0,Super Neighborhood
0,Willowbrook
1,Greater Greenspoint
2,Carverdale
3,Fairbanks / Northwest Crossing
4,Greater Inwood


And also the shape of our dataframe:

In [14]:
df.shape

(88, 1)

#  Part 2 - Generate a dataframe that contains latitute and longitude for each Super Neighborhood

### i) First attempt: Getting latitude and longitude using Geocoder

In [16]:
import geocoder

In [20]:
#Looping through the postal codes
for neighb in df['Super Neighborhood']:

    #initialize variables to None
    lat_lng_coords=None
    
    
    #loop until you get the coordinates
    while(lat_lng_coords is None):
            g=geocoder.arcgis('{}, Toronto, Ontario'.format(neighb))
            lat_lng_coords=g.latlng
            i=i+1
    df.loc[df['Super Neighborhood'] == neighb, 'Latitude']=lat_lng_coords[0]
    df.loc[df['Super Neighborhood'] == neighb, 'Longitude']=lat_lng_coords[1]

In [21]:
df.head(12)

Unnamed: 0,Super Neighborhood,Latitude,Longitude
0,Willowbrook,43.615836,-79.504731
1,Greater Greenspoint,44.15147,-77.0648
2,Carverdale,43.819029,-79.311113
3,Fairbanks / Northwest Crossing,43.775523,-79.509942
4,Greater Inwood,43.68733,-79.336978
5,Acres Home,43.671878,-79.428455
6,Hidden Valley,43.782629,-79.458019
7,Westbranch,43.649192,-79.46799
8,Addicks / Park Ten,43.598509,-79.508027
9,Spring Branch West,43.647244,-79.460321


# Part 3 - Exploring and Clustering the Neighborhoods of Toronto

Let's first import all the libraries needed for this section.

In [23]:
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

### ii) Let's take a quick look at the neighborhoods of Toronto in a map:

In [28]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Houston, Texas'

geolocator = Nominatim(user_agent="houston_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Toronto using latitude and longitude values
map_houston = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'],df['Super Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_houston)  
    
map_houston

### iii) Getting venues information from Foursquare

Setting client_id, client_secret and version for the conection with the Foursquare API

In [32]:
# @hidden_cell
# Define Foursquare Credentials and Version


Create functions to extract the category of the venue & to repeat this process for each neighborhood

In [33]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


# Let's create a function to repeat the same process to all the neighborhoods in toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now let's call the function defined above to get the top 100 venues inside a radius of 500m of the center of each neighborhood

In [34]:
# Call the function
LIMIT=100

toronto_venues_pre = getNearbyVenues(names=df_toronto['Neighborhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )


Let's check the shape of the resulting dataframe and check the first rows of it

In [35]:
# Let's check the size of the resulting dataframe
print('shape: ',toronto_venues_pre.shape)
toronto_venues_pre.head()

shape:  (1742, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbourfront, Regent Park",43.65512,-79.36264,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Harbourfront, Regent Park",43.65512,-79.36264,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Harbourfront, Regent Park",43.65512,-79.36264,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,"Harbourfront, Regent Park",43.65512,-79.36264,Cocina Economica,43.654959,-79.365657,Mexican Restaurant
4,"Harbourfront, Regent Park",43.65512,-79.36264,Body Blitz Spa East,43.654735,-79.359874,Spa


In [36]:
#Let's check how many venues were returned for each neighborhood
toronto_venues_pre.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,65,65,65,65,65,65
"Brockton, Exhibition Place, Parkdale Village",71,71,71,71,71,71
Business Reply Mail Processing Centre 969 Eastern,100,100,100,100,100,100
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",70,70,70,70,70,70
"Cabbagetown, St. James Town",43,43,43,43,43,43
Central Bay Street,98,98,98,98,98,98
"Chinatown, Grange Park, Kensington Market",98,98,98,98,98,98
Christie,9,9,9,9,9,9
Church and Wellesley,82,82,82,82,82,82


We can see that for some neighborhoods we got a really small amount of venues. For example, Roselawn only has 1 venue listed, and The Beaches only 5.

Really small numbers of venues don't give us enough information to understand the characteristics of each neighborhoods in order to analyse their similarities with the other neighborhoods when clustering them.

So let's drop the neighborhoods with 10 or less venues listed.

step1: create a list of neighborhoods with 10 or less venues:

In [37]:
min_venues=10
temp_group=toronto_venues_pre.groupby('Neighborhood').count()
temp_group=temp_group[temp_group['Venue']<=min_venues]
toExclude=temp_group.index.values

step2: exclude those neighborhoods from the dataframe toronto_venues

In [38]:
toronto_venues=toronto_venues_pre[~toronto_venues_pre['Neighborhood'].isin(toExclude)].reset_index(drop=True)
#Let's check how many venues were returned for each neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,65,65,65,65,65,65
"Brockton, Exhibition Place, Parkdale Village",71,71,71,71,71,71
Business Reply Mail Processing Centre 969 Eastern,100,100,100,100,100,100
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",70,70,70,70,70,70
"Cabbagetown, St. James Town",43,43,43,43,43,43
Central Bay Street,98,98,98,98,98,98
"Chinatown, Grange Park, Kensington Market",98,98,98,98,98,98
Church and Wellesley,82,82,82,82,82,82
"Commerce Court, Victoria Hotel",100,100,100,100,100,100


Now we can see that we only have neighborhoods with more than 20 venues.

Let's see the shape of our dataframe:

In [39]:
toronto_venues.shape

(1692, 7)

So let's see how many unique categories we have:

In [40]:
#Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 200 uniques categories.


One hot encoding: Let's create a new dataframe toronto_onehot in wich we will use the get_dummies and transform each venue category into a column with 1 (true) or zero (false)

In [41]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']],prefix='')
# putting the neighborhood column back
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
#toronto_onehot.columns = toronto_onehot.columns.str.replace("_", "")
toronto_onehot.head()

Unnamed: 0,Neighborhood,_Afghan Restaurant,_American Restaurant,_Antique Shop,_Art Gallery,_Art Museum,_Arts & Crafts Store,_Asian Restaurant,_Athletics & Sports,_BBQ Joint,...,_Toy / Game Store,_Trail,_Train Station,_Vegetarian / Vegan Restaurant,_Video Game Store,_Vietnamese Restaurant,_Wine Bar,_Wine Shop,_Wings Joint,_Yoga Studio
0,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's check the shape of the new dataframe

In [42]:
# And let's examine the new dataframe size.
toronto_onehot.shape

(1692, 201)

And then let's group rows by neighborhood and take the mean of the frequency of each category.

Let's see the shape and the first lines.

In [43]:
#Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
print('shape: ',toronto_grouped.shape)
toronto_grouped.head()

shape:  (25, 201)


Unnamed: 0,Neighborhood,_Afghan Restaurant,_American Restaurant,_Antique Shop,_Art Gallery,_Art Museum,_Arts & Crafts Store,_Asian Restaurant,_Athletics & Sports,_BBQ Joint,...,_Toy / Game Store,_Trail,_Train Station,_Vegetarian / Vegan Restaurant,_Video Game Store,_Vietnamese Restaurant,_Wine Bar,_Wine Shop,_Wings Joint,_Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.03,0.0,0.01,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,...,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.014085,0.0,0.014085,0.0,0.0,0.0,...,0.0,0.0,0.0,0.028169,0.0,0.014085,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.03,0.0,0.01,0.0,0.01,0.02,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,...,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286


Let's put that into a panda dataframe.

But first, let's write a function to sort the venues in descending order.

In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [45]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
    # meu comentario: numpy.arange returns evenly spaced values within the interval 0 (as nothing diff was declared) and the num_top_venues
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",_Coffee Shop,_Café,_Hotel,_Japanese Restaurant,_American Restaurant,_Burger Joint,_Deli / Bodega,_Gastropub,_Restaurant,_Bar
1,Berczy Park,_Coffee Shop,_Cocktail Bar,_Seafood Restaurant,_Restaurant,_Beer Bar,_Steakhouse,_Bakery,_Farmers Market,_Lounge,_Hotel
2,"Brockton, Exhibition Place, Parkdale Village",_Coffee Shop,_Café,_Furniture / Home Store,_Sandwich Place,_Restaurant,_Poutine Place,_Bakery,_Beer Bar,_Supermarket,_Hotel
3,Business Reply Mail Processing Centre 969 Eastern,_Coffee Shop,_Bar,_Steakhouse,_Café,_Hotel,_Japanese Restaurant,_Italian Restaurant,_Sushi Restaurant,_Thai Restaurant,_American Restaurant
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",_Coffee Shop,_Italian Restaurant,_Gym / Fitness Center,_Café,_Restaurant,_Bar,_Speakeasy,_Park,_Sandwich Place,_Pub


Now we have an organized dataframe with information from the top 100 venues in each neighborhood in Toronto.

It's time to cluster the neighborhoods based in this information!!!


### iv) Clustering neighborhoods

We are clustering the neighborhoods in 5 different clusters.

In [46]:
# Run k-means to cluster the neighborhood into 5 clusters.

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 1, 0, 1, 1, 0, 1, 1])

In [47]:
#Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_toronto 

# merge toronto_grouped with df_toronto to add latitude/longitude for each neighborhood

toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# As we are only looking at the neighborhoods with more than 10 venues, we should drop the neighborhoods of Toronto with 10 or less venues.
toronto_merged.dropna(inplace=True)
# Making sure the columns with cluster number has type integer
toronto_merged[["Cluster Labels"]] = toronto_merged[["Cluster Labels"]].astype("int64")
toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65512,-79.36264,2,_Coffee Shop,_Gym / Fitness Center,_Breakfast Spot,_Yoga Studio,_Theater,_Pub,_Restaurant,_Electronics Store,_Event Space,_Mexican Restaurant
1,M5B,Downtown Toronto,"Ryerson, Garden District",43.657363,-79.37818,1,_Coffee Shop,_Clothing Store,_Cosmetics Shop,_Café,_Middle Eastern Restaurant,_Hotel,_Italian Restaurant,_Restaurant,_Plaza,_Japanese Restaurant
2,M5C,Downtown Toronto,St. James Town,43.65121,-79.375481,1,_Coffee Shop,_Restaurant,_Hotel,_Café,_Bakery,_Cocktail Bar,_Italian Restaurant,_Breakfast Spot,_Gastropub,_Clothing Store
4,M5E,Downtown Toronto,Berczy Park,43.64516,-79.373675,1,_Coffee Shop,_Cocktail Bar,_Seafood Restaurant,_Restaurant,_Beer Bar,_Steakhouse,_Bakery,_Farmers Market,_Lounge,_Hotel
5,M5G,Downtown Toronto,Central Bay Street,43.656091,-79.38493,1,_Coffee Shop,_Clothing Store,_Plaza,_Middle Eastern Restaurant,_Ice Cream Shop,_Tea Room,_Hotel,_Spa,_Bookstore,_Sandwich Place


In [48]:
# Finally, let's visualize the resulting clusters

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

As we can see at the map above, even after taking out of the neighborhoods with 10 or less venues, most of our neighborhoods were classified in the same cluster 0 (red), with the other clusters having only one neighborhood each.

Let's take a look at the most common venues of each cluster to understand better each location.

#### Cluster 1:

In [49]:
#Cluster 1¶
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"Brockton, Exhibition Place, Parkdale Village",0,_Coffee Shop,_Café,_Furniture / Home Store,_Sandwich Place,_Restaurant,_Poutine Place,_Bakery,_Beer Bar,_Supermarket,_Hotel
23,"The Annex, North Midtown, Yorkville",0,_Sandwich Place,_Pizza Place,_Coffee Shop,_Café,_Italian Restaurant,_Furniture / Home Store,_Jewish Restaurant,_French Restaurant,_Liquor Store,_Burger Joint
27,"Runnymede, Swansea",0,_Café,_Coffee Shop,_Bakery,_Pizza Place,_Flower Shop,_Park,_Shoe Store,_Restaurant,_Pub,_Greek Restaurant
29,"Chinatown, Grange Park, Kensington Market",0,_Café,_Vegetarian / Vegan Restaurant,_Bar,_Chinese Restaurant,_Mexican Restaurant,_Vietnamese Restaurant,_Dumpling Restaurant,_Ice Cream Shop,_Coffee Shop,_Bakery
31,"CN Tower, Bathurst Quay, Island airport, Harbo...",0,_Coffee Shop,_Italian Restaurant,_Gym / Fitness Center,_Café,_Restaurant,_Bar,_Speakeasy,_Park,_Sandwich Place,_Pub


#### Cluster 1:

In [50]:
#Cluster 2¶
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Ryerson, Garden District",1,_Coffee Shop,_Clothing Store,_Cosmetics Shop,_Café,_Middle Eastern Restaurant,_Hotel,_Italian Restaurant,_Restaurant,_Plaza,_Japanese Restaurant
2,St. James Town,1,_Coffee Shop,_Restaurant,_Hotel,_Café,_Bakery,_Cocktail Bar,_Italian Restaurant,_Breakfast Spot,_Gastropub,_Clothing Store
4,Berczy Park,1,_Coffee Shop,_Cocktail Bar,_Seafood Restaurant,_Restaurant,_Beer Bar,_Steakhouse,_Bakery,_Farmers Market,_Lounge,_Hotel
5,Central Bay Street,1,_Coffee Shop,_Clothing Store,_Plaza,_Middle Eastern Restaurant,_Ice Cream Shop,_Tea Room,_Hotel,_Spa,_Bookstore,_Sandwich Place
7,"Adelaide, King, Richmond",1,_Coffee Shop,_Café,_Hotel,_Japanese Restaurant,_American Restaurant,_Burger Joint,_Deli / Bodega,_Gastropub,_Restaurant,_Bar
10,"Little Portugal, Trinity",1,_Bar,_Coffee Shop,_Cocktail Bar,_Restaurant,_Asian Restaurant,_New American Restaurant,_Boutique,_Vietnamese Restaurant,_French Restaurant,_Bakery
12,"Design Exchange, Toronto Dominion Centre",1,_Coffee Shop,_Hotel,_Café,_Restaurant,_Deli / Bodega,_Bar,_Italian Restaurant,_Gastropub,_American Restaurant,_Japanese Restaurant
15,"Commerce Court, Victoria Hotel",1,_Coffee Shop,_Hotel,_Restaurant,_Café,_American Restaurant,_Gym,_Japanese Restaurant,_Steakhouse,_Italian Restaurant,_Deli / Bodega
16,Studio District,1,_Diner,_Italian Restaurant,_Bakery,_Pizza Place,_American Restaurant,_Café,_Arts & Crafts Store,_Sushi Restaurant,_Brewery,_Coffee Shop
24,"Parkdale, Roncesvalles",1,_Coffee Shop,_Eastern European Restaurant,_Bakery,_Thai Restaurant,_Bookstore,_Sushi Restaurant,_Breakfast Spot,_Food & Drink Shop,_Pizza Place,_Gift Shop


#### Cluster 2:

In [51]:
#Cluster 3¶
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Harbourfront, Regent Park",2,_Coffee Shop,_Gym / Fitness Center,_Breakfast Spot,_Yoga Studio,_Theater,_Pub,_Restaurant,_Electronics Store,_Event Space,_Mexican Restaurant


#### Cluster 3:

In [52]:
#Cluster 4¶
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,"The Beaches West, India Bazaar",3,_Park,_Gym,_Board Shop,_Pizza Place,_Pub,_Movie Theater,_Sandwich Place,_Fast Food Restaurant,_Fish & Chips Shop,_Burrito Place


#### Cluster 4:

In [53]:
#Cluster 5¶
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Dovercourt Village, Dufferin",4,_Park,_Pet Store,_Athletics & Sports,_Furniture / Home Store,_Brazilian Restaurant,_Bar,_Bank,_Bakery,_Middle Eastern Restaurant,_Café


Conclusions:

We can conclude the Toronto has a lot of cafes, coffe shops and restaurants, which are the most frequent venues in our biggest cluster: the zero, as described below:

Cluster 0: Cafes, coffe shops and restaurants are the more frequent spots

While the other areas have a more diverse scene

Cluster 1: Besides food spots, the area has some other conveniences like bank, theater amd gift shops

Cluster 2: Airport area, with the most frequent venues being the ones inside the airport

Cluster 3: Together with cafes, grocerie stores and parks are among the most common venues

Cluster 4: Together with coffe shops, pubs, sports bar and light rail stations are among the most common venues.

.