# Exploring Torontos's neighborhoods - Complete Notebook

This is the complete notebook for "Segmenting and Clustering Neighborhoods in Toronto" task from IBM Data Science Certificate course.
It includes two previous tasks already posted on Github to be revised.

### First task: getting data about Toronto's neighborhoods on Wikipedia

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import numpy as np

In [2]:
# create html variable and an empty dataframe
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
toronto_hoods = pd.DataFrame(columns=['PostalCode', 'Borough', 'Neighborhood'])

In [3]:
# open html using BeautifulSoup
source = requests.get(url).text
soup = BeautifulSoup(source, 'html.parser')

In [4]:
# table separeted from the webpage using soup atributes
table = soup.table
table_rows = table.tbody.text

# create a list with each row of table as a list element
toronto_list = table_rows.split('\n\n\n')

# drop of the table head - the first element of the list
toronto_list.pop(0)
print(toronto_list)

['M1A\nNot assigned\nNot assigned', 'M2A\nNot assigned\nNot assigned', 'M3A\nNorth York\nParkwoods', 'M4A\nNorth York\nVictoria Village', 'M5A\nDowntown Toronto\nHarbourfront', 'M6A\nNorth York\nLawrence Heights', 'M6A\nNorth York\nLawrence Manor', "M7A\nDowntown Toronto\nQueen's Park", 'M8A\nNot assigned\nNot assigned', "M9A\nQueen's Park\nNot assigned", 'M1B\nScarborough\nRouge', 'M1B\nScarborough\nMalvern', 'M2B\nNot assigned\nNot assigned', 'M3B\nNorth York\nDon Mills North', 'M4B\nEast York\nWoodbine Gardens', 'M4B\nEast York\nParkview Hill', 'M5B\nDowntown Toronto\nRyerson', 'M5B\nDowntown Toronto\nGarden District', 'M6B\nNorth York\nGlencairn', 'M7B\nNot assigned\nNot assigned', 'M8B\nNot assigned\nNot assigned', 'M9B\nEtobicoke\nCloverdale', 'M9B\nEtobicoke\nIslington', 'M9B\nEtobicoke\nMartin Grove', 'M9B\nEtobicoke\nPrincess Gardens', 'M9B\nEtobicoke\nWest Deane Park', 'M1C\nScarborough\nHighland Creek', 'M1C\nScarborough\nRouge Hill', 'M1C\nScarborough\nPort Union', 'M2C\nNo

In [5]:
# separate each postalcode, borough, neighborhood from the list to a separate one

postalcode = []
borough = []
neighborhood = []
    
for i in toronto_list:
    row = i
    row = row.split('\n')
    if row[1] != 'Not assigned':
        postalcode.append(row[0])
        borough.append(row[1])
        neighborhood.append(row[2]) 
        
    else:
        pass

In [6]:
#alocate each lists values into the dataframe

for i in range(len(postalcode)):
    toronto_hoods = toronto_hoods.append({'PostalCode' : postalcode[i] , 'Borough' : borough[i], 'Neighborhood' : neighborhood[i]}, ignore_index=True)

toronto_hoods.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Downtown Toronto,Queen's Park
6,M9A,Queen's Park,Not assigned
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


In [7]:
# check cases of missing neighborhoods
toronto_hoods.replace('Not assigned', np.nan, inplace=True)
toronto_hoods.isnull().sum()

PostalCode      0
Borough         0
Neighborhood    1
dtype: int64

In [8]:
# replace missing neighborhoods
toronto_hoods.loc[6].replace(np.nan, 'Queen\'s Park', inplace=True)

In [9]:
print(type(toronto_hoods))

<class 'pandas.core.frame.DataFrame'>


In [10]:
# aggregate neighborhoods according to postalcode
strJoin = lambda x:", ".join(x.astype(str))
toronto_hoods = toronto_hoods.groupby(['PostalCode', 'Borough'], as_index=False, sort=False).agg({'Neighborhood':strJoin})

In [11]:
toronto_hoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


In [12]:
toronto_hoods.shape

(103, 3)

## Second task: getting Toronto's neighborhoods geocoordenates

In [13]:
# use the csv file avaliable by Coursera to get geocoordenates. Unable to use the code.

coordenates = 'https://cocl.us/Geospatial_data'
latlong = pd.read_csv(coordenates)
latlong.rename(columns = {'Postal Code':'PostalCode'}, inplace = True)
latlong.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
# merge the two dataframes into one
toronto_final = pd.merge(toronto_hoods, latlong, sort=True)

In [15]:
toronto_final.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
toronto_final.shape

(103, 5)

## Third task - Exploring and clustering Toronto's neighborhoods

Now that the primary data are ready, I need to get some information about each neighborhood. I will focus only in Downtown's neighborhoods and I will use Foursquare to get this other data.

In [17]:
#!pip install geocoder
#!pip install folium
import geocoder
import folium

In [18]:
# create a new dataframe with Downtown's neighborhoods only
downtown = toronto_final[toronto_final['Borough'] == "Downtown Toronto"].reset_index(drop=True)
downtown.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


In [19]:
downtown.shape

(19, 5)

In [20]:
# get geocoordenates from Downtown Toronto
url = 'https://maps.googleapis.com/maps/api/geocode/json'
g = geocoder.arcgis('Downtown Toronto, CA')
g.latlng

[43.65011000000004, -79.38289999999995]

In [21]:
# create Downtown Toronto map
downtownmap = folium.Map(location=g.latlng, zoom_start=13)
downtownmap

In [22]:
# add markers for each neighborhood on Downtown Toronto

for lat, lng, label in zip(downtown['Latitude'],downtown['Longitude'], downtown['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(downtownmap)

downtownmap

Getting Foursquare data about Toronto Neighborhoods:

In [23]:
#lembrar de apagar antes de postar
CLIENT_ID =  # your Foursquare ID
CLIENT_SECRET = # your Foursquare Secret
VERSION = # Foursquare API version

In [24]:
# create a function that retrieve venues for each neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
            'Neighborhood Latitude',
            'Neighborhood Longitude',
            'Venue',
            'Venue Latitude',
            'Venue Longitude',
            'Venue Category']
    
    return(nearby_venues)

In [25]:
# apply the function into data
downtown_venues = getNearbyVenues(names=downtown['Neighborhood'], latitudes=downtown['Latitude'], longitudes=downtown['Longitude'])

Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Queen's Park


In [26]:
print(downtown_venues.shape)
downtown_venues.head(10)

(1312, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
1,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
2,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
3,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
4,"Cabbagetown, St. James Town",43.667967,-79.367675,Cranberries,43.667843,-79.369407,Diner
5,"Cabbagetown, St. James Town",43.667967,-79.367675,F'Amelia,43.667536,-79.368613,Italian Restaurant
6,"Cabbagetown, St. James Town",43.667967,-79.367675,Kingyo Toronto,43.665895,-79.368415,Japanese Restaurant
7,"Cabbagetown, St. James Town",43.667967,-79.367675,Butter Chicken Factory,43.667072,-79.369184,Indian Restaurant
8,"Cabbagetown, St. James Town",43.667967,-79.367675,Murgatroid,43.667381,-79.369311,Restaurant
9,"Cabbagetown, St. James Town",43.667967,-79.367675,Merryberry Cafe + Bistro,43.66663,-79.368792,Café


In [27]:
# check how many venues categories were retrieved
print(len(downtown_venues['Venue Category'].unique()))

205


In [28]:
#one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']],prefix="", prefix_sep="")

In [29]:
# check the columns names
downtown_onehot.columns.values

array(['Afghan Restaurant', 'Airport', 'Airport Food Court',
       'Airport Gate', 'Airport Lounge', 'Airport Service',
       'Airport Terminal', 'American Restaurant', 'Antique Shop',
       'Aquarium', 'Art Gallery', 'Arts & Crafts Store',
       'Asian Restaurant', 'Athletics & Sports', 'BBQ Joint',
       'Baby Store', 'Bagel Shop', 'Bakery', 'Bank', 'Bar',
       'Baseball Stadium', 'Basketball Stadium', 'Beach',
       'Bed & Breakfast', 'Beer Bar', 'Beer Store', 'Belgian Restaurant',
       'Bistro', 'Boat or Ferry', 'Bookstore', 'Boutique',
       'Brazilian Restaurant', 'Breakfast Spot', 'Brewery',
       'Bubble Tea Shop', 'Building', 'Burger Joint', 'Burrito Place',
       'Butcher', 'Café', 'Camera Store', 'Candy Store',
       'Caribbean Restaurant', 'Cheese Shop', 'Chinese Restaurant',
       'Chocolate Shop', 'Church', 'Clothing Store', 'Cocktail Bar',
       'Coffee Shop', 'College Arts Building', 'College Auditorium',
       'College Gym', 'College Rec Center', 'Colo

As you can see, there is a category called 'Neighborhhod' in one hot dataframe. I've treated them as a noise, and after checking how many venues were part of this category (only two rows), I've drop of the entirely column.

In [30]:
# check how many venues are in this category
downtown_onehot.loc[downtown_onehot['Neighborhood'] == 1]

Unnamed: 0,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
536,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
622,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
# drop of this column from one hot dataframe
downtown_onehot.drop(['Neighborhood'], axis=1, inplace=True)

In [32]:
downtown_onehot.shape

(1312, 204)

In [33]:
# add neighborhood column back to dataframe. 
downtown_onehot['Neighborhood'] = downtown_venues['Neighborhood']

# move neighborhoods column to the first column
fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])

downtown_onehot = downtown_onehot[fixed_columns]
downtown_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Rosedale,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
4,"Cabbagetown, St. James Town",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
downtown_onehot.shape

(1312, 205)

In [35]:
# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_final = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_final.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,...,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.011905


In [36]:
downtown_final.shape

(19, 205)

In [37]:
# write a function to sort the venues in descending order in order to get the most common ones bellow

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [38]:
#get the top 5 venues for each neighborhood

num_top_venues = 5
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']

for ind in np.arange(num_top_venues):
    
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_final['Neighborhood']

for ind in np.arange(downtown_final.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_final.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Bar,Café,Steakhouse,Cosmetics Shop
1,Berczy Park,Coffee Shop,Cocktail Bar,Café,Farmers Market,Steakhouse
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Service,Airport Terminal,Boat or Ferry,Sculpture Garden
3,"Cabbagetown, St. James Town",Restaurant,Coffee Shop,Bakery,Italian Restaurant,Café
4,Central Bay Street,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Juice Bar


# Clustering Downtown Toronto neighborhoods

In [39]:
from sklearn.cluster import KMeans

In [40]:
#select dataset
downtown_clustering = downtown_final.drop('Neighborhood', 1)

#fit with 5 clusters
kmeans = KMeans(n_clusters=5, random_state=0).fit(downtown_clustering)

#check the labels
kmeans.labels_[0:10]

array([2, 2, 3, 2, 2, 2, 4, 2, 2, 2], dtype=int32)

Now, I need a new dataframe that includes the clusters as well as the top 5 venues for each neighborhood...

In [41]:
# add clustering labels into neighborhood's venues dataframe
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge first Downtown Toronto dataframe with the venues dataframe to add latitude/longitude for each neighborhood
downtown_merged = downtown.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
downtown_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Park,Playground,Trail,Deli / Bodega,Eastern European Restaurant
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,2,Restaurant,Coffee Shop,Bakery,Italian Restaurant,Café
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,2,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar
3,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,2,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant


... to create a map with clusters assigned in it

In [42]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=g.latlng, zoom_start=13)

# set color scheme for the clusters
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []

for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
    [lat, lon],
    radius=5,
    popup=label,
    color=rainbow[cluster-1],
    fill=True,
    fill_color=rainbow[cluster-1],
    fill_opacity=0.7).add_to(map_clusters)

map_clusters

Now I can analyse each cluster to see the neighborhoods and what kind of places (venues) they have. 

### Cluster 1 - Drink a lot of coffee

This cluster has a lot of coffee shops and parks to relax

In [43]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Harbourfront,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot
18,Queen's Park,0,Coffee Shop,Park,Gym,Nightclub,Sandwich Place


### Cluster 2 - Relax in the park

This one also have a lot of parks, but also have a lot of playgrounds and some european restaurants

In [44]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Rosedale,1,Park,Playground,Trail,Deli / Bodega,Eastern European Restaurant


### Cluster 3 - Don't starve, never. And also drink a lot of coffee

This cluster, the major one, is the king of restaurants and coffee shops. You will never stay hungry for a lot of time if you're in one of these neighborhoods.

In [45]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,"Cabbagetown, St. James Town",2,Restaurant,Coffee Shop,Bakery,Italian Restaurant,Café
2,Church and Wellesley,2,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar
4,"Ryerson, Garden District",2,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant
5,St. James Town,2,Coffee Shop,Café,Restaurant,Cosmetics Shop,Italian Restaurant
6,Berczy Park,2,Coffee Shop,Cocktail Bar,Café,Farmers Market,Steakhouse
7,Central Bay Street,2,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Juice Bar
8,"Adelaide, King, Richmond",2,Coffee Shop,Bar,Café,Steakhouse,Cosmetics Shop
9,"Harbourfront East, Toronto Islands, Union Station",2,Coffee Shop,Aquarium,Café,Hotel,Scenic Lookout
10,"Design Exchange, Toronto Dominion Centre",2,Coffee Shop,Café,Hotel,Bar,Restaurant
11,"Commerce Court, Victoria Hotel",2,Coffee Shop,Café,Hotel,Restaurant,Steakhouse


### Cluster 4 - Travel to wherever you want

This is the right place to get a flight to Hawai or Caribe.

In [46]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
14,"CN Tower, Bathurst Quay, Island airport, Harbo...",3,Airport Lounge,Airport Service,Airport Terminal,Boat or Ferry,Sculpture Garden


### Cluster 5 - Need buy some fresh food?

In this last cluster, some common venues repeat themselves from the others, but the second most common venue is grocery store, witch I believe is a good thing if you are a person who likes cooking.

In [47]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
17,Christie,4,Café,Grocery Store,Park,Gas Station,Restaurant
