# IBM Applied Data Science Capstone Notebook
This notebook will be mainly used for the capstone project.

In [1]:
import pandas as pd
import numpy as np

In [2]:
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


## Scrape wikipedia page that contains geographical info of Toronto
Import liberies for scraping wikipedia page.

In [3]:
import requests
from bs4 import BeautifulSoup

Scrape the Wikipedia page for the Toronto neighborhood data.

In [4]:
wiki = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url,'lxml')

Store the table from Wikipedia page into 3 lists: Postcode, Borough, Neighbourhood.

In [5]:
Postcode = []
Borough = []
Neighbourhood = []
for items in soup.find('table', class_='sortable').find_all('tr')[1::1]:
    data = items.find_all(['td'])
    try:
        Postcode.append(data[0].text)
        Borough.append(data[1].text)
        Neighbourhood.append(data[2].text.strip())
    except IndexError:pass

Merge the lists into a pandas dataframe, remove all the data with Borough value is "Not assigned" and then show the first 5 rows of the dataframe.

In [6]:
df = pd.DataFrame({'Postal Code' : Postcode,
                    'Borough' : Borough,
                    'Neighbourhood' : Neighbourhood})
df = df[df['Borough'] !='Not assigned'].reset_index(drop=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


Goup the records by Postcode and Borough, join the Neighbourhood with same Postcode, show the number of records in the new dataframe.

In [7]:
aggregation_functions = { 'Borough': 'first', 'Neighbourhood': ', '.join}
df_new = df.groupby(df['Postal Code']).aggregate(aggregation_functions).reset_index()
df_new.loc[df_new['Neighbourhood'] =='Not assigned','Neighbourhood'] = df_new['Borough']
df_new.shape

(103, 3)

## Create DataFrame with coordinate of each neighborhood in Toronto
Download csv file that contains geographical coordinates of each postal code.

In [8]:
df_geo = pd.read_csv('http://cocl.us/Geospatial_data')
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the latitude and longitude information with original dataframe.

In [9]:
df_neighbour = pd.merge(df_new, df_geo, how='inner', on = 'Postal Code')
df_neighbour.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Explore and cluster the neighborhoods in Toronto
install folium and geopy library

In [10]:
#!pip install folium
#!pip install geopy

Import folium and geopy libraries.

In [11]:
import folium # map rendering library
from geopy.geocoders import Nominatim
print('Libraries imported.')

Libraries imported.


Find boroughs that contain the word Toronto.

In [12]:
toronto_data = df_neighbour[df_neighbour['Borough'].str.contains('Toronto')].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


Use geopy library to get the latitude and longitude values of Toronto

In [13]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Create a map of Toronto with neighborhoods superimposed on top.

In [14]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighbourhood'], toronto_data['Postal Code']):
    label = '{}, {}, {}'.format(postcode, neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Define Foursquare Credentials and Version.

In [15]:
CLIENT_ID = 'BAHTLSRWTZBXVZDJ1BCYG0QGLCTMFET1GIYW40FEZDXKM15R' # your Foursquare ID
CLIENT_SECRET = 'Q0WDT2JIJ4ACQK4SGRCKILHEQG3HG35AU5OR12Q5EEFEEDTH' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BAHTLSRWTZBXVZDJ1BCYG0QGLCTMFET1GIYW40FEZDXKM15R
CLIENT_SECRET:Q0WDT2JIJ4ACQK4SGRCKILHEQG3HG35AU5OR12Q5EEFEEDTH


Create a function to process all the neighborhoods in Toronto.

In [67]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=10):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [68]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

Check the size of the resulting dataframe.

In [69]:
print(toronto_venues.shape)
toronto_venues.head()

(342, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Beaches,43.676357,-79.293031,Dip 'n Sip,43.678897,-79.297745,Coffee Shop


Check how many venus were returned for each neighborhood.

In [70]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",10,10,10,10,10,10
Berczy Park,10,10,10,10,10,10
"Brockton, Exhibition Place, Parkdale Village",10,10,10,10,10,10
Business Reply Mail Processing Centre 969 Eastern,10,10,10,10,10,10
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",10,10,10,10,10,10
"Cabbagetown, St. James Town",10,10,10,10,10,10
Central Bay Street,10,10,10,10,10,10
"Chinatown, Grange Park, Kensington Market",10,10,10,10,10,10
Christie,10,10,10,10,10,10
Church and Wellesley,10,10,10,10,10,10


Find out how many unique categories can be curated from all the returned venues.

In [71]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 116 uniques categories.


### Analyze Each Neighborhood
Use one hot encoding to transforme all the catagorical data into numerical numbers.

In [72]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Workshop,...,Sushi Restaurant,Swim School,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Get the shape of the new one hot dataframe.

In [73]:
toronto_onehot.shape

(342, 116)

Group rows by neighborhood and by taking the mean of the frequency of occerence  of each catagory.

In [74]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,...,Sushi Restaurant,Swim School,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.1,0.1,0.1,0.2,0.2,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Confirm the new size.

In [75]:
toronto_grouped.shape

(38, 116)

Print each neighborhood along with the top 3 most common venues

In [86]:
num_top_venues = 3

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
              venue  freq
0  Greek Restaurant   0.1
1  Asian Restaurant   0.1
2             Plaza   0.1


----Berczy Park----
             venue  freq
0     Concert Hall   0.1
1  Thai Restaurant   0.1
2     Liquor Store   0.1


----Brockton, Exhibition Place, Parkdale Village----
                venue  freq
0         Coffee Shop   0.2
1                 Gym   0.1
2  Italian Restaurant   0.1


----Business Reply Mail Processing Centre 969 Eastern----
                  venue  freq
0               Brewery   0.1
1            Restaurant   0.1
2  Fast Food Restaurant   0.1


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0    Airport Lounge   0.2
1  Airport Terminal   0.2
2             Plane   0.1


----Cabbagetown, St. James Town----
                   venue  freq
0                   Café   0.2
1  General Entertainment   0.1
2      Indian Restaurant   0.1


----Centr

Create the new dataframe and display the top 3 venues for each neighborhood.

In [87]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [88]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"Adelaide, King, Richmond",Asian Restaurant,Concert Hall,Seafood Restaurant
1,Berczy Park,French Restaurant,Steakhouse,Cocktail Bar
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Pet Store,Bar
3,Business Reply Mail Processing Centre 969 Eastern,Garden Center,Auto Workshop,Comic Shop
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Coffee Shop


### Clustering neighborhood
import KMeans library

In [28]:
from sklearn.cluster import KMeans

#### Run *k*-means to cluster the neighborhood into 5 clusters.

In [133]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 0, 0, 0, 2, 0, 2, 2, 3])

In [134]:
neighborhoods_venues_sorted = neighborhoods_venues_sorted.drop('Cluster Labels', 1)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [135]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Coffee Shop,Trail,Pub
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,3,Greek Restaurant,Ice Cream Shop,Yoga Studio
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Ice Cream Shop,Liquor Store,Burger Joint
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Bookstore,Fish Market,Comfort Food Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,3,Park,Bus Line,Swim School
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,3,Clothing Store,Gym,Dance Studio
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Yoga Studio,Sporting Goods Shop,Coffee Shop
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,2,Dessert Shop,Indian Restaurant,Sushi Restaurant
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,1,Playground,Gym,Restaurant
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,0,Coffee Shop,American Restaurant,Restaurant


Import libraries for data visualization

In [63]:
import matplotlib.cm as cm
import matplotlib.colors as colors

Create a map with clusters in neighborhood of Toronto.

In [136]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Cluster 1

In [137]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,The Beaches,0,Coffee Shop,Trail,Pub
2,"The Beaches West, India Bazaar",0,Ice Cream Shop,Liquor Store,Burger Joint
3,Studio District,0,Bookstore,Fish Market,Comfort Food Restaurant
6,North Toronto West,0,Yoga Studio,Sporting Goods Shop,Coffee Shop
9,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",0,Coffee Shop,American Restaurant,Restaurant
13,"Harbourfront, Regent Park",0,Breakfast Spot,Pub,Restaurant
15,St. James Town,0,Coffee Shop,Gastropub,Middle Eastern Restaurant
17,Central Bay Street,0,Coffee Shop,Gastropub,Tea Room
21,"Commerce Court, Victoria Hotel",0,Café,Gastropub,American Restaurant
27,"CN Tower, Bathurst Quay, Island airport, Harbo...",0,Airport Lounge,Airport Terminal,Coffee Shop


Cluster 2

In [138]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
8,"Moore Park, Summerhill East",1,Playground,Gym,Restaurant
10,Rosedale,1,Park,Trail,Playground
23,"Forest Hill North, Forest Hill West",1,Trail,Sushi Restaurant,Jewelry Store


Cluster 3

In [139]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
7,Davisville,2,Dessert Shop,Indian Restaurant,Sushi Restaurant
11,"Cabbagetown, St. James Town",2,Café,Diner,Japanese Restaurant
14,"Ryerson, Garden District",2,Clothing Store,Tea Room,Diner
19,"Harbourfront East, Toronto Islands, Union Station",2,Performing Arts Venue,Park,Salad Place
20,"Design Exchange, Toronto Dominion Centre",2,Café,Gastropub,Gym
24,"The Annex, North Midtown, Yorkville",2,Café,BBQ Joint,Vegetarian / Vegan Restaurant
25,"Harbord, University of Toronto",2,Bookstore,Italian Restaurant,College Gym
26,"Chinatown, Grange Park, Kensington Market",2,Café,Cocktail Bar,Arts & Crafts Store
30,Christie,2,Café,Grocery Store,Italian Restaurant
31,"Dovercourt Village, Dufferin",2,Bakery,Gym / Fitness Center,Music Venue


Cluster 4

In [140]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
1,"The Danforth West, Riverdale",3,Greek Restaurant,Ice Cream Shop,Yoga Studio
4,Lawrence Park,3,Park,Bus Line,Swim School
5,Davisville North,3,Clothing Store,Gym,Dance Studio
12,Church and Wellesley,3,Salon / Barbershop,Bubble Tea Shop,Breakfast Spot
16,Berczy Park,3,French Restaurant,Steakhouse,Cocktail Bar
18,"Adelaide, King, Richmond",3,Asian Restaurant,Concert Hall,Seafood Restaurant
28,Stn A PO Boxes 25 The Esplanade,3,Cocktail Bar,French Restaurant,Tea Room
32,"Little Portugal, Trinity",3,Wine Bar,Brewery,Ice Cream Shop
34,"High Park, The Junction South",3,Gastropub,Arts & Crafts Store,Mexican Restaurant


Cluster 5

In [141]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
22,Roselawn,4,Garden,Cocktail Bar,College Gym
