![alt text](https://cognitiveclass.ai/wp-content/themes/bdu3.0/static/images/cc-logo.png)

# Segmenting and Clustering Neighborhoods in Toronto

In [2]:
# Installing web-parsing libraries
!conda install -c conda-forge beautifulsoup4 --yes
!conda install -c conda-forge lxml --yes    # parser for html file

Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs:
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.7.1       |        py36_1001         140 KB  conda-forge
    conda-4.6.4                |           py36_0         877 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        1017 KB

The following packages will be UPDATED:

  beautifulsoup4      anaconda::beautifulsoup4-4.7.1-py36_1 --> conda-forge::beautifulsoup4-4.7.1-py36_1001
  conda                                        4.6.3-py36_0 --> 4.6.4-py36_0



Downloading and Extracting Packages
beautifulsoup4-4.7.1 | 140 KB    | ##################################### | 100% 
conda-4.6.4          | 877 KB    | ###########

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

In [2]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'lxml')

table = soup.find('table', class_='wikitable sortable')

In [3]:
table = soup.find('table', class_='wikitable sortable').text
len(table)

9385

In [4]:
table[0:300]

'\n\nPostcode\nBorough\nNeighbourhood\n\n\nM1A\nNot assigned\nNot assigned\n\n\nM2A\nNot assigned\nNot assigned\n\n\nM3A\nNorth York\nParkwoods\n\n\nM4A\nNorth York\nVictoria Village\n\n\nM5A\nDowntown Toronto\nHarbourfront\n\n\nM5A\nDowntown Toronto\nRegent Park\n\n\nM6A\nNorth York\nLawrence Heights\n\n\nM6A\nNorth York\nLawrence Manor\n\n\nM7A'

In [5]:
table[9000:-1]

'ssigned\nNot assigned\n\n\nM3Z\nNot assigned\nNot assigned\n\n\nM4Z\nNot assigned\nNot assigned\n\n\nM5Z\nNot assigned\nNot assigned\n\n\nM6Z\nNot assigned\nNot assigned\n\n\nM7Z\nNot assigned\nNot assigned\n\n\nM8Z\nEtobicoke\nKingsway Park South West\n\n\nM8Z\nEtobicoke\nMimico NW\n\n\nM8Z\nEtobicoke\nThe Queensway West\n\n\nM8Z\nEtobicoke\nRoyal York South West\n\n\nM8Z\nEtobicoke\nSouth of Bloor\n\n\nM9Z\nNot assigned\nNot assigned\n'

#### We found that after eliminating \n\n, we can split the string with \n to get the list of string

In [6]:
table = table.replace('\n\n','').split('\n')

#### Convert the list into a dataframe, -1 in reshape stands for unspecified number of row

In [7]:
Toronto_df = pd.DataFrame(np.array(table[3:]).reshape(-1,3),columns=['PostalCode','Borough','Neighborhood'])
Toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Filter out rows those Borough are Not assigned

In [8]:
Toronto_df = Toronto_df[Toronto_df['Borough']!='Not assigned'].reset_index(drop=True)
Toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


#### If Neighborhood is not assigned, it will be assigned the same name as borough

In [9]:
Toronto_df['Neighborhood'][Toronto_df['Neighborhood']=='Not assigned']=Toronto_df['Borough'][Toronto_df['Neighborhood']=='Not assigned']

#### Create new dataframe that neighborhoods with same postalcodes are in the same rows

In [73]:
# combine neighborhoods in a list for each postalcode
PostalCode_ = Toronto_df['PostalCode'].unique()
Borough_ = []
Neighborhood_ = []
for Postal in Toronto_df['PostalCode'].unique():
    Neigh_list = []
    Borough_list = []
    for ind in np.arange(Toronto_df.shape[0]):
        if Toronto_df.loc[ind,'PostalCode']==Postal:
            Neigh_list.append(Toronto_df['Neighborhood'][ind])
            Borough_list.append(Toronto_df['Borough'][ind])
    Neighborhood_.append(', '.join(Neigh_list))
    Borough_.append(Borough_list[0])

Toronto_new_df = pd.DataFrame({'PostalCode':PostalCode_, 'Borough':Borough_, 'Neighborhood':Neighborhood_})
Toronto_new_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


## Show number of rows of my dataframe

In [11]:
Toronto_new_df.shape[0]

103

## Get coordinates (latitude,longitude) of each postalcode

In [18]:
!conda install -c conda-forge geocoder --yes

Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs:
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geocoder-1.38.1            |             py_0          52 KB  conda-forge
    orderedset-2.0             |           py36_0         231 KB  conda-forge
    ratelim-0.1.6              |           py36_0           5 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         288 KB

The following NEW packages will be INSTALLED:

  geocoder           conda-forge/noarch::geocoder-1.38.1-py_0
  orderedset         conda-forge/linux-64::orderedset-2.0-py36_0
  ratelim            conda-forge/linux-64::ratelim-0.1.6-py36_0



Downloading and Extracting Packages
orderedset-2.0       | 231 KB    | ########

In [20]:
import geocoder # import geocoder


latitude = []
longitude = []

# loop until you get the coordinates
for Postal in Toronto_new_df['PostalCode']:
    lat_lng_coords = None         # initialize your variable to None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(Postal))
        lat_lng_coords = g.latlng

    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])

# This doesn't work, and takes too much time, so we directly use the data from the csv file

KeyboardInterrupt: 

In [21]:
!wget -q -O 'Geospatial_data.csv' https://cocl.us/Geospatial_data
print('Data downloaded!')

Data downloaded!


In [74]:
Geospatial_data = pd.read_csv('Geospatial_data.csv')
Geospatial_data.set_index('Postal Code', inplace=True)

latitude = []
longitude = []

# loop until you get the coordinates   
[latitude.append(Geospatial_data.loc[Postal,'Latitude']) for Postal in Toronto_new_df['PostalCode']]
[longitude.append(Geospatial_data.loc[Postal,'Longitude']) for Postal in Toronto_new_df['PostalCode']]

Toronto_new_df['Latitude']=latitude
Toronto_new_df['Longitude']=longitude
Toronto_new_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


In [55]:
import json # library to handle JSON files

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#### Define Foursquare Credentials and Version

In [56]:
CLIENT_ID = 'W5MC3D2YN3TRKIYR3DYNSETNTWM3NB54OXFRMCBJ2PHSV1F1' # your Foursquare ID
CLIENT_SECRET = 'DEXOA5GGMPUONILBNDMNXZEKF4AMWOX5YA4ZSPJRSUQ4BHW1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: W5MC3D2YN3TRKIYR3DYNSETNTWM3NB54OXFRMCBJ2PHSV1F1
CLIENT_SECRET:DEXOA5GGMPUONILBNDMNXZEKF4AMWOX5YA4ZSPJRSUQ4BHW1


From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [57]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Explore Neighborhoods in Toronto

#### Let's create a function to repeat the process exploring each neighborhood in Toronto

In [59]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *Toronto_venues*.

In [76]:
Toronto_venues = getNearbyVenues(names=Toronto_new_df['Neighborhood'],
                                   latitudes=Toronto_new_df['Latitude'],
                                   longitudes=Toronto_new_df['Longitude']
                                  )

Parkwoods
Victoria Village
Harbourfront, Regent Park
Lawrence Heights, Lawrence Manor
Queen's Park
Islington Avenue
Rouge, Malvern
Don Mills North
Woodbine Gardens, Parkview Hill
Ryerson, Garden District
Glencairn
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Highland Creek, Rouge Hill, Port Union
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
East Birchmount Park, Ionview, Kennedy Park
Bayview Village
CFB Toronto, Downsview East
The D

#### Let's find out how many unique categories can be curated from all the returned venues

In [77]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 279 uniques categories.


In [78]:
Toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


## Analyze Each Neighborhood

In [80]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhoods'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Neighborhoods,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [82]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhoods').mean().reset_index()
Toronto_grouped.head()

Unnamed: 0,Neighborhoods,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [83]:
Toronto_grouped.shape

(101, 280)

#### Let's build function that returns any number of the top most common venues

In [84]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [108]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhoods']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Thai Restaurant,American Restaurant,Clothing Store,Hotel,Bakery,Bar,Gym
1,Agincourt,Breakfast Spot,Lounge,Skating Rink,Clothing Store,Yoga Studio,Ethiopian Restaurant,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Playground,Park,Yoga Studio,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pizza Place,Fried Chicken Joint,Sandwich Place,Coffee Shop,Beer Store,Liquor Store,Pharmacy,Fast Food Restaurant,Ethiopian Restaurant
4,"Alderwood, Long Branch",Pizza Place,Pool,Pharmacy,Gym,Sandwich Place,Coffee Shop,Skating Rink,Pub,Drugstore,Discount Store


## Cluster Neighborhoods

In [86]:
from sklearn.cluster import KMeans

Run *k*-means to cluster the neighborhood into 5 clusters.

In [109]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhoods', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 2, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [110]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = Toronto_new_df

# merge manhattan_grouped with manhaattan_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Fast Food Restaurant,Food & Drink Shop,Park,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Intersection,Hockey Arena,Coffee Shop,Portuguese Restaurant,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0.0,Coffee Shop,Bakery,Park,Pub,Café,Theater,Restaurant,Mexican Restaurant,Breakfast Spot,Electronics Store
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,0.0,Clothing Store,Accessories Store,Event Space,Boutique,Arts & Crafts Store,Fraternity House,Coffee Shop,Miscellaneous Shop,Vietnamese Restaurant,Women's Store
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Diner,Gym,Portuguese Restaurant,Bar,Mexican Restaurant,Italian Restaurant,Café


Finally, let's visualize the resulting clusters

In [118]:
# create map
latitude = 43.72
longitude = -79.38
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood'], np.nan_to_num(Toronto_merged['Cluster Labels'])):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

We can see there're many neighborhoods belong to cluster 0. Let's check out what might each cluster represent for.

## Examine Clusters

#### Cluster 0: neighborhoods with various kinds of services

In [120]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[2] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,0.0,Intersection,Hockey Arena,Coffee Shop,Portuguese Restaurant,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
2,"Harbourfront, Regent Park",0.0,Coffee Shop,Bakery,Park,Pub,Café,Theater,Restaurant,Mexican Restaurant,Breakfast Spot,Electronics Store
3,"Lawrence Heights, Lawrence Manor",0.0,Clothing Store,Accessories Store,Event Space,Boutique,Arts & Crafts Store,Fraternity House,Coffee Shop,Miscellaneous Shop,Vietnamese Restaurant,Women's Store
4,Queen's Park,0.0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Diner,Gym,Portuguese Restaurant,Bar,Mexican Restaurant,Italian Restaurant,Café
7,Don Mills North,0.0,Japanese Restaurant,Caribbean Restaurant,Gym / Fitness Center,Pool,Café,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop
9,"Ryerson, Garden District",0.0,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Middle Eastern Restaurant,Japanese Restaurant,Italian Restaurant,Pizza Place,Plaza,Tea Room
12,"Highland Creek, Rouge Hill, Port Union",0.0,History Museum,Bar,Yoga Studio,Ethiopian Restaurant,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space
13,"Flemingdon Park, Don Mills South",0.0,Asian Restaurant,Gym,Beer Store,Coffee Shop,Italian Restaurant,Shopping Mall,Bike Shop,Sandwich Place,Sporting Goods Shop,Restaurant
14,Woodbine Heights,0.0,Skating Rink,Pharmacy,Curling Ice,Asian Restaurant,Park,Video Store,Beer Store,Cosmetics Shop,Donut Shop,Drugstore
15,St. James Town,0.0,Coffee Shop,Restaurant,Hotel,Café,Clothing Store,Park,Cocktail Bar,Cosmetics Shop,Bakery,Italian Restaurant


#### Cluster 1: neighborhood with various knids of restaurant and many yoga studio and drugstore

In [121]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[2] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,"Silver Hills, York Mills",1.0,Cafeteria,Yoga Studio,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant


#### Cluster 2: neighborhoods with many parks, donut shop and drug store

In [122]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[2] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Humewood-Cedarvale,2.0,Field,Hockey Arena,Park,Trail,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
21,Caledonia-Fairbanks,2.0,Park,Market,Fast Food Restaurant,Pharmacy,Women's Store,Gym Pool,Electronics Store,Dog Run,Doner Restaurant,Donut Shop
35,East Toronto,2.0,Park,Intersection,Convenience Store,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
40,"CFB Toronto, Downsview East",2.0,Airport,Bus Stop,Park,Electronics Store,Yoga Studio,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
49,"Maple Leaf Park, North Park, Upwood Park",2.0,Construction & Landscaping,Park,Bakery,Basketball Court,Yoga Studio,Ethiopian Restaurant,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
52,"Newtonbrook, Willowdale",2.0,Piano Bar,Park,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
61,Lawrence Park,2.0,Lake,Park,Dim Sum Restaurant,Swim School,Bus Line,Yoga Studio,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
66,York Mills West,2.0,Park,Bank,Yoga Studio,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
68,"Forest Hill North, Forest Hill West",2.0,Trail,Jewelry Store,Park,Sushi Restaurant,Bus Line,Yoga Studio,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
77,"Kingsview Village, Martin Grove Gardens, Richv...",2.0,Mobile Phone Shop,Park,Bus Line,Yoga Studio,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant


#### Cluster 3: neighborhoods with many fast food restaurants and pizza places

In [123]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[2] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,3.0,Fast Food Restaurant,Food & Drink Shop,Park,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
6,"Rouge, Malvern",3.0,Fast Food Restaurant,Yoga Studio,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
8,"Woodbine Gardens, Parkview Hill",3.0,Fast Food Restaurant,Pizza Place,Pet Store,Pharmacy,Gym / Fitness Center,Rock Climbing Spot,Café,Breakfast Spot,Intersection,Athletics & Sports
10,Glencairn,3.0,Pizza Place,Japanese Restaurant,Pub,Park,Sushi Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
11,"Cloverdale, Islington, Martin Grove, Princess ...",3.0,Print Shop,Bank,Dog Run,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Yoga Studio
50,Humber Summit,3.0,Pizza Place,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Discount Store
82,"Clarks Corners, Sullivan, Tam O'Shanter",3.0,Pizza Place,Pharmacy,Noodle House,Fried Chicken Joint,Italian Restaurant,Thai Restaurant,Chinese Restaurant,Fast Food Restaurant,Coworking Space,Creperie


#### Cluster 4: neighborhoods close to baseball field, otherwise similar with the one in cluster 1

In [124]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[2] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,"Emery, Humberlea",4.0,Baseball Field,Construction & Landscaping,Yoga Studio,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
101,"Humber Bay, King's Mill Park, Kingsway Park So...",4.0,Baseball Field,Yoga Studio,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Dog Run
