## Part 3 : Segmentation & Clustering of Neighborhoods in Toronto 
In the previous notebook, we have already obtained the Toronto Neighborhoods dataframe in a form we needed for analysis

Below code obtains the same dataframe without going into much details as it has already been discussed in the previous notebook

In [1]:
# import necessary libraries
import pandas as pd
import numpy as np
import requests

In [5]:
# Toronto neighborhoods data frame
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header = 0)[0]
df.head()

Unnamed: 0,M1ANot assigned,M2ANot assigned,M3ANorth York(Parkwoods),M4ANorth York(Victoria Village),M5ADowntown Toronto(Regent Park / Harbourfront),M6ANorth York(Lawrence Manor / Lawrence Heights),M7AQueen's Park(Ontario Provincial Government),M8ANot assigned,M9AEtobicoke(Islington Avenue)
0,M1BScarborough(Malvern / Rouge),M2BNot assigned,M3BNorth York(Don Mills)North,M4BEast York(Parkview Hill / Woodbine Gardens),"M5BDowntown Toronto(Garden District, Ryerson)",M6BNorth York(Glencairn),M7BNot assigned,M8BNot assigned,M9BEtobicoke(West Deane Park / Princess Garden...
1,M1CScarborough(Rouge Hill / Port Union / Highl...,M2CNot assigned,M3CNorth York(Don Mills)South(Flemingdon Park),M4CEast York(Woodbine Heights),M5CDowntown Toronto(St. James Town),M6CYork(Humewood-Cedarvale),M7CNot assigned,M8CNot assigned,M9CEtobicoke(Eringate / Bloordale Gardens / Ol...
2,M1EScarborough(Guildwood / Morningside / West ...,M2ENot assigned,M3ENot assigned,M4EEast Toronto(The Beaches),M5EDowntown Toronto(Berczy Park),M6EYork(Caledonia-Fairbanks),M7ENot assigned,M8ENot assigned,M9ENot assigned
3,M1GScarborough(Woburn),M2GNot assigned,M3GNot assigned,M4GEast York(Leaside),M5GDowntown Toronto(Central Bay Street),M6GDowntown Toronto(Christie),M7GNot assigned,M8GNot assigned,M9GNot assigned
4,M1HScarborough(Cedarbrae),M2HNorth York(Hillcrest Village),M3HNorth York(Bathurst Manor / Wilson Heights ...,M4HEast York(Thorncliffe Park),M5HDowntown Toronto(Richmond / Adelaide / King),M6HWest Toronto(Dufferin / Dovercourt Village),M7HNot assigned,M8HNot assigned,M9HNot assigned


### Below few code cells will check the data frame for the purpose of clustering

In [154]:
# Value counts for each Borough
df.Borough.value_counts()

Etobicoke           44
North York          38
Downtown Toronto    37
Scarborough         37
Central Toronto     17
West Toronto        13
York                 9
East Toronto         7
East York            6
Queen's Park         1
Mississauga          1
Name: Borough, dtype: int64

In [155]:
# Check for Not assigned values in the data frame in Neighborhood column
(df.Neighborhood == 'Not assigned').value_counts()

False    210
Name: Neighborhood, dtype: int64

In [157]:
# Grouping Neighborhoods with common Postal codes
grouped = df.groupby(['PostalCode', 'Neighborhood'], as_index = False).count()
neigh_df = grouped.groupby('PostalCode')['Neighborhood'].apply(lambda x: ",".join(x.astype(str))).reset_index()
neigh_df.head()

Unnamed: 0,PostalCode,Neighborhood
0,M1B,"Malvern,Rouge"
1,M1C,"Highland Creek,Port Union,Rouge Hill"
2,M1E,"Guildwood,Morningside,West Hill"
3,M1G,Woburn
4,M1H,Cedarbrae


In [159]:
# Grouping data using Borough field
borough_df = pd.DataFrame({'count' : df.groupby(['PostalCode', 'Borough']).size()}).reset_index()
borough_df.set_index('PostalCode')
borough_df.drop('count', axis = 1, inplace = True)
borough_df.head()

Unnamed: 0,PostalCode,Borough
0,M1B,Scarborough
1,M1C,Scarborough
2,M1E,Scarborough
3,M1G,Scarborough
4,M1H,Scarborough


In [160]:
# Merging the above two data frames with common postal code
toronto_df = pd.merge(borough_df, neigh_df, on = 'PostalCode')
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern,Rouge"
1,M1C,Scarborough,"Highland Creek,Port Union,Rouge Hill"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [161]:
# print the shape of the data frame
toronto_df.shape

(103, 3)

In [163]:
# Reading the latitue and longitude csv file of Toronto neighborhoos
lat_long = pd.read_csv('http://cocl.us/Geospatial_data')
lat_long.columns = ['PostalCode', 'Latitude', 'Longitude']
lat_long.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [164]:
# Merge this data frame with the toronto_df
toronto_neigh_ll_df = pd.merge(toronto_df, lat_long, on = 'PostalCode')
toronto_neigh_ll_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern,Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Port Union,Rouge Hill",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [165]:
# print the shape of the data frame
toronto_neigh_ll_df.shape

(103, 5)

In [166]:
# Save the dataframe to csv file
toronto_neigh_ll_df.to_csv('Toronto_Postal_data')

### Let's start with the analysis of Toronto Neighborhood data

In [167]:
#!conda install -c conda-forge folium --yes
#!conda install -c conda-forge geopy --yes     # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim          # convert an address into latitude and longitude values
from pandas.io.json import json_normalize      # transform JSON file into a pandas dataframe
import folium                                  # map rendering library

#### Get the latitude & longitude coordinates of Toronto to draw its map

In [168]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent = 'Toronto_explorer')      # define user_agent
location = geolocator.geocode(address)

# Latitude coordinate
lat = location.latitude      

# Longitude coordinate
lng = location.longitude

print('The latitude is {} & longitude is {}'.format(lat, lng))

The latitude is 43.653963 & longitude is -79.387207


### Creating map of Toronto with the neighborhoods superimposed on it

In [169]:
# Generate map of Toronto using folium
T_map = folium.Map(location = [lat, lng], zoom_start = 12)
for lt, ln, bgh, ngh in zip(toronto_neigh_ll_df['Latitude'], toronto_neigh_ll_df['Longitude'], toronto_neigh_ll_df['Borough'], toronto_neigh_ll_df['Neighborhood']):
    label = '{}, {}'.format(bgh, ngh)
    label = folium.Popup(label, parse_html = True)    # Adds label to the map
    folium.CircleMarker([lt, ln], radius = 5, popup = label, color = 'blue', 
                        fill = True, fill_color = '#3186cc', fill_opacity = 0.7, parse_html = False).add_to(T_map)

T_map

## For our analysis purpose we will consider only the Borough names which contain Toronto in it

In [170]:
# Filter rows containing Toronto in the Borough names
T_neigh = toronto_neigh_ll_df[toronto_neigh_ll_df.Borough.str.contains('Toronto', case = False)].reset_index(drop = True)
T_neigh.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"Riverdale,The Danforth West",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar,The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [171]:
# Print the shape of the data frame
T_neigh.shape

(39, 5)

### Let's create the map of Borough names which contains Toronto in it and put the labels of neighborhoods

In [172]:
# Map of Borough names which contains Toronto
T_map = folium.Map(location = [lat, lng], zoom_start = 12)

for lt,ln, label in zip(T_neigh['Latitude'], T_neigh['Longitude'], T_neigh['Neighborhood']):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker([lt, ln], radius = 5, color = 'blue', popup = label, fill = True, fill_color = '#3186cc', 
                       fill_opacity = 0.7, parse_html = False).add_to(T_map)
    
T_map

### Define Foursquare Credentials and Version

In [173]:
# Foursquare API request
CLIENT_ID = 'I40ANJ0HSBEHAJ50BD3ODF12RMAXUZPYI5YMTRW3ROK2KBAS'         # Foursquare ID
CLIENT_SECRET = '5WBVGYJSOCYQF3NYYN3V03ZLJSCVVXNGJE0M3HRCQ4TP032I'     # Foursquare Secret
VERSION = '20180605'   # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: I40ANJ0HSBEHAJ50BD3ODF12RMAXUZPYI5YMTRW3ROK2KBAS
CLIENT_SECRET:5WBVGYJSOCYQF3NYYN3V03ZLJSCVVXNGJE0M3HRCQ4TP032I


### Explore the 1st neighborhood in East Toronto using Foursquare API

In [174]:
# Examine 1st neighborhood in East Toronto
ne_lt = T_neigh.loc[0, 'Latitude']     # neighborhood latitude value
ne_ln = T_neigh.loc[0, 'Longitude']    # neighborhood longitude value

ne_name = T_neigh.loc[0, 'Neighborhood']    # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(ne_name, ne_lt, ne_ln))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


In [175]:
# Create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
       CLIENT_ID, CLIENT_SECRET, VERSION, ne_lt, ne_ln, radius, LIMIT)
                  
results = requests.get(url).json()      # make the GET request

# View the result
results                       

{'meta': {'code': 200, 'requestId': '5e43ca2d006dce001b50743a'},
 'response': {'headerLocation': 'The Beaches',
  'headerFullLocation': 'The Beaches, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 43.680857404499996,
    'lng': -79.28682091449052},
   'sw': {'lat': 43.67185739549999, 'lng': -79.29924148550948}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bd461bc77b29c74a07d9282',
       'name': 'Glen Manor Ravine',
       'location': {'address': 'Glen Manor',
        'crossStreet': 'Queen St.',
        'lat': 43.67682094413784,
        'lng': -79.29394208780985,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.67682094413784,
          'lng': -79.29394208780985}],
        'distanc

### As we know that all the relevant information is in the items key. Let's explore it

In [176]:
# Now let's clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']

# flatten JSON
nearby_venues = json_normalize(venues)        

# filter relevant columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis = 1)

# clean columns & keep the last word as its name
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]


# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
    
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Glen Manor Ravine,Trail,43.676821,-79.293942
1,The Big Carrot Natural Food Market,Health Food Store,43.678879,-79.297734
2,Grover Pub and Grub,Pub,43.679181,-79.297215
3,Upper Beaches,Neighborhood,43.680563,-79.292869


In [177]:
# print the number of venues returned by Foursquare
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


### Let's explore all the Neighborhoods in our T_neigh dataframe using Foursquare 

#### Let's create a function to repeat the above process for all the neighborhoods in T_neigh dataframe

In [178]:
# custom function 
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    
    # Create the empty list to hold venue details
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [179]:
# Call the above function on each neighborhood to create the Toronto_venues data frame
Toronto_venues = getNearbyVenues(names = T_neigh['Neighborhood'], latitudes = T_neigh['Latitude'], longitudes = T_neigh['Longitude'])

The Beaches
Riverdale,The Danforth West
India Bazaar,The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront
Garden District,Ryerson
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
North Midtown,The Annex,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
Bathurst Quay,CN Tower,Harbourfront West,Island airport,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvalles
Runnymede

In [180]:
# Check the size of the resulting data frame
print(Toronto_venues.shape)
Toronto_venues.head()

(1705, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"Riverdale,The Danforth West",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


#### Let's check how many venues were returned for each neighborhood

In [181]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
"Bathurst Quay,CN Tower,Harbourfront West,Island airport,King and Spadina,Railway Lands,South Niagara",16,16,16,16,16,16
Berczy Park,55,55,55,55,55,55
"Brockton,Exhibition Place,Parkdale Village",24,24,24,24,24,24
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"Cabbagetown,St. James Town",42,42,42,42,42,42
Central Bay Street,84,84,84,84,84,84
"Chinatown,Grange Park,Kensington Market",80,80,80,80,80,80
Christie,18,18,18,18,18,18
Church and Wellesley,81,81,81,81,81,81


### Let's find out how many unique categories are there in the returned venues

In [182]:
print('There are {} unique categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 229 unique categories.


## Analyze each neighborhood

In [183]:
# Convert venue category to one hot encoding 
toronto_ohe = pd.get_dummies(Toronto_venues[['Venue Category']], prefix = '', prefix_sep = '')
toronto_ohe.drop('Neighborhood', axis = 1, inplace = True)
toronto_ohe.head()

Unnamed: 0,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [184]:
# insert neighborhood column at the beginning in the dataframe
toronto_ohe.insert(0, 'Neighborhood', Toronto_venues['Neighborhood'])
toronto_ohe.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Riverdale,The Danforth West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Examine the size of the new data frame

In [185]:
toronto_ohe.shape

(1705, 229)

### Next, let's group data frame by neighborhood by taking the mean of the frequency of occurrence of each category

In [186]:
toronto_grouped = toronto_ohe.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0
1,"Bathurst Quay,CN Tower,Harbourfront West,Islan...",0.0,0.0625,0.0625,0.0625,0.125,0.1875,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0
3,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667
4,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,...,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.011905
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0125,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,...,0.012346,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346


In [187]:
# Check the new size after grouping
toronto_grouped.shape

(39, 229)

### Let's print each neighborhood along with the top 5 most common venues

In [188]:
# Iterate over the neighborhood to find the top 5 most common venues
for hood in toronto_grouped.Neighborhood:
    print('---' + hood + '---')
    temp = toronto_grouped[toronto_grouped.Neighborhood == hood].T.reset_index()
    temp.columns = ['venne', 'freq']
    temp.drop(temp.index[0], inplace = True)
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True)[0:5])
    print('\n')

---Adelaide,King,Richmond---
             venne  freq
0      Coffee Shop  0.07
1  Thai Restaurant  0.04
2             Café  0.04
3       Steakhouse  0.04
4              Bar  0.04


---Bathurst Quay,CN Tower,Harbourfront West,Island airport,King and Spadina,Railway Lands,South Niagara---
                venne    freq
0     Airport Service  0.1875
1      Airport Lounge   0.125
2    Airport Terminal   0.125
3  Airport Food Court  0.0625
4        Airport Gate  0.0625


---Berczy Park---
                venne       freq
0         Coffee Shop  0.0727273
1        Cocktail Bar  0.0545455
2  Seafood Restaurant  0.0363636
3         Cheese Shop  0.0363636
4          Steakhouse  0.0363636


---Brockton,Exhibition Place,Parkdale Village---
            venne       freq
0  Breakfast Spot  0.0833333
1            Café  0.0833333
2     Coffee Shop  0.0833333
3       Nightclub  0.0833333
4             Gym  0.0416667


---Business Reply Mail Processing Centre 969 Eastern---
              venne       freq


### Let's convert the top 10 one hot encoded category into into a pandas dataframe for each neighborhood

First, lets write a function to sort the venues in descending order.

In [191]:
def return_most_common_venues(row, top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:top_venues]

In [192]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
top_venues_sorted = pd.DataFrame(columns=columns)
top_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    top_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], top_venues)


# See the top 5 entries
top_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Bar,Thai Restaurant,Café,Steakhouse,Asian Restaurant,Burger Joint,Breakfast Spot,Bakery,Cosmetics Shop
1,"Bathurst Quay,CN Tower,Harbourfront West,Islan...",Airport Service,Airport Lounge,Airport Terminal,Coffee Shop,Harbor / Marina,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry,Airport
2,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Farmers Market,Bakery,Seafood Restaurant,Steakhouse,Cheese Shop,Café,Greek Restaurant
3,"Brockton,Exhibition Place,Parkdale Village",Breakfast Spot,Café,Nightclub,Coffee Shop,Yoga Studio,Pet Store,Stadium,Burrito Place,Restaurant,Climbing Gym
4,Business Reply Mail Processing Centre 969 Eastern,Skate Park,Auto Workshop,Brewery,Smoke Shop,Spa,Restaurant,Farmers Market,Fast Food Restaurant,Burrito Place,Recording Studio


# Cluster Neighborhoods

Run k-means to cluster the neighborhood into 3 clusters.

In [193]:
# import KMeans from sklearn
from sklearn.cluster import KMeans

# set the number of clusters
kclusters = 3

# Remove neighborhood column 
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster label as well as the top 10 venues for each neighborhood.

In [194]:
# add clustering labels
top_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = T_neigh

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(top_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()  # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Trail,Health Food Store,Pub,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
1,M4K,East Toronto,"Riverdale,The Danforth West",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Restaurant,Bubble Tea Shop,Grocery Store,Pub,Pizza Place
2,M4L,East Toronto,"India Bazaar,The Beaches West",43.668999,-79.315572,0,Park,Board Shop,Sushi Restaurant,Sandwich Place,Brewery,Liquor Store,Burger Joint,Italian Restaurant,Burrito Place,Fast Food Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Bakery,Italian Restaurant,Brewery,American Restaurant,Yoga Studio,Bookstore,Sandwich Place
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Lake,Swim School,Bus Line,Park,General Entertainment,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


In [195]:
# Check the shape of the merged data frame
toronto_merged.shape

(39, 16)

## Finally, let's visualize the resulting clusters on the map

In [196]:
# import useful libraries
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location = [lat, lng], zoom_start = 11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker([lat, lon], radius = 5, popup = label, color = rainbow[cluster-1], fill = True, fill_color = rainbow[cluster-1],
    fill_opacity = 0.7).add_to(map_clusters)
       
map_clusters

# Examine each cluster

In [197]:
# cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Trail,Health Food Store,Pub,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
1,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Restaurant,Bubble Tea Shop,Grocery Store,Pub,Pizza Place
2,East Toronto,0,Park,Board Shop,Sushi Restaurant,Sandwich Place,Brewery,Liquor Store,Burger Joint,Italian Restaurant,Burrito Place,Fast Food Restaurant
3,East Toronto,0,Café,Coffee Shop,Gastropub,Bakery,Italian Restaurant,Brewery,American Restaurant,Yoga Studio,Bookstore,Sandwich Place
4,Central Toronto,0,Lake,Swim School,Bus Line,Park,General Entertainment,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
5,Central Toronto,0,Park,Department Store,Breakfast Spot,Sandwich Place,Food & Drink Shop,Hotel,Gym,Comic Shop,Dim Sum Restaurant,Eastern European Restaurant
6,Central Toronto,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Salon / Barbershop,Restaurant,Rental Car Location,Café,Chinese Restaurant,Park,Mexican Restaurant
7,Central Toronto,0,Dessert Shop,Sandwich Place,Coffee Shop,Sushi Restaurant,Gym,Café,Italian Restaurant,Pizza Place,Brewery,Restaurant
9,Central Toronto,0,Coffee Shop,Pub,Pizza Place,Sushi Restaurant,Sports Bar,Fried Chicken Joint,Restaurant,American Restaurant,Supermarket,Liquor Store
11,Downtown Toronto,0,Restaurant,Coffee Shop,Italian Restaurant,Pizza Place,Bakery,Pub,Café,Butcher,Sandwich Place,Breakfast Spot


# Cluster 2

In [198]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,1,Park,Playground,Tennis Court,Restaurant,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run
10,Downtown Toronto,1,Park,Playground,Trail,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store


# Cluster 3

In [199]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,2,Health & Beauty Service,Pool,Garden,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


# Conclusion

### Cluster 1 : Most common venues are Coffee Shop, Cafe, Italian Restaurants, Fast Food Restaurants, Bar, Pub which are plenty in number

### Cluster 2 : Most common venues are Park, Playground, Tennis Court, Dance Studio which are few in number

### Cluster 3 : Most common venues if Health & Beauty Services, Pool, Garden which are very less in number