# Battle of the Neighborhoods: Segmenting and Clustering Neighborhoods in Toronto

## Assignment Description

In this data science project we shall explore and cluster the neighborhoods in the city of Toronto to retrieve the most common venues for each cluster by completing the following task:
1. For the Toronto neighborhood data, the Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M was scraped and wrangled, clean it then read it into a pandas dataframe to get the data into a structured format for further analysis.

2. Retrieved the latitude and longitude coordinate data for each Neighborhood in Toronto from the link to the csv file, https://cocl.us/Geospatial_data with their postal code.

3. Utilized Foursquare API to provide the venues in Toronto and explore the most common venue categories in each neighborhood.

4. The k-means clustering algorithm, a form of unsupervised machine learning, was used to group the neighborhoods into clusters.

**Installing and importing Libraries**

In [1]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Folium installed
Libraries imported.


**Webscraping and wrangling the data**

The Wikipedia page was scraped using BeautifulSoup. The data was wrangled such that cells that have borough Not assigned was ignored. Similar postal code with different neighborhoods was combined into one row such that the neighborhoods are separated by commas. If a cell has a borough but neighborhood Not assigned then the neighborhood will be same as the borough. Then transform the data into a pandas dataframe.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

data  = requests.get(url).text

soup = BeautifulSoup(data,"lxml")  # create a soup object using the variable 'data'

#Transform table data into a dataframe
table_contents=[] #create list
table=soup.find('table') #finding the table
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
toronto_data=pd.DataFrame(table_contents)
toronto_data['Borough']=toronto_data['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [3]:
toronto_data.shape

(103, 3)

**Latitude and longitude coordinates with their postal code**

Read the csv file containing the latitudes and longitudes coordinates with their postal codes and place into a dataframe.

In [4]:
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


**Merge both dataframes on PostalCode**

In [5]:
lat_lon.rename(columns={'Postal Code':'PostalCode'},inplace=True)

In [6]:
toronto_data = pd.merge(lat_lon,toronto_data,on='PostalCode')
toronto_data = toronto_data [["PostalCode", "Borough", "Neighborhood","Latitude","Longitude"]]
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


**Retrieve rows which contains Toronto in their borough from the dataframe**

In [7]:
toronto_data = toronto_data[toronto_data['Borough'].str.contains(('Toronto'),regex=False)].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4J,East York/East Toronto,The Danforth East,43.685347,-79.338106
2,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
4,M4M,East Toronto,Studio District,43.659526,-79.340923


**Visualizingthe Boroughs from the dataframe using Folium**

In [8]:
map_toronto = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

for lat,lng,borough,neighborhood in zip(toronto_data['Latitude'],toronto_data['Longitude'],toronto_data['Borough'],toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='green',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
map_toronto

**Utilizing Foursquare API to explore the neighborhoods, get all the venues and segment them.**

In [9]:
#Define foursquare credentials and version
CLIENT_ID = 'XFXHAXFSM3TNXNKTWZH14XGOSRW5DWYNIAZYBF5FKBP5EW2K' # your Foursquare ID
CLIENT_SECRET = 'NTMB4MUHTB5XGMNTBEE4RVTHIJVQ4XIKGPWZNLKXUPZM5D1Y' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [10]:
#Exploring the first neighborhood.
toronto_data.loc[0, 'Neighborhood']

'The Beaches'

In [11]:
to_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
to_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

to_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(to_name, 
                                                               to_latitude, 
                                                               to_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


**Now, let's get the top 100 venues that are in The Beaches within a radius of 500 meters using foursquare.**

In [12]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius


to_url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    to_latitude, 
    to_longitude, 
    radius, 
    LIMIT)
to_url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=XFXHAXFSM3TNXNKTWZH14XGOSRW5DWYNIAZYBF5FKBP5EW2K&client_secret=NTMB4MUHTB5XGMNTBEE4RVTHIJVQ4XIKGPWZNLKXUPZM5D1Y&v=20180605&ll=43.67635739999999,-79.2930312&radius=500&limit=100'

In [13]:
to_results = requests.get(to_url).json()
to_results

{'meta': {'code': 200, 'requestId': '60fbfdfcb80a5173e5191037'},
 'response': {'headerLocation': 'The Beaches',
  'headerFullLocation': 'The Beaches, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 43.680857404499996,
    'lng': -79.28682091449052},
   'sw': {'lat': 43.67185739549999, 'lng': -79.29924148550948}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bd461bc77b29c74a07d9282',
       'name': 'Glen Manor Ravine',
       'location': {'address': 'Glen Manor',
        'crossStreet': 'Queen St.',
        'lat': 43.67682094413784,
        'lng': -79.29394208780985,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.67682094413784,
          'lng': -79.29394208780985}],
        'distanc

In [14]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [15]:
#Extracting the details from the json and structure it into a dataframe
to_venues = to_results['response']['groups'][0]['items']
    
to_nearby_venues = json_normalize(to_venues) # flatten JSON

# filter columns
to_filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
to_nearby_venues =to_nearby_venues.loc[:, to_filtered_columns]

# filter the category for each row
to_nearby_venues['venue.categories'] = to_nearby_venues.apply(get_category_type, axis=1)

# clean columns
to_nearby_venues.columns = [col.split(".")[-1] for col in to_nearby_venues.columns]

print(to_nearby_venues.shape)
to_nearby_venues

(4, 4)


  to_nearby_venues = json_normalize(to_venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Glen Manor Ravine,Trail,43.676821,-79.293942
1,The Big Carrot Natural Food Market,Health Food Store,43.678879,-79.297734
2,Grover Pub and Grub,Pub,43.679181,-79.297215
3,Upper Beaches,Neighborhood,43.680563,-79.292869


In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                 latitudes=toronto_data['Latitude'],
                                 longitudes=toronto_data['Longitude']
                                  )

The Beaches
The Danforth  East
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North & West
The Annex, North Midtown, Yorkville
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Enclave of M5E
First Canadian Place, Underground city
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High Park, The Junction 

In [18]:
print(toronto_venues.shape)
toronto_venues.head()

(1596, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Danforth East,43.685347,-79.338106,Danforth & Jones,43.684352,-79.334792,Intersection


In [19]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,56,56,56,56,56,56
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,66,66,66,66,66,66
Christie,16,16,16,16,16,16
Church and Wellesley,80,80,80,80,80,80
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,35,35,35,35,35,35
Davisville North,8,8,8,8,8,8
"Dufferin, Dovercourt Village",16,16,16,16,16,16


In [20]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 236 uniques categories.


**Using One Hot encoding  to find out the top ten venues for each neighborhood**

In [21]:
#one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
mid = toronto_onehot['Neighborhood']
toronto_onehot.drop(labels=['Neighborhood'], axis=1, inplace = True)
toronto_onehot.insert(0,'Neighborhood', mid)

toronto_onehot

Unnamed: 0,Neighborhood,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Danforth East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1591,Enclave of M4L,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1592,Enclave of M4L,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1593,Enclave of M4L,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1594,Enclave of M4L,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0625,0.0625,0.0625,0.125,0.1875,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.015152
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
toronto_num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    toronto_temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    toronto_temp.columns = ['venue','freq']
    toronto_temp = toronto_temp.iloc[1:]
    toronto_temp['freq'] = toronto_temp['freq'].astype(float)
    toronto_temp = toronto_temp.round({'freq': 2})
    print(toronto_temp.sort_values('freq', ascending=False).reset_index(drop=True).head(toronto_num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0         Coffee Shop  0.05
1              Bakery  0.05
2            Pharmacy  0.04
3      Farmers Market  0.04
4  Seafood Restaurant  0.04


----Brockton, Parkdale Village, Exhibition Place----
            venue  freq
0            Café  0.13
1  Breakfast Spot  0.09
2          Bakery  0.09
3     Coffee Shop  0.09
4    Climbing Gym  0.04


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0   Airport Service  0.19
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3   Harbor / Marina  0.06
4           Airport  0.06


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.18
1      Sandwich Place  0.06
2                Café  0.05
3  Italian Restaurant  0.05
4          Restaurant  0.03


----Christie----
           venue  freq
0  Grocery Store  0.25
1           Café  0.19
2           Park  0.12
3      Nightclub  0.06
4    Coff

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
toronto_num_top_venues = 10

toronto_indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
toronto_columns = ['Neighborhood']
for ind in np.arange(toronto_num_top_venues):
    try:
        toronto_columns.append('{}{} Most Common Venue'.format(ind+1, toronto_indicators[ind]))
    except:
        toronto_columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=toronto_columns)
toronto_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], toronto_num_top_venues)

toronto_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Bakery,Coffee Shop,Seafood Restaurant,Restaurant,Cocktail Bar,Cheese Shop,Pharmacy,Beer Bar,Farmers Market,Bistro
1,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Bakery,Coffee Shop,Climbing Gym,Burrito Place,Italian Restaurant,Stadium,Intersection,Restaurant
2,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Boat or Ferry,Rental Car Location,Boutique,Sculpture Garden,Airport Gate,Airport Food Court
3,Central Bay Street,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Bubble Tea Shop,Salad Place,Burger Joint,Restaurant,Japanese Restaurant,Office
4,Christie,Grocery Store,Café,Park,Nightclub,Italian Restaurant,Baby Store,Candy Store,Athletics & Sports,Restaurant,Coffee Shop


**Cluster the neighborhoods using K Means Clustering**

In [26]:
toronto_kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
toronto_kmeans = KMeans(n_clusters=toronto_kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
toronto_kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0])

In [27]:
# unique value in target column
np.unique(toronto_kmeans.labels_)

array([0, 1, 2, 3])

In [28]:
# add clustering labels
toronto_venues_sorted.insert(0, 'Cluster Labels', toronto_kmeans.labels_)

toronto_merged = toronto_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood',how='inner')
print(toronto_merged.shape)
toronto_merged.head() # check the last columns!

(39, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Pub,Health Food Store,Trail,Yoga Studio,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
1,M4J,East York/East Toronto,The Danforth East,43.685347,-79.338106,3,Intersection,Park,Convenience Store,Yoga Studio,Diner,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant
2,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Italian Restaurant,Coffee Shop,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Indian Restaurant,Caribbean Restaurant,Spa,Pub
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,0,Sandwich Place,Park,Pub,Burrito Place,Italian Restaurant,Fast Food Restaurant,Intersection,Restaurant,Fish & Chips Shop,Steakhouse
4,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Coffee Shop,Bakery,Café,Gastropub,American Restaurant,Brewery,Stationery Store,Fish Market,Italian Restaurant,Convenience Store


**Visualizing the cluster**

In [29]:
# create map
toronto_map_clusters = folium.Map(location=[to_latitude, to_longitude], zoom_start=11)

# set color scheme for the clusters
xs = np.arange(toronto_kclusters)
y = [j + xs + (j*xs)**2 for j in range(toronto_kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(y)))
rainbow = [colors.rgb2hex(j) for j in colors_array]

# add markers to the map
markers_colors = []
for latit, lonit, pos, t_cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(pos) + ' Cluster ' + str(t_cluster), parse_html=True)
    folium.CircleMarker(
        [latit, lonit],
        radius=5,
        popup=label,
        color=rainbow[t_cluster-1],
        fill=True,
        fill_color=rainbow[t_cluster-1],
        fill_opacity=0.7).add_to(toronto_map_clusters)
       
toronto_map_clusters

In [30]:
toronto_cluster1= toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1,2] + list(range(5, toronto_merged.shape[1]))]]
toronto_cluster1  

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,The Beaches,0,Pub,Health Food Store,Trail,Yoga Studio,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
2,East Toronto,"The Danforth West, Riverdale",0,Greek Restaurant,Italian Restaurant,Coffee Shop,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Indian Restaurant,Caribbean Restaurant,Spa,Pub
3,East Toronto,"India Bazaar, The Beaches West",0,Sandwich Place,Park,Pub,Burrito Place,Italian Restaurant,Fast Food Restaurant,Intersection,Restaurant,Fish & Chips Shop,Steakhouse
4,East Toronto,Studio District,0,Coffee Shop,Bakery,Café,Gastropub,American Restaurant,Brewery,Stationery Store,Fish Market,Italian Restaurant,Convenience Store
5,Central Toronto,Lawrence Park,0,Park,Swim School,Bus Line,Business Service,Falafel Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
6,Central Toronto,Davisville North,0,Gym / Fitness Center,Hotel,Dance Studio,Department Store,Sandwich Place,Breakfast Spot,Food & Drink Shop,Park,General Entertainment,Gay Bar
7,Central Toronto,North Toronto West,0,Coffee Shop,Cosmetics Shop,Clothing Store,Sporting Goods Shop,Gym / Fitness Center,Fast Food Restaurant,Diner,Metro Station,Mexican Restaurant,Park
8,Central Toronto,Davisville,0,Sandwich Place,Dessert Shop,Italian Restaurant,Gym,Sushi Restaurant,Café,Pizza Place,Coffee Shop,Farmers Market,Pharmacy
10,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",0,Coffee Shop,Bagel Shop,Café,Liquor Store,Vietnamese Restaurant,Supermarket,Sushi Restaurant,Bank,Light Rail Station,American Restaurant
12,Downtown Toronto,"St. James Town, Cabbagetown",0,Pizza Place,Café,Coffee Shop,Italian Restaurant,Park,Pub,Bakery,Restaurant,Japanese Restaurant,Butcher


In [31]:
toronto_venues1 = (toronto_cluster1['1st Most Common Venue'].append(
                   toronto_cluster1['2nd Most Common Venue']).append(
                   toronto_cluster1['3rd Most Common Venue']).append(
                   toronto_cluster1['4th Most Common Venue']).append(
                   toronto_cluster1['5th Most Common Venue']).append(
                   toronto_cluster1['6th Most Common Venue']).append(
                   toronto_cluster1['7th Most Common Venue']).append(
                   toronto_cluster1['8th Most Common Venue']).append(
                   toronto_cluster1['9th Most Common Venue']).append(
                   toronto_cluster1['10th Most Common Venue']))

print(toronto_venues1.value_counts().head(10))

Coffee Shop            23
Café                   23
Restaurant             17
Italian Restaurant     12
Bakery                 11
Pub                    10
Park                    9
Hotel                   9
Japanese Restaurant     9
Bar                     7
dtype: int64


In [32]:
toronto_cluster2=toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1,2] + list(range(5, toronto_merged.shape[1]))]]
toronto_cluster2

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Central Toronto,"Moore Park, Summerhill East",1,Gym,Park,Trail,Department Store,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
11,Downtown Toronto,Rosedale,1,Park,Playground,Trail,Department Store,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
24,Central Toronto,Forest Hill North & West,1,Park,Jewelry Store,Trail,Sushi Restaurant,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [33]:
toronto_venues2 = (toronto_cluster2['1st Most Common Venue'].append(
                   toronto_cluster2['2nd Most Common Venue']).append(
                   toronto_cluster2['3rd Most Common Venue']).append(
                   toronto_cluster2['4th Most Common Venue']).append(
                   toronto_cluster2['5th Most Common Venue']).append(
                   toronto_cluster2['6th Most Common Venue']).append(
                   toronto_cluster2['7th Most Common Venue']).append(
                   toronto_cluster2['8th Most Common Venue']).append(
                   toronto_cluster2['9th Most Common Venue']).append(
                   toronto_cluster2['10th Most Common Venue']))

print(toronto_venues2.value_counts().head(10))

Eastern European Restaurant    3
Trail                          3
Dumpling Restaurant            3
Escape Room                    3
Park                           3
Ethiopian Restaurant           3
Electronics Store              3
Department Store               2
Donut Shop                     2
Sushi Restaurant               1
dtype: int64


In [34]:
toronto_cluster3=toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1,2] + list(range(5, toronto_merged.shape[1]))]]
toronto_cluster3

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,Roselawn,2,Garden,Home Service,Fast Food Restaurant,Comic Shop,Concert Hall,Falafel Restaurant,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store


In [35]:
toronto_venues3 = (toronto_cluster3['1st Most Common Venue'].append(
                   toronto_cluster3['2nd Most Common Venue']).append(
                   toronto_cluster3['3rd Most Common Venue']).append(
                   toronto_cluster3['4th Most Common Venue']).append(
                   toronto_cluster3['5th Most Common Venue']).append(
                   toronto_cluster3['6th Most Common Venue']).append(
                   toronto_cluster3['7th Most Common Venue']).append(
                   toronto_cluster3['8th Most Common Venue']).append(
                   toronto_cluster3['9th Most Common Venue']).append(
                   toronto_cluster3['10th Most Common Venue']))

print(toronto_venues3.value_counts().head(10))

Escape Room             1
Comic Shop              1
Concert Hall            1
Falafel Restaurant      1
Electronics Store       1
Ethiopian Restaurant    1
Garden                  1
Event Space             1
Home Service            1
Fast Food Restaurant    1
dtype: int64


In [36]:
toronto_cluster4=toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1,2] + list(range(5, toronto_merged.shape[1]))]]
toronto_cluster4

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,East York/East Toronto,The Danforth East,3,Intersection,Park,Convenience Store,Yoga Studio,Diner,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


In [37]:
toronto_venues4 = (toronto_cluster4['1st Most Common Venue'].append(
                   toronto_cluster4['2nd Most Common Venue']).append(
                   toronto_cluster4['3rd Most Common Venue']).append(
                   toronto_cluster4['4th Most Common Venue']).append(
                   toronto_cluster4['5th Most Common Venue']).append(
                   toronto_cluster4['6th Most Common Venue']).append(
                   toronto_cluster4['7th Most Common Venue']).append(
                   toronto_cluster4['8th Most Common Venue']).append(
                   toronto_cluster4['9th Most Common Venue']).append(
                   toronto_cluster4['10th Most Common Venue']))

print(toronto_venues4.value_counts().head(10))

Intersection                   1
Event Space                    1
Eastern European Restaurant    1
Convenience Store              1
Park                           1
Yoga Studio                    1
Electronics Store              1
Ethiopian Restaurant           1
Diner                          1
Escape Room                    1
dtype: int64
