

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in TORONTO City</font></h1>


## Introduction

In this assignment, first we will build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, 
in order to obtain the neighborhood data that is in the table of wikipedia page and to transform the data into a pandas dataframe.
Next, we will build the code to obtain the Latitudes and Longitudes using the .csv data from the link https://cocl.us/Geospatial_data for the postal codes in the dataframe

Later,we will use the Foursquare API to explore neighborhoods in Toronto City. We will use the **explore** function 
to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters.
You will use the *k*-means clustering algorithm to complete this task. Finally, we will use 
the Folium library to visualize the neighborhoods in Toronto City and their emerging clusters.

In [None]:
!conda install -c conda-forge bs4 --yes
from bs4 import BeautifulSoup
import requests
import pandas as pd
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium

#### We will obtained the dataframe 'df' containing the postal codes from the wikipedia page 

In [3]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(website_url,'html.parser')
My_table = soup.find('table',{'class':'wikitable sortable'})
links = My_table.findAll('td')
PostalCode = []
Borough= []
Neighborhood = []
incr = 0
for link in links:
    val = incr%3
    incr = incr+1
    if val == 0:
        PostalCode.append(link.text.strip())
    elif val == 1:
        Borough.append(link.text.strip())
    else:
        Neighborhood.append(link.text.strip())
df = pd.DataFrame()
df['PostalCode'] = PostalCode
df['Borough'] = Borough
df['Neighborhood'] = Neighborhood
df.drop(df.loc[df['Borough']=='Not assigned'].index, inplace=True)
New_Set=[]
for x in df['Neighborhood']:
     z = x.replace('/',',')
     New_Set.append(z)
df['New_Column'] = New_Set
df.drop(columns='Neighborhood', inplace=True)
df.rename(columns={"New_Column": "Neighborhood"}, inplace = True)
df.reset_index(inplace = True)
df.drop(columns='index', inplace=True)

#### We will read the latitude and longitude values corresponding to the postal codes from the link mentioned earlier into a dataframe 'df2'

In [4]:
df2 = pd.read_csv('https://cocl.us/Geospatial_data')
df2
latitude_arr = []
longitude_arr = []
for x in df['PostalCode']:
    y = x
    zcca = df2.loc[df2['Postal Code'] == y,'Latitude'].iloc[0]
    zccb = df2.loc[df2['Postal Code'] == y,'Longitude'].iloc[0]
    latitude_arr.append(zcca)
    longitude_arr.append(zccb)
    
df['Latitude'] = latitude_arr
df['Longitude'] = longitude_arr
print('The dataframe has {} boroughs and {} neighborhoods.'.format(len(df['Borough'].unique()), df.shape[0]))

The dataframe has 10 boroughs and 103 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Toronto City.

In [5]:
address = 'Toronto'

geolocator = Nominatim(user_agent="To_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [49]:
neighborhoods = pd.DataFrame()
neighborhoods = df
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

However, for illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in West Toronto. So let's slice the original dataframe and create a new dataframe of the Toronto data.

In [7]:
West_toronto_data = neighborhoods[neighborhoods['Borough'] == 'West Toronto'].reset_index(drop=True)
West_toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M6H,West Toronto,"Dufferin , Dovercourt Village",43.669005,-79.442259
1,M6J,West Toronto,"Little Portugal , Trinity",43.647927,-79.41975
2,M6K,West Toronto,"Brockton , Parkdale Village , Exhibition Place",43.636847,-79.428191
3,M6P,West Toronto,"High Park , The Junction South",43.661608,-79.464763
4,M6R,West Toronto,"Parkdale , Roncesvalles",43.64896,-79.456325


In [8]:
address = 'West Toronto'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of West Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of West Toronto are 43.6534817, -79.3839347.


#### As we did with all of Toronto City, let's visualize West Toronto and the neighborhoods in it.

In [9]:
map_west_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(West_toronto_data['Latitude'], West_toronto_data['Longitude'], West_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_west_toronto)  
    
map_west_toronto

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.
#### Define Foursquare Credentials and Version

In [51]:
CLIENT_ID = 'UM4NTMFYN2G5YFHWER4HJOF2K0XWOLUS3D5WWAT5IUDCWJVY' # your Foursquare ID
CLIENT_SECRET = 'CGEV55C4FUQNJO5JNQR53QYCU1EYLXDY5IBOIUBG3LBZAYHY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UM4NTMFYN2G5YFHWER4HJOF2K0XWOLUS3D5WWAT5IUDCWJVY
CLIENT_SECRET:CGEV55C4FUQNJO5JNQR53QYCU1EYLXDY5IBOIUBG3LBZAYHY


In [12]:
West_toronto_data.loc[0, 'Neighborhood']

'Dufferin , Dovercourt Village'

In [13]:
neighborhood_latitude = West_toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = West_toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = West_toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Dufferin , Dovercourt Village are 43.66900510000001, -79.4422593.


In [14]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=UM4NTMFYN2G5YFHWER4HJOF2K0XWOLUS3D5WWAT5IUDCWJVY&client_secret=CGEV55C4FUQNJO5JNQR53QYCU1EYLXDY5IBOIUBG3LBZAYHY&v=20180605&ll=43.66900510000001,-79.4422593&radius=500&limit=100'

In [15]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ea30ca1aba297001b6f7ca1'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Davenport',
  'headerFullLocation': 'Davenport, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 16,
  'suggestedBounds': {'ne': {'lat': 43.67350510450001,
    'lng': -79.43604977526607},
   'sw': {'lat': 43.664505095500004, 'lng': -79.44846882473394}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5753753b498eeb535c53aed5',
       'name': 'The Greater Good Bar',
       'location': {'address': '229 Geary St',
        'crossStreet': 'at Dufferin St',
        'lat': 43.669409,
        'lng': -79.439267,
        'labeledLatLngs': [{'label': 'disp

In [16]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
import json
from pandas.io.json import json_normalize

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  


Unnamed: 0,name,categories,lat,lng
0,The Greater Good Bar,Bar,43.669409,-79.439267
1,Parallel,Middle Eastern Restaurant,43.669516,-79.438728
2,Planet Fitness,Gym / Fitness Center,43.667588,-79.442574
3,FreshCo,Grocery Store,43.667918,-79.440754
4,Blood Brothers Brewing,Brewery,43.669944,-79.436533


In [18]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

16 venues were returned by Foursquare.


## 2. Explore Neighborhoods in West Toronto

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                             'Neighborhood Latitude', 
                             'Neighborhood Longitude', 
                             'Venue', 
                             'Venue Latitude', 
                             'Venue Longitude', 
                             'Venue Category']
    
    return(nearby_venues)

In [20]:
West_toronto_venues = getNearbyVenues(names = West_toronto_data['Neighborhood'],
                                   latitudes = West_toronto_data['Latitude'],
                                   longitudes = West_toronto_data['Longitude']
                                  )

Dufferin , Dovercourt Village
Little Portugal , Trinity
Brockton , Parkdale Village , Exhibition Place
High Park , The Junction South
Parkdale , Roncesvalles
Runnymede , Swansea


In [22]:
print(West_toronto_venues.shape)
West_toronto_venues.head()

(151, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Dufferin , Dovercourt Village",43.669005,-79.442259,The Greater Good Bar,43.669409,-79.439267,Bar
1,"Dufferin , Dovercourt Village",43.669005,-79.442259,Parallel,43.669516,-79.438728,Middle Eastern Restaurant
2,"Dufferin , Dovercourt Village",43.669005,-79.442259,Planet Fitness,43.667588,-79.442574,Gym / Fitness Center
3,"Dufferin , Dovercourt Village",43.669005,-79.442259,FreshCo,43.667918,-79.440754,Grocery Store
4,"Dufferin , Dovercourt Village",43.669005,-79.442259,Blood Brothers Brewing,43.669944,-79.436533,Brewery


In [23]:
West_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Brockton , Parkdale Village , Exhibition Place",22,22,22,22,22,22
"Dufferin , Dovercourt Village",16,16,16,16,16,16
"High Park , The Junction South",22,22,22,22,22,22
"Little Portugal , Trinity",41,41,41,41,41,41
"Parkdale , Roncesvalles",13,13,13,13,13,13
"Runnymede , Swansea",37,37,37,37,37,37


In [24]:
print('There are {} uniques categories.'.format(len(West_toronto_venues['Venue Category'].unique())))

There are 77 uniques categories.


## 3. Analyze Each Neighborhood

In [25]:
# one hot encoding
West_toronto_onehot = pd.get_dummies(West_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
West_toronto_onehot['Neighborhood'] = West_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [West_toronto_onehot.columns[-1]] + list(West_toronto_onehot.columns[:-1])
West_toronto_onehot = West_toronto_onehot[fixed_columns]

West_toronto_onehot.head()

Unnamed: 0,Neighborhood,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bakery,Bank,Bar,Beer Store,Bookstore,Boutique,...,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Dufferin , Dovercourt Village",0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Dufferin , Dovercourt Village",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Dufferin , Dovercourt Village",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Dufferin , Dovercourt Village",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Dufferin , Dovercourt Village",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
West_toronto_onehot.shape

(151, 78)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
West_toronto_grouped = West_toronto_onehot.groupby('Neighborhood').mean().reset_index()
West_toronto_grouped

Unnamed: 0,Neighborhood,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bakery,Bank,Bar,Beer Store,Bookstore,Boutique,...,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Brockton , Parkdale Village , Exhibition Place",0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Dufferin , Dovercourt Village",0.0,0.0,0.0,0.125,0.0625,0.0625,0.0,0.0,0.0,...,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0
2,"High Park , The Junction South",0.0,0.045455,0.0,0.045455,0.0,0.045455,0.0,0.045455,0.0,...,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0
3,"Little Portugal , Trinity",0.02439,0.0,0.04878,0.0,0.0,0.097561,0.02439,0.0,0.02439,...,0.0,0.0,0.0,0.0,0.02439,0.04878,0.04878,0.02439,0.0,0.02439
4,"Parkdale , Roncesvalles",0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Runnymede , Swansea",0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.027027,0.027027,...,0.0,0.054054,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027


In [28]:
West_toronto_grouped.shape

(6, 78)

#### Let's print each neighborhood along with the top 5 most common venues

In [29]:
num_top_venues = 5

for hood in West_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = West_toronto_grouped[West_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Brockton , Parkdale Village , Exhibition Place----
            venue  freq
0            Café  0.14
1     Coffee Shop  0.09
2  Breakfast Spot  0.09
3      Restaurant  0.05
4       Nightclub  0.05


----Dufferin , Dovercourt Village----
                  venue  freq
0              Pharmacy  0.12
1                Bakery  0.12
2  Gym / Fitness Center  0.06
3         Grocery Store  0.06
4           Pizza Place  0.06


----High Park , The Junction South----
                       venue  freq
0            Thai Restaurant  0.09
1         Mexican Restaurant  0.09
2                       Café  0.09
3  Cajun / Creole Restaurant  0.05
4                  Speakeasy  0.05


----Little Portugal , Trinity----
                   venue  freq
0                    Bar  0.10
1             Restaurant  0.07
2            Men's Store  0.05
3       Asian Restaurant  0.05
4  Vietnamese Restaurant  0.05


----Parkdale , Roncesvalles----
                         venue  freq
0                    Gift Shop  0.15


In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
import numpy as np 
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = West_toronto_grouped['Neighborhood']

for ind in np.arange(West_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(West_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Brockton , Parkdale Village , Exhibition Place",Café,Breakfast Spot,Coffee Shop,Nightclub,Convenience Store,Italian Restaurant,Performing Arts Venue,Pet Store,Climbing Gym,Intersection
1,"Dufferin , Dovercourt Village",Pharmacy,Bakery,Music Venue,Pizza Place,Park,Café,Middle Eastern Restaurant,Brewery,Gym / Fitness Center,Wine Shop
2,"High Park , The Junction South",Mexican Restaurant,Thai Restaurant,Café,Speakeasy,Fast Food Restaurant,Music Venue,Italian Restaurant,Cajun / Creole Restaurant,Flea Market,Diner
3,"Little Portugal , Trinity",Bar,Restaurant,Men's Store,Asian Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Café,French Restaurant,Korean Restaurant,Juice Bar
4,"Parkdale , Roncesvalles",Gift Shop,Cuban Restaurant,Italian Restaurant,Movie Theater,Dog Run,Eastern European Restaurant,Coffee Shop,Restaurant,Breakfast Spot,Dessert Shop


## 4. Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 3 clusters.

In [34]:
from sklearn.cluster import KMeans
kclusters = 3

West_toronto_grouped_clustering = West_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(West_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 1, 1, 0, 1], dtype=int32)

In [35]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

West_toronto_merged = West_toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
West_toronto_merged = West_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

West_toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M6H,West Toronto,"Dufferin , Dovercourt Village",43.669005,-79.442259,2,Pharmacy,Bakery,Music Venue,Pizza Place,Park,Café,Middle Eastern Restaurant,Brewery,Gym / Fitness Center,Wine Shop
1,M6J,West Toronto,"Little Portugal , Trinity",43.647927,-79.41975,1,Bar,Restaurant,Men's Store,Asian Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Café,French Restaurant,Korean Restaurant,Juice Bar
2,M6K,West Toronto,"Brockton , Parkdale Village , Exhibition Place",43.636847,-79.428191,1,Café,Breakfast Spot,Coffee Shop,Nightclub,Convenience Store,Italian Restaurant,Performing Arts Venue,Pet Store,Climbing Gym,Intersection
3,M6P,West Toronto,"High Park , The Junction South",43.661608,-79.464763,1,Mexican Restaurant,Thai Restaurant,Café,Speakeasy,Fast Food Restaurant,Music Venue,Italian Restaurant,Cajun / Creole Restaurant,Flea Market,Diner
4,M6R,West Toronto,"Parkdale , Roncesvalles",43.64896,-79.456325,0,Gift Shop,Cuban Restaurant,Italian Restaurant,Movie Theater,Dog Run,Eastern European Restaurant,Coffee Shop,Restaurant,Breakfast Spot,Dessert Shop


Finally, let's visualize the resulting clusters

In [37]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(West_toronto_merged['Latitude'], West_toronto_merged['Longitude'], West_toronto_merged['Neighborhood'], West_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters
Now, we will examine each cluster and determine the discriminating venue categories that distinguish each cluster.


#### Cluster 1

In [39]:
West_toronto_merged.loc[West_toronto_merged['Cluster Labels'] == 0, West_toronto_merged.columns[[2] + list(range(5, West_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"Parkdale , Roncesvalles",0,Gift Shop,Cuban Restaurant,Italian Restaurant,Movie Theater,Dog Run,Eastern European Restaurant,Coffee Shop,Restaurant,Breakfast Spot,Dessert Shop


#### Cluster 2

In [40]:
West_toronto_merged.loc[West_toronto_merged['Cluster Labels'] == 1, West_toronto_merged.columns[[2] + list(range(5, West_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Little Portugal , Trinity",1,Bar,Restaurant,Men's Store,Asian Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Café,French Restaurant,Korean Restaurant,Juice Bar
2,"Brockton , Parkdale Village , Exhibition Place",1,Café,Breakfast Spot,Coffee Shop,Nightclub,Convenience Store,Italian Restaurant,Performing Arts Venue,Pet Store,Climbing Gym,Intersection
3,"High Park , The Junction South",1,Mexican Restaurant,Thai Restaurant,Café,Speakeasy,Fast Food Restaurant,Music Venue,Italian Restaurant,Cajun / Creole Restaurant,Flea Market,Diner
5,"Runnymede , Swansea",1,Café,Coffee Shop,Italian Restaurant,Pizza Place,Pub,Sushi Restaurant,Latin American Restaurant,Dessert Shop,Juice Bar,Diner


#### Cluster 3

In [41]:
West_toronto_merged.loc[West_toronto_merged['Cluster Labels'] == 2, West_toronto_merged.columns[[2] + list(range(5, West_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Dufferin , Dovercourt Village",2,Pharmacy,Bakery,Music Venue,Pizza Place,Park,Café,Middle Eastern Restaurant,Brewery,Gym / Fitness Center,Wine Shop
