# Segmenting and Clustering of Neighbourhoods in Toronto

Before we get the data and start changing it, let's import the libraries that we will need.

In [152]:
import numpy as np
import pandas as pd

#### 1. Read data from the website and transform into a _pandas_ dataframe

In [153]:
# Reading the data and creating a data frame
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_wiki_can=pd.read_html(url, header=0)[0]

df_wiki_can.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### 2. Ignore cells with a borough that is Not assigned

In [154]:
# drop all rows in the data frame if value of Borough='Not assigned'
df_wiki_can = df_wiki_can.set_index("Borough")
df_wiki_can = df_wiki_can.drop("Not assigned", axis=0)

df_wiki_can.head()

Unnamed: 0_level_0,Postcode,Neighbourhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
North York,M3A,Parkwoods
North York,M4A,Victoria Village
Downtown Toronto,M5A,Harbourfront
Downtown Toronto,M5A,Regent Park
North York,M6A,Lawrence Heights


In [155]:
# reset index from column Borough
df_wiki_can = df_wiki_can.reset_index()

#### 3. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough

In [156]:
# Replace Neighbourhood with Borough if Neighbourhood=='Not assigned'
df_wiki_can.Neighbourhood = np.where(df_wiki_can.Neighbourhood.eq('Not assigned'), df_wiki_can.Borough, df_wiki_can.Neighbourhood)

#### 4. If more than one rows with same Postcode, the rows are combined with neighbourhoods seperated with a comma

In [157]:
# grouping rows with same post code values
df_wiki_can = df_wiki_can.groupby(['Postcode','Borough'], sort=False).Neighbourhood.apply(','.join).reset_index(name='Neighbourhood')

In [158]:
df_wiki_can.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


In [159]:
df_wiki_can.shape

(103, 3)

####  5. Read Latituide and Longitdue data from csv and transform into a _pandas_ data frame

In [160]:
df_geo_can = pd.read_csv('https://cocl.us/Geospatial_data')

print ('Data read into a pandas dataframe!')

Data read into a pandas dataframe!


In [161]:
df_geo_can.shape

(103, 3)

#### 6. Match the headings of both data frames and Merge them

In [162]:
df_geo_can.rename(columns={'Postal Code':'Postcode'}, inplace=True)
df_geo_can.head()#check if changes applied

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### 7. Merge the two data frames and assign results to a new data frame

In [174]:
df_can_geo = df_wiki_can.merge(df_geo_can, on='Postcode')
df_can_geo.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


In [175]:
df_can_geo.shape

(103, 5)

### Exploring the Data

Before we get to explore the data , let's import and download all the dependencies that we will need.

In [167]:
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

#### 8. Use geopy library to get the latitude and longitude values of Toronto, Canada

In [168]:
#get the longitude and latitude data for Toronto
address = 'Toronto, Canada'
geolocater = Nominatim(user_agent = 'tr_explorer')
location = geolocater.geocode(address)
latitude = location.latitude
longitude = location.longitude 
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


#### 9. Create a map of Toronto with neighbourhoods superimposed on top

In [176]:
#create the map of Toronto, Canada
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

#Adding markers to map
for lat, lng, borough, neighborhood, in zip(df_can_geo['Latitude'], df_can_geo['Longitude'], df_can_geo['Borough'], df_can_geo['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='#31866cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
map_toronto

In [177]:
df_can_geo['Borough'].unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

#### 10. let's simplify the above map and segment and cluster only the neighbourhood in Downtown Toronto. So let's slice the original dataframe and create a new dataframe

In [204]:
df_toronto = df_can_geo[df_can_geo['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
df_toronto.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


In [205]:
df_toronto.shape

(18, 5)

Let's get the coordinates of Downtown Toronto

In [206]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="tr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto, Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto, Toronto are 43.6541737, -79.3808116451341.


In [207]:
# create map of York using latitude and longitude values
map_dt = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dt)  
    
map_dt

Next we will utilize the Foursquare API to explore the neighbourhoods and segment them

#### 11. Define Foursquare Credentials and Version

In [265]:
#REMOVED FOR SECURITY REASONS
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20190811' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


#### 12. Let's explore a neighbourhood in the data frame

Let's get a neighbourhood name

In [209]:
nh_name = df_toronto.loc[2,'Neighbourhood']
print(nh_name)

St. James Town


Now, lets get the coordinate values of "St. James Town"

In [210]:
nh_lat = df_toronto.loc[2, 'Latitude'] # neighborhood latitude value
nh_lon = df_toronto.loc[2, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(nh_name, 
                                                               nh_lat, 
                                                               nh_lon))

Latitude and longitude values of St. James Town are 43.6514939, -79.3754179.


Now let's get the top 10 venues that are in Weston within a radius of 500 meters

In [226]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    nh_lat, 
    nh_lon, 
    radius, 
    LIMIT)

In [227]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d4fc8923d0cad0030b1ad07'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'St. Lawrence',
  'headerFullLocation': 'St. Lawrence, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 127,
  'suggestedBounds': {'ne': {'lat': 43.6559939045, 'lng': -79.36921018606671},
   'sw': {'lat': 43.646993895499996, 'lng': -79.3816256139333}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '574ad72238fa943556d93b8e',
       'name': 'Gyu-Kaku Japanese BBQ',
       'location': {'address': '81 Church St',
        'crossStreet': 'at Adelaide St E',
        'lat': 43.651422275497914,
        'lng': -79.37504693687086,
        'labeledLatLngs'

In [228]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Gyu-Kaku Japanese BBQ,Japanese Restaurant,43.651422,-79.375047
1,Crepe TO,Creperie,43.650063,-79.374587
2,Terroni,Italian Restaurant,43.650927,-79.375602
3,GEORGE Restaurant,Restaurant,43.653346,-79.374445
4,Pearl Diver,Gastropub,43.651481,-79.3736


In [229]:

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


Let's create a function to repeat the process above with each neighbourhood in Downtown Toronto

In [230]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['nh_name', 
                  'nh_lat', 
                  'nh_lon', 
                  'Venue', 
                  'Venue Latitude', 'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we write the code to run the above function on each neighborhood and create a new dataframe called dt_venues

In [231]:
dt_venues = getNearbyVenues(names=df_toronto['Neighbourhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )

Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown,St. James Town
First Canadian Place,Underground city
Church and Wellesley


In [232]:
dt_venues.shape

(1277, 7)

In [233]:
dt_venues.head()

Unnamed: 0,nh_name,nh_lat,nh_lon,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbourfront,Regent Park",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Harbourfront,Regent Park",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Harbourfront,Regent Park",43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,"Harbourfront,Regent Park",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Harbourfront,Regent Park",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


let's check how many venues were returned for each neighbourhood

In [234]:
dt_venues.groupby('nh_name').count()

Unnamed: 0_level_0,nh_lat,nh_lon,Venue,Venue Latitude,Venue Longitude,Venue Category
nh_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,56,56,56,56,56,56
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",16,16,16,16,16,16
"Cabbagetown,St. James Town",46,46,46,46,46,46
Central Bay Street,82,82,82,82,82,82
"Chinatown,Grange Park,Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,82,82,82,82,82,82
"Commerce Court,Victoria Hotel",100,100,100,100,100,100
"Design Exchange,Toronto Dominion Centre",100,100,100,100,100,100


Lets check the no.of unique catagories

In [236]:
print('There are {} uniques categories.'.format(len(dt_venues['Venue Category'].unique())))

There are 201 uniques categories.


#### 13. Analyse Each Neighbourhood

In [239]:
# one hot encoding
dt_onehot = pd.get_dummies(dt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighborhood'] = dt_venues['nh_name'] 

# move neighborhood column to the first column
fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])
dt_onehot = dt_onehot[fixed_columns]

dt_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [240]:
dt_onehot.shape

(1277, 201)

Next, let's group rows by neighbourhood and by taking the mean of the frequency of occurrence of each category

In [242]:
dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()
dt_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0625,0.0625,0.0625,0.125,0.1875,0.125,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0
5,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.04,0.01,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.012195,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,0.012195,0.0,0.0,0.0,0.0,0.0,0.012195,0.012195,0.0,0.012195
8,"Commerce Court,Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
9,"Design Exchange,Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0


In [243]:
dt_grouped.shape

(18, 201)

Let's print each neighborhood along with the top 3 most common venues

In [244]:
num_top_venues = 3

for hood in dt_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dt_grouped[dt_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
         venue  freq
0  Coffee Shop  0.08
1         Café  0.05
2          Bar  0.04


----Berczy Park----
          venue  freq
0   Coffee Shop  0.09
1  Cocktail Bar  0.05
2      Beer Bar  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
              venue  freq
0   Airport Service  0.19
1    Airport Lounge  0.12
2  Airport Terminal  0.12


----Cabbagetown,St. James Town----
                venue  freq
0          Restaurant  0.07
1         Coffee Shop  0.07
2  Italian Restaurant  0.04


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.15
1                Café  0.06
2  Italian Restaurant  0.05


----Chinatown,Grange Park,Kensington Market----
                           venue  freq
0                           Café  0.06
1  Vegetarian / Vegan Restaurant  0.06
2             Chinese Restaurant  0.05


----Christie----
           venue  freq
0           Café  0.19

let's write a function to sort the venues in descending order.

In [245]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [246]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']

for ind in np.arange(dt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Hotel,Breakfast Spot,Restaurant,Gym,Asian Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Steakhouse,Cheese Shop,Beer Bar,Seafood Restaurant,Farmers Market,Café,French Restaurant
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Terminal,Airport Lounge,Coffee Shop,Boat or Ferry,Bar,Sculpture Garden,Boutique,Airport Gate,Airport
3,"Cabbagetown,St. James Town",Restaurant,Coffee Shop,Bakery,Café,Pizza Place,Flower Shop,Italian Restaurant,Pub,Sandwich Place,Jewelry Store
4,Central Bay Street,Coffee Shop,Café,Ice Cream Shop,Italian Restaurant,Burger Joint,Sandwich Place,Bubble Tea Shop,Chinese Restaurant,Salad Place,Bakery


### Cluster Neighbourhoods 

Run k-means to cluster the neighborhood into 5 clusters.

In [254]:
#set number of clusters
kclusters = 5

dt_clustered = dt_grouped.drop('Neighborhood', 1)

#run k-means clustering
kmeans = KMeans( n_clusters=kclusters, init='k-means++',random_state=0, n_init=10).fit(dt_clustered)

#check cluster labels
kmeans.labels_[0:10]

array([0, 0, 3, 0, 0, 2, 4, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [256]:
# add clustering labels
# neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_) 
dt_merged = df_toronto
#merge
dt_merged = dt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')
dt_merged.reset_index
dt_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Café,Breakfast Spot,Pub,Mexican Restaurant,Shoe Store,Brewery,Restaurant
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Middle Eastern Restaurant,Bubble Tea Shop,Restaurant,Diner,Ice Cream Shop,Italian Restaurant
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Restaurant,Hotel,Italian Restaurant,Cosmetics Shop,Gastropub,Beer Bar,Breakfast Spot,Cocktail Bar
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Cocktail Bar,Bakery,Steakhouse,Cheese Shop,Beer Bar,Seafood Restaurant,Farmers Market,Café,French Restaurant
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Café,Ice Cream Shop,Italian Restaurant,Burger Joint,Sandwich Place,Bubble Tea Shop,Chinese Restaurant,Salad Place,Bakery


Let's visualize the Clusters

In [257]:
# create map
map_cl = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighbourhood'], dt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    print(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow,
        fill=True,
        fill_color=rainbow,
        fill_opacity=0.7).add_to(map_cl)
       
map_cl

0
0
0
0
0
4
0
0
0
0
2
2
3
1
0
0
0
0


### Examine Cluster

#### Cluster 1

In [258]:
dt_merged.loc[dt_merged['Cluster Labels'] == 0, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Park,Bakery,Café,Breakfast Spot,Pub,Mexican Restaurant,Shoe Store,Brewery,Restaurant
1,Downtown Toronto,0,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Middle Eastern Restaurant,Bubble Tea Shop,Restaurant,Diner,Ice Cream Shop,Italian Restaurant
2,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Italian Restaurant,Cosmetics Shop,Gastropub,Beer Bar,Breakfast Spot,Cocktail Bar
3,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Bakery,Steakhouse,Cheese Shop,Beer Bar,Seafood Restaurant,Farmers Market,Café,French Restaurant
4,Downtown Toronto,0,Coffee Shop,Café,Ice Cream Shop,Italian Restaurant,Burger Joint,Sandwich Place,Bubble Tea Shop,Chinese Restaurant,Salad Place,Bakery
6,Downtown Toronto,0,Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Hotel,Breakfast Spot,Restaurant,Gym,Asian Restaurant
7,Downtown Toronto,0,Coffee Shop,Hotel,Aquarium,Italian Restaurant,Café,Scenic Lookout,Bakery,Pizza Place,Brewery,Sporting Goods Shop
8,Downtown Toronto,0,Coffee Shop,Café,Hotel,Restaurant,Italian Restaurant,Deli / Bodega,Gastropub,Bar,Bakery,Gym
9,Downtown Toronto,0,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Deli / Bodega,Gastropub,Bakery,Seafood Restaurant,Gym
14,Downtown Toronto,0,Coffee Shop,Restaurant,Café,Beer Bar,Seafood Restaurant,Hotel,Fast Food Restaurant,Cocktail Bar,Art Gallery,Breakfast Spot


#### Cluster 2

In [259]:
dt_merged.loc[dt_merged['Cluster Labels'] == 1, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,1,Park,Playground,Trail,Building,Deli / Bodega,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Dive Bar


#### Cluster 3

In [260]:
dt_merged.loc[dt_merged['Cluster Labels'] == 2, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,2,Café,Bookstore,Restaurant,Bar,Bakery,Japanese Restaurant,Gym,College Arts Building,Italian Restaurant,Poutine Place
11,Downtown Toronto,2,Vegetarian / Vegan Restaurant,Café,Bar,Chinese Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Bakery,Caribbean Restaurant


#### Cluster 4

In [261]:
dt_merged.loc[dt_merged['Cluster Labels'] == 3, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,3,Airport Service,Airport Terminal,Airport Lounge,Coffee Shop,Boat or Ferry,Bar,Sculpture Garden,Boutique,Airport Gate,Airport


#### Cluster 5

In [262]:
dt_merged.loc[dt_merged['Cluster Labels'] == 4, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Downtown Toronto,4,Grocery Store,Café,Park,Nightclub,Diner,Italian Restaurant,Baby Store,Athletics & Sports,Restaurant,Coffee Shop
