## Segmenting and clustering neighborhoods in Toronto

In this Notebook, I will use the location of neighborhoods in Toronto to explore its most common venues. Those venues will then be used as features to group the neighborhoods into clusters. I will then use the Folium Library to visualize the neighborhoods and the clusters 
* Table of Toronto postcode, boroughs, and neighborhoods https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
* Table latitdue and lognitudes of each neighood https://cocl.us/Geospatial_data

In [5]:
#jk
import numpy as np
import pandas as pd 

In [6]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0)
# selecting the fisrt table in the list of tables from the wiki
df = pd.DataFrame(dfs[0])
# getting rid of the rows that don't have an assigned borough 
df = df[df.Borough!='Not assigned']
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


since more than one neighbourhoods can be in the same postal code area , neighbourhoods with the same postcode were grouped then put in the same row of the neighbourhood column 

In [8]:
# groups by post code then changing the columns 
df = df.groupby('Postcode').agg({'Borough':'first',
                             'Neighbourhood': ', '.join}).reset_index()
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


 The 'Not assigned' values in the neighbourhood column were replaced with the boroughs' name 

In [9]:
df[df.Neighbourhood=='Not assigned']

Unnamed: 0,Postcode,Borough,Neighbourhood
85,M7A,Queen's Park,Not assigned


In [10]:
# replacing the 'Not assigned' values of the neighbourhood column to the Borough  
df.Neighbourhood.replace('Not assigned',df.Borough,inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [11]:
df.shape

(103, 3)

### Part 2

In order to use the FourSquare location data to explore the venues in each neighborhood, the latitude and longitude of each neighborhood is needed. The Latitude and Longitude columns were added to the table with postcode, borough, and neighborhood. 
* Table latitdue and lognitudes of each neighood https://cocl.us/Geospatial_data

In [12]:
import io
import requests

In [14]:
url = "https://cocl.us/Geospatial_data"
s = requests.get(url).content
cord = pd.read_csv(io.StringIO(s.decode('utf-8')))
cord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
df['Latitude']= cord.Latitude
df['Longitude']= cord.Longitude

In [19]:
print(df.shape)
df.head()

(103, 5)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Part 3

Below I will explore and cluster the neighborhoods in Toronto.  

In [20]:
!conda install -c conda-forge folium=0.5.0 --yes 
!conda install -c conda-forge geopy --yes

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


R:\Python\Coursera\projects\Coursera_Capstone>set "JAVA_HOME=C:\Program Files\jdk-11.0.2" 

R:\Python\Coursera\projects\Coursera_Capstone>set "JAVA_HOME_CONDA_BACKUP=" 

R:\Python\Coursera\projects\Coursera_Capstone>set "JAVA_HOME_CONDA_BACKUP=C:\Program Files\jdk-11.0.2" 

R:\Python\Coursera\projects\Coursera_Capstone>set "JAVA_HOME=C:\Users\jagan\Anaconda3\Library" 
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


R:\Python\Coursera\projects\Coursera_Capstone>set "JAVA_HOME=C:\Program Files\jdk-11.0.2" 

R:\Python\Coursera\projects\Coursera_Capstone>set "JAVA_HOME_CONDA_BACKUP=" 

R:\Python\Coursera\projects\Coursera_Capstone>set "JAVA_HOME_CONDA_BACKUP=C:\Program Files\jdk-11.0.2" 

R:\Python\Coursera\projects\Coursera_Capstone>set "J

In [21]:
import folium
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
from sklearn.cluster import KMeans

The geopy library was used to get the latitude and longitude of Toronto to create a folium map of it 

In [22]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="TO_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


 ### Map of Toronto with neighborhoods superimposed on top.

In [23]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for pc, lat, lng, label in zip(df['Postcode'],df['Latitude'], df['Longitude'], df['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Define Foursquare Credentials and Version

In [24]:
CLIENT_ID = 'WIPZANI2V24YXWFY5ZMZDFJQUEU0TU4104O5QFU5IZTQWNPI' # your Foursquare ID
CLIENT_SECRET = 'OL4S1A14HKCMGLC4EBYUXLVON0PO0PYSY4LMQUI4JQMTZCFJ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#### This function explores the 30 top venues within a 500 meter radius for all the included Toronto neighborhoods 

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=30):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Using the function above and creating a dataframe with Neighborhood and venue location

In [26]:
tor_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )
tor_venues

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge, Malvern",43.806686,-79.194353,Interprovincial Group,43.805630,-79.200378,Print Shop
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
3,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
5,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
6,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.190720,Mexican Restaurant
7,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
8,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center
9,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Lawrence Ave E & Kingston Rd,43.767704,-79.189490,Intersection


In [27]:
print(tor_venues.shape)
tor_venues.head()

(1322, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge, Malvern",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
3,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place


### Analyze Each Neighborhood

In [28]:
# one hot encoding
tor_onehot = pd.get_dummies(tor_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tor_onehot['Neighborhood'] = tor_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [tor_onehot.columns[-1]] + list(tor_onehot.columns[:-1])
tor_onehot = tor_onehot[fixed_columns]

tor_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### grouping the rows by neighborhood and taking the mean of the frequency of occurrence of each venue category

In [29]:
tor_grouped = tor_onehot.groupby('Neighborhood').mean().reset_index()
tor_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.033333,...,0.000000,0.0,0.033333,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
1,Agincourt,0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
4,"Alderwood, Long Branch",0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
5,"Bathurst Manor, Downsview North, Wilson Heights",0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.000000,0.0,0.052632,0.000000,0.000000,0.000000,0.000000,0.0
6,Bayview Village,0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
7,"Bedford Park, Lawrence Manor East",0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.041667,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
8,Berczy Park,0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.033333,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
9,"Birch Cliff, Cliffside West",0.000000,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0


#### Each neighborhood along with its top 5 most common venues

In [30]:
num_top_venues = 5

for hood in tor_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = tor_grouped[tor_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
              venue  freq
0              Café  0.10
1       Pizza Place  0.07
2             Hotel  0.07
3        Steakhouse  0.07
4  Asian Restaurant  0.07


----Agincourt----
              venue  freq
0            Lounge  0.33
1    Sandwich Place  0.33
2    Breakfast Spot  0.33
3  Malay Restaurant  0.00
4            Market  0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
              venue  freq
0        Playground   0.5
1              Park   0.5
2             Motel   0.0
3  Malay Restaurant   0.0
4            Market   0.0


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                  venue  freq
0         Grocery Store   0.2
1              Pharmacy   0.1
2  Fast Food Restaurant   0.1
3           Pizza Place   0.1
4            Beer Store   0.1


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.25
1             Gym  0.12
2     C

            venue  freq
0  Clothing Store  0.13
1     Coffee Shop  0.13
2   Smoothie Shop  0.03
3    Liquor Store  0.03
4   Movie Theater  0.03


----First Canadian Place, Underground city----
           venue  freq
0           Café  0.13
1    Coffee Shop  0.10
2  Deli / Bodega  0.07
3     Restaurant  0.07
4     Steakhouse  0.07


----Flemingdon Park, Don Mills South----
              venue  freq
0               Gym  0.09
1        Beer Store  0.09
2       Coffee Shop  0.09
3  Asian Restaurant  0.09
4      Concert Hall  0.04


----Forest Hill North, Forest Hill West----
                venue  freq
0    Sushi Restaurant  0.25
1       Jewelry Store  0.25
2               Trail  0.25
3  Mexican Restaurant  0.25
4         Yoga Studio  0.00


----Glencairn----
                 venue  freq
0          Pizza Place  0.25
1                 Park  0.25
2                  Pub  0.25
3  Japanese Restaurant  0.25
4  Monument / Landmark  0.00


----Guildwood, Morningside, West Hill----
               ven

4  Malay Restaurant   0.0


----Thorncliffe Park----
               venue  freq
0     Sandwich Place  0.12
1  Indian Restaurant  0.12
2       Burger Joint  0.12
3        Yoga Studio  0.06
4               Bank  0.06


----Victoria Village----
                   venue  freq
0            Pizza Place   0.2
1            Coffee Shop   0.2
2  Portuguese Restaurant   0.2
3           Hockey Arena   0.2
4      French Restaurant   0.2


----Westmount----
                venue  freq
0         Pizza Place  0.29
1  Chinese Restaurant  0.14
2      Discount Store  0.14
3      Sandwich Place  0.14
4        Intersection  0.14


----Willowdale South----
              venue  freq
0  Ramen Restaurant  0.10
1              Café  0.07
2    Sandwich Place  0.07
3       Coffee Shop  0.07
4   Bubble Tea Shop  0.03


----Willowdale West----
            venue  freq
0    Home Service  0.14
1        Pharmacy  0.14
2         Butcher  0.14
3     Coffee Shop  0.14
4  Discount Store  0.14


----Woburn----
              

### Pandas dataframe of the top 10 venues in each neighborhood

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = tor_grouped['Neighborhood']

for ind in np.arange(tor_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tor_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Café,Asian Restaurant,Steakhouse,Hotel,Pizza Place,Gastropub,Smoke Shop,Lounge,Speakeasy,Bar
1,Agincourt,Lounge,Breakfast Spot,Sandwich Place,Women's Store,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Playground,Park,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pizza Place,Fried Chicken Joint,Sandwich Place,Discount Store,Fast Food Restaurant,Beer Store,Japanese Restaurant,Pharmacy,Garden
4,"Alderwood, Long Branch",Pizza Place,Skating Rink,Coffee Shop,Pharmacy,Pub,Sandwich Place,Gym,Airport Service,Deli / Bodega,Empanada Restaurant


### Clustering the neighborhoods using k-means 

In [33]:
# set number of clusters
kclusters = 6

tor_clustering = tor_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(tor_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 4, 3, 3, 3, 0, 0, 0, 0])

changed the name of the neighborhood column for the table merge done below 

In [34]:
df.rename({'Neighbourhood': 'Neighborhood'}, axis=1, inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


creating a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [35]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tor_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
tor_merged = tor_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

tor_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,3.0,Fast Food Restaurant,Print Shop,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,Construction & Landscaping,Bar,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,3.0,Pizza Place,Electronics Store,Breakfast Spot,Medical Center,Intersection,Rental Car Location,Mexican Restaurant,Dessert Shop,Eastern European Restaurant,Dumpling Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,3.0,Coffee Shop,Korean Restaurant,Women's Store,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Bakery,Fried Chicken Joint,Hakka Restaurant,Bank,Athletics & Sports,Thai Restaurant,Caribbean Restaurant,Discount Store,Dim Sum Restaurant,Diner


checking the merged dataframe for missing values before adding cluters to map 

In [36]:
tor_merged[tor_merged['Cluster Labels'].isnull()]

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,,,,,,,,,,,
21,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,,,,,,,,,,,
93,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,,,,,,
94,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724,,,,,,,,,,,
98,M9N,York,Weston,43.706876,-79.518188,,,,,,,,,,,


dropping rows with missing information and changing the Cluster Labels column from float to int 

In [37]:
tor_merged.dropna(inplace=True)
tor_merged['Cluster Labels'] = tor_merged['Cluster Labels'].astype(int)

In [38]:
tor_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,3,Fast Food Restaurant,Print Shop,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0,Construction & Landscaping,Bar,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,3,Pizza Place,Electronics Store,Breakfast Spot,Medical Center,Intersection,Rental Car Location,Mexican Restaurant,Dessert Shop,Eastern European Restaurant,Dumpling Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,3,Coffee Shop,Korean Restaurant,Women's Store,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0,Bakery,Fried Chicken Joint,Hakka Restaurant,Bank,Athletics & Sports,Thai Restaurant,Caribbean Restaurant,Discount Store,Dim Sum Restaurant,Diner


### Map of Toronto with clusters 

In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tor_merged['Latitude'], tor_merged['Longitude'], tor_merged['Neighborhood'], tor_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

#### Cluster 1

In [40]:
tor_merged.loc[tor_merged['Cluster Labels'] == 0, tor_merged.columns[[1] + list(range(5, tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,0,Construction & Landscaping,Bar,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
4,Scarborough,0,Bakery,Fried Chicken Joint,Hakka Restaurant,Bank,Athletics & Sports,Thai Restaurant,Caribbean Restaurant,Discount Store,Dim Sum Restaurant,Diner
6,Scarborough,0,Discount Store,Coffee Shop,Hobby Shop,Bus Station,Department Store,Women's Store,Dessert Shop,Ethiopian Restaurant,Empanada Restaurant,Electronics Store
7,Scarborough,0,Bus Line,Bakery,Metro Station,Park,Fast Food Restaurant,Bus Station,Intersection,Soccer Field,Women's Store,Electronics Store
8,Scarborough,0,Motel,American Restaurant,Deli / Bodega,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
9,Scarborough,0,College Stadium,General Entertainment,Skating Rink,Café,Concert Hall,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
10,Scarborough,0,Indian Restaurant,Latin American Restaurant,Vietnamese Restaurant,Chinese Restaurant,Pet Store,General Entertainment,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
11,Scarborough,0,Breakfast Spot,Smoke Shop,Bakery,Middle Eastern Restaurant,Women's Store,Event Space,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
17,North York,0,Golf Course,Pool,Dog Run,Mediterranean Restaurant,Women's Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
18,North York,0,Coffee Shop,Clothing Store,Theater,Fast Food Restaurant,Bakery,Liquor Store,Japanese Restaurant,Restaurant,Candy Store,Food Court


#### Cluster 2 

In [41]:
tor_merged.loc[tor_merged['Cluster Labels'] == 1, tor_merged.columns[[1] + list(range(5, tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,North York,1,Park,Convenience Store,Bank,Women's Store,Dessert Shop,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
25,North York,1,Park,Food & Drink Shop,Bus Stop,Women's Store,Department Store,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
30,North York,1,Airport,Park,Women's Store,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
40,East York,1,Park,Coffee Shop,Convenience Store,Women's Store,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
44,Central Toronto,1,Park,Swim School,Bus Line,Lawyer,Women's Store,Dessert Shop,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
50,Downtown Toronto,1,Park,Playground,Trail,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
63,Central Toronto,1,Garden,Women's Store,Deli / Bodega,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
72,North York,1,Pizza Place,Park,Pub,Japanese Restaurant,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
74,York,1,Park,Women's Store,Fast Food Restaurant,Market,Department Store,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
79,North York,1,Park,Construction & Landscaping,Basketball Court,Bakery,Women's Store,Dim Sum Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store


#### Cluster 3

In [42]:
tor_merged.loc[tor_merged['Cluster Labels'] == 2, tor_merged.columns[[1] + list(range(5, tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,North York,2,Cafeteria,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run,Discount Store


#### Cluster 4

In [43]:
tor_merged.loc[tor_merged['Cluster Labels'] == 3, tor_merged.columns[[1] + list(range(5, tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,3,Fast Food Restaurant,Print Shop,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
2,Scarborough,3,Pizza Place,Electronics Store,Breakfast Spot,Medical Center,Intersection,Rental Car Location,Mexican Restaurant,Dessert Shop,Eastern European Restaurant,Dumpling Restaurant
3,Scarborough,3,Coffee Shop,Korean Restaurant,Women's Store,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
12,Scarborough,3,Lounge,Breakfast Spot,Sandwich Place,Women's Store,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
13,Scarborough,3,Pizza Place,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Bank,Thai Restaurant,Fried Chicken Joint,Rental Car Location,Breakfast Spot,Noodle House
15,Scarborough,3,Fast Food Restaurant,Chinese Restaurant,Pizza Place,Sandwich Place,Grocery Store,Coffee Shop,Pharmacy,Indian Restaurant,Camera Store,American Restaurant
24,North York,3,Pizza Place,Butcher,Home Service,Discount Store,Pharmacy,Coffee Shop,Grocery Store,Airport Service,Dim Sum Restaurant,Falafel Restaurant
28,North York,3,Coffee Shop,Pharmacy,Sushi Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,Fast Food Restaurant,Diner,Deli / Bodega,Middle Eastern Restaurant,Pizza Place
29,North York,3,Falafel Restaurant,Coffee Shop,Massage Studio,Bar,Caribbean Restaurant,Metro Station,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant
34,North York,3,Pizza Place,Portuguese Restaurant,Hockey Arena,French Restaurant,Coffee Shop,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Electronics Store,Dance Studio


#### Cluster 5

In [44]:
tor_merged.loc[tor_merged['Cluster Labels'] == 4, tor_merged.columns[[1] + list(range(5, tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough,4,Playground,Women's Store,Deli / Bodega,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run
14,Scarborough,4,Playground,Park,Women's Store,Deli / Bodega,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dog Run


#### Cluster 6 

In [45]:
tor_merged.loc[tor_merged['Cluster Labels'] == 5, tor_merged.columns[[1] + list(range(5, tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Etobicoke,5,Pool,Baseball Field,Women's Store,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
97,North York,5,Baseball Field,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
