<a href="https://colab.research.google.com/github/dekabrsky/Coursera_Capstone/blob/main/5_2_Segmenting_and_Clustering_Neighborhoods_in_Toronto.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Segmenting and Clustering Neighborhoods in Toronto**

In [102]:
!pip install geocoder
import pandas as pd
import requests
import geocoder
from geopy.geocoders import Nominatim
import folium
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup



## **1. Dataframe of the postal code of each neighborhood along with the borough name and neighborhood name**

Import List of postal codes of Canada: M from Wikipedia, the free encyclopedia

In [103]:
List_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(List_url).text
soup = BeautifulSoup(source, 'xml')

In [104]:
page = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
page

<Response [200]>

HTML to soup

In [105]:
soup=BeautifulSoup(page.text,'xml')
print(soup.title)

<title>List of postal codes of Canada: M - Wikipedia</title>


Soup tables to dataframe + cleanup

In [106]:
cells=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        cells.append(cell)

df=pd.DataFrame(cells)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


Check for missing plugs

In [107]:
df['Neighborhood'].str.count("Not assigned").sum()

0

Check dataframe shape

In [108]:
df.shape

(103, 3)

## **2. Add Altitude and Longitide to DataSet**

Postal Code to Coordinates

In [109]:
data = pd.read_csv("https://cocl.us/Geospatial_data")
data

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


Combine our dataset with new data

In [110]:
new_data = df.join(data.set_index('Postal Code'), on='PostalCode', how='inner')
new_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


Check new_data shape

In [111]:
new_data.shape

(103, 5)

## **3. Explore and cluster the neighborhoods in Toronto**

Initializing Foursquare API credentials

In [112]:
CLIENT_ID = 'RABMLY5MFOJZHWMZGU5Y0PC4TE4E13MCOSB03ZTAO4240CLF' 
CLIENT_SECRET = 'XFUHZA0PTCU4OMSGKV2XP4AQCIY2CZLRK5MHVWRMTKDGZZPF'
VERSION = '20180606' # Foursquare API version

Function to get all the venue categories

In [113]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(name, lat, lng, v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Use it

In [114]:
venues = getNearbyVenues(new_data['Neighborhood'], new_data['Latitude'], new_data['Longitude'])
print(venues.shape)
print(venues.head)

(1332, 5)
<bound method NDFrame.head of                                           Neighbourhood  ...         Venue Category
0                                             Parkwoods  ...                   Park
1                                             Parkwoods  ...      Food & Drink Shop
2                                      Victoria Village  ...           Hockey Arena
3                                      Victoria Village  ...  Portuguese Restaurant
4                                      Victoria Village  ...            Coffee Shop
...                                                 ...  ...                    ...
1327  Mimico NW, The Queensway West, South of Bloor,...  ...   Fast Food Restaurant
1328  Mimico NW, The Queensway West, South of Bloor,...  ...            Social Club
1329  Mimico NW, The Queensway West, South of Bloor,...  ...            Flower Shop
1330  Mimico NW, The Queensway West, South of Bloor,...  ...          Tanning Salon
1331  Mimico NW, The Queensway West,

Analyze the data

In [115]:
group_by_n = venues.groupby('Neighbourhood')
group_by_n.head()


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,Coffee Shop
...,...,...,...,...,...
1318,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Wingporium,Wings Joint
1319,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,South St. Burger,Burger Joint
1320,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Dollarama,Discount Store
1321,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Healthy Planet,Supplement Shop


In [116]:
group_by_v = venues.groupby('Venue Category')
group_by_v.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,Coffee Shop
...,...,...,...,...,...
1321,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Healthy Planet,Supplement Shop
1325,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,RONA,Hardware Store
1328,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Royal Canadian Legion #210,Social Club
1329,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Islington Florist & Nursery,Flower Shop


We have 431 records for each neighbourhood and 232 different types of Venue Categories.

Now we will present our table as a Newtonian product (One Hot Coding).

In [117]:
venue = pd.get_dummies(venues[['Venue Category']])
venue['Neighbourhood'] = venues['Neighbourhood'] 

fixed_columns = [venue.columns[-1]] + list(venue.columns[:-1])
venue = venue[fixed_columns]
venue.head()

Unnamed: 0,Neighbourhood,Venue Category_Accessories Store,Venue Category_Adult Boutique,Venue Category_Airport,Venue Category_Airport Food Court,Venue Category_Airport Gate,Venue Category_Airport Lounge,Venue Category_Airport Service,Venue Category_Airport Terminal,Venue Category_American Restaurant,Venue Category_Antique Shop,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Arts & Crafts Store,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_Auto Garage,Venue Category_BBQ Joint,Venue Category_Baby Store,Venue Category_Bagel Shop,Venue Category_Bakery,Venue Category_Bank,Venue Category_Bar,Venue Category_Baseball Field,Venue Category_Basketball Court,Venue Category_Basketball Stadium,Venue Category_Beer Bar,Venue Category_Beer Store,Venue Category_Belgian Restaurant,Venue Category_Bike Shop,Venue Category_Bistro,Venue Category_Board Shop,Venue Category_Boat or Ferry,Venue Category_Bookstore,Venue Category_Boutique,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Bridal Shop,Venue Category_Bubble Tea Shop,Venue Category_Burger Joint,...,Venue Category_Seafood Restaurant,Venue Category_Shopping Mall,Venue Category_Shopping Plaza,Venue Category_Skate Park,Venue Category_Skating Rink,Venue Category_Smoke Shop,Venue Category_Smoothie Shop,Venue Category_Soccer Field,Venue Category_Social Club,Venue Category_Spa,Venue Category_Speakeasy,Venue Category_Sporting Goods Shop,Venue Category_Sports Bar,Venue Category_Stadium,Venue Category_Stationery Store,Venue Category_Steakhouse,Venue Category_Supermarket,Venue Category_Supplement Shop,Venue Category_Sushi Restaurant,Venue Category_Swim School,Venue Category_Tailor Shop,Venue Category_Taiwanese Restaurant,Venue Category_Tanning Salon,Venue Category_Tea Room,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Theme Restaurant,Venue Category_Tibetan Restaurant,Venue Category_Toy / Game Store,Venue Category_Trail,Venue Category_Train Station,Venue Category_Truck Stop,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Vietnamese Restaurant,Venue Category_Warehouse Store,Venue Category_Wine Bar,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Grouping the Neighbourhoods, calculating the mean venue categories in each.

In [118]:
groups = venue.groupby('Neighbourhood').mean().reset_index()
groups.head()

Unnamed: 0,Neighbourhood,Venue Category_Accessories Store,Venue Category_Adult Boutique,Venue Category_Airport,Venue Category_Airport Food Court,Venue Category_Airport Gate,Venue Category_Airport Lounge,Venue Category_Airport Service,Venue Category_Airport Terminal,Venue Category_American Restaurant,Venue Category_Antique Shop,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Arts & Crafts Store,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_Auto Garage,Venue Category_BBQ Joint,Venue Category_Baby Store,Venue Category_Bagel Shop,Venue Category_Bakery,Venue Category_Bank,Venue Category_Bar,Venue Category_Baseball Field,Venue Category_Basketball Court,Venue Category_Basketball Stadium,Venue Category_Beer Bar,Venue Category_Beer Store,Venue Category_Belgian Restaurant,Venue Category_Bike Shop,Venue Category_Bistro,Venue Category_Board Shop,Venue Category_Boat or Ferry,Venue Category_Bookstore,Venue Category_Boutique,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Bridal Shop,Venue Category_Bubble Tea Shop,Venue Category_Burger Joint,...,Venue Category_Seafood Restaurant,Venue Category_Shopping Mall,Venue Category_Shopping Plaza,Venue Category_Skate Park,Venue Category_Skating Rink,Venue Category_Smoke Shop,Venue Category_Smoothie Shop,Venue Category_Soccer Field,Venue Category_Social Club,Venue Category_Spa,Venue Category_Speakeasy,Venue Category_Sporting Goods Shop,Venue Category_Sports Bar,Venue Category_Stadium,Venue Category_Stationery Store,Venue Category_Steakhouse,Venue Category_Supermarket,Venue Category_Supplement Shop,Venue Category_Sushi Restaurant,Venue Category_Swim School,Venue Category_Tailor Shop,Venue Category_Taiwanese Restaurant,Venue Category_Tanning Salon,Venue Category_Tea Room,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Theme Restaurant,Venue Category_Tibetan Restaurant,Venue Category_Toy / Game Store,Venue Category_Trail,Venue Category_Train Station,Venue Category_Truck Stop,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Vietnamese Restaurant,Venue Category_Warehouse Store,Venue Category_Wine Bar,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,...,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Function for getting the most popular categories of establishments

In [119]:
def get_most_popular(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Use it

In [125]:
num_top = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighbourhood'] = groups['Neighbourhood']

for ind in np.arange(new_data.shape[1]):
    venues_sorted.iloc[ind, 1:] = get_most_popular(groups.iloc[ind, :], num_top)

venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Venue Category_Lounge,Venue Category_Latin American Restaurant,Venue Category_Skating Rink,Venue Category_Clothing Store,Venue Category_Breakfast Spot,Venue Category_Deli / Bodega,Venue Category_Eastern European Restaurant,Venue Category_Drugstore,Venue Category_Donut Shop,Venue Category_Dog Run
1,"Alderwood, Long Branch",Venue Category_Pizza Place,Venue Category_Sandwich Place,Venue Category_Coffee Shop,Venue Category_Pub,Venue Category_Gym,Venue Category_Airport Gate,Venue Category_Dance Studio,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Drugstore
2,"Bathurst Manor, Wilson Heights, Downsview North",Venue Category_Coffee Shop,Venue Category_Bank,Venue Category_Deli / Bodega,Venue Category_Supermarket,Venue Category_Diner,Venue Category_Shopping Mall,Venue Category_Bridal Shop,Venue Category_Sandwich Place,Venue Category_Restaurant,Venue Category_Ice Cream Shop
3,Bayview Village,Venue Category_Chinese Restaurant,Venue Category_Bank,Venue Category_Japanese Restaurant,Venue Category_Café,Venue Category_Yoga Studio,Venue Category_Department Store,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Drugstore,Venue Category_Donut Shop
4,"Bedford Park, Lawrence Manor East",Venue Category_Italian Restaurant,Venue Category_Sandwich Place,Venue Category_Coffee Shop,Venue Category_Restaurant,Venue Category_Liquor Store,Venue Category_Butcher,Venue Category_Indian Restaurant,Venue Category_Pub,Venue Category_Café,Venue Category_Sushi Restaurant


Cluster Model

In [126]:
k_num_clusters = 4

groups_clustering = groups.drop('Neighbourhood', 1)

kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(groups_clustering)
kmeans

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=4, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

Add cluster column to dataframe

In [129]:
new_data[:-3]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
95,M1X,Scarborough,Upper Rouge,43.836125,-79.205636
96,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
97,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.382280
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944


In [132]:
#venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
result = new_data[:-3]
result = result.join(venues_sorted)
result.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3,Agincourt,Venue Category_Lounge,Venue Category_Latin American Restaurant,Venue Category_Skating Rink,Venue Category_Clothing Store,Venue Category_Breakfast Spot,Venue Category_Deli / Bodega,Venue Category_Eastern European Restaurant,Venue Category_Drugstore,Venue Category_Donut Shop,Venue Category_Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,3,"Alderwood, Long Branch",Venue Category_Pizza Place,Venue Category_Sandwich Place,Venue Category_Coffee Shop,Venue Category_Pub,Venue Category_Gym,Venue Category_Airport Gate,Venue Category_Dance Studio,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Drugstore
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,3,"Bathurst Manor, Wilson Heights, Downsview North",Venue Category_Coffee Shop,Venue Category_Bank,Venue Category_Deli / Bodega,Venue Category_Supermarket,Venue Category_Diner,Venue Category_Shopping Mall,Venue Category_Bridal Shop,Venue Category_Sandwich Place,Venue Category_Restaurant,Venue Category_Ice Cream Shop
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3,Bayview Village,Venue Category_Chinese Restaurant,Venue Category_Bank,Venue Category_Japanese Restaurant,Venue Category_Café,Venue Category_Yoga Studio,Venue Category_Department Store,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Drugstore,Venue Category_Donut Shop
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,3,"Bedford Park, Lawrence Manor East",Venue Category_Italian Restaurant,Venue Category_Sandwich Place,Venue Category_Coffee Shop,Venue Category_Restaurant,Venue Category_Liquor Store,Venue Category_Butcher,Venue Category_Indian Restaurant,Venue Category_Pub,Venue Category_Café,Venue Category_Sushi Restaurant


In [134]:
result = result.dropna(subset=['Cluster Labels'])

In [136]:
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode('Toronto, Ontario')
latitude = location.latitude
longitude = location.longitude

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(result['Latitude'], result['Longitude'], result['Neighbourhood'], result
['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters