# IBM Data Science Captstone Project Python Notebook

We will use this notebook to complete Capstone project

In [1]:
import pandas as pd
import numpy as np

In [2]:
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


# Requirement

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11  in the above table.
- If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.
- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

# Q1 - Shape of our data

In [3]:
import requests
from bs4 import BeautifulSoup

In [4]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
wiki_url = requests.get(url)
wiki_url = requests.get(url).text

In [5]:
soup=BeautifulSoup(wiki_url,"html.parser")

Lets generate dataframe from the data we scraped

In [84]:
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighbourhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})


In [85]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


We will now group our data based on Postal Code

In [86]:
df = df.groupby(['PostalCode']).head()
df

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [9]:
df.shape

(103, 3)

For 1st Section, we have 103 rows and 3 columns in our df 

# Q2 - Geocoder

In [12]:
#!pip install geocoder

In [13]:
import geocoder

# initialize your variable to None
lat_lng_coords = None
postal_code = 'M1B'

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

Worked on the above code however, it was taking ages to run so will use csv file which was provided 

In [14]:
data = pd.read_csv('Geospatial_Coordinates.csv')
data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Let's check the rows and columns of our datasets and df

In [15]:
print("The shape of df data is: ", df.shape)
print("The shape of data is: ", data.shape)

The shape of df data is:  (103, 3)
The shape of data is:  (103, 3)


Since both datasets have similar columns and rows, we can join our data, lets see how many common columns we have to join our dataset

In [16]:
print("Column names in df dataset is as follows: ",df.columns)
print("Column names in data is as follows: ", data.columns)

Column names in df dataset is as follows:  Index(['PostalCode', 'Borough', 'Neighborhood'], dtype='object')
Column names in data is as follows:  Index(['Postal Code', 'Latitude', 'Longitude'], dtype='object')


We can use Postal Code as a column to join our dataset

Since column name on data is Postal Code lets change it to PostalCode so it matches with df data

In [87]:
data.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
data.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [88]:
combined_data = df.join(data.set_index('PostalCode'), on='PostalCode', how='inner')
combined_data

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


We have sucessfully joined our df with data report

Lets check the shape of our new data

In [19]:
combined_data.shape

(103, 5)

## End of 2nd Question

----------------------------- #### --------------------------

# Quetion 3

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

- to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
- to generate maps to visualize your neighborhoods and how they cluster together

Lets get the co-ordinates of Toronto, Ontario

In [21]:
#!pip install geopy

In [22]:
import geocoder
from geopy.geocoders import Nominatim 

address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('These are the coordinates of Toronto {}, {}.'.format(latitude,longitude))

These are the coordinates of Toronto 43.6534817, -79.3839347.


## Now we will create the map of Toronto

In [24]:
#!pip install folium

In [25]:
import folium

In [89]:
# Map of Toronto
map_Toronto = folium.Map(location=[latitude, longitude],zoom_start=12)

# Adding Markers
for latitude, longitude, borough, neighbourhood in zip(combined_data['Latitude'],combined_data['Longitude'],
                                                      combined_data['Borough'],combined_data['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood,borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [latitude,longitude],
    radius=5,
    popup=label,
    color='green',
    fill=True).add_to(map_Toronto)

map_Toronto

Initilizing Foursquare API credientials

In [27]:
CLIENT_ID = 'CPU0ZRB5X0E2P0ZNNCCORUDHHU0P2ZZ3OLEQAWP5J2GCPIDY' 
CLIENT_SECRET = '2ZMGRNSU0ENK5UWVMR30HSSR5K1RLXCEVW0GJBKFW2EHJHT1'
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CPU0ZRB5X0E2P0ZNNCCORUDHHU0P2ZZ3OLEQAWP5J2GCPIDY
CLIENT_SECRET:2ZMGRNSU0ENK5UWVMR30HSSR5K1RLXCEVW0GJBKFW2EHJHT1


Creating a function to get all the venues categories in Toronto

Seeing the venues in Toronto for each Neighborhood

In [90]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [91]:
venues_in_toronto = getNearbyVenues(combined_data['Neighbourhood'], combined_data['Latitude'], combined_data['Longitude'])


Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

In [92]:
venues_in_toronto.shape

(1342, 5)

So we have 1342 rows and 5 columns in the data, lets look at our data

In [93]:
venues_in_toronto.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,KFC,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,Coffee Shop


Lets check the venues by Neighborhood

In [94]:
venues_in_toronto.groupby('Neighbourhood').head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,KFC,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,Coffee Shop
...,...,...,...,...,...
1326,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,South St. Burger,Burger Joint
1327,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Wingporium,Wings Joint
1328,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Dollarama,Discount Store
1329,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Healthy Planet,Supplement Shop


In [95]:
venues_in_toronto.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Ardene Shoes Outlet
Adult Boutique,Church and Wellesley,43.665860,-79.383160,Seduction
Airport,Downsview East,43.737473,-79.394420,Toronto Downsview Airport (YZD)
Airport Food Court,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Café
Airport Gate,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Gate 8
...,...,...,...,...
Wine Bar,"Little Portugal, Trinity",43.653206,-79.400049,Paris Paris Bar
Wine Shop,"Dufferin, Dovercourt Village",43.669005,-79.442259,Macedo
Wings Joint,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Wingporium
Women's Store,Caledonia-Fairbanks,43.689026,-79.453512,Maximum Woman


In [96]:
venues_in_toronto.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,1,1,1,1
Adult Boutique,1,1,1,1
Airport,2,2,2,2
Airport Food Court,1,1,1,1
Airport Gate,1,1,1,1
...,...,...,...,...
Wine Bar,2,2,2,2
Wine Shop,1,1,1,1
Wings Joint,1,1,1,1
Women's Store,1,1,1,1


So there are around 430 Neighborhood and 244 different Venue Categories

## We will use One hot encoder on Venue Category 

In [97]:
toronto_venue_category = pd.get_dummies(venues_in_toronto[['Venue Category']],prefix="", prefix_sep="")

# Moving Neighbourhood column in the beginnign
toronto_venue_category['Neighbourhood'] = venues_in_toronto['Neighbourhood']
fixed_column = [toronto_venue_category.columns[-1]] + list(toronto_venue_category.columns[:-1])
toronto_venue_category = toronto_venue_category[fixed_column]

toronto_venue_category.head()


Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We will group our data by Neighbourhoods, calculate the mean of Venue category in each neighbourhood

In [98]:
toronto_grouped = toronto_venue_category.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Lets make a function to get the top common venue category

In [99]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

We will take 15 top to cluster our neighbourhood

In [100]:
num_top_venues = 15

indicators = ['st', 'nd', 'rd']

# create columns accordingly to the number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


# Lets Create a new Dataframe
n_venue_sorted = pd.DataFrame(columns=columns)
n_venue_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    n_venue_sorted.iloc[ind,1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

n_venue_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Agincourt,Breakfast Spot,Lounge,Latin American Restaurant,Accessories Store,Neighborhood,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue
1,"Alderwood, Long Branch",Pizza Place,Playground,Sandwich Place,Gym,Pub,Pharmacy,Coffee Shop,Athletics & Sports,Monument / Landmark,Museum,Movie Theater,Motel,Miscellaneous Shop,Modern European Restaurant,Mobile Phone Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Pizza Place,Shopping Mall,Fried Chicken Joint,Frozen Yogurt Shop,Sandwich Place,Supermarket,Chinese Restaurant,Middle Eastern Restaurant,Gas Station,Restaurant,Gift Shop,Diner,Park
3,Bayview Village,Café,Japanese Restaurant,Chinese Restaurant,Bank,Accessories Store,Museum,Neighborhood,Music Venue,Movie Theater,Nightclub,Motel,Monument / Landmark,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Sandwich Place,Italian Restaurant,Toy / Game Store,Breakfast Spot,Butcher,Café,Sushi Restaurant,Liquor Store,Thai Restaurant,Fast Food Restaurant,Restaurant,Juice Bar,Comfort Food Restaurant,Indian Restaurant


## Let's start making Clusters

In [101]:
from sklearn.cluster import KMeans

In [102]:
# Setting the custer number to 5

k_num_cluster = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# running k means
kmeans = KMeans(n_clusters=k_num_cluster, random_state=0).fit(toronto_grouped_clustering)
kmeans

KMeans(n_clusters=5, random_state=0)

In [103]:
# Lets view the labels
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       2, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 2, 0, 0, 0,
       1, 0, 0, 1, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0,
       0, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 1])

In [104]:
# Adding the labels to the top 15 common venue categories
n_venue_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Lets Join Toronto_grouped with Combined data, so we can plot the data

In [107]:
toronto_merged = combined_data

toronto_merged = toronto_merged.join(n_venue_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Fast Food Restaurant,Food & Drink Shop,Park,Music Venue,...,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Portuguese Restaurant,Pizza Place,Hockey Arena,Coffee Shop,...,Accessories Store,Modern European Restaurant,Museum,Movie Theater,Motel,Monument / Landmark,Miscellaneous Shop,Mobile Phone Shop,Neighborhood,Middle Eastern Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0.0,Coffee Shop,Park,Bakery,Theater,...,Yoga Studio,Pub,Performing Arts Venue,Dessert Shop,Restaurant,Distribution Center,Chocolate Shop,Café,Farmers Market,Spa
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Accessories Store,Vietnamese Restaurant,...,Coffee Shop,Boutique,Pharmacy,Pet Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Playground,Mobile Phone Shop,Modern European Restaurant
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,Yoga Studio,Creperie,...,Burrito Place,Smoothie Shop,Café,Sandwich Place,College Auditorium,Persian Restaurant,Bar,Diner,Distribution Center,Park


In [108]:
# Lets drop NaN values (if any)

toronto_merged_nonan = toronto_merged.dropna(subset=['Cluster Labels'])

Plotting our clusters

In [109]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [110]:
map_clusters = folium.Map(location=[latitude, longitude],zoom_start=11)

# Setting colors
x = np.arange(k_num_cluster)
ys = [i + x + (i*x)**2 for i in range(k_num_cluster)]
colors_array = cm.rainbow(np.linspace(0,1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_nonan['Latitude'],toronto_merged_nonan['Longitude'],
                                  toronto_merged_nonan['Neighbourhood'],toronto_merged_nonan['Cluster Labels']):
    label=folium.Popup('Cluster '+ str(int(cluster) +1) + '\n' + str(poi), parse_html=True)
    folium.CircleMarker([lat, lon],
                       radius=5,
                       popup=label,
                       color=rainbow[int(cluster-1)],
                       fill=True,
                       fill_color=rainbow[int(cluster-1)]).add_to(map_clusters)
map_clusters

Lets Verify our Clusters

### Cluster 1

In [112]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 0, toronto_merged_nonan.columns[[1]+list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
1,North York,0.0,Portuguese Restaurant,Pizza Place,Hockey Arena,Coffee Shop,French Restaurant,Accessories Store,Modern European Restaurant,Museum,Movie Theater,Motel,Monument / Landmark,Miscellaneous Shop,Mobile Phone Shop,Neighborhood,Middle Eastern Restaurant
2,Downtown Toronto,0.0,Coffee Shop,Park,Bakery,Theater,Breakfast Spot,Yoga Studio,Pub,Performing Arts Venue,Dessert Shop,Restaurant,Distribution Center,Chocolate Shop,Café,Farmers Market,Spa
3,North York,0.0,Clothing Store,Furniture / Home Store,Accessories Store,Vietnamese Restaurant,Event Space,Coffee Shop,Boutique,Pharmacy,Pet Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Playground,Mobile Phone Shop,Modern European Restaurant
4,Queen's Park,0.0,Coffee Shop,Sushi Restaurant,Yoga Studio,Creperie,Spa,Burrito Place,Smoothie Shop,Café,Sandwich Place,College Auditorium,Persian Restaurant,Bar,Diner,Distribution Center,Park
6,Scarborough,0.0,Fast Food Restaurant,Accessories Store,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue,Neighborhood
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Downtown Toronto,0.0,Café,Italian Restaurant,Bakery,Restaurant,Pub,Park,Butcher,Caribbean Restaurant,Playground,Market,Liquor Store,Beer Store,Taiwanese Restaurant,Coffee Shop,Thai Restaurant
97,Downtown Toronto,0.0,Café,Coffee Shop,Restaurant,Hotel,Seafood Restaurant,Pizza Place,Pub,Speakeasy,Steakhouse,Tea Room,Gastropub,Gluten-free Restaurant,Japanese Restaurant,Bakery,Art Gallery
99,Downtown Toronto,0.0,Pub,Beer Bar,Martial Arts School,Men's Store,Bookstore,Mexican Restaurant,Bubble Tea Shop,Burger Joint,Café,Ethiopian Restaurant,Escape Room,Salon / Barbershop,Coffee Shop,Restaurant,Park
100,East Toronto Business,0.0,Light Rail Station,Park,Garden,Garden Center,Brewery,Auto Workshop,Skate Park,Restaurant,Spa,Burrito Place,Gym / Fitness Center,Farmers Market,Fast Food Restaurant,Comic Shop,Pizza Place


### Cluster 2

In [113]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 1, toronto_merged_nonan.columns[[1]+list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,North York,1.0,Fast Food Restaurant,Food & Drink Shop,Park,Music Venue,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater
21,York,1.0,Park,Women's Store,Bar,Accessories Store,Modern European Restaurant,Museum,Movie Theater,Motel,Monument / Landmark,Mobile Phone Shop,Neighborhood,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station
35,East York/East Toronto,1.0,Convenience Store,Park,Accessories Store,New American Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue
49,North York,1.0,Basketball Court,Construction & Landscaping,Bakery,Park,Accessories Store,Museum,Movie Theater,Motel,Monument / Landmark,Modern European Restaurant,Neighborhood,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant
61,Central Toronto,1.0,Photography Studio,Swim School,Park,Bus Line,Accessories Store,Music Venue,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater
66,North York,1.0,Convenience Store,Electronics Store,Park,Accessories Store,Neighborhood,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum
68,Central Toronto,1.0,Jewelry Store,Park,Trail,Sushi Restaurant,Museum,Movie Theater,Motel,Monument / Landmark,Modern European Restaurant,Accessories Store,Music Venue,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station
77,Etobicoke,1.0,Mobile Phone Shop,Park,Sandwich Place,Accessories Store,Neighborhood,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum
83,Central Toronto,1.0,Lawyer,Park,Tennis Court,Accessories Store,Music Venue,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater
85,Scarborough,1.0,Playground,Intersection,Park,Accessories Store,Modern European Restaurant,Museum,Movie Theater,Motel,Monument / Landmark,Mobile Phone Shop,Neighborhood,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station


### Cluster 3

In [114]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 2, toronto_merged_nonan.columns[[1]+list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
57,North York,2.0,Baseball Field,Accessories Store,Neighborhood,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue
98,Etobicoke,2.0,Pool,River,Accessories Store,Neighborhood,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum
101,Etobicoke,2.0,Pool,Construction & Landscaping,Baseball Field,New American Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue


### Cluster 4

In [115]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 3, toronto_merged_nonan.columns[[1]+list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
40,North York,3.0,Airport,Park,Accessories Store,Neighborhood,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum
52,North York,3.0,Park,Accessories Store,Neighborhood,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue
64,York,3.0,Park,Accessories Store,Neighborhood,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue


### Cluster 5

In [117]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 4, toronto_merged_nonan.columns[[1]+list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
32,Scarborough,4.0,Playground,Accessories Store,Neighborhood,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music Venue


We have sucessfully Clustered our data into 5 Clusters