# "Segmenting and Clustering Neighborhoods in Toronto"
## Coursera Capstone Project for the Applied Data Science Specialization

## QUESTION 1:

First, we need to install and import any required dependencies and libraries

In [1]:

#!pip install pandas
#!pip install bs4
#!pip install requests

import pandas as pd # we gonna be working with dataframes
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page

Let's create a Beautiful soup object from the appropiate Wikipedia page listing all the Boroughs, Neighborhoods and Postal Codes for Toronto 

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
data  = requests.get(url).text
soup = BeautifulSoup(data, 'html.parser')

#print(soup.prettify())

Now we fill a list with dictionaries, each one containing de Postal Code, Borough and Neighborhood from each cell of the Html table

In [3]:
table_contents=[]
table = soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

#print(table_contents)

Now we can create the dataframe and check it's content

In [4]:
df_tor_boroughs=pd.DataFrame(table_contents)
# a few substitutions/correction of mispelled and unprecise names
df_tor_boroughs['Borough']=df_tor_boroughs['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

df_tor_boroughs.head(10)
                                             

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Let's check how many rows the dataframe has

In [5]:
print("The dataframe has {} rows of different neighborhoods" .format(df_tor_boroughs.shape[0]))

The dataframe has 103 rows of different neighborhoods


## QUESTION 2:

We will need the Geocoder pgeocode to find the Lat and Long of our Postal Codes.
Let's install it and import it

In [6]:
#!pip install pgeocode
import pgeocode

Now we can make a list with all our Postal codes and the query the coordinates of each one of them 

In [7]:

pgeocode.Nominatim('ca')
geolocator = pgeocode.Nominatim('ca')
postal_codes = df_tor_boroughs['PostalCode'].tolist()
latitudes = []
longitudes = []
for postal_code in (postal_codes):
    latLong = geolocator.query_postal_code(postal_code)

    if not latLong.empty:
        #print(f'The Postal Code {postal_code} has Lat:{latLong.latitude}, Long:{latLong.longitude}')
        latitudes.append(latLong.latitude)
        longitudes.append(latLong.longitude)

After finding the coordinates of every Postal Code, we can add it to our dataframe

In [8]:
df_tor_boroughs["Latitude"] = latitudes
df_tor_boroughs["Longitude"] = longitudes
df_tor_boroughs

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.3300
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
4,M7A,Queen's Park,Ontario Provincial Government,43.6641,-79.3889
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.6518,-79.5076
99,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.3830
100,M7Y,East Toronto Business,Enclave of M4L,43.7804,-79.2505
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.6325,-79.4939


We need to make sure that we don't have invalid coords in our dataframe

In [9]:
df_tor_boroughs.loc[df_tor_boroughs['Latitude'].isnull()]


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
76,M7R,Mississauga,Enclave of L4W,,


In [10]:
df_tor_boroughs.loc[df_tor_boroughs['Longitude'].isnull()]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
76,M7R,Mississauga,Enclave of L4W,,


Yes we have 1 line with invalid coords! This will mess-up our soon-to-be-created Map. Let's drop the rows with NaN values

In [11]:
df_tor_boroughs.dropna(inplace=True)
df_tor_boroughs.reset_index(drop=True, inplace=True)
print(f'The new shape of our dataframe now has one less row! {df_tor_boroughs.shape}. We removed the wrong neighborhood')

The new shape of our dataframe now has one less row! (102, 5). We removed the wrong neighborhood


And now our NaN row (index 76, M7R, Mississauga, Enclave of L4W) is gone! 

In [12]:
df_tor_boroughs.iloc[70:80]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
70,M9P,Etobicoke,Westmount,43.6949,-79.5323
71,M1R,Scarborough,"Wexford, Maryvale",43.7507,-79.3003
72,M2R,North York,Willowdale West,43.7786,-79.445
73,M4R,Central Toronto,North Toronto West,43.7143,-79.4065
74,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.6736,-79.4035
75,M6R,West Toronto,"Parkdale, Roncesvalles",43.6469,-79.4521
76,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.6898,-79.5582
77,M1S,Scarborough,Agincourt,43.7946,-79.2644
78,M4S,Central Toronto,Davisville,43.702,-79.3853
79,M5S,Downtown Toronto,"University of Toronto, Harbord",43.6629,-79.3987


In [13]:
print(f'BTW, according to our Dataframe, we have {len(df_tor_boroughs["Borough"].unique())} different Boroughs and {len(df_tor_boroughs["Neighborhood"].unique())} different Neighborhoods in Toronto.')

BTW, according to our Dataframe, we have 14 different Boroughs and 102 different Neighborhoods in Toronto.


## QUESTION 3:

Now we can start clustering and exploring neighborhoods in Toronto utilizing Folium Maps with Foursquare location data 

We'll find out Toronto coordinates with geopy library, as the previously used pgeocode library only works with postal code inputs

In [14]:
#!pip install geopy
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="tor_explorer")
tor_location = geolocator.geocode("Toronto City, ON")
print(f'Toronto coordinates: Lat: {tor_location.latitude}, Long: {tor_location.longitude}')

Toronto coordinates: Lat: 43.6534817, Long: -79.3839347


... and we create a Folium Map of Toronto with all it's neighborhoods

In [15]:
#!pip install folium 
import folium # map rendering library

In [16]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[tor_location.latitude, tor_location.longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_tor_boroughs['Latitude'], df_tor_boroughs['Longitude'], df_tor_boroughs['Borough'], df_tor_boroughs['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<a href="https://nbviewer.jupyter.org/github/Andreslion/Coursera_Capstone/blob/main/Segmenting_and_Clustering_Neighborhoods_in_Toronto.ipynb"> <h3> OPTIONAL: Load this Jupyter notebook on https://nbviewer.jupyter.org/ to properly render the maps </h3> </a>

### <<<< THIS IS A SCREENCAPTURE OF THE MAP BECAUSE OF THE GITHUB PROBLEM (Make this Notebook Trusted to load map: File -> Trust Notebook) NOT LOADING MAPS>>>>

<a><img src = "https://raw.githubusercontent.com/Andreslion/Coursera_Capstone/main/toronto_neighborhoods_map.png" width = 1280> </a>

In order to use the Foursquare API, we need to define our Cient ID and Secret. (Sorry folks, I'll have delete my pesonal keys after retreiving the required data)

In [17]:
CLIENT_ID = 'X0JXYS04XKE2GYSQV4B0LLQYY2WW4WNA30HSJNVUWVRTMMQZ' # your Foursquare ID
CLIENT_SECRET = '1BI0T33MEXXUHDGGVQO0OOR3OEORSHISYPJWCQF1WFBT21BJ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

The following function "getNearbyVenues" was defined in our Clustering Neighborhoods Lab. 
We'll re-use it to retreive the top 100 venues in each neighborhood from the Foursquare database.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

And now with this function we can generate a new dataframe of the top 100 (if existent) venues per neighborhood

In [19]:
toronto_venues = getNearbyVenues(names=df_tor_boroughs['Neighborhood'],
                                   latitudes=df_tor_boroughs['Latitude'],
                                   longitudes=df_tor_boroughs['Longitude']
                                  )

For some reason, there is a few Venues Categories tagged as "Neighborhood". That's not very descriptive and it will mess with our code later on. We will drop those rows

In [20]:
toronto_venues[toronto_venues['Venue Category'] == 'Neighborhood']

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
427,The Beaches,43.6784,-79.2941,Upper Beaches,43.680563,-79.292869,Neighborhood
573,Central Bay Street,43.6564,-79.386,Downtown Toronto,43.653232,-79.385296,Neighborhood
720,"Richmond, Adelaide, King",43.6496,-79.3833,Downtown Toronto,43.653232,-79.385296,Neighborhood
1091,"Brockton, Parkdale Village, Exhibition Place",43.6383,-79.4301,Parkdale,43.640524,-79.4322,Neighborhood
1825,Enclave of M5E,43.6437,-79.3787,Harbourfront,43.639526,-79.380688,Neighborhood


In [21]:
print(f'Before dropping, we have {len(toronto_venues["Neighborhood"].unique())} different Neighborhoods with venues, {toronto_venues.shape[0]} total venues, {len(toronto_venues["Venue"].unique())} different Venues, and {len(toronto_venues["Venue Category"].unique())} different Venue Categories')

Before dropping, we have 99 different Neighborhoods with venues, 2170 total venues, 1367 different Venues, and 266 different Venue Categories


In [22]:
toronto_venues = toronto_venues[toronto_venues['Venue Category'] != 'Neighborhood']
# Or we could also do the same with:
#toronto_venues.drop(toronto_venues[toronto_venues['Venue Category'] == 'Neighborhood'].index, inplace=True) 

In [23]:
print(f'So, after dropping, we end-up with {len(toronto_venues["Neighborhood"].unique())} different Neighborhoods with venues, {toronto_venues.shape[0]} total venues, {len(toronto_venues["Venue"].unique())} different Venues, and {len(toronto_venues["Venue Category"].unique())} different Venue Categories')

So, after dropping, we end-up with 99 different Neighborhoods with venues, 2165 total venues, 1363 different Venues, and 265 different Venue Categories


We can check how many venues we have per Neighbourhood. And even better, sort it from most to least venues to find the neighborhoods with the most venues!

In [24]:
venues_per_neigh = toronto_venues.groupby('Neighborhood').count().drop(columns=['Neighborhood Latitude', 'Neighborhood Longitude', 'Venue Latitude', 'Venue Longitude', 'Venue Category']).rename(columns={'Venue':'Num of Venues'}).sort_values('Num of Venues', ascending=False).reset_index()
venues_per_neigh.head(20)

Unnamed: 0,Neighborhood,Num of Venues
0,"Garden District, Ryerson",100
1,"First Canadian Place, Underground city",100
2,"Commerce Court, Victoria Hotel",100
3,"Toronto Dominion Centre, Design Exchange",100
4,"Richmond, Adelaide, King",99
5,St. James Town,98
6,Enclave of M5E,95
7,Berczy Park,93
8,Church and Wellesley,75
9,"Lawrence Manor, Lawrence Heights",70


It's time to start clustering Toronto neighborhoods to know wich ones are similar to each other.

First we'll need to define the most common venues on each neighborhood, put it in a datafreme, cluster them, and then visualize it in a Map.

To get everything "standardized", we need to make a dataframe showing every single different venue category (265, according to previous result) and mark wich categories exists in any given neighborhood. That means that we should end-up with a dataframe of shape (99, 266) (99 neighbourhoods with venues x 266 (categories of venues + Neighb. name) )

In [25]:
# one hot encoding
toronto_onehot = []
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 


# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot_grouped = toronto_onehot.groupby('Neighborhood').sum().reset_index()


toronto_onehot_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Alderwood, Long Branch",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bayview Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Bedford Park, Lawrence Manor East",0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,"Willowdale, Newtonbrook",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
95,Woburn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
96,Woodbine Heights,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
97,York Mills West,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
"""
num_top_venues = 5

for hood in toronto_onehot_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_onehot_grouped[toronto_onehot_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['Venue Cat','Quant']
    temp = temp.iloc[1:]
    #temp['freq'] = temp['freq'].astype(float)
    #temp = temp.round({'freq': 2})
    print(temp.sort_values('Quant', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
"""
    

'\nnum_top_venues = 5\n\nfor hood in toronto_onehot_grouped[\'Neighborhood\']:\n    print("----"+hood+"----")\n    temp = toronto_onehot_grouped[toronto_onehot_grouped[\'Neighborhood\'] == hood].T.reset_index()\n    temp.columns = [\'Venue Cat\',\'Quant\']\n    temp = temp.iloc[1:]\n    #temp[\'freq\'] = temp[\'freq\'].astype(float)\n    #temp = temp.round({\'freq\': 2})\n    print(temp.sort_values(\'Quant\', ascending=False).reset_index(drop=True).head(num_top_venues))\n    print(\'\n\')\n'

We can re-use de function "return_most_common_venues" from our Clustering Neighborhoods Lab to get the 10 most common venue categorie for each neighborhood. Just for the sake of information.

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
import numpy as np # library to handle data in a vectorized manner

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_onehot_grouped['Neighborhood']

for ind in np.arange(toronto_onehot_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_onehot_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Skating Rink,Breakfast Spot,Badminton Court,New American Restaurant,Music Venue,Music Store,Museum,Moving Target,Movie Theater
1,"Alderwood, Long Branch",Gym,Convenience Store,Coffee Shop,Pub,Pharmacy,Sandwich Place,Pizza Place,Mobile Phone Shop,Modern European Restaurant,Miscellaneous Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Pizza Place,Middle Eastern Restaurant,Deli / Bodega,Mediterranean Restaurant,Fried Chicken Joint,Coffee Shop,Accessories Store,Movie Theater,Music Store,Museum
3,Bayview Village,Trail,Flower Shop,Gas Station,Park,Accessories Store,Moroccan Restaurant,Music Store,Museum,Moving Target,Movie Theater
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Toy / Game Store,Café,Butcher,Pizza Place,Sushi Restaurant,Liquor Store,Greek Restaurant
5,Berczy Park,Coffee Shop,Hotel,Bakery,Seafood Restaurant,Café,Cocktail Bar,Beer Bar,Restaurant,Pub,Japanese Restaurant
6,"Birch Cliff, Cliffside West",Skating Rink,Café,College Stadium,General Entertainment,Accessories Store,Music Store,Museum,Moving Target,Movie Theater,Moroccan Restaurant
7,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Breakfast Spot,Thrift / Vintage Store,Gift Shop,Pizza Place,Brewery,Supermarket,Chiropractor,Cocktail Bar
8,"CN Tower, King and Spadina, Railway Lands, Har...",Italian Restaurant,Coffee Shop,Bar,Café,Bakery,Park,Speakeasy,Restaurant,Gym / Fitness Center,Grocery Store
9,Caledonia-Fairbanks,Park,Gym,Women's Store,Mexican Restaurant,Sporting Goods Shop,Beer Store,Bakery,Movie Theater,Music Store,Museum


We could cluster our neighborhoods based on their similarity of venue categories. Time to cluster neighborhoods!

In [29]:
#!pip install sklearn
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_onehot_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)


  toronto_grouped_clustering = toronto_onehot_grouped.drop('Neighborhood', 1)


After clustering, we should have 99 labels, one corresponding to each one of the 99 neighborhoods. Let's check that:

In [30]:
# check cluster labels generated for each row in the dataframe
print(f'We have {len(toronto_onehot_grouped["Neighborhood"])} neighborhoods and {len(kmeans.labels_)} clustering labels')
print(f'Labels: {kmeans.labels_}')

We have 99 neighborhoods and 99 clustering labels
Labels: [0 0 0 0 0 3 0 4 4 0 0 1 0 4 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 3 0 1 0
 0 0 0 0 0 0 0 0 0 0 0 4 0 2 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 3 0 0 4 0 0 0
 3 4 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 4 0 0 0 0 0 0]


99 neighborhoods and 99 labels. Good!

For visualizing the clustered neighborhoods in a Folium Map, we gonna need a dataframe containing our neighborhoods, their coordinates, and of course, their clustering labels. 

In [31]:
toronto_clustering_map = df_tor_boroughs[df_tor_boroughs['Neighborhood'].isin(toronto_onehot_grouped['Neighborhood'])].sort_values('Neighborhood').drop(columns=['PostalCode','Borough']).reset_index(drop=True)
# Now we have the Neighborhood, Latitude and Longitude columns. We just need to insert the kMeans labels column as "Cluster"
toronto_clustering_map.insert(1, 'Cluster', kmeans.labels_)

toronto_clustering_map.head(20)

Unnamed: 0,Neighborhood,Cluster,Latitude,Longitude
0,Agincourt,0,43.7946,-79.2644
1,"Alderwood, Long Branch",0,43.6021,-79.5402
2,"Bathurst Manor, Wilson Heights, Downsview North",0,43.7535,-79.4472
3,Bayview Village,0,43.7797,-79.3813
4,"Bedford Park, Lawrence Manor East",0,43.7335,-79.4177
5,Berczy Park,3,43.6456,-79.3754
6,"Birch Cliff, Cliffside West",0,43.6952,-79.2646
7,"Brockton, Parkdale Village, Exhibition Place",4,43.6383,-79.4301
8,"CN Tower, King and Spadina, Railway Lands, Har...",4,43.6404,-79.3995
9,Caledonia-Fairbanks,0,43.6889,-79.4507


Now we can see our clustered Map!

In [32]:

# create map
map_toronto_clusters = folium.Map(location=[tor_location.latitude, tor_location.longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = ['green', 'red', 'orange', 'blue', 'purple']

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_clustering_map['Latitude'], toronto_clustering_map['Longitude'], toronto_clustering_map['Neighborhood'], toronto_clustering_map['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=colors_array[cluster],
        fill=True,
        fill_color=colors_array[cluster],
        fill_opacity=0.7).add_to(map_toronto_clusters)
       
map_toronto_clusters

<a href="https://nbviewer.jupyter.org/github/Andreslion/Coursera_Capstone/blob/main/Segmenting_and_Clustering_Neighborhoods_in_Toronto.ipynb"> <h3> OPTIONAL: Load this Jupyter notebook on https://nbviewer.jupyter.org/ to properly render the maps </h3> </a>

### <<<< THIS IS A SCREENCAPTURE OF THE MAP BECAUSE OF THE GITHUB PROBLEM (Make this Notebook Trusted to load map: File -> Trust Notebook) NOT LOADING MAPS>>>>

<a><img src = "https://raw.githubusercontent.com/Andreslion/Coursera_Capstone/main/totonto_clusters_map.png" width = 1280> </a>

Nice map!

After seeing the geo-spatial distrubution of the similar (clustered) neighborhoods, we should also see what makes them similar. Let's see what defines each cluster:

In [33]:
# We'll add the Cluster column to the dataframe containing the Neighborhood and the Most Common Venues
neighborhoods_venues_sorted.insert(1, 'Cluster', kmeans.labels_)
neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,0,Latin American Restaurant,Skating Rink,Breakfast Spot,Badminton Court,New American Restaurant,Music Venue,Music Store,Museum,Moving Target,Movie Theater
1,"Alderwood, Long Branch",0,Gym,Convenience Store,Coffee Shop,Pub,Pharmacy,Sandwich Place,Pizza Place,Mobile Phone Shop,Modern European Restaurant,Miscellaneous Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",0,Pizza Place,Middle Eastern Restaurant,Deli / Bodega,Mediterranean Restaurant,Fried Chicken Joint,Coffee Shop,Accessories Store,Movie Theater,Music Store,Museum
3,Bayview Village,0,Trail,Flower Shop,Gas Station,Park,Accessories Store,Moroccan Restaurant,Music Store,Museum,Moving Target,Movie Theater
4,"Bedford Park, Lawrence Manor East",0,Italian Restaurant,Coffee Shop,Sandwich Place,Toy / Game Store,Café,Butcher,Pizza Place,Sushi Restaurant,Liquor Store,Greek Restaurant
5,Berczy Park,3,Coffee Shop,Hotel,Bakery,Seafood Restaurant,Café,Cocktail Bar,Beer Bar,Restaurant,Pub,Japanese Restaurant
6,"Birch Cliff, Cliffside West",0,Skating Rink,Café,College Stadium,General Entertainment,Accessories Store,Music Store,Museum,Moving Target,Movie Theater,Moroccan Restaurant
7,"Brockton, Parkdale Village, Exhibition Place",4,Café,Coffee Shop,Breakfast Spot,Thrift / Vintage Store,Gift Shop,Pizza Place,Brewery,Supermarket,Chiropractor,Cocktail Bar
8,"CN Tower, King and Spadina, Railway Lands, Har...",4,Italian Restaurant,Coffee Shop,Bar,Café,Bakery,Park,Speakeasy,Restaurant,Gym / Fitness Center,Grocery Store
9,Caledonia-Fairbanks,0,Park,Gym,Women's Store,Mexican Restaurant,Sporting Goods Shop,Beer Store,Bakery,Movie Theater,Music Store,Museum


We'll separate each cluster to see the categories of venues definig them

In [34]:
Cluster = [1, 2, 3, 4, 5]
for i in range(5):
    Cluster[i] = neighborhoods_venues_sorted[neighborhoods_venues_sorted['Cluster']==i]

### Cluster 0:

In [35]:
Cluster[0].head()

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,0,Latin American Restaurant,Skating Rink,Breakfast Spot,Badminton Court,New American Restaurant,Music Venue,Music Store,Museum,Moving Target,Movie Theater
1,"Alderwood, Long Branch",0,Gym,Convenience Store,Coffee Shop,Pub,Pharmacy,Sandwich Place,Pizza Place,Mobile Phone Shop,Modern European Restaurant,Miscellaneous Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",0,Pizza Place,Middle Eastern Restaurant,Deli / Bodega,Mediterranean Restaurant,Fried Chicken Joint,Coffee Shop,Accessories Store,Movie Theater,Music Store,Museum
3,Bayview Village,0,Trail,Flower Shop,Gas Station,Park,Accessories Store,Moroccan Restaurant,Music Store,Museum,Moving Target,Movie Theater
4,"Bedford Park, Lawrence Manor East",0,Italian Restaurant,Coffee Shop,Sandwich Place,Toy / Game Store,Café,Butcher,Pizza Place,Sushi Restaurant,Liquor Store,Greek Restaurant


### Cluster 1:

In [36]:
Cluster[1].head()

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Central Bay Street,1,Coffee Shop,Spa,Clothing Store,Bubble Tea Shop,Middle Eastern Restaurant,Breakfast Spot,Italian Restaurant,Sushi Restaurant,Café,Sandwich Place
30,Enclave of M5E,1,Coffee Shop,Restaurant,Gym,Hotel,Japanese Restaurant,Italian Restaurant,Café,Deli / Bodega,Bakery,Sporting Goods Shop
35,"Garden District, Ryerson",1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Hotel,Department Store,Theater,Pizza Place,Fast Food Restaurant


### Cluster 2:

In [37]:
Cluster[2].head()

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,"Fairview, Henry Farm, Oriole",2,Clothing Store,Fast Food Restaurant,Restaurant,Coffee Shop,Bank,Juice Bar,Japanese Restaurant,Baseball Field,Greek Restaurant,Shoe Store
50,"Lawrence Manor, Lawrence Heights",2,Clothing Store,Coffee Shop,Restaurant,Women's Store,Bakery,Food Court,Shoe Store,Sushi Restaurant,Sandwich Place,Men's Store


### Cluster 3:

In [38]:
Cluster[3].head()

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Berczy Park,3,Coffee Shop,Hotel,Bakery,Seafood Restaurant,Café,Cocktail Bar,Beer Bar,Restaurant,Pub,Japanese Restaurant
17,"Commerce Court, Victoria Hotel",3,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Salad Place,Seafood Restaurant,Asian Restaurant,Steakhouse
33,"First Canadian Place, Underground city",3,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Salad Place,Seafood Restaurant,Asian Restaurant,Steakhouse
67,"Richmond, Adelaide, King",3,Café,Coffee Shop,Restaurant,Gym,Hotel,Salad Place,Steakhouse,Japanese Restaurant,Asian Restaurant,Seafood Restaurant
74,St. James Town,3,Coffee Shop,Café,Seafood Restaurant,Bakery,Restaurant,Cocktail Bar,Cosmetics Shop,Italian Restaurant,Clothing Store,Theater


### Cluster 4:

In [39]:
Cluster[4].head()

Unnamed: 0,Neighborhood,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,"Brockton, Parkdale Village, Exhibition Place",4,Café,Coffee Shop,Breakfast Spot,Thrift / Vintage Store,Gift Shop,Pizza Place,Brewery,Supermarket,Chiropractor,Cocktail Bar
8,"CN Tower, King and Spadina, Railway Lands, Har...",4,Italian Restaurant,Coffee Shop,Bar,Café,Bakery,Park,Speakeasy,Restaurant,Gym / Fitness Center,Grocery Store
13,Church and Wellesley,4,Japanese Restaurant,Coffee Shop,Sushi Restaurant,Gay Bar,Restaurant,Yoga Studio,Mediterranean Restaurant,Grocery Store,Men's Store,Hotel
48,"Kensington Market, Chinatown, Grange Park",4,Café,Bar,Vegetarian / Vegan Restaurant,Bakery,Vietnamese Restaurant,Caribbean Restaurant,Coffee Shop,Dumpling Restaurant,Park,Burger Joint
63,"Parkdale, Roncesvalles",4,Sushi Restaurant,Eastern European Restaurant,Coffee Shop,Bakery,Bookstore,Food & Drink Shop,Breakfast Spot,Thai Restaurant,Café,Gift Shop


Now we can easily see that the most common venues around downtown (Clusters 1 and 3) are Coffe shops and Restaurants, and the biggest cluster (Cluster 0) is really defined by a multitude of bussines supporting the regular day-to-day activities. That's the most residential areas.