Torento Neighborhood 
====
This notebook Load and clean data about the neighborhoods in the city of Toronto :
1. Get Toronto neighborhood data : scrape the Wikipedia page and wrangle the data, clean it
1. Read Toronto neighborhood data it into a pandas dataframe
1. Get the latitude and the longitude coordinates of each neighborhood. 


In [30]:
#!pip install bs4 lxml

In [31]:
# Import required modules
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
from bs4 import BeautifulSoup
import lxml

I - Get Toronto neighborhood data
----

In [123]:
# Create a variable with the url
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

# Use requests to get the contents
r = requests.get(url)

# Get the text of the contents
html_content = r.text

# Convert the html content into a beautiful soup object
soup = BeautifulSoup(html_content)#, 'lxml')

soup.title

<title>List of postal codes of Canada: M - Wikipedia</title>

In [125]:
# 1) Fill Dataframe with Toronto neighborhood data:
table = soup.find_all('table')
df = pd.read_html(str(table))[0]

print(df.shape)
df.head()

(180, 3)


Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


In [134]:
#2)  Ignore cells with a borough that is Not assigned : Borough != 'Not assigned'
df = df.loc[df['Borough'] != 'Not assigned']
print(df.shape)
df.head()

(103, 3)


Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [132]:
# There is 3 Neighborhood (Downsview , Willowdale, Don Mills) that have several Postal Code
test = df.groupby(['Neighborhood']).count()
test.loc[test['Borough'] >1]
#df.loc[df['Neighborhood'] =='Willowdale']

Unnamed: 0_level_0,Postal code,Borough
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Don Mills,2,2
Downsview,4,4
Willowdale,2,2


In [135]:
# 3) neighborhoods separated with a comma
df = df.replace(to_replace=' /', value=',', regex=True)
#test
df.loc[ df['Postal code'] == 'M5A']

Unnamed: 0,Postal code,Borough,Neighborhood
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [136]:
# (4) If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

## check if Neighborhood = Nan then return Borough, otherwise return Neighborhood
def check_Neighborhood(Neighborhood,Borough):
    if type(Neighborhood)==float: 
        if np.isnan(float(Neighborhood)):
            return Borough
        else:
            return Neighborhood
    else:
        return Neighborhood     
df['Neighborhood'] = df.apply(lambda x: check_Neighborhood(x['Neighborhood'],x['Borough']),axis=1)

# (5) Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
df.reset_index(drop=True,inplace=True)
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [137]:
# (6) In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.
df.shape[0]

103

II - Get the latitude and the longitude coordinates of each neighborhood. 
----

In [138]:
# Import clean data 
path = 'http://cocl.us/Geospatial_data'
df_Geospatial = pd.read_csv(path)
df_Geospatial.rename(columns = {'Postal Code':'Postal code'}, inplace = True) 

In [139]:
# merge the Geospatial data into the dataframe
df = pd.merge(df, df_Geospatial, on='Postal code')

In [144]:
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


### let's visualizat Toroton the neighborhoods in it.

In [41]:
# create map of 'Totonto,Canada' using latitude and longitude values
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

latitude=43.651070
longitude=-79.347015
map_Totonto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Totonto)  
    
map_Totonto

In [185]:
print('The dataframe has {} boroughs and {} neighborhoods on which {} unique neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0],
        pd.unique(df['Neighborhood']).size
    )
)


The dataframe has 10 boroughs and 103 neighborhoods on which 98 unique neighborhoods.


III - Explore and cluster the neighborhoods in Toronto
---

In [43]:
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


#### Define Foursquare Credentials and Version

In [44]:
CLIENT_ID = '0DHDGL0XUVRCPQD12IPFUXP4OKSRYWYJ4HDVHJJ505YV1WVU' # your Foursquare ID
CLIENT_SECRET = '10PADLLJX0DSQRUT02R05OFDVFWNY5KXRRB1LH4AGLL4JZZY' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0DHDGL0XUVRCPQD12IPFUXP4OKSRYWYJ4HDVHJJ505YV1WVU
CLIENT_SECRET:10PADLLJX0DSQRUT02R05OFDVFWNY5KXRRB1LH4AGLL4JZZY


## Explore Neighborhoods
#### Load venues to all the neighborhoods in Manhattan

In [45]:
# type your answer here
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

torento_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [109]:
print(torento_venues.shape)
print('There are {} uniques categories.'.format(len(torento_venues['Venue Category'].unique())))
torento_venues.head()

(1336, 7)
There are 225 uniques categories.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [110]:
# Let's check how many venues were returned for each neighborhood
torento_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",10,10,10,10,10,10
"Bathurst Manor, Wilson Heights, Downsview North",20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",24,24,24,24,24,24
...,...,...,...,...,...,...
Willowdale,36,36,36,36,36,36
"Willowdale, Newtonbrook",1,1,1,1,1,1
Woburn,3,3,3,3,3,3
Woodbine Heights,12,12,12,12,12,12


#### Let's find the 10 most commun venues for each neighborhood.

In [111]:
# one hot encoding
toronto_onehot = pd.get_dummies(torento_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = torento_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()


Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [112]:
toronto_onehot.shape

(1336, 225)

In [113]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.050000,0.000000,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,Willowdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.027778,0.0,0.0,0.0,0.0
91,"Willowdale, Newtonbrook",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.083333,0.000000,0.0,0.0,0.0,0.0


In [152]:
toronto_grouped.shape

(95, 225)

In [153]:

num_top_venues = 10

indicators = ['st', 'nd', 'rd']
#function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Skating Rink,Clothing Store,Breakfast Spot,Lounge,Women's Store,Dance Studio,Dog Run,Distribution Center,Discount Store
1,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Sandwich Place,Athletics & Sports,Pool,Gym,Pub,Skating Rink,Pharmacy,Discount Store
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Gas Station,Chinese Restaurant,Diner,Sandwich Place,Bridal Shop,Ice Cream Shop,Restaurant,Deli / Bodega
3,Bayview Village,Café,Japanese Restaurant,Chinese Restaurant,Bank,Dance Studio,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Coffee Shop,Italian Restaurant,Restaurant,Juice Bar,Pizza Place,Butcher,Sushi Restaurant,Fast Food Restaurant,Liquor Store


In [154]:
neighborhoods_venues_sorted.shape

(95, 11)

In [155]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Skating Rink,Clothing Store,Breakfast Spot,Lounge,Women's Store,Dance Studio,Dog Run,Distribution Center,Discount Store
1,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Sandwich Place,Athletics & Sports,Pool,Gym,Pub,Skating Rink,Pharmacy,Discount Store
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Gas Station,Chinese Restaurant,Diner,Sandwich Place,Bridal Shop,Ice Cream Shop,Restaurant,Deli / Bodega
3,Bayview Village,Café,Japanese Restaurant,Chinese Restaurant,Bank,Dance Studio,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Coffee Shop,Italian Restaurant,Restaurant,Juice Bar,Pizza Place,Butcher,Sushi Restaurant,Fast Food Restaurant,Liquor Store
...,...,...,...,...,...,...,...,...,...,...,...
90,Willowdale,Coffee Shop,Ramen Restaurant,Pizza Place,Sandwich Place,Grocery Store,Café,Home Service,Electronics Store,Steakhouse,Plaza
91,"Willowdale, Newtonbrook",Home Service,Women's Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
92,Woburn,Coffee Shop,Korean Restaurant,Dance Studio,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
93,Woodbine Heights,Spa,Cosmetics Shop,Beer Store,Diner,Dance Studio,Bus Stop,Curling Ice,Athletics & Sports,Skating Rink,Park


## Cluster the neighborhoods in Toronto


### Run k-means to cluster the neighborhood into 5 clusters.

In [156]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
 

array([0, 4, 0, 0, 0, 0, 0, 0, 0, 0])

In [157]:
print(pd.unique(df['Neighborhood']).size)

98


### New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [158]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Park,Food & Drink Shop,Women's Store,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Coffee Shop,Intersection,Portuguese Restaurant,Hockey Arena,Women's Store,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0.0,Coffee Shop,Park,Bakery,Breakfast Spot,Café,Mexican Restaurant,Historic Site,Distribution Center,Dessert Shop,Spa
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Accessories Store,Boutique,Coffee Shop,Arts & Crafts Store,Furniture / Home Store,Event Space,Gift Shop,Vietnamese Restaurant,Donut Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Coffee Shop,Diner,Yoga Studio,Arts & Crafts Store,Beer Bar,Hobby Shop,Juice Bar,Discount Store,Creperie,Mexican Restaurant


### let's visualize the resulting clusters

In [160]:
toronto_merged.shape

(103, 16)

There is 3 Neighborhood with no information returned by Forsquare API, so let's exclude this ones 

In [173]:
pd.unique(toronto_merged['Cluster Labels'])

array([ 3.,  0., nan,  4.,  1.,  2.])

In [162]:
def isNaN(num):
    return num != num

toronto_merged[isNaN(toronto_merged['Cluster Labels'])]

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,,,,,,
45,M2L,North York,"York Mills, Silver Hills",43.75749,-79.374714,,,,,,,,,,,
95,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,,,,,,,,,,,


In [172]:
#Drop Neighborhood with Nan value
toronto_merged_Cleaned = toronto_merged.dropna()
print(toronto_merged_Cleaned.shape)

(100, 16)


we need to convert the "Cluster Labels"  from float in integer :

In [171]:
#convert to integer 
toronto_merged_Cleaned['Cluster Labels'] = toronto_merged_Cleaned['Cluster Labels'].astype(np.int64)
#test
pd.unique(toronto_merged_Cleaned['Cluster Labels'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


array([3, 0, 4, 1, 2], dtype=int64)

let's visualize the resulting clusters in a map

In [169]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_Cleaned['Latitude'], toronto_merged_Cleaned['Longitude'], toronto_merged_Cleaned['Neighborhood'], toronto_merged_Cleaned['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


### Examine Clusters

#### Cluster 1

In [175]:
toronto_merged_Cleaned.loc[toronto_merged_Cleaned['Cluster Labels'] == 0, toronto_merged_Cleaned.columns[[1] + list(range(5, toronto_merged_Cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Coffee Shop,Intersection,Portuguese Restaurant,Hockey Arena,Women's Store,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store
2,Downtown Toronto,0.0,Coffee Shop,Park,Bakery,Breakfast Spot,Café,Mexican Restaurant,Historic Site,Distribution Center,Dessert Shop,Spa
3,North York,0.0,Clothing Store,Accessories Store,Boutique,Coffee Shop,Arts & Crafts Store,Furniture / Home Store,Event Space,Gift Shop,Vietnamese Restaurant,Donut Shop
4,Downtown Toronto,0.0,Coffee Shop,Diner,Yoga Studio,Arts & Crafts Store,Beer Bar,Hobby Shop,Juice Bar,Discount Store,Creperie,Mexican Restaurant
7,North York,0.0,Coffee Shop,Japanese Restaurant,Restaurant,Gym,Beer Store,Chinese Restaurant,Italian Restaurant,Bus Line,Bike Shop,Sporting Goods Shop
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,0.0,Café,Coffee Shop,Restaurant,Gastropub,Seafood Restaurant,Bakery,General Travel,Greek Restaurant,Gym,Gym / Fitness Center
98,Etobicoke,0.0,River,Pool,Women's Store,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
99,Downtown Toronto,0.0,Gay Bar,Coffee Shop,Burger Joint,Indian Restaurant,Italian Restaurant,Juice Bar,Steakhouse,Pub,Ramen Restaurant,Beer Bar
100,East Toronto,0.0,Yoga Studio,Auto Workshop,Skate Park,Smoke Shop,Spa,Burrito Place,Light Rail Station,Restaurant,Recording Studio,Farmers Market


#### Cluster 2

In [176]:
toronto_merged_Cleaned.loc[toronto_merged_Cleaned['Cluster Labels'] == 1, toronto_merged_Cleaned.columns[[1] + list(range(5, toronto_merged_Cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,1.0,Garden,Women's Store,Curling Ice,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant


In [177]:
toronto_merged_Cleaned.loc[toronto_merged_Cleaned['Cluster Labels'] == 2, toronto_merged_Cleaned.columns[[1] + list(range(5, toronto_merged_Cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,York,2.0,Convenience Store,Women's Store,Dance Studio,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner


#### Cluster 3

In [181]:
#Park
toronto_merged_Cleaned.loc[toronto_merged_Cleaned['Cluster Labels'] == 3, toronto_merged_Cleaned.columns[[1] + list(range(5, toronto_merged_Cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,3.0,Park,Food & Drink Shop,Women's Store,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
21,York,3.0,Park,Women's Store,Pool,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
35,East York,3.0,Park,Convenience Store,Women's Store,Curling Ice,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
61,Central Toronto,3.0,Park,Swim School,Bus Line,Dance Studio,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
66,North York,3.0,Park,Convenience Store,Bank,Women's Store,Dance Studio,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
85,Scarborough,3.0,Park,Playground,Women's Store,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
91,Downtown Toronto,3.0,Park,Playground,Trail,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
101,Etobicoke,3.0,Park,Baseball Field,Women's Store,Dance Studio,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner


#### Cluster 4

In [179]:
toronto_merged_Cleaned.loc[toronto_merged_Cleaned['Cluster Labels'] == 4, toronto_merged_Cleaned.columns[[1] + list(range(5, toronto_merged_Cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,4.0,Fast Food Restaurant,Print Shop,Construction & Landscaping,Women's Store,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
8,East York,4.0,Pizza Place,Fast Food Restaurant,Gastropub,Breakfast Spot,Pharmacy,Gym / Fitness Center,Bank,Intersection,Bus Line,Athletics & Sports
10,North York,4.0,Park,Japanese Restaurant,Pizza Place,Pub,Cuban Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
50,North York,4.0,Pizza Place,Empanada Restaurant,Dumpling Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
63,York,4.0,Pizza Place,Grocery Store,Convenience Store,Bus Line,Women's Store,Dance Studio,Drugstore,Donut Shop,Dog Run,Distribution Center
65,Scarborough,4.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
70,Etobicoke,4.0,Pizza Place,Coffee Shop,Discount Store,Intersection,Chinese Restaurant,Sandwich Place,Curling Ice,Dog Run,Distribution Center,Diner
77,Etobicoke,4.0,Park,Pizza Place,Sandwich Place,Bus Line,Cuban Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
82,Scarborough,4.0,Pizza Place,Intersection,Fried Chicken Joint,Rental Car Location,Thai Restaurant,Chinese Restaurant,Bank,Italian Restaurant,Fast Food Restaurant,Noodle House
89,Etobicoke,4.0,Pizza Place,Grocery Store,Fried Chicken Joint,Pharmacy,Fast Food Restaurant,Sandwich Place,Beer Store,Women's Store,Deli / Bodega,Department Store
