# Clustering Assignment

## Requirements

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto. Start by creating a new Notebook for this assignment. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

In [1]:
# importing necessary libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
print('Libraries imported.')

Libraries imported.


In [2]:
# getting data from internet
wikipedia_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
raw_wikipedia_page= requests.get(wikipedia_link).text

# using beautiful soup to parse the HTML/XML codes.
soup = BeautifulSoup(raw_wikipedia_page,'lxml')
#print(soup.prettify())

In [3]:
# extracting the raw table inside that webpage
data = []
columns = []
table = soup.find(class_='wikitable')
for index, tr in enumerate(table.find_all('tr')):
    section = []
    for td in tr.find_all(['th','td']):
        section.append(td.text.rstrip())
    
    #First row of data is the header
    if (index == 0):
        columns = section
    else:
        data.append(section)

#convert list into Pandas DataFrame
canada_df = pd.DataFrame(data = data,columns = columns)
canada_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


# Step 1B: Data Cleanup
<ol>
    <li>Remove Boroughs that are 'Not assigned'</li>
<li>More than one neighborhood can exist in one postal code area, combined these into one row with the neighborhoods separated with a comma</li>
<li>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough</li>
    </ol>

In [7]:
#Remove Boroughs that are 'Not assigned'
canada_df = canada_df[canada_df['Borough'] != 'Not assigned']
canada_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [9]:
# More than one neighborhood can exist in one postal code area, combined these into one row with the neighborhoods separated with a comma
canada_df["Neighbourhood"] = canada_df.groupby("Postal Code")["Neighbourhood"].transform(lambda neigh: ', '.join(neigh))

#remove duplicates
canada_df = canada_df.drop_duplicates()

#update index to be postcode if it isn't already
if(canada_df.index.name != 'Postal Code'):
    canada_df = canada_df.set_index('Postal Code')
    
canada_df.head()

Unnamed: 0_level_0,Borough,Neighbourhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [13]:
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
canada_df['Neighbourhood'].replace("Not assigned", canada_df["Borough"],inplace=True)
canada_df.head()

Unnamed: 0_level_0,Borough,Neighbourhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [14]:
canada_df.shape

(103, 2)

# Question 2

## Use the Geocoder package or the csv file to create dataframe with longitude and latitude values

We will be using a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

In [15]:
geo_url="http://cocl.us/Geospatial_data"
geo_data=pd.read_csv(geo_url)

In [16]:
#geo_data.columns
geo_data.columns=['Postal Code', 'Latitude', 'Longitude']

In [17]:
toronto_df2= pd.merge(canada_df, geo_data, how='inner', on="Postal Code")
toronto_df2

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [18]:
toronto_df2.shape

(103, 5)

# Question 3
## Explore and cluster the neighborhoods in Toronto

In [19]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(toronto_df2['Borough'].unique()),
        toronto_df2.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighborhoods.


In [22]:
!pip install geopy
from geopy.geocoders import Nominatim 
import geopy
# convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
!pip install folium
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries imported.


In [23]:
#get Latitute and longitude of toronto

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ON")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of toronto City are 43.6534817, -79.3839347.


In [25]:
# create map of Torronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df2['Latitude'], toronto_df2['Longitude'], toronto_df2['Borough'], toronto_df2['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto) 
map_toronto

In [26]:
df_borough_toronto=toronto_df2[toronto_df2["Borough"].str.contains("Toronto")].reset_index(drop=True)
df_borough_toronto.size

195

In [27]:
df_borough_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [28]:
df_borough_toronto["Borough"].unique()

array(['Downtown Toronto', 'East Toronto', 'West Toronto',
       'Central Toronto'], dtype=object)

In [29]:
df_borough_toronto["color"]=df_borough_toronto["Borough"].map({'East Toronto':"green", 'Central Toronto':"red", 'Downtown Toronto':"blue",
       'West Toronto':"black"})

In [30]:
df_borough_toronto.shape

(39, 6)

In [32]:
#first neigbourhood
neighborhood_latitude1=df_borough_toronto["Latitude"][0]
neighborhood_longitude1=df_borough_toronto["Longitude"][0]
neighborhood_name1=df_borough_toronto["Neighbourhood"][0]

print (f"{neighborhood_name1} has lognitude and latitude as : [{neighborhood_latitude1},{neighborhood_longitude1}]")

Regent Park, Harbourfront has lognitude and latitude as : [43.6542599,-79.3606359]


In [48]:
# Setup API URL to explore venues near by
LIMIT=100
RADIUS=500
CLIENT_ID = "TUEWOICLELDTCLW0CFIYZRL0B5HKGFOI1AD2WM0SFCEBHI4L"
CLIENT_SECRET="CCCISA5WQH2OASVGMJRLYOLRXGKD351RXU1H1ABX4NFXQFMH"
VERSION = 20201217
url=f"https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&ll={neighborhood_latitude1},{neighborhood_longitude1}&v={VERSION}&radius={RADIUS}&limit={LIMIT}"
url
neighborhood_json = requests.get(url).json()["response"]["groups"][0]["items"]

In [49]:

# Serializing json
import json
json_object = json.dumps(neighborhood_json, indent = 4)

In [50]:
#save data as json file to explore
with open("jsonData.json","w") as f:
    f.write(json_object)

In [51]:
venues=neighborhood_json

In [52]:
#flatten Json
from pandas import json_normalize
nearby_venues=json_normalize(venues)

In [53]:
filtered_columns=['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']

In [54]:
nearby_venues=nearby_venues.loc[:,filtered_columns]

In [55]:
def getCategory_type(row):
    try:
        category_list=row["name"]
    except:
        category_list=row["venue.categories"]
    if len(category_list)==0:
        return None
    else:
        return category_list[0]["name"]

In [56]:
nearby_venues["categories"]= [x[0]["name"] for x in nearby_venues["venue.categories"]]

In [57]:
nearby_venues.drop(["venue.categories"],axis=1,inplace=True)

In [58]:
nearby_venues

Unnamed: 0,venue.name,venue.location.lat,venue.location.lng,categories
0,Roselle Desserts,43.653447,-79.362017,Bakery
1,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Impact Kitchen,43.656369,-79.35698,Restaurant
5,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
6,The Extension Room,43.653313,-79.359725,Gym / Fitness Center
7,The Distillery Historic District,43.650244,-79.359323,Historic Site
8,Corktown Common,43.655618,-79.356211,Park
9,SOMA chocolatemaker,43.650622,-79.358127,Chocolate Shop


In [59]:
df_borough_toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,color
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,blue
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,blue
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,blue
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,blue
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,green
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,blue
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,blue
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,blue
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,blue
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,black


In [111]:
def getNearByVenues(neighbourhood_name,lat,long):
    venues_list=[]

    for name, lat, lng in zip(neighbourhood_name,lat,long):
        print(name)
        
        url=f"https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&ll={lat},{lng}&v={VERSION}&radius={RADIUS}&limit={LIMIT}"
        neighborhood_json = requests.get(url).json()["response"]["groups"][0]["items"]
        venues_list.append([(
            name,
            lat,
            lng,
            v["venue"]["name"],
            v["venue"]["location"]["lat"],
            v["venue"]["location"]["lng"],
            v["venue"]["categories"][0]["name"]) for v in neighborhood_json])
        #appending list of  venuedetails as list into another list venues list
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns=['Neighbourhood', 
                    'Neighbourhood Latitude', 
                    'Neighbourhood Longitude', 
                    'Venue', 
                    'Venue Latitude', 
                    'Venue Longitude', 
                    'Venue Category']
    return (nearby_venues)

In [112]:
toronto_venues_df = getNearByVenues(df_borough_toronto['Neighbourhood'],df_borough_toronto['Latitude'],df_borough_toronto['Longitude'])

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


In [63]:
toronto_venues_df.shape

(1624, 7)

In [115]:
toronto_venues_df.groupby("Neighbourhood").count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,55,55,55,55,55,55
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",16,16,16,16,16,16
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,68,68,68,68,68,68
Christie,16,16,16,16,16,16
Church and Wellesley,75,75,75,75,75,75
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,33,33,33,33,33,33
Davisville North,9,9,9,9,9,9


In [116]:
#analyze the neighbourhoood
#creating dummy for each venue category

torento_onehot=pd.get_dummies(toronto_venues_df[["Venue Category"]], prefix="", prefix_sep="")

In [117]:
torento_onehot.shape

(1624, 235)

In [118]:
torento_onehot["Neighbourhood"]=toronto_venues_df["Neighbourhood"]

In [119]:
torento_onehot.head()

Unnamed: 0,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"


In [120]:
torento_onehot.columns.get_loc("Neighbourhood")

235

In [121]:

torento_onehot.columns[158]

'Miscellaneous Shop'

In [123]:
fixed_columns=[torento_onehot.columns[159]]+list(torento_onehot.columns[0:159])+list(torento_onehot.columns[160:])

In [124]:
len(fixed_columns)

236

In [125]:
torento_onehot=torento_onehot[fixed_columns]

In [126]:

torento_onehot.head()

Unnamed: 0,Modern European Restaurant,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"


In [127]:
torento_onehot_grouped=torento_onehot.groupby("Neighbourhood").mean().reset_index()

In [128]:
torento_onehot_grouped

Unnamed: 0,Neighbourhood,Modern European Restaurant,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0625,0.0625,0.0625,0.125,0.125,0.0625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.014706
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,...,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.026667
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [129]:

for  hood in torento_onehot_grouped["Neighbourhood"]:
    print(f"-------{hood}----")
    temp=torento_onehot_grouped[torento_onehot_grouped["Neighbourhood"]==hood].T.reset_index()
    temp.columns=["venue","freq"]
    temp=temp[1:]
    temp["freq"]=round(temp["freq"].astype(float),2)
    print(temp.sort_values(by="freq",axis=0,ascending=False).reset_index(drop=True).head(10))
    dict1={}
    print("\n")

-------Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1  Seafood Restaurant  0.04
2         Cheese Shop  0.04
3            Beer Bar  0.04
4        Cocktail Bar  0.04
5          Restaurant  0.04
6      Farmers Market  0.04
7              Bakery  0.04
8       Grocery Store  0.02
9            Pharmacy  0.02


-------Brockton, Parkdale Village, Exhibition Place----
                   venue  freq
0                   Café  0.13
1              Nightclub  0.09
2            Coffee Shop  0.09
3         Breakfast Spot  0.09
4          Grocery Store  0.04
5     Italian Restaurant  0.04
6  Performing Arts Venue  0.04
7           Climbing Gym  0.04
8             Restaurant  0.04
9          Burrito Place  0.04


-------Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                  venue  freq
0  Fast Food Restaurant  0.06
1         Auto Workshop  0.06
2            Comic Shop  0.06
3           Pizza Place  0.06
4      Recording Studi

In [131]:
dict1={}

for  hood in torento_onehot_grouped["Neighbourhood"]:
    val=[]
    #print(f"-------{hood}----")
    temp=torento_onehot_grouped[torento_onehot_grouped["Neighbourhood"]==hood].T.reset_index()
    temp.columns=["venue","freq"]
    temp=temp[1:]
    temp["freq"]=round(temp["freq"].astype(float),2)
    val=list(temp.sort_values(by="freq",axis=0,ascending=False).reset_index(drop=True).head(10)["venue"])
    dict1[hood]=val

In [132]:
cols=["No."+str(x)+"_common_Place" for x in range(1,11)]

In [133]:
neighborhoods_venues_sorted=pd.DataFrame(dict1).T

In [134]:
neighborhoods_venues_sorted.columns=cols

In [135]:
neighborhoods_venues_sorted.insert(0,"Neighbourhood",list(neighborhoods_venues_sorted.index))

In [136]:
neighborhoods_venues_sorted.reset_index(drop=True,inplace=True)

In [137]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
0,Berczy Park,Coffee Shop,Seafood Restaurant,Cheese Shop,Beer Bar,Cocktail Bar,Restaurant,Farmers Market,Bakery,Grocery Store,Pharmacy
1,"Brockton, Parkdale Village, Exhibition Place",Café,Nightclub,Coffee Shop,Breakfast Spot,Grocery Store,Italian Restaurant,Performing Arts Venue,Climbing Gym,Restaurant,Burrito Place
2,"Business reply mail Processing Centre, South C...",Fast Food Restaurant,Auto Workshop,Comic Shop,Pizza Place,Recording Studio,Restaurant,Butcher,Burrito Place,Brewery,Farmers Market
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Harbor / Marina,Plane,Boat or Ferry,Boutique,Coffee Shop,Rental Car Location,Bar,Sculpture Garden
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Department Store,Salad Place,Japanese Restaurant,Bubble Tea Shop,Thai Restaurant,Burger Joint


In [138]:
torento_onehot_grouped.head()

Unnamed: 0,Neighbourhood,Modern European Restaurant,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0625,0.0625,0.0625,0.125,0.125,0.0625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.014706


In [139]:
#Training
from  sklearn.cluster import KMeans

#set no of clusters
n_cluster=5
#set gtraining Data
training_Data=torento_onehot_grouped.drop("Neighbourhood",axis=1)
#Training the model
cluster_kmean=KMeans(n_clusters=n_cluster,random_state=0).fit(training_Data)
cluster_kmean

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

In [140]:
#check the labels
cluster_kmean.labels_

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 0, 2, 1, 2,
       2, 2, 2, 2, 0, 4, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2])

In [141]:
#adding cluster into venues tables
neighborhoods_venues_sorted.insert(0,"cluster_lablel",cluster_kmean.labels_)

In [142]:
neighborhoods_venues_sorted

Unnamed: 0,cluster_lablel,Neighbourhood,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
0,2,Berczy Park,Coffee Shop,Seafood Restaurant,Cheese Shop,Beer Bar,Cocktail Bar,Restaurant,Farmers Market,Bakery,Grocery Store,Pharmacy
1,2,"Brockton, Parkdale Village, Exhibition Place",Café,Nightclub,Coffee Shop,Breakfast Spot,Grocery Store,Italian Restaurant,Performing Arts Venue,Climbing Gym,Restaurant,Burrito Place
2,2,"Business reply mail Processing Centre, South C...",Fast Food Restaurant,Auto Workshop,Comic Shop,Pizza Place,Recording Studio,Restaurant,Butcher,Burrito Place,Brewery,Farmers Market
3,2,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Harbor / Marina,Plane,Boat or Ferry,Boutique,Coffee Shop,Rental Car Location,Bar,Sculpture Garden
4,2,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Department Store,Salad Place,Japanese Restaurant,Bubble Tea Shop,Thai Restaurant,Burger Joint
5,2,Christie,Grocery Store,Café,Park,Nightclub,Italian Restaurant,Candy Store,Baby Store,Athletics & Sports,Restaurant,Coffee Shop
6,2,Church and Wellesley,Coffee Shop,Gay Bar,Sushi Restaurant,Japanese Restaurant,Restaurant,Café,Pub,Men's Store,Mediterranean Restaurant,Yoga Studio
7,2,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym,American Restaurant,Seafood Restaurant,Deli / Bodega,Italian Restaurant,Japanese Restaurant
8,2,Davisville,Pizza Place,Sandwich Place,Dessert Shop,Café,Italian Restaurant,Coffee Shop,Gym,Sushi Restaurant,Indoor Play Area,Gas Station
9,2,Davisville North,Gym / Fitness Center,Food & Drink Shop,Department Store,Sandwich Place,Dance Studio,Hotel,Dog Run,Breakfast Spot,Park,Mediterranean Restaurant


In [143]:
torento_merged=toronto_df2.copy()
torento_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [144]:
torento_merged=pd.merge(torento_merged,neighborhoods_venues_sorted,on="Neighbourhood")

In [97]:
torento_merged.set_index("Postal Code",drop=True,inplace=True)

In [145]:
torento_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,cluster_lablel,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Park,Pub,Bakery,Theater,Café,Breakfast Spot,Gym / Fitness Center,Event Space,French Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Yoga Studio,Restaurant,Beer Bar,Fried Chicken Joint,Smoothie Shop,Mexican Restaurant,Café,Sandwich Place,Chinese Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Cosmetics Shop,Bubble Tea Shop,Diner,Ramen Restaurant,Italian Restaurant,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Coffee Shop,Café,Restaurant,Cocktail Bar,Beer Bar,American Restaurant,Gastropub,Department Store,Hotel,Gym
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Health Food Store,Pub,Trail,Neighborhood,Modern European Restaurant,Men's Store,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Massage Studio
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,2,Coffee Shop,Seafood Restaurant,Cheese Shop,Beer Bar,Cocktail Bar,Restaurant,Farmers Market,Bakery,Grocery Store,Pharmacy
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,2,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Department Store,Salad Place,Japanese Restaurant,Bubble Tea Shop,Thai Restaurant,Burger Joint
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2,Grocery Store,Café,Park,Nightclub,Italian Restaurant,Candy Store,Baby Store,Athletics & Sports,Restaurant,Coffee Shop
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,2,Coffee Shop,Café,Hotel,Gym,Restaurant,Thai Restaurant,Clothing Store,Bar,Breakfast Spot,Bookstore
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,2,Pharmacy,Bakery,Supermarket,Park,Music Venue,Middle Eastern Restaurant,Café,Brewery,Bar,Bank


In [146]:
torento_merged.columns

Index(['Postal Code', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude',
       'cluster_lablel', 'No.1_common_Place', 'No.2_common_Place',
       'No.3_common_Place', 'No.4_common_Place', 'No.5_common_Place',
       'No.6_common_Place', 'No.7_common_Place', 'No.8_common_Place',
       'No.9_common_Place', 'No.10_common_Place'],
      dtype='object')

In [147]:
# create map of Torronto using latitude and longitude values
map_toronto = folium.Map(location=[torento_merged["Latitude"][0], torento_merged["Longitude"][0]], zoom_start=10)

# set color scheme for the clusters
x = np.arange(n_cluster)
ys = [i + x + (i*x)**2 for i in range(n_cluster)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
for lat, lng,neighborhood,cluster_label in zip(torento_merged['Latitude'], torento_merged['Longitude'], torento_merged['Neighbourhood'],torento_merged["cluster_lablel"]):
    
    label = folium.Popup(str(neighborhood)+"cluster\n"+str(cluster_label), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster_label],
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto) 
map_toronto

# Cluster1

In [148]:
torento_merged[torento_merged["cluster_lablel"]== 0]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,cluster_lablel,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
18,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Park,Bus Line,Swim School,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant
21,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307,0,Park,Jewelry Store,Trail,Sushi Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Lounge,Malay Restaurant,Market,Martial Arts School
33,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,0,Park,Playground,Trail,Monument / Landmark,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio


# Cluster 2

In [149]:
torento_merged[torento_merged["cluster_lablel"]== 1]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,cluster_lablel,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
29,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,1,Playground,Trail,Modern European Restaurant,Monument / Landmark,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio


# Cluster 3

In [150]:
torento_merged[torento_merged["cluster_lablel"]== 2]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,cluster_lablel,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Park,Pub,Bakery,Theater,Café,Breakfast Spot,Gym / Fitness Center,Event Space,French Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Yoga Studio,Restaurant,Beer Bar,Fried Chicken Joint,Smoothie Shop,Mexican Restaurant,Café,Sandwich Place,Chinese Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Cosmetics Shop,Bubble Tea Shop,Diner,Ramen Restaurant,Italian Restaurant,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Coffee Shop,Café,Restaurant,Cocktail Bar,Beer Bar,American Restaurant,Gastropub,Department Store,Hotel,Gym
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,2,Coffee Shop,Seafood Restaurant,Cheese Shop,Beer Bar,Cocktail Bar,Restaurant,Farmers Market,Bakery,Grocery Store,Pharmacy
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,2,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Department Store,Salad Place,Japanese Restaurant,Bubble Tea Shop,Thai Restaurant,Burger Joint
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2,Grocery Store,Café,Park,Nightclub,Italian Restaurant,Candy Store,Baby Store,Athletics & Sports,Restaurant,Coffee Shop
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,2,Coffee Shop,Café,Hotel,Gym,Restaurant,Thai Restaurant,Clothing Store,Bar,Breakfast Spot,Bookstore
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,2,Pharmacy,Bakery,Supermarket,Park,Music Venue,Middle Eastern Restaurant,Café,Brewery,Bar,Bank
10,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,2,Coffee Shop,Aquarium,Café,Hotel,Italian Restaurant,Scenic Lookout,Restaurant,Fried Chicken Joint,Brewery,Baseball Stadium


# Cluster 4

In [151]:
torento_merged[torento_merged["cluster_lablel"]== 3]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,cluster_lablel,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Health Food Store,Pub,Trail,Neighborhood,Modern European Restaurant,Men's Store,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Massage Studio


# Cluster 5

In [152]:
torento_merged[torento_merged["cluster_lablel"]== 4]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,cluster_lablel,No.1_common_Place,No.2_common_Place,No.3_common_Place,No.4_common_Place,No.5_common_Place,No.6_common_Place,No.7_common_Place,No.8_common_Place,No.9_common_Place,No.10_common_Place
19,M5N,Central Toronto,Roselawn,43.711695,-79.416936,4,Music Venue,Garden,Modern European Restaurant,Monument / Landmark,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio


Observations: Most of the neighborhoods fall into Cluster 4 which are mostly business areas with cafe, restaurants, supermarkets etc. Cluster 2& 3 is just a garden, Cluster 3 are playground and park, Cluster 5 home service and garden and swim school, and lastly Cluster 1 is pub and noraml food service and trail.