## Toronto Neighbourhood
#### Made by: <a href = "https://www.facebook.com/henriquemmenezes">Henrique Chaves</a>
#### <a href = "https://www.linkedin.com/in/henrique-c-a6b0a5121/">My Linkedin</a>

### 1st STEP: Web scraping and data cleaning.

First of all, import the requiered libraries for web scraping and data cleaning.

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np

First, I need to request the wikipedia page that contains the Neighbourhood table of Toronto using a GET method and BeautifulSoup HTML parser.

In [2]:
req = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
if req.status_code == 200:
    content = req.content
    print("OK.")
else:
    print("Error.")

OK.


In [46]:
soup = BeautifulSoup(content, "html.parser")
table = soup.find(name = "table")
#table

Then, I need to convert this content to string and use pandas to convert into a DataFrame.

In [4]:
table_str = str(table)
dftotal = pd.read_html(table_str)
dftotal

[    Postcode           Borough  \
 0        M1A      Not assigned   
 1        M2A      Not assigned   
 2        M3A        North York   
 3        M4A        North York   
 4        M5A  Downtown Toronto   
 5        M5A  Downtown Toronto   
 6        M6A        North York   
 7        M6A        North York   
 8        M7A      Queen's Park   
 9        M8A      Not assigned   
 10       M9A         Etobicoke   
 11       M1B       Scarborough   
 12       M1B       Scarborough   
 13       M2B      Not assigned   
 14       M3B        North York   
 15       M4B         East York   
 16       M4B         East York   
 17       M5B  Downtown Toronto   
 18       M5B  Downtown Toronto   
 19       M6B        North York   
 20       M7B      Not assigned   
 21       M8B      Not assigned   
 22       M9B         Etobicoke   
 23       M9B         Etobicoke   
 24       M9B         Etobicoke   
 25       M9B         Etobicoke   
 26       M9B         Etobicoke   
 27       M1C       

This returned a list of DataFrames, so, I will get the first one (that is what I need).

In [5]:
df = dftotal[0]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


There are a lot of "Not assigned" labels. So I will convert all "Not assigned" to NaN in the Borough column, and then, I'll drop these rows.

In [6]:
df["Borough"].replace("Not assigned", np.nan, inplace = True)
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,,Not assigned
1,M2A,,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,,Not assigned


In [7]:
df = df.dropna(axis = 0)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [8]:
df.reset_index(drop = True, inplace = True)
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


Now, I'll search for any "Not assigned" remaining on the Neighbourhood column.

In [9]:
df.loc[df["Neighbourhood"] == "Not assigned"]

Unnamed: 0,Postcode,Borough,Neighbourhood
6,M7A,Queen's Park,Not assigned


I just found one "Not assigned" on Neighbourhood. So I'll convert this to the same name of the Borough on the row.

In [10]:
df["Neighbourhood"].replace("Not assigned", "Queen's Park", inplace = True)
df.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


Now, we don't have more "Not assigned". 
<br>
Let's check if Postcode is linked with just one or more neighbourhoods.

In [11]:
df_grouped = df.groupby("Postcode").count()
df_grouped

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,2,2
M1C,3,3
M1E,3,3
M1G,1,1
M1H,1,1
M1J,1,1
M1K,3,3
M1L,3,3
M1M,3,3
M1N,2,2


Well, Postcode is linked with one or more neighbourhoods. <br>
So, I will get all neighbourhoods of each postcode, and then, I'll put all together in the same row.

In [12]:
df_new = pd.DataFrame(columns = ["Postcode", "Borough", "Neighbourhood"])
for index, row in df_grouped.iterrows():
    neigh = ''
    borough = df.loc[df["Postcode"] == index]["Borough"].values[0]
    for i in range(row["Neighbourhood"]):
        if len(neigh) == 0:
            neigh = df.loc[df["Postcode"] == index]["Neighbourhood"].values[i]
        else:
            neigh = neigh + ' , ' + df.loc[df["Postcode"] == index]["Neighbourhood"].values[i]
        
    df_new = df_new.append({"Postcode": index, "Borough": borough, "Neighbourhood": neigh}, ignore_index = True)

df_new

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge , Malvern"
1,M1C,Scarborough,"Highland Creek , Rouge Hill , Port Union"
2,M1E,Scarborough,"Guildwood , Morningside , West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park , Ionview , Kennedy Park"
7,M1L,Scarborough,"Clairlea , Golden Mile , Oakridge"
8,M1M,Scarborough,"Cliffcrest , Cliffside , Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff , Cliffside West"


Now that is good. Let's see the shape of our df_new.

In [13]:
df_new.shape

(103, 3)

So now, I will import the geopy library to get Latitude and Longitude for each Postal code.

In [14]:
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="toronto_cluster")

In [15]:
df_lat_lon = pd.DataFrame(columns = ["Latitude", "Longitude"])
df_lat_lon

Unnamed: 0,Latitude,Longitude


In [16]:
for index, row in df_new.iterrows():
    location = None
    location = geolocator.geocode("{}, Toronto, Ontario".format(row["Postcode"]))
    if location == None:
        latitude = np.nan
        longitude = np.nan
    else:
        latitude = location.latitude
        longitude = location.longitude
    df_lat_lon = df_lat_lon.append({"Latitude": latitude, "Longitude": longitude}, ignore_index=True)
    print("Postcode: {} , Latitude: {}, Longitude: {}".format(row["Postcode"], latitude, longitude))

df_lat_lon

Postcode: M1B , Latitude: 43.653963, Longitude: -79.387207
Postcode: M1C , Latitude: 43.653963, Longitude: -79.387207
Postcode: M1E , Latitude: nan, Longitude: nan
Postcode: M1G , Latitude: 43.6449033, Longitude: -79.3818364
Postcode: M1H , Latitude: nan, Longitude: nan
Postcode: M1J , Latitude: nan, Longitude: nan
Postcode: M1K , Latitude: nan, Longitude: nan
Postcode: M1L , Latitude: nan, Longitude: nan
Postcode: M1M , Latitude: nan, Longitude: nan
Postcode: M1N , Latitude: nan, Longitude: nan
Postcode: M1P , Latitude: nan, Longitude: nan
Postcode: M1R , Latitude: nan, Longitude: nan
Postcode: M1S , Latitude: nan, Longitude: nan
Postcode: M1T , Latitude: nan, Longitude: nan
Postcode: M1V , Latitude: nan, Longitude: nan
Postcode: M1W , Latitude: 43.6449033, Longitude: -79.3818364
Postcode: M1X , Latitude: nan, Longitude: nan
Postcode: M2H , Latitude: nan, Longitude: nan
Postcode: M2J , Latitude: 43.6449033, Longitude: -79.3818364
Postcode: M2K , Latitude: nan, Longitude: nan
Postcode:

Unnamed: 0,Latitude,Longitude
0,43.653963,-79.387207
1,43.653963,-79.387207
2,,
3,43.644903,-79.381836
4,,
5,,
6,,
7,,
8,,
9,,


The function couldn't got a lot of locations of postal code, so I'll download a CSV file that contains Latitude and Longitude for each Postal code from Toronto.

In [17]:
!wget -O latlongtoronto.csv https://cocl.us/Geospatial_data

--2019-10-15 02:48:04--  https://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 169.48.113.194
Connecting to cocl.us (cocl.us)|169.48.113.194|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-10-15 02:48:05--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.26.197
Connecting to ibm.box.com (ibm.box.com)|107.152.26.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-10-15 02:48:05--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Reusing existing connection to ibm.box.com:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-10-15 

Then, I will create a DataFrame with these informations.

In [18]:
df_lat_lon = pd.read_csv("latlongtoronto.csv")
df_lat_lon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now, I will join two dataframes by Postal Code, and then drop a column (the new dataframe have 2 postalcode columns) of postal code.

In [19]:
df_new = df_new.merge(df_lat_lon, how = "left", left_on = "Postcode", right_on = "Postal Code")
df_new.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge , Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek , Rouge Hill , Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood , Morningside , West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


In [20]:
df_new.drop(["Postal Code"], axis = 1, inplace = True)
df_new.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge , Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek , Rouge Hill , Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood , Morningside , West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [21]:
df_new.rename(columns = {"Postcode": "PostalCode"}, inplace = True)
df_new.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge , Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek , Rouge Hill , Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood , Morningside , West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


I will use Folium library to work with map.

In [22]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

# All requested packages already installed.



In [23]:
location = geolocator.geocode("Toronto, Canada")

toronto_map = folium.Map(
                location = [location.latitude, location.longitude], zoom_start = 11)
toronto_map

Let's see how the neighborhoods are located in Toronto.

In [25]:
for index, row in df_new.iterrows():
    label = folium.Popup(row["Borough"], parse_html = True)
    location = [row["Latitude"], row["Longitude"]]
    folium.CircleMarker(
        location,
        radius = 10,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False
        ).add_to(toronto_map)
    
toronto_map

In [49]:
# The code was removed by Watson Studio for sharing.

Let's explore Downtown Toronto.

In [27]:
location = geolocator.geocode("Downtown Toronto, Toronto, Canada")

downtown_toronto_map = folium.Map(
                location = [location.latitude, location.longitude], zoom_start = 13)
downtown_toronto_map

In [28]:
df_toronto = df_new.loc[df_new["Borough"]== "Downtown Toronto"]
df_toronto

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
50,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
51,M4X,Downtown Toronto,"Cabbagetown , St. James Town",43.667967,-79.367675
52,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
53,M5A,Downtown Toronto,"Harbourfront , Regent Park",43.65426,-79.360636
54,M5B,Downtown Toronto,"Ryerson , Garden District",43.657162,-79.378937
55,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
56,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
57,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
58,M5H,Downtown Toronto,"Adelaide , King , Richmond",43.650571,-79.384568
59,M5J,Downtown Toronto,"Harbourfront East , Toronto Islands , Union St...",43.640816,-79.381752


And see where the Neighborhood are located.

In [29]:
for index, row in df_toronto.iterrows():
    label = folium.Popup("Downtown Toronto", parse_html = True)
    location = [row["Latitude"], row["Longitude"]]
    folium.CircleMarker(
        location,
        radius = 10,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = '#d97386',
        fill_opacity = 0.7,
        parse_html = False
        ).add_to(downtown_toronto_map)
    
downtown_toronto_map

Now, let's import the libraries to work with JSON.

In [30]:
import json
from pandas.io.json import json_normalize

And create a function to get the venues nearby the neighborhood.

In [32]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        LIMIT = 100
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

And use this function onto our dataset of Dowtown Toronto.

In [33]:
downtown_venues = getNearbyVenues(names = df_toronto["Neighbourhood"],
                                     latitudes = df_toronto["Latitude"],
                                     longitudes = df_toronto["Longitude"])

Rosedale
Cabbagetown , St. James Town
Church and Wellesley
Harbourfront , Regent Park
Ryerson , Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide , King , Richmond
Harbourfront East , Toronto Islands , Union Station
Design Exchange , Toronto Dominion Centre
Commerce Court , Victoria Hotel
Harbord , University of Toronto
Chinatown , Grange Park , Kensington Market
CN Tower , Bathurst Quay , Island airport , Harbourfront West , King and Spadina , Railway Lands , South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place , Underground city
Christie


In [34]:
downtown_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Mooredale House,43.678631,-79.380091,Building
1,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
2,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
3,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
4,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
5,"Cabbagetown , St. James Town",43.667967,-79.367675,Butter Chicken Factory,43.667072,-79.369184,Indian Restaurant
6,"Cabbagetown , St. James Town",43.667967,-79.367675,Cranberries,43.667843,-79.369407,Diner
7,"Cabbagetown , St. James Town",43.667967,-79.367675,F'Amelia,43.667536,-79.368613,Italian Restaurant
8,"Cabbagetown , St. James Town",43.667967,-79.367675,Kingyo Toronto,43.665895,-79.368415,Japanese Restaurant
9,"Cabbagetown , St. James Town",43.667967,-79.367675,Murgatroid,43.667381,-79.369311,Restaurant


Let's see how many venues was found for each neighborhood.

In [35]:
downtown_venues.groupby("Neighborhood").count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide , King , Richmond",100,100,100,100,100,100
Berczy Park,57,57,57,57,57,57
"CN Tower , Bathurst Quay , Island airport , Harbourfront West , King and Spadina , Railway Lands , South Niagara",17,17,17,17,17,17
"Cabbagetown , St. James Town",44,44,44,44,44,44
Central Bay Street,88,88,88,88,88,88
"Chinatown , Grange Park , Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,87,87,87,87,87,87
"Commerce Court , Victoria Hotel",100,100,100,100,100,100
"Design Exchange , Toronto Dominion Centre",100,100,100,100,100,100


Then, I can perform an One Hot Encoding to modify our DataFrame for clustering.

In [36]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")
downtown_onehot = downtown_onehot.drop("Neighborhood", axis = 1)

# add neighborhood column back to dataframe
downtown_onehot = pd.concat((downtown_venues["Neighborhood"], downtown_onehot), axis = 1)


downtown_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Rosedale,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


I can group by Neighborhood and see the mean for each feature.

In [37]:
downtown_grouped = downtown_onehot.groupby("Neighborhood").mean().reset_index()
downtown_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide , King , Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower , Bathurst Quay , Island airport , Ha...",0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown , St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,...,0.0,0.0,0.011364,0.0,0.011364,0.0,0.011364,0.0,0.0,0.011364


Also, I would like to now the top 5 venues for each Neighborhood.

In [38]:
for i in np.arange(downtown_grouped.shape[0]):
    print("----- {} -----".format(downtown_grouped.loc[i, "Neighborhood"]))
    print(downtown_grouped.T.iloc[1:,i].sort_values(ascending = False).head())
    print("\n")
    

----- Adelaide , King , Richmond -----
Coffee Shop            0.07
Café                   0.05
Bar                    0.04
Thai Restaurant        0.04
American Restaurant    0.04
Name: 0, dtype: object


----- Berczy Park -----
Coffee Shop           0.0701754
Cocktail Bar          0.0526316
Café                  0.0350877
Beer Bar              0.0350877
Seafood Restaurant    0.0350877
Name: 1, dtype: object


----- CN Tower , Bathurst Quay , Island airport , Harbourfront West , King and Spadina , Railway Lands , South Niagara -----
Airport Service      0.176471
Airport Lounge       0.117647
Airport Terminal     0.117647
Plane               0.0588235
Bar                 0.0588235
Name: 2, dtype: object


----- Cabbagetown , St. James Town -----
Coffee Shop           0.0681818
Bakery                0.0454545
Pub                   0.0454545
Café                  0.0454545
Italian Restaurant    0.0454545
Name: 3, dtype: object


----- Central Bay Street -----
Coffee Shop            0.14772

It's possible to create a DataFrame for this most common venues. 

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide , King , Richmond",Coffee Shop,Café,Bar,Thai Restaurant,American Restaurant,Steakhouse,Sushi Restaurant,Asian Restaurant,Restaurant,Burger Joint
1,Berczy Park,Coffee Shop,Cocktail Bar,Café,Beer Bar,Seafood Restaurant,Bakery,Steakhouse,Cheese Shop,Farmers Market,Jazz Club
2,"CN Tower , Bathurst Quay , Island airport , Ha...",Airport Service,Airport Lounge,Airport Terminal,Plane,Bar,Coffee Shop,Sculpture Garden,Boutique,Boat or Ferry,Harbor / Marina
3,"Cabbagetown , St. James Town",Coffee Shop,Bakery,Pub,Café,Italian Restaurant,Pizza Place,Park,Restaurant,Grocery Store,American Restaurant
4,Central Bay Street,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Burger Joint,Ice Cream Shop,Middle Eastern Restaurant,Bubble Tea Shop,Bar,Spa


So, I can see that is a lot of Coffee Shops and there is a Neighbourhood based on Airport too.
<br>
Let's cluster it using KMeans and see how it is located around the map.

In [41]:
from sklearn.cluster import KMeans

kclusters = 3

kmeans = KMeans(n_clusters= kclusters, random_state = 29121999)
kmeans.fit(downtown_grouped.drop("Neighborhood", axis = 1))

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=29121999, tol=0.0001, verbose=0)

In [42]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = df_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

downtown_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Park,Playground,Trail,Building,Yoga Studio,Dim Sum Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant
51,M4X,Downtown Toronto,"Cabbagetown , St. James Town",43.667967,-79.367675,0,Coffee Shop,Bakery,Pub,Café,Italian Restaurant,Pizza Place,Park,Restaurant,Grocery Store,American Restaurant
52,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Gastropub,Men's Store,Mediterranean Restaurant,Hotel,Gym
53,M5A,Downtown Toronto,"Harbourfront , Regent Park",43.65426,-79.360636,0,Coffee Shop,Park,Pub,Bakery,Café,Breakfast Spot,Restaurant,Mexican Restaurant,Theater,Cosmetics Shop
54,M5B,Downtown Toronto,"Ryerson , Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,Fast Food Restaurant,Italian Restaurant,Café,Bookstore,Restaurant,Theater


In [43]:
# create map
location = geolocator.geocode("Downtown Toronto, Toronto, Canada")

map_clusters = folium.Map(
                location = [location.latitude, location.longitude], zoom_start = 12)

# set colors clusters
colors = ['#fc74a4', '#fcc374', '#7491fc']

# add markers to the map
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighbourhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=colors[cluster],
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

As I was expecting, a lot of clusters based on coffeshops and restaurants(pink). Another one based in airport(blue), and there is more one that I don't know (yellow). Lets explore.

In [44]:
downtown_merged.loc[downtown_merged["Cluster Labels"] == 1]

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Park,Playground,Trail,Building,Yoga Studio,Dim Sum Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant


And know we know that yellow cluster is based on park & playground.

### That's all folks!
#### Made by: <a href = "https://www.facebook.com/henriquemmenezes">Henrique Chaves</a>
#### <a href = "https://www.linkedin.com/in/henrique-c-a6b0a5121/">My Linkedin</a>
