# Toronto Neighborhood Project Assignment

### Introduction

In this notebook, we have retrieved of Toronto Neighborhoods from a Wikipedia page, used Foursquare API to retrieve the neighborhood information of the Downtown Toronto Borough and then explored and clustered the data. Todo all this we have used Geocoder to get co-ordinates of neighborhoods, Foursquare API to get info of neighborhoods and K-means to cluster the data returned by Foursquare. The assignment is divided into three sections, Data Retrieval, getting Geographical co-ordinates and Exploring and Clustering. They are marked as Part A, Part B and Part C

## Part A : Retrieving Data of Toronto Neighborhoods from Wikipedia Page

### Importing all Required Libraries

In [108]:
import urllib.request #library used to open URL
from bs4 import BeautifulSoup #library used to parse HTML
import pandas as pd
import numpy as np
print('All Libraries imported')

All Libraries imported


In [109]:
#Specify which URl page to use for scraping Data
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [110]:
# open URl using urllib.request and save HTML in variable page
page = urllib.request.urlopen(url)

In [111]:
#parse HTML from our URL into BeautifulSoup parse tree format
soup = BeautifulSoup(page, "lxml")

In [112]:
#Using find_all function to get back all the tables in the HTML
all_tables = soup.find_all("table")
all_tables

[<table class="wikitable sortable">
 <tbody><tr>
 <th>Postcode</th>
 <th>Borough</th>
 <th>Neighbourhood
 </th></tr>
 <tr>
 <td>M1A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M2A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M3A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
 </td></tr>
 <tr>
 <td>M4A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
 </td></tr>
 <tr>
 <td>M5A</td>
 <td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
 <td><a href="/wiki/Regent_Park" title="Regent Park">Harbourfront</a>
 </td></tr>
 <tr>
 <td>M6A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Lawrence_Heights" title="Lawrence Heights">Lawrence Heights</a>
 </td></tr>
 <tr>


In [113]:
# Extracting the Postal Code table from the tables above
right_table = soup.find('table', class_='wikitable sortable')
right_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Harbourfront</a>
</td></tr>
<tr>
<td>M6A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Lawrence_Heights" title="Lawrence Heights">Lawrence Heights</a>
</td></tr>
<tr>
<td>M6A</td>
<td><a href="/wiki/North

In [114]:
# Extracting Data from the table above
P = []
B = []
N =[]
for row in right_table.find_all('tr'):
    cells = row.find_all('td')
    if len(cells)== 3:
        P.append(cells[0].find(text=True))
        B.append(cells[1].find(text = True))
        N.append(cells[2].find(text = True))

In [115]:
# Creating a Pandas Dataframe using the Lists above
df = pd.DataFrame(P, columns = ['PostalCode'])
df['Borough'] = B
df['Neighborhood']=N
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Not assigned
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Queen's Park


### Cleaning  and pre-processing the Data

In [116]:
#Dropping rows with  Borough = 'Not assigned'
df = df[df.Borough!= 'Not assigned']

# Resetting the index
df.index = np.arange(0, len(df))

# Merging rows with same PostalCode
df = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(list)
df = df.sample(frac=1).reset_index()
df['Neighborhood']= df['Neighborhood'].str.join(',')

# removing unwanted |n's at end of Neighborhood values which showed up after running the groupby command
df['Neighborhood']= df['Neighborhood'].str.replace('\n', '') 

# Assigning the value of Borough to Neighborhood if Neighborhood is 'Not assigned'
df.loc[df['Neighborhood']=='Not assigned', 'Neighborhood'] = df.loc[df['Neighborhood']=='Not assigned', 'Borough']

df


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market"
1,M1E,Scarborough,"Guildwood,Morningside,West Hill"
2,M4P,Central Toronto,Davisville North
3,M2K,North York,Bayview Village
4,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
5,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout..."
6,M1G,Scarborough,Woburn
7,M3N,North York,Downsview Northwest
8,M4K,East Toronto,"The Danforth West,Riverdale"
9,M4W,Downtown Toronto,Rosedale


In [117]:
df.shape

(103, 3)

## Part B : Get Longitude and Latitude Values of Postal Codes using geocoder

In [118]:
# Adding empty columns Latitude and Longitude to dataframe for storing longitude and Latitude values
df['Longitude'] = ""
df['Latitude'] = ""
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",,
1,M1E,Scarborough,"Guildwood,Morningside,West Hill",,
2,M4P,Central Toronto,Davisville North,,
3,M2K,North York,Bayview Village,,
4,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",,


In [119]:
# Getting Latitude and Longitude values for all postl Codes using Geocoder
import geocoder
lat_lng_coords = None
#looping through the dataframe for all postal codes
for postalcode in df['PostalCode']:
    g = geocoder.arcgis('{} Toronto, Ontario'.format(postalcode))
    lat_lng_coords = g.latlng
# locating row where "PostalCode" = postalcode
    i = df.loc[df['PostalCode']== postalcode].index
# Assigning the Co-ordinates returned by geocoder to the dataframe Latitude and Longitude columns of row i
    df.loc[i,'Latitude'] = lat_lng_coords[0] 
    df.loc[i,'Longitude'] = lat_lng_coords[1]
df

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",-79.3972,43.6535
1,M1E,Scarborough,"Guildwood,Morningside,West Hill",-79.1752,43.7658
2,M4P,Central Toronto,Davisville North,-79.3885,43.7128
3,M2K,North York,Bayview Village,-79.3805,43.781
4,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",-79.2636,43.7263
5,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout...",-79.4896,43.6328
6,M1G,Scarborough,Woburn,-79.2176,43.7684
7,M3N,North York,Downsview Northwest,-79.5196,43.7554
8,M4K,East Toronto,"The Danforth West,Riverdale",-79.3551,43.6832
9,M4W,Downtown Toronto,Rosedale,-79.3779,43.6822


## Part C : Exploring and Clustering Neighborhoods in Toronto

Importing all required libraries and packages

In [120]:
from geopy.geocoders import Nominatim 
import folium
import requests
import json 
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
print('All Libraries and Packages imported')

All Libraries and Packages imported


#### Using geopy to retrieve Latitude and Longitude of Toronto

In [121]:

address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent = "toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical co-ordinates of Toronto are: {}, {}'.format(latitude, longitude))

The geographical co-ordinates of Toronto are: 43.653963, -79.387207


#### Create a map of Toronto with neighborhoods superimposed on top

In [122]:
# Create map of Toronto using the Latitude and Longitude values from above
map_toronto = folium.Map(location = [latitude, longitude], zoom_start = 10)
map_toronto
# add markers to map showing all buroughs in df_toronto dataframe

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{},{}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_toronto)
map_toronto

### Exploring the Borough Downtown Toronto 

#### Create new Dataframe with neighborhoods in Downtown Toronto

In [123]:
# Creating Dataframe with buroughs having Toronto in their name

df_toronto = df[df['Borough'] == 'Downtown Toronto'].reset_index(drop = True)
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",-79.3972,43.6535
1,M4W,Downtown Toronto,Rosedale,-79.3779,43.6822
2,M5C,Downtown Toronto,St. James Town,-79.3755,43.6512
3,M5K,Downtown Toronto,"Design Exchange,Toronto Dominion Centre",-79.3815,43.6471
4,M4X,Downtown Toronto,"Cabbagetown,St. James Town",-79.3666,43.6682


#### Getting the Co-ordinates of Downtown Toronto

In [124]:
address = 'Downtown Toronto, Ontario'
geolocator = Nominatim(user_agent = "toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical co-ordinates of Downtown Toronto are: {}, {}'.format(latitude, longitude))

The geographical co-ordinates of Downtown Toronto are: 43.655115, -79.380219


#### Create Map of Downtown Toronto with Neighborhood markers superimposed on top

In [125]:
# Create map of Downtown Toronto using the above latitude and logitude values
map_downtown_toronto = folium.Map(location = [latitude, longitude], zoom_start = 11)

# Add markers showing neighborhoods in Downtown Toronto
for lat, lng, label in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_downtown_toronto)
map_downtown_toronto

##### Using Foursquare API to explore neighborhoods in Downtown Toronto

### Defining Foursquare API Credentials

In [126]:
CLIENT_ID = 'UM1K52E4KEGGMPMZIDTLRDE0FEWWNBDSA3R0F0KN3RLAXS3U'
CLIENT_SECRET = 'KNVM15Q0PDOZBUUWBQ51DGJEUJZQYEXHCP0OAC3JDUB0LUB0'
VERSION = '20180605'

print('Foursquare Credentials')
print('CLient_id:',CLIENT_ID)
print('Client_Secret:', CLIENT_SECRET)

Foursquare Credentials
CLient_id: UM1K52E4KEGGMPMZIDTLRDE0FEWWNBDSA3R0F0KN3RLAXS3U
Client_Secret: KNVM15Q0PDOZBUUWBQ51DGJEUJZQYEXHCP0OAC3JDUB0LUB0


#### Exploring First Neighborhood in Dataframe

In [127]:
df_toronto.loc[0,'Neighborhood']

'Chinatown,Grange Park,Kensington Market'

Getting Location of the first neighborhood of the Downtown Toronto Dataframe

In [128]:
neighborhood_latitude = df_toronto.loc[0, 'Latitude']
neighborhood_longitude = df_toronto.loc[0, 'Longitude']
neighborhood_name = df_toronto.loc[0, 'Neighborhood']

print('Latitude and Longitude values of {} are {} and {}.'.format(neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Latitude and Longitude values of Chinatown,Grange Park,Kensington Market are 43.653530000000046 and -79.39723268299997.


#### Getting the top 100 venues in CN Tower, Bathurst Quay, Island Airport, Harbourfront West, King and Spadina, Railway Lands and South Niagara Neighborhood within a 500m radius

Creating URl for the request to the Foursquare API

In [129]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,
                                                                                                                      CLIENT_SECRET,
                                                                                                                      VERSION,
                                                                                                                      neighborhood_latitude,
                                                                                                                      neighborhood_longitude,
                                                                                                                      radius,
                                                                                                                      LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?&client_id=UM1K52E4KEGGMPMZIDTLRDE0FEWWNBDSA3R0F0KN3RLAXS3U&client_secret=KNVM15Q0PDOZBUUWBQ51DGJEUJZQYEXHCP0OAC3JDUB0LUB0&v=20180605&ll=43.653530000000046,-79.39723268299997&radius=500&limit=100


In [130]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5dd18d36c94979001b685b53'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Grange Park',
  'headerFullLocation': 'Grange Park, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 84,
  'suggestedBounds': {'ne': {'lat': 43.65803000450005,
    'lng': -79.39102475860142},
   'sw': {'lat': 43.649029995500044, 'lng': -79.40344060739852}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4af45a8af964a52097f121e3',
       'name': 'Banh Mi Nguyen Huong',
       'location': {'address': '322 Spadina Ave.',
        'crossStreet': 'at Dundas St E',
        'lat': 43.653628006211044,
        'lng': -79.3983759881993,
        'labeledLa

Creating function to get the category of the venue from the results

In [131]:
def get_category_type(row):
    try:
        category_list = row['categories']
    except:
        category_list = row['venue.categories']
    if len(category_list)==0:
        return None
    else:
        return category_list[0]['name']

Cleaning the json and structuring it into a pandas dataframe

In [132]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)

#filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis = 1)

# clean the columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()



Unnamed: 0,name,categories,lat,lng
0,Banh Mi Nguyen Huong,Vietnamese Restaurant,43.653628,-79.398376
1,Saigon Lotus Restaurant,Vietnamese Restaurant,43.654311,-79.399225
2,Meeplemart,Gaming Cafe,43.651628,-79.39741
3,The Moonbean Cafe,Café,43.654147,-79.400182
4,El Rey,Cocktail Bar,43.652764,-79.400048


Total number of nearby venues returned by Foursquare

In [133]:
print('{} nearby venues were returned by Foursquare'.format(nearby_venues.shape[0]))

84 nearby venues were returned by Foursquare


### Exploring all the neighborhoods in Downtown Toronto

#### Create a Function to repeat the process of retrieving the venues for a neighborhood for all neighborhoods

In [134]:
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        #Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
               CLIENT_ID,
               CLIENT_SECRET,
               VERSION,
               lat,
               lng,
               radius,
               LIMIT)
        # make the Get request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each neighborhood
        venues_list.append([(
          name,
          lat,
          lng,
          v['venue']['name'],
          v['venue']['location']['lat'],
          v['venue']['location']['lng'],
          v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                            'Neighborhood Latitude',
                            'Neighborhood Longitude',
                            'Venue',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category']
    return (nearby_venues)

#### Call the above function to get venues for all the Neighborhoods in Downtown Toronto

In [135]:
downtown_toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )

Chinatown,Grange Park,Kensington Market
Rosedale
St. James Town
Design Exchange,Toronto Dominion Centre
Cabbagetown,St. James Town
Ryerson,Garden District
Stn A PO Boxes 25 The Esplanade
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Christie
Central Bay Street
Harbord,University of Toronto
Harbourfront East,Toronto Islands,Union Station
Berczy Park
First Canadian Place,Underground city
Harbourfront
Church and Wellesley
Adelaide,King,Richmond
Commerce Court,Victoria Hotel


Checking the size of the dataframe

In [136]:
print(downtown_toronto_venues.shape)

(1255, 7)


Checking number of venues returned for each Neighborhood

In [137]:
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,63,63,63,63,63,63
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",71,71,71,71,71,71
"Cabbagetown,St. James Town",39,39,39,39,39,39
Central Bay Street,96,96,96,96,96,96
"Chinatown,Grange Park,Kensington Market",84,84,84,84,84,84
Christie,11,11,11,11,11,11
Church and Wellesley,86,86,86,86,86,86
"Commerce Court,Victoria Hotel",100,100,100,100,100,100
"Design Exchange,Toronto Dominion Centre",100,100,100,100,100,100


Getting the count of unique categories returned

In [138]:
print('There are {} unique catogories:'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 186 unique catogories:


### Analyzing Each Neighborhood

In [139]:
# one hot encoding
down_tor_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# There is a venue category 'Neighborhood', changing its name to 'Hoods'
down_tor_onehot.rename(columns={'Neighborhood': 'Hoods'}, inplace = True)

# add neighborhood column back as first column of dataframe
down_tor_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [down_tor_onehot.columns[-1]] + list(down_tor_onehot.columns[:-1])
down_tor_onehot = down_tor_onehot[fixed_columns]

down_tor_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,...,Trail,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,"Chinatown,Grange Park,Kensington Market",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
1,"Chinatown,Grange Park,Kensington Market",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
2,"Chinatown,Grange Park,Kensington Market",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Chinatown,Grange Park,Kensington Market",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Chinatown,Grange Park,Kensington Market",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Getting Size of new dataframe

In [140]:
down_tor_onehot.shape

(1255, 187)

#### Grouping rows by neighborhoods and taking the mean of the frequencies of each category

In [141]:
down_tor_grouped = down_tor_onehot.groupby('Neighborhood').mean().reset_index()
down_tor_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,...,Trail,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.03,0.0,0.01,0.0,0.03,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,...,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,...,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085
3,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.010417,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.010417,0.010417,0.010417,0.0,0.0,0.0


#### Confirming New Size

In [142]:
down_tor_grouped.shape

(18, 187)

#### Printing the top 5 venues for each Neighborhood

In [143]:
num_top_venues = 5

for hood in down_tor_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = down_tor_grouped[down_tor_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
              venue  freq
0       Coffee Shop  0.08
1              Café  0.06
2             Hotel  0.05
3        Steakhouse  0.04
4  Asian Restaurant  0.03


----Berczy Park----
         venue  freq
0  Coffee Shop  0.08
1   Restaurant  0.05
2       Bakery  0.05
3   Steakhouse  0.03
4         Café  0.03


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
                venue  freq
0         Coffee Shop  0.11
1  Italian Restaurant  0.07
2                 Bar  0.04
3                Park  0.04
4      Sandwich Place  0.03


----Cabbagetown,St. James Town----
                venue  freq
0          Restaurant  0.08
1         Coffee Shop  0.08
2              Bakery  0.05
3                Café  0.05
4  Italian Restaurant  0.05


----Central Bay Street----
                  venue  freq
0           Coffee Shop  0.10
1        Clothing Store  0.06
2        Cosmetics Shop  0.03
3  Fast Food Restaurant  0.03
4 

#### Putting these in a Pandas Dataframe

Write a function to sort the venues in descending order

In [144]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

 Create new Dataframe with top ten venues for each neighborhood

In [145]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = down_tor_grouped['Neighborhood']

for ind in np.arange(down_tor_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(down_tor_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Hotel,Steakhouse,Japanese Restaurant,Burger Joint,Bar,Bakery,Restaurant,Asian Restaurant
1,Berczy Park,Coffee Shop,Restaurant,Bakery,Cocktail Bar,Farmers Market,Beer Bar,Cheese Shop,Hotel,Lounge,Steakhouse
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Coffee Shop,Italian Restaurant,Park,Bar,Pub,Sandwich Place,Speakeasy,Bakery,Café,Restaurant
3,"Cabbagetown,St. James Town",Restaurant,Coffee Shop,Bakery,Café,Italian Restaurant,Pizza Place,Park,Diner,Jewelry Store,Sandwich Place
4,Central Bay Street,Coffee Shop,Clothing Store,Bakery,Fast Food Restaurant,Cosmetics Shop,Restaurant,Plaza,Bookstore,Chinese Restaurant,Tea Room


### Clustering Neighborhoods

Running K-means to cluster neighborhoods into 5 clusters

In [146]:
# setting  number of clusters
kclusters = 5

down_tor_grouped_clustering = down_tor_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(down_tor_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 3, 0, 3, 4, 0, 0, 0])

Creating new dataframe which includes the cluster as well as the top 10 venues

In [147]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

down_tor_merged = df_toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
down_tor_merged = down_tor_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

down_tor_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",-79.3972,43.6535,3,Café,Vietnamese Restaurant,Bar,Chinese Restaurant,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Park,Bakery
1,M4W,Downtown Toronto,Rosedale,-79.3779,43.6822,1,Playground,Candy Store,Park,Grocery Store,Electronics Store,Food & Drink Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
2,M5C,Downtown Toronto,St. James Town,-79.3755,43.6512,0,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Breakfast Spot,Seafood Restaurant,Gastropub,Cosmetics Shop
3,M5K,Downtown Toronto,"Design Exchange,Toronto Dominion Centre",-79.3815,43.6471,0,Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Seafood Restaurant,Gastropub,Steakhouse,Italian Restaurant,Bar
4,M4X,Downtown Toronto,"Cabbagetown,St. James Town",-79.3666,43.6682,3,Restaurant,Coffee Shop,Bakery,Café,Italian Restaurant,Pizza Place,Park,Diner,Jewelry Store,Sandwich Place


Visualizing the clusters

In [148]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(down_tor_merged['Latitude'], down_tor_merged['Longitude'], down_tor_merged['Neighborhood'], down_tor_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining Clusters

#### Cluster 1

In [149]:
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 0, down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Breakfast Spot,Seafood Restaurant,Gastropub,Cosmetics Shop
3,Downtown Toronto,0,Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Seafood Restaurant,Gastropub,Steakhouse,Italian Restaurant,Bar
5,Downtown Toronto,0,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Restaurant,Italian Restaurant,Ramen Restaurant,Burger Joint,Sporting Goods Shop
6,Downtown Toronto,0,Coffee Shop,Steakhouse,Hotel,Bar,Café,Thai Restaurant,American Restaurant,Sushi Restaurant,Asian Restaurant,Pizza Place
7,Downtown Toronto,0,Coffee Shop,Italian Restaurant,Park,Bar,Pub,Sandwich Place,Speakeasy,Bakery,Café,Restaurant
9,Downtown Toronto,0,Coffee Shop,Clothing Store,Bakery,Fast Food Restaurant,Cosmetics Shop,Restaurant,Plaza,Bookstore,Chinese Restaurant,Tea Room
12,Downtown Toronto,0,Coffee Shop,Restaurant,Bakery,Cocktail Bar,Farmers Market,Beer Bar,Cheese Shop,Hotel,Lounge,Steakhouse
13,Downtown Toronto,0,Coffee Shop,Hotel,Café,American Restaurant,Restaurant,Seafood Restaurant,Bar,Bakery,Gastropub,Asian Restaurant
14,Downtown Toronto,0,Coffee Shop,Bakery,Boat or Ferry,Theater,Spa,Italian Restaurant,Cosmetics Shop,Breakfast Spot,Brewery,Gastropub
15,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Restaurant,Gay Bar,Sushi Restaurant,Gastropub,Hotel,Dance Studio,Italian Restaurant,Men's Store


 Renaming Cluster 1 as 'Coffee Shops' since the most common venue in this cluster is Coffee Shop

In [159]:
down_tor_merged.loc[(down_tor_merged['Cluster Labels']) == 0,'Cluster Labels']= 'Coffee Shops'
# Display Cluster 1
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 'Coffee Shops', down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,Coffee Shops,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Breakfast Spot,Seafood Restaurant,Gastropub,Cosmetics Shop
3,Downtown Toronto,Coffee Shops,Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Seafood Restaurant,Gastropub,Steakhouse,Italian Restaurant,Bar
5,Downtown Toronto,Coffee Shops,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Restaurant,Italian Restaurant,Ramen Restaurant,Burger Joint,Sporting Goods Shop
6,Downtown Toronto,Coffee Shops,Coffee Shop,Steakhouse,Hotel,Bar,Café,Thai Restaurant,American Restaurant,Sushi Restaurant,Asian Restaurant,Pizza Place
7,Downtown Toronto,Coffee Shops,Coffee Shop,Italian Restaurant,Park,Bar,Pub,Sandwich Place,Speakeasy,Bakery,Café,Restaurant
9,Downtown Toronto,Coffee Shops,Coffee Shop,Clothing Store,Bakery,Fast Food Restaurant,Cosmetics Shop,Restaurant,Plaza,Bookstore,Chinese Restaurant,Tea Room
12,Downtown Toronto,Coffee Shops,Coffee Shop,Restaurant,Bakery,Cocktail Bar,Farmers Market,Beer Bar,Cheese Shop,Hotel,Lounge,Steakhouse
13,Downtown Toronto,Coffee Shops,Coffee Shop,Hotel,Café,American Restaurant,Restaurant,Seafood Restaurant,Bar,Bakery,Gastropub,Asian Restaurant
14,Downtown Toronto,Coffee Shops,Coffee Shop,Bakery,Boat or Ferry,Theater,Spa,Italian Restaurant,Cosmetics Shop,Breakfast Spot,Brewery,Gastropub
15,Downtown Toronto,Coffee Shops,Coffee Shop,Japanese Restaurant,Restaurant,Gay Bar,Sushi Restaurant,Gastropub,Hotel,Dance Studio,Italian Restaurant,Men's Store


### Cluster 2

In [151]:
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 1, down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,1,Playground,Candy Store,Park,Grocery Store,Electronics Store,Food & Drink Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


Since this cluster has only one element and top two venues are Playground and Park we can rename it "Kid Play Areas'

In [160]:
down_tor_merged.loc[(down_tor_merged['Cluster Labels']) == 1,'Cluster Labels']= 'Kid Play Areas'
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 'Kid Play Areas', down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,Kid Play Areas,Playground,Candy Store,Park,Grocery Store,Electronics Store,Food & Drink Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


### Cluster 3

In [153]:
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 2, down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Downtown Toronto,2,Pier,Harbor / Marina,Park,Yoga Studio,Electronics Store,Food & Drink Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In this cluster we see that the most common venues are waterfront venues so we can name it 'Waterfront'

In [161]:
down_tor_merged.loc[(down_tor_merged['Cluster Labels']) == 2,'Cluster Labels']= 'Waterfront'
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 'Waterfront', down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Downtown Toronto,Waterfront,Pier,Harbor / Marina,Park,Yoga Studio,Electronics Store,Food & Drink Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


### Cluster 4

In [155]:
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 3, down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,3,Café,Vietnamese Restaurant,Bar,Chinese Restaurant,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Park,Bakery
4,Downtown Toronto,3,Restaurant,Coffee Shop,Bakery,Café,Italian Restaurant,Pizza Place,Park,Diner,Jewelry Store,Sandwich Place
10,Downtown Toronto,3,Café,Bakery,Coffee Shop,Restaurant,Italian Restaurant,Bar,Bookstore,Japanese Restaurant,Gym,Ramen Restaurant


The most common venues in this cluster eating places so we could name it 'Restaurants'

In [162]:
down_tor_merged.loc[(down_tor_merged['Cluster Labels']) == 3,'Cluster Labels']= 'Restaurants'
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 'Restaurants', down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Restaurants,Café,Vietnamese Restaurant,Bar,Chinese Restaurant,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Park,Bakery
4,Downtown Toronto,Restaurants,Restaurant,Coffee Shop,Bakery,Café,Italian Restaurant,Pizza Place,Park,Diner,Jewelry Store,Sandwich Place
10,Downtown Toronto,Restaurants,Café,Bakery,Coffee Shop,Restaurant,Italian Restaurant,Bar,Bookstore,Japanese Restaurant,Gym,Ramen Restaurant


### Cluster 5

In [157]:
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 4, down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Downtown Toronto,4,Café,Grocery Store,Playground,Coffee Shop,Candy Store,Italian Restaurant,Athletics & Sports,Baby Store,Food & Drink Shop,Fish Market


This cluster mainly has stores so we could name it 'Shopping'

In [163]:
down_tor_merged.loc[(down_tor_merged['Cluster Labels']) == 4,'Cluster Labels']= 'Shopping'
down_tor_merged.loc[down_tor_merged['Cluster Labels'] == 'Shopping', down_tor_merged.columns[[1] + list(range(5, down_tor_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Downtown Toronto,Shopping,Café,Grocery Store,Playground,Coffee Shop,Candy Store,Italian Restaurant,Athletics & Sports,Baby Store,Food & Drink Shop,Fish Market
