# Segmenting and Clustering Neighborhoods in the city of Toronto, Canada #

## Introduction

The aim of this project is to create a code that code identifies neighborhood area segments in Toronto and cluster them according to venues available in vicinity of those neighborhoods 

## Table of Contents

[> Part 1 - Data Scraping](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/d9d1a6c0-8106-49be-b516-04e5dc7b0a28?projectid=277684b3-e1d9-43e7-83dc-aa7c533cbc8b&projectTitle=Capstone%20Project&context=wdp)

[> Part 2 - Geocoding](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/d9d1a6c0-8106-49be-b516-04e5dc7b0a28?projectid=277684b3-e1d9-43e7-83dc-aa7c533cbc8b&projectTitle=Capstone%20Project&context=wdp)

[> Part 3 - Clustering](https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/d9d1a6c0-8106-49be-b516-04e5dc7b0a28?projectid=277684b3-e1d9-43e7-83dc-aa7c533cbc8b&projectTitle=Capstone%20Project&context=wdp)
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
___

# Part 1 - Data Scraping

**Input Data [Wikipedia: List of postal codes of Canada: M](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)**

In [1]:
import pandas as pd
import numpy as np
!pip install folium
import folium as fl
from geopy.geocoders import Nominatim
import requests
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup

Collecting folium
  Downloading https://files.pythonhosted.org/packages/43/77/0287320dc4fd86ae8847bab6c34b5ec370e836a79c7b0c16680a3d9fd770/folium-0.8.3-py2.py3-none-any.whl (87kB)
[K    100% |████████████████████████████████| 92kB 6.9MB/s eta 0:00:01
[?25hRequirement not upgraded as not directly required: numpy in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Requirement not upgraded as not directly required: jinja2 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: requests in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded a

**Download Postal Data Data and create webpage scrape**

In [2]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
xml_page_data = BeautifulSoup(source, 'lxml')

In [3]:
class webpage_scrapp:
       
        def parse_url(self, url):
            response = requests.get(url)
            xml_page_data = BeautifulSoup(response.text, 'lxml')
            return [(self.parse_html_table(table))\
                    for table in xml_page_data.find_all('table', class_="wikitable sortable")]  
    
        def parse_html_table(self, table):
            n_columns = 0
            n_rows=0
            column_names = []
            for row in table.find_all('tr'):
                td_tags = row.find_all('td')
                if len(td_tags) > 0:
                    n_rows+=1
                    if n_columns == 0:
                        n_columns = len(td_tags)
                        
                th_tags = row.find_all('th') 
                if len(th_tags) > 0 and len(column_names) == 0:
                    for th in th_tags:
                        column_names.append(th.get_text())
    
            if len(column_names) > 0 and len(column_names) != n_columns:
                raise Exception("Column titles do not match the number of columns")
    
            columns = column_names if len(column_names) > 0 else range(0,n_columns)
            df = pd.DataFrame(columns = columns,
                              index= range(0,n_rows))
            row_marker = 0
            for row in table.find_all('tr'):
                column_marker = 0
                columns = row.find_all('td')
                for column in columns:
                    df.iat[row_marker,column_marker] = column.get_text()
                    column_marker += 1
                if len(columns) > 0:
                    row_marker += 1
                    
            for col in df:
                try:
                    df[col] = df[col].astype(float)
                except ValueError:
                    pass
            
            return df

In [4]:
table = webpage_scrapp().parse_url('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0] 
table.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned\n
1,M2A,Not assigned,Not assigned\n
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n
5,M5A,Downtown Toronto,Regent Park\n
6,M6A,North York,Lawrence Heights\n
7,M6A,North York,Lawrence Manor\n
8,M7A,Queen's Park,Not assigned\n
9,M8A,Not assigned,Not assigned\n


**Ignore cells with a borough that is Not assigned**


In [5]:
table = table[table.Borough != 'Not assigned']
table.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n
5,M5A,Downtown Toronto,Regent Park\n
6,M6A,North York,Lawrence Heights\n
7,M6A,North York,Lawrence Manor\n
8,M7A,Queen's Park,Not assigned\n
10,M9A,Etobicoke,Islington Avenue\n
11,M1B,Scarborough,Rouge\n
12,M1B,Scarborough,Malvern\n


**Remove \n from the data in table**

In [6]:
table = table.replace('\n',' ', regex=True)
table.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


**Combine neighborhoods belonging to the same postcode**

In [7]:
neighborhood_frame = table.groupby(['Postcode','Borough'])['Neighbourhood\n'].apply(lambda x: ", ".join(x.astype(str))).reset_index()
neighborhood_frame = neighborhood_frame.sample(frac=1).reset_index(drop=True)
neighborhood_frame.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3L,North York,Downsview West
1,M5G,Downtown Toronto,Central Bay Street
2,M1V,Scarborough,"Agincourt North , L'Amoreaux East , Milliken ,..."
3,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North"
4,M8Z,Etobicoke,"Kingsway Park South West , Mimico NW , The Que..."
5,M6P,West Toronto,"High Park , The Junction South"
6,M1L,Scarborough,"Clairlea , Golden Mile , Oakridge"
7,M5L,Downtown Toronto,"Commerce Court , Victoria Hotel"
8,M3A,North York,Parkwoods
9,M4S,Central Toronto,Davisville


**Using Shape method, print dataframe rows**

In [8]:
print(neighborhood_frame.shape)

(103, 3)


# Part 2 - Geocoding

**Input Geospacial Data**

In [9]:
url_geo="http://cocl.us/Geospatial_data"
geo_info=pd.read_csv(url_geo)
geo_info.head(12)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


**Populate Latitide and Longitude based on Toronto City Postal Codes**

In [10]:
print(list(neighborhood_frame))
print(list(geo_info))

full_table = neighborhood_frame.set_index('Postcode').join(geo_info.set_index('Postal Code'))
full_table = full_table.sample(frac=1).reset_index(drop=True)
full_table.head(12)

['Postcode', 'Borough', 'Neighbourhood\n']
['Postal Code', 'Latitude', 'Longitude']


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,East York,"Woodbine Gardens , Parkview Hill",43.706397,-79.309937
1,Downtown Toronto,Christie,43.669542,-79.422564
2,Central Toronto,"Deer Park , Forest Hill SE , Rathnelly , South...",43.686412,-79.400049
3,Central Toronto,North Toronto West,43.715383,-79.405678
4,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North",43.653654,-79.506944
5,North York,Downsview Northwest,43.761631,-79.520999
6,Scarborough,"Clairlea , Golden Mile , Oakridge",43.711112,-79.284577
7,Scarborough,Upper Rouge,43.836125,-79.205636
8,Downtown Toronto,"CN Tower , Bathurst Quay , Island airport , Ha...",43.628947,-79.39442
9,East York,Thorncliffe Park,43.705369,-79.349372


# Part 3 - Clustering

**Inport other libraries that are required for this section** *(no need to complete if you have previously installed)*

In [11]:
# example of what will come up if you have already installed
import pandas as pd
import numpy as np
!pip install folium
import folium as fl
from geopy.geocoders import Nominatim
import requests
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

Requirement not upgraded as not directly required: folium in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages
Requirement not upgraded as not directly required: jinja2 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: numpy in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: requests in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: branca>=0.3.0 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: MarkupSafe>=0.23 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from jinja2->folium)
Requirement not upgraded as not directly required: chardet<3

**Load previous data from task-1**

In [12]:
#I've decided to re-load the data to ensure that everything runs correctly, this is not required
tables = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
df = tables[0][1:]
df.columns = tables[0].iloc[0]
neighborhoods_toronto = df[df['Borough'].str.contains(r'Toronto')]


In [13]:
geolocator = Nominatim(user_agent="ny_explorer")
neighborhoods = pd.DataFrame(columns = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude'])
for hood, borough in zip(neighborhoods_toronto['Neighbourhood'],neighborhoods_toronto['Borough']):
    #print(hood+', Toronto')
    location = geolocator.geocode(hood+', Toronto')  
    if (location is not None): neighborhoods.loc[len(neighborhoods)] = [borough, hood, location.latitude, location.longitude]

**Generate Latitude and Logitude for Toronto**

In [14]:
address = 'Toronto, CA'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


**Creates map of toronto using the latiude and longitude values provided above**

In [16]:
map_toronto = fl.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = fl.Popup(label, parse_html=True)
    fl.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Input Foursquare Venu Data Collection Deatils**

In [17]:
CLIENT_ID = 'VQDAWFP4CYP423S20CFBLH2X2LEIUNAGHOWYOQH3LNNVBJBU'
CLIENT_SECRET = 'KQJTSCUF2VMQIMYBN21KIXPXMGPHI50JPZH4XMFAATUB4WWE'
VERSION = '20190512'

LIMIT = 100 # limit of venues returned by Foursquare API
radius = 500 # defines the radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
     
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

**Now we are going to search for neighbourhoods in Toronto**

In [18]:
toronto_venues = getNearbyVenues(names=neighborhoods['Neighbourhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

In [19]:
print(toronto_venues.head)()

<bound method NDFrame.head of              Neighbourhood  Neighbourhood Latitude  Neighbourhood Longitude  \
0             Harbourfront               43.640080               -79.380150   
1             Harbourfront               43.640080               -79.380150   
2             Harbourfront               43.640080               -79.380150   
3             Harbourfront               43.640080               -79.380150   
4             Harbourfront               43.640080               -79.380150   
5             Harbourfront               43.640080               -79.380150   
6             Harbourfront               43.640080               -79.380150   
7             Harbourfront               43.640080               -79.380150   
8             Harbourfront               43.640080               -79.380150   
9             Harbourfront               43.640080               -79.380150   
10            Harbourfront               43.640080               -79.380150   
11            Harbourf

TypeError: 'NoneType' object is not callable

**To make the results clearer and easier to read, I have run the follwowing**

In [21]:
print(toronto_venues.shape)
toronto_venues.head()

(3636, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.64008,-79.38015,Harbour Square Park,43.639253,-79.378395,Park
1,Harbourfront,43.64008,-79.38015,Lake Ontario,43.638945,-79.379665,Lake
2,Harbourfront,43.64008,-79.38015,Harbourfront,43.639526,-79.380688,Neighborhood
3,Harbourfront,43.64008,-79.38015,Miku,43.641374,-79.377531,Japanese Restaurant
4,Harbourfront,43.64008,-79.38015,BeaverTails,43.639899,-79.380197,Bakery


**Now to analayse the data by each neighbourhood**

In [22]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
print(toronto_onehot.shape)
toronto_onehot.head()

(3636, 301)


Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


**Group rows by neighborhood and take the mean of the frequency of occurrence of each category**

In [23]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(71, 301)


Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
1,Bathurst Quay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brockton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0
4,CN Tower,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0


**Print neighborhoods with the top 15 most common venues**

In [37]:
num_top_venues = 15

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
                  venue  freq
0           Coffee Shop  0.09
1                 Hotel  0.05
2             Gastropub  0.04
3        Cosmetics Shop  0.04
4                  Café  0.04
5   Japanese Restaurant  0.03
6                   Gym  0.03
7        Breakfast Spot  0.03
8   American Restaurant  0.03
9            Restaurant  0.03
10   Italian Restaurant  0.03
11          Salad Place  0.03
12               Bakery  0.02
13        Deli / Bodega  0.02
14         Burger Joint  0.02


----Bathurst Quay----
                   venue  freq
0            Coffee Shop  0.17
1                   Café  0.12
2                   Park  0.08
3                    Pub  0.04
4                 Tunnel  0.04
5                 Garden  0.04
6       Sushi Restaurant  0.04
7   Caribbean Restaurant  0.04
8                   Bank  0.04
9       Sculpture Garden  0.04
10                   Gym  0.04
11       Harbor / Marina  0.04
12                 Diner  0.04
13         Grocery Store  0.04
14   Japanese 

                  venue  freq
0           Coffee Shop  0.09
1                  Café  0.08
2                 Hotel  0.07
3            Restaurant  0.05
4             Gastropub  0.03
5                   Bar  0.03
6            Steakhouse  0.03
7          Burger Joint  0.03
8                Bakery  0.03
9   American Restaurant  0.03
10   Seafood Restaurant  0.03
11        Deli / Bodega  0.03
12     Asian Restaurant  0.03
13          Salad Place  0.02
14       Breakfast Spot  0.02


----Forest Hill North----
                       venue  freq
0                 Playground  0.25
1                       Park  0.25
2   Mediterranean Restaurant  0.25
3                       Bank  0.25
4          Accessories Store  0.00
5             Nightlife Spot  0.00
6               Optical Shop  0.00
7                Opera House  0.00
8                     Office  0.00
9    North Indian Restaurant  0.00
10              Noodle House  0.00
11                 Nightclub  0.00
12      Other Great Outdoors  0.00
13

                  venue  freq
0                Bakery  0.07
1      Sushi Restaurant  0.07
2           Coffee Shop  0.07
3    Italian Restaurant  0.05
4   Japanese Restaurant  0.04
5                  Bank  0.04
6          Burger Joint  0.04
7      Asian Restaurant  0.04
8              Tea Room  0.04
9        Cosmetics Shop  0.04
10       Sandwich Place  0.02
11                  Pub  0.02
12        Metro Station  0.02
13                 Pool  0.02
14        Deli / Bodega  0.02


----Little Portugal----
                 venue  freq
0                  Bar  0.12
1          Coffee Shop  0.09
2                 Café  0.09
3         Cocktail Bar  0.06
4               Bakery  0.06
5           Restaurant  0.06
6        Grocery Store  0.03
7             Pharmacy  0.03
8                 Park  0.03
9       Sandwich Place  0.03
10     Thai Restaurant  0.03
11            Boutique  0.03
12  Athletics & Sports  0.03
13   Korean Restaurant  0.03
14            Dive Bar  0.03


----Moore Park----
         

                 venue  freq
0       Sandwich Place  0.10
1       History Museum  0.10
2          Coffee Shop  0.10
3                 Café  0.07
4          Pizza Place  0.07
5                 Park  0.03
6             Pharmacy  0.03
7          Flower Shop  0.03
8               Museum  0.03
9           Steakhouse  0.03
10  Mexican Restaurant  0.03
11              Castle  0.03
12        Liquor Store  0.03
13           BBQ Joint  0.03
14        Burger Joint  0.03


----South Niagara----
                     venue  freq
0              Pizza Place  0.05
1              Yoga Studio  0.05
2                     Café  0.05
3                   Bakery  0.05
4              Coffee Shop  0.03
5   Furniture / Home Store  0.03
6             Optical Shop  0.03
7             Dessert Shop  0.03
8                    Diner  0.03
9                Bookstore  0.02
10                 Brewery  0.02
11              Shoe Store  0.02
12           Grocery Store  0.02
13               Gastropub  0.02
14               

                      venue  freq
0                      Café  0.18
1       Japanese Restaurant  0.07
2                      Park  0.07
3                 Bookstore  0.07
4         French Restaurant  0.04
5   Comfort Food Restaurant  0.04
6                       Gym  0.04
7               Coffee Shop  0.04
8     College Arts Building  0.04
9               College Gym  0.04
10                   Bakery  0.04
11                   Museum  0.04
12               Restaurant  0.04
13          Bubble Tea Shop  0.04
14         Video Game Store  0.04


----Victoria Hotel----
                   venue  freq
0            Coffee Shop  0.19
1            Pizza Place  0.08
2          Grocery Store  0.08
3                   Café  0.05
4                  Hotel  0.03
5                Library  0.03
6         Breakfast Spot  0.03
7    Filipino Restaurant  0.03
8          Smoothie Shop  0.03
9             Beer Store  0.03
10  Caribbean Restaurant  0.03
11                   Bar  0.03
12                  Bank  0.

**Put list in Pandas Dataframe**

In [53]:
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

**Write a function to sort the venues in descending order**

In [56]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 15

indicators = ['st', 'nd', 'rd']

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Adelaide,Coffee Shop,Hotel,Cosmetics Shop,Café,Gastropub,Restaurant,American Restaurant,Salad Place,Japanese Restaurant,Breakfast Spot,Italian Restaurant,Gym,Thai Restaurant,Seafood Restaurant,Steakhouse
1,Bathurst Quay,Coffee Shop,Café,Park,Pub,Diner,Caribbean Restaurant,Garden,Sculpture Garden,Dance Studio,Ramen Restaurant,Japanese Restaurant,Sushi Restaurant,Bank,Grocery Store,Harbor / Marina
2,Berczy Park,Coffee Shop,Café,Restaurant,Hotel,Bakery,Gastropub,Seafood Restaurant,Beer Bar,Italian Restaurant,Japanese Restaurant,Gym,Cocktail Bar,Creperie,Art Gallery,BBQ Joint
3,Brockton,Bar,Vietnamese Restaurant,Park,Grocery Store,Dive Bar,Korean Restaurant,South American Restaurant,Gastropub,Organic Grocery,French Restaurant,Jazz Club,Pizza Place,Coffee Shop,Portuguese Restaurant,Bakery
4,CN Tower,Coffee Shop,Hotel,Pizza Place,Italian Restaurant,Aquarium,Scenic Lookout,Gym,Concert Hall,Baseball Stadium,Restaurant,History Museum,Bar,Bistro,Fast Food Restaurant,Sports Bar


**Cluster Neighbourhoods**

In [59]:
# set number of clusters
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
kmeans.labels_[0:15]

array([4, 4, 4, 0, 4, 4, 3, 0, 0, 4, 4, 0, 0, 4, 4], dtype=int32)

**Merge data and add Latitude and Longitude**

In [60]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = neighborhoods
toronto_merged =toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
toronto_merged.head() 

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Downtown Toronto,Harbourfront,43.64008,-79.38015,4,Coffee Shop,Café,Hotel,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Sporting Goods Shop,Sports Bar,Brewery,Chinese Restaurant,Fried Chicken Joint,Steakhouse,Park
1,Downtown Toronto,Regent Park,43.660706,-79.360457,4,Coffee Shop,Thai Restaurant,Pub,Electronics Store,Fast Food Restaurant,Beer Store,Food Truck,Sushi Restaurant,Auto Dealership,Restaurant,Animal Shelter,Pharmacy,Indian Restaurant,Pet Store,Performing Arts Venue
2,Downtown Toronto,Ryerson,43.621573,-79.55913,4,Department Store,Coffee Shop,Arts & Crafts Store,Restaurant,Gift Shop,Burger Joint,Furniture / Home Store,Gym,Discount Store,Portuguese Restaurant,Sporting Goods Shop,Breakfast Spot,Pet Store,Women's Store,Electronics Store
3,Downtown Toronto,Garden District,43.656502,-79.377128,4,Coffee Shop,Clothing Store,Restaurant,Cosmetics Shop,Café,Fast Food Restaurant,Middle Eastern Restaurant,Hotel,Sporting Goods Shop,Japanese Restaurant,Spa,Tea Room,Theater,Movie Theater,Bookstore
4,Downtown Toronto,St. James Town,43.669403,-79.372704,4,Coffee Shop,Pizza Place,Grocery Store,Café,Diner,Library,Bar,Bank,Bakery,Restaurant,Beer Store,Market,Food & Drink Shop,Italian Restaurant,Pharmacy


**Create Map**

In [61]:
map_clusters = fl.Map(location=[latitude, longitude], zoom_start=13)
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = fl.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    fl.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Cluster 1**

In [66]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
5,The Beaches,Beach,Coffee Shop,Tea Room,Salon / Barbershop,Park,Breakfast Spot,Thai Restaurant,Bar,Japanese Restaurant,Pub,Diner,Jewelry Store,Supermarket,Bakery,Bank
8,Christie,Korean Restaurant,Coffee Shop,Indian Restaurant,Dessert Shop,Mexican Restaurant,Ice Cream Shop,Sandwich Place,Café,Japanese Restaurant,Grocery Store,Cocktail Bar,Eastern European Restaurant,Karaoke Bar,Paper / Office Supplies Store,Spa
12,Dovercourt Village,Café,Restaurant,Pizza Place,Brazilian Restaurant,Bus Line,Coffee Shop,Park,Bar,Zoo,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Egyptian Restaurant,Eastern European Restaurant
13,Dufferin,Bar,Bakery,Café,Mexican Restaurant,Beer Store,Coffee Shop,Cocktail Bar,Clothing Store,Beer Bar,Sandwich Place,Fast Food Restaurant,Farmers Market,Skating Rink,Sushi Restaurant,Japanese Restaurant
17,Little Portugal,Bar,Coffee Shop,Café,Restaurant,Cocktail Bar,Bakery,Korean Restaurant,Japanese Restaurant,Thai Restaurant,Jazz Club,Athletics & Sports,Portuguese Restaurant,French Restaurant,Dive Bar,Sandwich Place
19,The Danforth West,Indian Restaurant,Coffee Shop,Pharmacy,Grocery Store,Bus Line,Spa,Fish & Chips Shop,Market,Fast Food Restaurant,Metro Station,Mexican Restaurant,Skating Rink,Middle Eastern Restaurant,Fried Chicken Joint,Doctor's Office
20,Riverdale,Vietnamese Restaurant,Chinese Restaurant,Grocery Store,Bakery,Fast Food Restaurant,Light Rail Station,Trail,Bar,Baseball Field,Breakfast Spot,Asian Restaurant,Neighborhood,French Restaurant,Dim Sum Restaurant,Coffee Shop
23,Brockton,Bar,Vietnamese Restaurant,Park,Grocery Store,Dive Bar,Korean Restaurant,South American Restaurant,Gastropub,Organic Grocery,French Restaurant,Jazz Club,Pizza Place,Coffee Shop,Portuguese Restaurant,Bakery
25,Parkdale Village,Café,Diner,Tibetan Restaurant,Pharmacy,Pizza Place,Bar,Indian Restaurant,Chinese Restaurant,Light Rail Station,Tea Room,Thrift / Vintage Store,North Indian Restaurant,Restaurant,Clothing Store,Liquor Store
26,The Beaches West,Beach,Coffee Shop,Tea Room,Salon / Barbershop,Park,Breakfast Spot,Thai Restaurant,Bar,Japanese Restaurant,Pub,Diner,Jewelry Store,Supermarket,Bakery,Bank


**Cluster 2**

In [67]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
34,Forest Hill North,Playground,Mediterranean Restaurant,Park,Bank,Zoo,Egyptian Restaurant,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Dive Bar,Ethiopian Restaurant,Event Space
35,Forest Hill West,Playground,Mediterranean Restaurant,Park,Bank,Zoo,Egyptian Restaurant,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Dive Bar,Ethiopian Restaurant,Event Space
55,Forest Hill SE,Playground,Mediterranean Restaurant,Park,Bank,Zoo,Egyptian Restaurant,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Dive Bar,Ethiopian Restaurant,Event Space


**Cluster 3**

In [68]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
48,Swansea,Park,Skating Rink,Dance Studio,Social Club,Zoo,Dumpling Restaurant,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant,Discount Store,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant
66,Rosedale,Park,Bike Trail,Playground,Egyptian Restaurant,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Discount Store,Ethiopian Restaurant,Event Space,Exhibit


**Cluster 4**

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Central Bay Street,Campground,Castle,Zoo,Discount Store,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


**Cluster 5**

In [69]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Harbourfront,Coffee Shop,Café,Hotel,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Sporting Goods Shop,Sports Bar,Brewery,Chinese Restaurant,Fried Chicken Joint,Steakhouse,Park
1,Regent Park,Coffee Shop,Thai Restaurant,Pub,Electronics Store,Fast Food Restaurant,Beer Store,Food Truck,Sushi Restaurant,Auto Dealership,Restaurant,Animal Shelter,Pharmacy,Indian Restaurant,Pet Store,Performing Arts Venue
2,Ryerson,Department Store,Coffee Shop,Arts & Crafts Store,Restaurant,Gift Shop,Burger Joint,Furniture / Home Store,Gym,Discount Store,Portuguese Restaurant,Sporting Goods Shop,Breakfast Spot,Pet Store,Women's Store,Electronics Store
3,Garden District,Coffee Shop,Clothing Store,Restaurant,Cosmetics Shop,Café,Fast Food Restaurant,Middle Eastern Restaurant,Hotel,Sporting Goods Shop,Japanese Restaurant,Spa,Tea Room,Theater,Movie Theater,Bookstore
4,St. James Town,Coffee Shop,Pizza Place,Grocery Store,Café,Diner,Library,Bar,Bank,Bakery,Restaurant,Beer Store,Market,Food & Drink Shop,Italian Restaurant,Pharmacy
6,Berczy Park,Coffee Shop,Café,Restaurant,Hotel,Bakery,Gastropub,Seafood Restaurant,Beer Bar,Italian Restaurant,Japanese Restaurant,Gym,Cocktail Bar,Creperie,Art Gallery,BBQ Joint
9,Adelaide,Coffee Shop,Hotel,Cosmetics Shop,Café,Gastropub,Restaurant,American Restaurant,Salad Place,Japanese Restaurant,Breakfast Spot,Italian Restaurant,Gym,Thai Restaurant,Seafood Restaurant,Steakhouse
10,King,Coffee Shop,Restaurant,Café,Hotel,Italian Restaurant,Gastropub,American Restaurant,Japanese Restaurant,Gym,Deli / Bodega,Seafood Restaurant,Steakhouse,Bakery,Beer Bar,Breakfast Spot
11,Richmond,Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Pizza Place,American Restaurant,Hotel,Pub,Bakery,Burrito Place,Japanese Restaurant,Theater,Arts & Crafts Store,Asian Restaurant
14,Harbourfront East,Coffee Shop,Café,Hotel,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Sporting Goods Shop,Sports Bar,Brewery,Chinese Restaurant,Fried Chicken Joint,Steakhouse,Park
