# Segmenting and Clustering Neighborhoods in Toronto
---
### Applied Data Science Capstone project  - Week3
#### Carlos Sepúlveda
<h5> Part 3

In [2]:
!pip install geocoder
!conda install -c conda-forge folium=0.5.0 --yes
# Libraries needed
import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np
import geocoder
from geopy.geocoders import Nominatim
import folium
print('Libraries loaded!!!')

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 8.0MB/s ta 0:00:011
[?25hCollecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-for

<h6> Scrapping the web page

In [3]:
# webPage url
url = 'https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=942851379'
webPage = requests.get(url)

In [4]:
# Get soup with lxml parser
soup = BeautifulSoup(webPage.content, 'lxml')

In [5]:
#find Table
table = soup.find_all('table')[0]
#table

<h6> Beautifull soup to Pandas DF

In [6]:
# DataFrame columns name
df_col_names = ['Postalcode', 'Borough', 'Neighborhood']
df = pd.DataFrame(columns = df_col_names)
#df

In [7]:
# Perform seacrh of all data needed in the table
for tr_marker in table.find_all('tr'):
    row_data=[]
    for td_marker in tr_marker.find_all('td'):
        row_data.append(td_marker.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data

df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


<h5> Cleaning data

Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [8]:
# we ignore cells with a borough that is Not Assigned
df_clean = df[df['Borough']!= 'Not assigned']
df_clean.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [9]:
# If a cell has a borough but a Not Assigned neighborhood, 
# then the neighborhood will be the same as borough
df_clean.loc[df_clean['Neighborhood']=='Not assigned','Neighborhood']=df_clean['Borough']
df_clean.head(20)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
9,M9A,Etobicoke,Islington Avenue
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table

In [10]:
# rows with the same postalcode will be combined into one row with the neighborhoods
# separated with comma

df_grouped = df_clean.groupby(['Postalcode','Borough'], sort = False).agg(', '.join)
df_grouped.reset_index(inplace = True)
df_grouped.head(30)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


in the last cell of your notebook use the **.shape** method to print the numbers of rows of th dataframe

In [11]:
df_grouped.shape

(103, 3)

In [12]:
#Function from the code provided on instructions
def get_coords(postal_code):

    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords

#get_coords('M1H')

In [13]:
#postal_codes = df_grouped['Postalcode']
#postal_codes_to_coords = [get_coords(postal_code) for postal_code in postal_codes.tolist()]

# Due unreliable of the package will be use the alternative method

In [14]:
df_postal_codes_coords = pd.read_csv('http://cocl.us/Geospatial_data')
df_postal_codes_coords.rename(columns={'Postal Code':'Postalcode'}, inplace = True)
df_postal_codes_coords.head()

Unnamed: 0,Postalcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
df_postal_geo = pd.merge(df_grouped, df_postal_codes_coords, on='Postalcode')
df_postal_geo.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


In [16]:
# Get de coordinates of Toronto
geolocator = Nominatim(user_agent='My_notebook')
location = None
addr = 'Toronto, Ontario Canada'

# loop until you get the coordinates
while(location is None):
    location = geolocator.geocode('Toronto, Ontario Canada')

lat = location.latitude
lon = location.longitude

print('Toronto is locates on Latitude: {}, Longitude: {}'.format(lat, lon))


Toronto is locates on Latitude: 43.6534817, Longitude: -79.3839347


<h5> Toronto neighborhoods map creation

In [17]:
# create map of New York using latitude and longitude values
toronto_map = folium.Map(location = [lat, lon], zoom_start=11)

# add markers
for lat, long, borough, neighborhood in zip(df_postal_geo['Latitude'], df_postal_geo['Longitude'],
                                          df_postal_geo['Borough'],df_postal_geo['Neighborhood']):
    label = '{},{}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
    [lat,long],
    radius = 4,
    popup = label,
    color = 'blue',
    fill = True,
    fill_color = '#87cefa',
    fill_opacity = 0.5,
    parse_html = False).add_to(toronto_map)
    

In [18]:
toronto_map

For sake of simplicity, the target analysis will be on those  boroughs whose names contains "Toronto"

In [19]:
# we create a DF with boroughs that hold Toronto in its name, 
# after that we sort by borough's name and made an index reset
df_boroughs_contain_toronto = df_postal_geo[df_postal_geo['Borough'].str.contains('Toronto')].reset_index(drop = True)
df_boroughs_contain_toronto.sort_values(by=['Borough'], inplace = True)
df_boroughs_contain_toronto.reset_index(drop = True, inplace = True)
print(df_boroughs_contain_toronto.shape)
df_boroughs_contain_toronto.head()


(39, 5)


Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M5N,Central Toronto,Roselawn,43.711695,-79.416936
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307
3,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
4,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678


In [20]:
# We change the borough's name in those with the same value we aggregate the postalcode as suffix
borough_as_list = df_boroughs_contain_toronto['Borough'].tolist()
current_name = 'x'
for i, borough_name in enumerate(borough_as_list):
    if i==0:
        current_name = borough_name
    else:
        if df_boroughs_contain_toronto.loc[i].at['Borough'] == current_name:
            suffix = df_boroughs_contain_toronto.loc[i].at['Postalcode']
            df_boroughs_contain_toronto.at[i,'Borough'] = current_name + "-" +suffix
        else:
            current_name = df_boroughs_contain_toronto.loc[i].at['Borough']

df_boroughs_contain_toronto

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M5N,Central Toronto,Roselawn,43.711695,-79.416936
1,M4P,Central Toronto-M4P,Davisville North,43.712751,-79.390197
2,M5P,Central Toronto-M5P,"Forest Hill North, Forest Hill West",43.696948,-79.411307
3,M4R,Central Toronto-M4R,North Toronto West,43.715383,-79.405678
4,M5R,Central Toronto-M5R,"The Annex, North Midtown, Yorkville",43.67271,-79.405678
5,M4S,Central Toronto-M4S,Davisville,43.704324,-79.38879
6,M4T,Central Toronto-M4T,"Moore Park, Summerhill East",43.689574,-79.38316
7,M4V,Central Toronto-M4V,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049
8,M4N,Central Toronto-M4N,Lawrence Park,43.72802,-79.38879
9,M5T,Downtown Toronto,"Chinatown, Grange Park, Kensington Market",43.653206,-79.400049


In [21]:
# create map of New York using latitude and longitude values
toronto_map2 = folium.Map(location = [lat, lon], zoom_start=12)

# add markers
for lat, long, borough, neighborhood in zip(df_boroughs_contain_toronto['Latitude'], df_boroughs_contain_toronto['Longitude'],
                                          df_boroughs_contain_toronto['Borough'],df_boroughs_contain_toronto['Neighborhood']):
    label = '{},{}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
    [lat,long],
    radius = 4,
    popup = label,
    color = 'blue',
    fill = True,
    fill_color = '#87cefa',
    fill_opacity = 0.5,
    parse_html = False).add_to(toronto_map2)

In [22]:
toronto_map2

In [23]:
CLIENT_ID = 'PJ1XE1YHTFANS2Z3OT3PYMMEG2BBKOXNXXISOLGZ1VYH0SWQ'
SECRET_KEY = 'EQQC5KNXLTD4HJ5BRXOUB5XW22LHYJVIBVC5U3UGEAEPR5E4'
VERSION = '20180605'
LIMIT = 150

<h5>First create a function to get all the venues for each Neighborhood

In [24]:
# code from Segmenting and Clustering Neighborhoods in New Yor City notebook
def getNearbyBorough(names, latitudes, longitudes, radius = 900, limit = LIMIT):
    
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        #create the API request url
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            SECRET_KEY, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        #make GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        #return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat, 
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Borough', 
                                 'Borough Latitude', 
                                 'Borough Longitude', 
                                 'Venue', 
                                 'Venue Latitude', 
                                 'Venue Longitude', 
                                 'Venue Category']
        
    return(nearby_venues)

        

In [63]:
borough_with_Toronto_names_venues = getNearbyBorough(names=df_boroughs_contain_toronto['Borough'],
                                                    latitudes=df_boroughs_contain_toronto['Latitude'],
                                                    longitudes=df_boroughs_contain_toronto['Longitude'],
                                                    radius = 800 )


Central Toronto
Central Toronto-M4P
Central Toronto-M5P
Central Toronto-M4R
Central Toronto-M5R
Central Toronto-M4S
Central Toronto-M4T
Central Toronto-M4V
Central Toronto-M4N
Downtown Toronto
Downtown Toronto-M4Y
Downtown Toronto-M5S
Downtown Toronto-M5V
Downtown Toronto-M4W
Downtown Toronto-M5W
Downtown Toronto-M4X
Downtown Toronto-M5X
Downtown Toronto-M5A
Downtown Toronto-M5L
Downtown Toronto-M7A
Downtown Toronto-M5B
Downtown Toronto-M5K
Downtown Toronto-M5C
Downtown Toronto-M5E
Downtown Toronto-M5G
Downtown Toronto-M5J
Downtown Toronto-M6G
Downtown Toronto-M5H
East Toronto
East Toronto-M7Y
East Toronto-M4K
East Toronto-M4L
East Toronto-M4M
West Toronto
West Toronto-M6R
West Toronto-M6J
West Toronto-M6P
West Toronto-M6K
West Toronto-M6S


Let's check the resulting dataframe

In [64]:
print(borough_with_Toronto_names_venues.shape)
borough_with_Toronto_names_venues.head()

(2818, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central Toronto,43.711695,-79.416936,Ceiling Champions,43.713891,-79.420702,Home Service
1,Central Toronto,43.711695,-79.416936,Rosalind's Garden Oasis,43.712189,-79.411978,Garden
2,Central Toronto,43.711695,-79.416936,Lytton Park,43.714954,-79.41197,Playground
3,Central Toronto,43.711695,-79.416936,Anti Aging Clinic - Toronto,43.715772,-79.412294,Spa
4,Central Toronto,43.711695,-79.416936,Groomingdale's,43.716548,-79.422242,Pet Store


Let's check how many venues were returnes for each borough

In [28]:
borough_with_Toronto_names_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,20,20,20,20,20,20
Central Toronto-M4N,9,9,9,9,9,9
Central Toronto-M4P,100,100,100,100,100,100
Central Toronto-M4R,42,42,42,42,42,42
Central Toronto-M4S,100,100,100,100,100,100
Central Toronto-M4T,62,62,62,62,62,62
Central Toronto-M4V,77,77,77,77,77,77
Central Toronto-M5P,47,47,47,47,47,47
Central Toronto-M5R,100,100,100,100,100,100
Downtown Toronto,100,100,100,100,100,100


Let's find out how many unique categories can be curated from all the returned venues

In [65]:
print('There are {} uniques categories.'.format(len(borough_with_Toronto_names_venues['Venue Category'].unique())))

There are 287 uniques categories.


<h4> Analyze Each Borough

In [66]:
# one hot encoding
toronto_borough_onehot = pd.get_dummies(borough_with_Toronto_names_venues[['Venue Category']], prefix ="", prefix_sep="")

# add borough column back to dataframe
toronto_borough_onehot['Borough'] = borough_with_Toronto_names_venues['Borough']

#Move de borough column to the first column
fixed_colums = [toronto_borough_onehot.columns[-1]] + list(toronto_borough_onehot.columns[:-1])
toronto_borough_onehot = toronto_borough_onehot[fixed_colums]

toronto_borough_onehot.head(20)

Unnamed: 0,Borough,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Central Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Central Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Central Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Central Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Central Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Central Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Central Toronto-M4P,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Central Toronto-M4P,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Central Toronto-M4P,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Central Toronto-M4P,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Next, let's groups rows by borough and by taking the mean of the frequency of occurence of each category

In [67]:
toronto_group_by_borough = toronto_borough_onehot.groupby('Borough').mean().reset_index()
print(toronto_group_by_borough.shape)
toronto_group_by_borough

(39, 288)


Unnamed: 0,Borough,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Central Toronto-M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Toronto-M4P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0
3,Central Toronto-M4R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258
4,Central Toronto-M4S,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.02439
5,Central Toronto-M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Toronto-M4V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,...,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625
7,Central Toronto-M5P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Central Toronto-M5R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,...,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Downtown Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.06,0.0,0.03,0.0,0.01,0.0,0.0,0.02


Let's print each neighborhood along with the top 10 most common venues

In [68]:
num_top_venues = 10

for borough in toronto_group_by_borough['Borough']:
    print("----"+borough+"----")
    temp = toronto_group_by_borough[toronto_group_by_borough['Borough'] == borough].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
                     venue  freq
0               Playground  0.33
1                   Garden  0.17
2             Home Service  0.17
3                      Spa  0.17
4                Pet Store  0.17
5                   Museum  0.00
6  New American Restaurant  0.00
7             Neighborhood  0.00
8               Nail Salon  0.00
9              Music Venue  0.00


----Central Toronto-M4N----
                  venue  freq
0             Bookstore  0.12
1           Coffee Shop  0.12
2                Lawyer  0.12
3                  Café  0.12
4                  Park  0.12
5              Bus Line  0.12
6            Restaurant  0.12
7  Gym / Fitness Center  0.12
8          Music School  0.00
9                Museum  0.00


----Central Toronto-M4P----
                venue  freq
0  Italian Restaurant  0.08
1         Coffee Shop  0.08
2         Pizza Place  0.08
3                Café  0.06
4        Dessert Shop  0.04
5                Park  0.04
6            Pharmacy  0.04

Let's put that into a pandas dataframe

In [69]:
# First, let's write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    return row_categories_sorted.index.values[0:num_top_venues]


Now let's create the new dataframe and display the top 10 venues for each borough

In [74]:
indicators = ['st','nd','rd']

# create collumns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue '.format(ind+1))

# create a new dataframe
borough_venues_sorted = pd.DataFrame(columns = columns)
borough_venues_sorted['Borough'] = toronto_group_by_borough['Borough']

for ind in np.arange(toronto_group_by_borough.shape[0]):
    borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_group_by_borough.iloc[ind,:], num_top_venues)

borough_venues_sorted.head(10)


Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Playground,Home Service,Spa,Pet Store,Garden,Comic Shop,Concert Hall,Falafel Restaurant,Comedy Club,Event Space
1,Central Toronto-M4N,Lawyer,Bookstore,Café,Gym / Fitness Center,Coffee Shop,Park,Restaurant,Bus Line,Yoga Studio,Doner Restaurant
2,Central Toronto-M4P,Pizza Place,Italian Restaurant,Coffee Shop,Café,Gym,Dessert Shop,Pharmacy,Sushi Restaurant,Park,Food & Drink Shop
3,Central Toronto-M4R,Coffee Shop,Skating Rink,Italian Restaurant,Café,Diner,Yoga Studio,Spa,Salon / Barbershop,Restaurant,Rental Car Location
4,Central Toronto-M4S,Italian Restaurant,Coffee Shop,Sushi Restaurant,Café,Dessert Shop,Sandwich Place,Gym,Pizza Place,Restaurant,Bar
5,Central Toronto-M4T,Park,Grocery Store,Playground,Thai Restaurant,Candy Store,Sandwich Place,Café,Japanese Restaurant,Gym,Dumpling Restaurant
6,Central Toronto-M4V,Coffee Shop,Sushi Restaurant,Thai Restaurant,Italian Restaurant,Pub,Grocery Store,Sandwich Place,Restaurant,Pizza Place,Bank
7,Central Toronto-M5P,Italian Restaurant,Park,Coffee Shop,Gastropub,Café,Dry Cleaner,Bank,Bakery,Bagel Shop,Japanese Restaurant
8,Central Toronto-M5R,Café,Coffee Shop,Pub,Italian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Pizza Place,History Museum,Sandwich Place,Burger Joint
9,Downtown Toronto,Café,Bar,Vegetarian / Vegan Restaurant,Coffee Shop,Vietnamese Restaurant,Mexican Restaurant,Dessert Shop,Art Gallery,Yoga Studio,Dumpling Restaurant


##### Cluster Boroughs
---
    
We run k-mean to cluster boroughs,
We will use the same numbers of cluster that in the New York example. This is due the analized area is small and it's possible to find a lot of 

In [75]:
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
kclusters = 5
toronto_grouped_clustering = toronto_group_by_borough.drop('Borough',1)

#run k-menas clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 4).fit(toronto_grouped_clustering)

#check cluster labels generate for each row in dataframe
kmeans.labels_[0:10]

array([1, 0, 3, 3, 3, 4, 3, 0, 3, 3], dtype=int32)

let's create a new dataframe that includes the cluster as well as the top venues for each borough

In [76]:
borough_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = df_boroughs_contain_toronto

# merge toronto_grouped with toronto
toronto_merged = toronto_merged.join(borough_venues_sorted.set_index('Borough'), on = 'Borough')
toronto_merged.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5N,Central Toronto,Roselawn,43.711695,-79.416936,1,Playground,Home Service,Spa,Pet Store,Garden,Comic Shop,Concert Hall,Falafel Restaurant,Comedy Club,Event Space
1,M4P,Central Toronto-M4P,Davisville North,43.712751,-79.390197,3,Pizza Place,Italian Restaurant,Coffee Shop,Café,Gym,Dessert Shop,Pharmacy,Sushi Restaurant,Park,Food & Drink Shop
2,M5P,Central Toronto-M5P,"Forest Hill North, Forest Hill West",43.696948,-79.411307,0,Italian Restaurant,Park,Coffee Shop,Gastropub,Café,Dry Cleaner,Bank,Bakery,Bagel Shop,Japanese Restaurant
3,M4R,Central Toronto-M4R,North Toronto West,43.715383,-79.405678,3,Coffee Shop,Skating Rink,Italian Restaurant,Café,Diner,Yoga Studio,Spa,Salon / Barbershop,Restaurant,Rental Car Location
4,M5R,Central Toronto-M5R,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,3,Café,Coffee Shop,Pub,Italian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Pizza Place,History Museum,Sandwich Place,Burger Joint


Visualize the resulting cluster

In [78]:
# create map
map_clusters = folium.Map(location = [lat, lon], zoom_start = 12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add makers to the map
makers_clor = []
for la, lg, poi, cluster in zip(toronto_merged['Latitude'],toronto_merged['Longitude'], 
                                toronto_merged['Borough'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
    [la, lg],
    radius = 4,
    popup = label,
    color = rainbow[cluster-1],
    fill = True,
    fill_color = rainbow[cluster-1],
    fill_opacity = 0.7).add_to(map_clusters)

map_clusters
    


#### Examine Clusters
---
Now,we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

In [62]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto-M5R,0,Café,Coffee Shop,Gym,Restaurant,Pub,Vegetarian / Vegan Restaurant,Italian Restaurant,Grocery Store,Museum,Bakery
9,Downtown Toronto,0,Café,Bar,Vegetarian / Vegan Restaurant,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Art Gallery,Beer Bar,Bakery,Pizza Place
11,Downtown Toronto-M5S,0,Café,Bar,Coffee Shop,Restaurant,Vegetarian / Vegan Restaurant,Bakery,Mexican Restaurant,Bookstore,Grocery Store,Pub
26,Downtown Toronto-M6G,0,Korean Restaurant,Café,Coffee Shop,Grocery Store,Cocktail Bar,Ice Cream Shop,Mexican Restaurant,Park,Comedy Club,Karaoke Bar
31,East Toronto-M4L,0,Indian Restaurant,Coffee Shop,Grocery Store,Park,Beach,Café,Restaurant,Sandwich Place,Brewery,Gym
32,East Toronto-M4M,0,Coffee Shop,Bar,Café,Diner,Brewery,Vietnamese Restaurant,American Restaurant,Bakery,French Restaurant,Italian Restaurant
33,West Toronto,0,Park,Café,Coffee Shop,Bar,Sushi Restaurant,Pharmacy,Brewery,Art Gallery,Portuguese Restaurant,Italian Restaurant
34,West Toronto-M6R,0,Café,Coffee Shop,Pizza Place,Sushi Restaurant,Bar,Bakery,Sandwich Place,Restaurant,Pub,Breakfast Spot
35,West Toronto-M6J,0,Café,Bar,Coffee Shop,Bakery,Restaurant,Asian Restaurant,Pizza Place,Italian Restaurant,Cocktail Bar,Vietnamese Restaurant
36,West Toronto-M6P,0,Café,Bar,Coffee Shop,Convenience Store,Park,Thai Restaurant,Italian Restaurant,Sushi Restaurant,Nail Salon,Antique Shop


In [53]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,1,Sushi Restaurant,Coffee Shop,Italian Restaurant,Bank,Pharmacy,Skating Rink,Bakery,Bagel Shop,Gastropub,Japanese Restaurant


In [54]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Central Toronto-M4P,2,Coffee Shop,Italian Restaurant,Dessert Shop,Café,Sushi Restaurant,Pizza Place,Pharmacy,Gym,Yoga Studio,Sandwich Place
3,Central Toronto-M4R,2,Skating Rink,Italian Restaurant,Coffee Shop,Diner,Café,Park,Mexican Restaurant,Food & Drink Shop,Tea Room,Chinese Restaurant
4,Central Toronto-M5R,2,Café,Coffee Shop,Gym,Restaurant,Pub,Vegetarian / Vegan Restaurant,Italian Restaurant,Grocery Store,Museum,Bakery
5,Central Toronto-M4S,2,Coffee Shop,Italian Restaurant,Sushi Restaurant,Gym,Dessert Shop,Restaurant,Pub,Pizza Place,Café,Middle Eastern Restaurant
6,Central Toronto-M4T,2,Grocery Store,Italian Restaurant,Coffee Shop,Pub,Park,Thai Restaurant,Gym,Café,Bank,Sandwich Place
7,Central Toronto-M4V,2,Coffee Shop,Park,Sushi Restaurant,Italian Restaurant,Thai Restaurant,Grocery Store,Pub,Café,Spa,Bank
10,Downtown Toronto-M4Y,2,Coffee Shop,Japanese Restaurant,Park,Gay Bar,Italian Restaurant,Men's Store,Restaurant,Ramen Restaurant,Burger Joint,Café
12,Downtown Toronto-M5V,2,Coffee Shop,Harbor / Marina,Café,Garden,Scenic Lookout,Park,Track,Dog Run,Dance Studio,Sculpture Garden
13,Downtown Toronto-M4W,2,Coffee Shop,Grocery Store,Park,Playground,Athletics & Sports,Filipino Restaurant,Candy Store,Breakfast Spot,Bistro,Office
15,Downtown Toronto-M4X,2,Park,Japanese Restaurant,Coffee Shop,Gastropub,Café,Diner,Filipino Restaurant,Caribbean Restaurant,Jewelry Store,Taiwanese Restaurant


In [55]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Central Toronto-M5P,3,Park,Café,Bank,Coffee Shop,Skating Rink,Bakery,Sushi Restaurant,Trail,Burger Joint,Italian Restaurant
33,West Toronto,3,Park,Café,Coffee Shop,Bar,Sushi Restaurant,Pharmacy,Brewery,Art Gallery,Portuguese Restaurant,Italian Restaurant
36,West Toronto-M6P,3,Café,Bar,Coffee Shop,Convenience Store,Park,Thai Restaurant,Italian Restaurant,Sushi Restaurant,Nail Salon,Antique Shop


In [79]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Central Toronto-M4T,4,Park,Grocery Store,Playground,Thai Restaurant,Candy Store,Sandwich Place,Café,Japanese Restaurant,Gym,Dumpling Restaurant
