<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Introduction
For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

Start by creating a new Notebook for this assignment.
Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">To create the dataframe</a>

2. <a href="#item2">To obtain latitude and longitude</a>

3. <a href="#item3">To explore and visualize neighborhood</a>
  
</font>
</div>

In [16]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 1. To create the dataframe

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

Submit a link to your Notebook on your Github repository. (10 marks)

Note: There are different website scraping libraries and packages in Python. For scraping the above table, you can simply use pandas to read the table into a pandas dataframe.

Another way, which would help to learn for more complicated cases of web scraping is using the BeautifulSoup package. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

The package is so popular that there is a plethora of tutorials and examples on how to use it. Here is a very good Youtube video on how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k

Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe.

#### Download data from the link, assign to the dataframe toronto_df:

In [17]:
import pandas as pd
import io
import requests
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
c=pd.read_html(url)
toronto_df = c[0]
print('Data downloaded!')
print(toronto_df.head())
print(toronto_df.shape)

Data downloaded!
  Postal code           Borough                Neighborhood
0         M1A      Not assigned                         NaN
1         M2A      Not assigned                         NaN
2         M3A        North York                   Parkwoods
3         M4A        North York            Victoria Village
4         M5A  Downtown Toronto  Regent Park / Harbourfront
(180, 3)


#### Drop rows with unassigned Borough:

In [18]:
toronto_df=toronto_df[toronto_df['Borough']!='Not assigned']
print("After dropping not assigned values")
toronto_df.reset_index(drop=True, inplace=True)
print(toronto_df.head())
print(toronto_df.shape)

After dropping not assigned values
  Postal code           Borough                                  Neighborhood
0         M3A        North York                                     Parkwoods
1         M4A        North York                              Victoria Village
2         M5A  Downtown Toronto                    Regent Park / Harbourfront
3         M6A        North York             Lawrence Manor / Lawrence Heights
4         M7A  Downtown Toronto  Queen's Park / Ontario Provincial Government
(103, 3)


#### Take care of the rows that have a borough but a Not assigned neighborhood

In [19]:
ind=(toronto_df['Neighborhood']=='Not assigned')
print(ind.sum())
toronto_df.loc[ind, 'Neighborhood']=toronto_df.loc[ind, 'Borough']
print("After assigning neighborhood")
print(toronto_df.head())
print(toronto_df.shape)

0
After assigning neighborhood
  Postal code           Borough                                  Neighborhood
0         M3A        North York                                     Parkwoods
1         M4A        North York                              Victoria Village
2         M5A  Downtown Toronto                    Regent Park / Harbourfront
3         M6A        North York             Lawrence Manor / Lawrence Heights
4         M7A  Downtown Toronto  Queen's Park / Ontario Provincial Government
(103, 3)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


#### Check if any duplicate postal code:

In [20]:
print('There are {} uniques categories.'.format(len(toronto_df['Postal code'].unique())))

There are 103 uniques categories.


#### No duplicate postal code, so no need to handle, change the separation from / to , :

In [21]:
toronto_df['Neighborhood'] = [x.replace(' /', ',') for x in toronto_df['Neighborhood']]
toronto_df.rename(columns={'Postal code':'PostalCode'}, inplace=True)
#toronto_df = toronto_df.sort_values(by=['PostalCode','Borough'])
print(toronto_df.head())
print(toronto_df.shape)

  PostalCode           Borough                                 Neighborhood
0        M3A        North York                                    Parkwoods
1        M4A        North York                             Victoria Village
2        M5A  Downtown Toronto                    Regent Park, Harbourfront
3        M6A        North York             Lawrence Manor, Lawrence Heights
4        M7A  Downtown Toronto  Queen's Park, Ontario Provincial Government
(103, 3)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


<a id='item2'></a>

## 2. To obtain latitude and longitude

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Use the Geocoder package or the csv file to create the following dataframe

Important Note: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (2 marks)


In [22]:
!pip install geocoder



#### To retrieve latitude and longitude

In [24]:
#Try out the geocoder with a single example
import geocoder
entry=toronto_df.iloc[0, :]
locat=None
while locat==None:
    g = geocoder.arcgis('{}, {}, {}'.format(entry['PostalCode'], entry['Neighborhood'], entry['Borough']))
    locat=g.latlng
locat

[43.75293455500008, -79.33564142299997]

In [25]:
toronto_df['Latitude']=None
toronto_df['Longitude']=None
for i in range(toronto_df.shape[0]):
    entry=toronto_df.iloc[i, :]
    locat=None
    while locat==None:
        g = geocoder.arcgis('{}, {}, {}'.format(entry['PostalCode'], entry['Neighborhood'], entry['Borough']))
        locat=g.latlng
    entry['Latitude']=locat[0]
    entry['Longitude']=locat[1]
    toronto_df.iloc[i, :]=entry
toronto_df.shape
toronto_df.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-vie

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7529,-79.3356
1,M4A,North York,Victoria Village,43.7281,-79.3119
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.651,-79.353
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7236,-79.4371
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6618,-79.3894


<a id='item3'></a>

## 3. To explore and visualize neighborhood

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

to add enough Markdown cells to explain what you decided to do and to report any observations you make.
to generate maps to visualize your neighborhoods and how they cluster together.
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

#### To obtain the latitude and longitude of Toronto

In [26]:
address = 'Toronto'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Keep those contain Toronto in Borough, also, due to the unstable package, some of the returned latitude and longitude are not correct, so the csv file is used in this section.

In [123]:
latlon_df=pd.read_csv('Geospatial_Coordinates.csv')
latlon_df.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
toronto_df.loc[toronto_df.PostalCode.isin(latlon_df.PostalCode), ['Latitude', 'Longitude']] = latlon_df[['Latitude', 'Longitude']]
toronto_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.8067,-79.1944
1,M4A,North York,Victoria Village,43.7845,-79.1605
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.7636,-79.1887
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.771,-79.2169
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.7731,-79.2395
5,M9A,Etobicoke,Islington Avenue,43.7447,-79.2395
6,M1B,Scarborough,"Malvern, Rouge",43.7279,-79.262
7,M3B,North York,Don Mills,43.7111,-79.2846
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.7163,-79.2395
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6927,-79.2648


In [124]:
df=toronto_df[toronto_df['Borough'].str.contains('Toronto')].reset_index(drop=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.7636,-79.1887
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.7731,-79.2395
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6927,-79.2648
3,M5C,Downtown Toronto,St. James Town,43.7995,-79.3184
4,M4E,East Toronto,The Beaches,43.7869,-79.386


#### Create map of Toronto using latitude and longitude values with the neighborhood popup

In [125]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Define Foursquare Credentials and Version

In [32]:
CLIENT_ID = 'OV32LUAEQG2RHIS4WBUEIRTR0BPOQQDYANRHECEKBJMBKDL4' # your Foursquare ID
CLIENT_SECRET = 'LPOAE4QCO02XWITIXU1DTWENZOECEKYMOU0EPNFIOO3IJTKJ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OV32LUAEQG2RHIS4WBUEIRTR0BPOQQDYANRHECEKBJMBKDL4
CLIENT_SECRET:LPOAE4QCO02XWITIXU1DTWENZOECEKYMOU0EPNFIOO3IJTKJ


####  Retrieve venues information in all the neighborhoods in Toronto

In [126]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

LIMIT = 100
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )
toronto_venues.head()

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town,

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
1,"Regent Park, Harbourfront",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
2,"Regent Park, Harbourfront",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
3,"Regent Park, Harbourfront",43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
4,"Regent Park, Harbourfront",43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center


#### Explore the data

In [130]:
print(toronto_venues.shape)
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))
toronto_venues.groupby('Neighborhood').count()

(758, 7)
There are 200 uniques categories.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,2,2,2,2,2,2
"Brockton, Parkdale Village, Exhibition Place",40,40,40,40,40,40
Business reply mail Processing CentrE,4,4,4,4,4,4
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17
Central Bay Street,7,7,7,7,7,7
Christie,3,3,3,3,3,3
Church and Wellesley,7,7,7,7,7,7
"Commerce Court, Victoria Hotel",1,1,1,1,1,1
Davisville,3,3,3,3,3,3
Davisville North,57,57,57,57,57,57


#### Analyze the data

In [131]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
toronto_onehot.head()

Unnamed: 0,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Stadium,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Butcher,Cafeteria,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Rec Center,College Stadium,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Health Food Store,Hockey Arena,Home Service,Hookah Bar,Hospital,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Martial Arts Dojo,Medical Center,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Social Club,Soup Place,Spa,Sporting Goods Shop,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tailor Shop,Tanning Salon,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Group by category and take the mean to find the frequency of each venue

In [50]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
type(toronto_grouped)
print(toronto_grouped.shape)
toronto_grouped.head()

(36, 212)


Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Basketball Stadium,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Stop,Butcher,Cable Car,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Health Food Store,Historic Site,History Museum,Hobby Shop,Hot Dog Joint,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Latin American Restaurant,Library,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soup Place,South American Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Steakhouse,Street Art,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.04,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.03,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.02,0.04,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078125,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.046875,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.03125,0.0,0.015625,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.046875,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.015625,0.015625,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.03125
3,Central Bay Street,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.027027,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.013514,0.162162,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040541,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.027027,0.0,0.013514,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.013514,0.013514,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.013514,0.0
4,Christie,0.0,0.04,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0


#### Print each neighborhood along with the top 5 most common venues

In [132]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    #print(temp.head())
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                 venue  freq
0                 Café  0.06
1          Coffee Shop  0.05
2           Restaurant  0.05
3                Hotel  0.05
4  American Restaurant  0.04


----Brockton, Parkdale Village, Exhibition Place----
                 venue  freq
0          Coffee Shop  0.07
1                Hotel  0.05
2  Japanese Restaurant  0.04
3                 Café  0.04
4           Restaurant  0.03


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                venue  freq
0         Coffee Shop  0.08
1  Italian Restaurant  0.06
2                 Gym  0.05
3          Restaurant  0.05
4         Yoga Studio  0.03


----Central Bay Street----
            venue  freq
0     Coffee Shop  0.16
1  Clothing Store  0.08
2           Hotel  0.04
3  Cosmetics Shop  0.03
4            Café  0.03


----Christie----
                 venue  freq
0                 Café  0.06
1          Coffee Shop  0.05
2           R

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [133]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [134]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Café,Coffee Shop,Restaurant,Hotel,American Restaurant,Gym,Salad Place,Breakfast Spot,Asian Restaurant,Steakhouse
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Hotel,Café,Japanese Restaurant,Restaurant,American Restaurant,Asian Restaurant,Thai Restaurant,Sushi Restaurant,Bookstore
2,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Italian Restaurant,Restaurant,Gym,Yoga Studio,Beer Bar,Hotel,Bar,French Restaurant,Pizza Place
3,Central Bay Street,Coffee Shop,Clothing Store,Hotel,Theater,Sandwich Place,Café,Restaurant,Plaza,Diner,Bubble Tea Shop
4,Christie,Café,Coffee Shop,Restaurant,Hotel,American Restaurant,Gym,Salad Place,Breakfast Spot,Asian Restaurant,Steakhouse


<a id='item4'></a>

#### Run *k*-means to cluster the neighborhood into 5 clusters.

In [138]:
from sklearn.model_selection import GridSearchCV
parameters = {'n_clusters':range(3, 10)}
lr = KMeans()
kmeans = GridSearchCV(lr, parameters)
kmeans.fit(toronto_grouped_clustering)
print(kmeans.predict(toronto_grouped_clustering))
labels=kmeans.predict(toronto_grouped_clustering)



[6 6 6 2 6 6 1 6 6 6 6 6 8 6 6 6 2 6 4 3 6 2 6 6 7 0 6 2 6 6 5 6 0 6 6 6]


In [139]:
# add clustering labels
neighborhoods_venues_sorted['Cluster Labels']=labels

In [140]:
toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.7636,-79.1887,Pub,Athletics & Sports,Café,Performing Arts Venue,Music Venue,Seafood Restaurant,Mexican Restaurant,Chocolate Shop,Distribution Center,Food Truck,6.0
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.7731,-79.2395,Coffee Shop,Sushi Restaurant,Café,Middle Eastern Restaurant,Italian Restaurant,Juice Bar,Spa,Burger Joint,Burrito Place,Sandwich Place,2.0
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6927,-79.2648,Fast Food Restaurant,Pizza Place,Bank,Café,Pharmacy,Breakfast Spot,Rock Climbing Spot,Intersection,Gastropub,Pet Store,6.0
3,M5C,Downtown Toronto,St. James Town,43.7995,-79.3184,Coffee Shop,Grocery Store,Pizza Place,Diner,Metro Station,Filipino Restaurant,Market,Bus Stop,Sandwich Place,Library,2.0
4,M4E,East Toronto,The Beaches,43.7869,-79.386,Health Food Store,Pub,Trail,Park,Church,Discount Store,Farmers Market,Farm,Falafel Restaurant,Event Space,0.0


In [141]:
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.7636,-79.1887,Pub,Athletics & Sports,Café,Performing Arts Venue,Music Venue,Seafood Restaurant,Mexican Restaurant,Chocolate Shop,Distribution Center,Food Truck,6.0
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.7731,-79.2395,Coffee Shop,Sushi Restaurant,Café,Middle Eastern Restaurant,Italian Restaurant,Juice Bar,Spa,Burger Joint,Burrito Place,Sandwich Place,2.0
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6927,-79.2648,Fast Food Restaurant,Pizza Place,Bank,Café,Pharmacy,Breakfast Spot,Rock Climbing Spot,Intersection,Gastropub,Pet Store,6.0
3,M5C,Downtown Toronto,St. James Town,43.7995,-79.3184,Coffee Shop,Grocery Store,Pizza Place,Diner,Metro Station,Filipino Restaurant,Market,Bus Stop,Sandwich Place,Library,2.0
4,M4E,East Toronto,The Beaches,43.7869,-79.386,Health Food Store,Pub,Trail,Park,Church,Discount Store,Farmers Market,Farm,Falafel Restaurant,Event Space,0.0
5,M5E,Downtown Toronto,Berczy Park,43.7575,-79.3747,Café,Coffee Shop,Restaurant,Hotel,American Restaurant,Gym,Salad Place,Breakfast Spot,Asian Restaurant,Steakhouse,6.0
6,M5G,Downtown Toronto,Central Bay Street,43.7827,-79.4423,Coffee Shop,Clothing Store,Hotel,Theater,Sandwich Place,Café,Restaurant,Plaza,Diner,Bubble Tea Shop,2.0
7,M6G,Downtown Toronto,Christie,43.7533,-79.3297,Café,Coffee Shop,Restaurant,Hotel,American Restaurant,Gym,Salad Place,Breakfast Spot,Asian Restaurant,Steakhouse,6.0
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.7375,-79.4648,Coffee Shop,Café,Restaurant,Clothing Store,Thai Restaurant,Salad Place,Gym,Hotel,Sushi Restaurant,Deli / Bodega,6.0
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.739,-79.5069,Café,Bar,Bakery,Restaurant,Sushi Restaurant,Coffee Shop,Pub,Comedy Club,Grocery Store,Market,6.0


#### Some entries are with NAN values because of no venue information available, drop those rows.

In [142]:
toronto_merged.dropna(inplace=True)
toronto_merged.reset_index(drop=True)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.7636,-79.1887,Pub,Athletics & Sports,Café,Performing Arts Venue,Music Venue,Seafood Restaurant,Mexican Restaurant,Chocolate Shop,Distribution Center,Food Truck,6.0
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.7731,-79.2395,Coffee Shop,Sushi Restaurant,Café,Middle Eastern Restaurant,Italian Restaurant,Juice Bar,Spa,Burger Joint,Burrito Place,Sandwich Place,2.0
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6927,-79.2648,Fast Food Restaurant,Pizza Place,Bank,Café,Pharmacy,Breakfast Spot,Rock Climbing Spot,Intersection,Gastropub,Pet Store,6.0
3,M5C,Downtown Toronto,St. James Town,43.7995,-79.3184,Coffee Shop,Grocery Store,Pizza Place,Diner,Metro Station,Filipino Restaurant,Market,Bus Stop,Sandwich Place,Library,2.0
4,M4E,East Toronto,The Beaches,43.7869,-79.386,Health Food Store,Pub,Trail,Park,Church,Discount Store,Farmers Market,Farm,Falafel Restaurant,Event Space,0.0
5,M5E,Downtown Toronto,Berczy Park,43.7575,-79.3747,Café,Coffee Shop,Restaurant,Hotel,American Restaurant,Gym,Salad Place,Breakfast Spot,Asian Restaurant,Steakhouse,6.0
6,M5G,Downtown Toronto,Central Bay Street,43.7827,-79.4423,Coffee Shop,Clothing Store,Hotel,Theater,Sandwich Place,Café,Restaurant,Plaza,Diner,Bubble Tea Shop,2.0
7,M6G,Downtown Toronto,Christie,43.7533,-79.3297,Café,Coffee Shop,Restaurant,Hotel,American Restaurant,Gym,Salad Place,Breakfast Spot,Asian Restaurant,Steakhouse,6.0
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.7375,-79.4648,Coffee Shop,Café,Restaurant,Clothing Store,Thai Restaurant,Salad Place,Gym,Hotel,Sushi Restaurant,Deli / Bodega,6.0
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.739,-79.5069,Café,Bar,Bakery,Restaurant,Sushi Restaurant,Coffee Shop,Pub,Comedy Club,Grocery Store,Market,6.0


#### Finally, let's visualize the resulting clusters

In [143]:
kclusters=len(np.unique(labels))
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
#print(ys)
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
#print(kclusters)
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    #print(cluster)
    #print(poi)
    #print(lat)
    #print(lon)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<hr>

Copyright &copy; 2018 [Cognitive Class](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).