## Segmenting and Clustering Neighborhoods in Toronto, Canada

#### Note: This notebook is for the assignment of Week 3 of Class 9 _Applied Data Science Capstone_

## Introduction

In this notebook, I will use web scraping to get information and do segmenting and clustering neighborhoods in Toronto, which is similar to what we did for the Neighborhoods in New York City.

## Table of Contents

### 1. Download and Prepare Dataset  
  #### 1.1. Import packages/libraries 
  #### 1.2. Download data and prepare the pandas dataframe  
  #### 1.3. Data Cleaning 
  #### 1.4. Add the coordinates of each neighborhood to the dataframe   
### 2. Explore and cluster the neighborhoods in Toronto.   
  #### 2.1. Explore Neighborhoods in Toronto  
  #### 2.2. Analyze Each Neighborhood  
  #### 2.3. Cluster Neighborhoods  
  #### 2.4. Examine Clusters with my observation and conclusions  

## 1. Download and Prepare Dataset

### 1.1 Import packages/libraries

In [2]:
# Use BeautifulSoup 4 for Web scraping and Requests
from bs4 import BeautifulSoup
import requests
import csv

#####  Download all other dependencies that we will need.

In [3]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

In [5]:
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

Collecting package metadata: ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\cheny\Anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.6.14               |           py37_0         2.1 MB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    geopy-1.19.0               |             py_0          53 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.2 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.49-py_0
  geopy              conda-forge/noarch::geopy-1.19.0-py_0

The following packages will be UPDATED:

  conda                                       4.6.12-py37_0 --> 4.6.14-py37_0




'pA' is not recognized as an internal or external command,
operable program or batch file.


Libraries imported.


In [6]:
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### 1.2 Download data and prepare the pandas dataframe
    Here, we will try several different ways to scrape the table and prepare the dataframe.

### Method 1:  Using pandas for web scraping tables:  
            This is the most convenivent way to scrape tables from webpages

In [145]:
# https://www.youtube.com/watch?v=sAuGH1Kto2I 
websource='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
# use pd.read_html to get all the tables in the html page and save it in a list -- dfs
dfs = pd.read_html(websource, header=0)  
# the 1st table of dfs is the table we would like to use.
df = dfs[0]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [132]:
df.shape

(288, 3)

The dataframe obtained above, df, will be cleaned in the next section 1.3 Data Cleaning.

### Method 2:  Using BeautifulSoup 4 for web scraping tables:

In [13]:
# Use BeautifulSoup 4 for Web scraping and Requests
from bs4 import BeautifulSoup
import requests
import csv

In [15]:
# for the wikipedia webpage of week3_class9: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

# https://beautiful-soup-4.readthedocs.io/en/latest/#making-the-soup
# This table summarizes the advantages and disadvantages of each parser library.

# let's creat a soup object soup to get the table info:

soup = BeautifulSoup(source, 'lxml')  # 'lxml' or 'html5', 'html5lib','html.parser','lxml-xml'
## Note:  None of the above parser library can read the "thead". "thead = soup.table.thead" return as None.
##        They only recogonize "tbody" and omit "thead" !!!!

# print(soup.prettify())  # it works

# find the table info; # print(table1.prettify()) : it works.
table = soup.find('table')

In [103]:
# with open('cms_scrape.csv', 'w') as csv_file:   # this way does not need csv_file.close() in the end.
csv_file = open('cms_scrape.csv', 'w')  # Do not forget csv_file.close() in the end!!! 

# use csv.writer
csv_writer = csv.writer(csv_file)

In [104]:
column_names=[]

for tcol in table.find_all('th'):
        ## use .rstrip("\n\r") to remove any possible "\n" from the strings.
        tcolumns=tcol.text.rstrip("\n\r")
        #tcolumns=tcol.text
        column_names.append(tcolumns)

print(column_names)

csv_writer.writerow(column_names)    # need to wait for the csv_file.close() in the next cell to update its content.


['Postcode', 'Borough', 'Neighbourhood']
[['Postcode', 'Borough', 'Neighbourhood']]


In [105]:
df_table=[]

table_rows = table.tbody.find_all('tr')
for tr in table_rows:
    td = tr.find_all('td')
    
    ## use .rstrip("\n\r") to remove any possible "\n" from the strings.!!!!
    row = [i.text.rstrip("\n\r") for i in td] 
    ## row = [i.text for i in td]
    
    ## print(type(row))  : <class 'list'>
    ## print(row)
    df_table.append(row)        ### Should not write df=df.append(...)!!!
    csv_writer.writerow(row)

##print(df_table)

csv_file.close()

#### Now, we have a csv file for the table; let's convert the csv file to a dataframe.

In [77]:
df_soup=pd.read_csv('cms_scrape.csv', header=0)
df_soup.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [117]:
df_soup.shape

(288, 3)

#### Convert the df_table list to a dataframe

In [114]:
df_table=pd.DataFrame(data=df_table, columns = column_names)

In [116]:
df_table.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,,,
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


#### Summary: 
All three dataframes obtained above, i.e., "df", "df_soup", and "df_table", contain the same raw data.

### 1.3 Data Cleaning
    In the 1.2 section, we used different methods and obtained three dataframes, i.e., "df", "df_soup", and "df_table", all of which contain the same raw data.
    In this section, we will only use "df" for data cleaning and analysis in the rest of the notebooks.

In [245]:
# First, make a cope of df. We will work on df1 and keep df as the raw data. 
df1=df[:]     
df1.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


__We will follow the requirements in the assignment:__
1. The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood.
2. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
3. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
4. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
5. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
6. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

__1.Replace the column name "Postcode" with "PostalCode".__

In [246]:
df1.rename(columns={'Postcode' : 'PostalCode'},inplace = True)
df1.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


__2. Delete the cells with a borough being "Not assigned".__

In [247]:
for i in range(df1.shape[0]):
    if df1.loc[i, 'Borough'] == 'Not assigned':
        #print(f'i={i}', df1.loc[i, 'Borough'])
        df1.drop([i], axis=0, inplace=True)

# reset_index after dropping rows
df1.reset_index(drop=True, inplace=True)

df1.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


In [248]:
df1.shape

(211, 3)

In [249]:
len(df1['PostalCode'].value_counts())   # 103 unique PostalCode in total : Name: PostalCode, Length: 103, dtype: int64

103

__3. Combine the neighborhoods that exist in one postal code area.__

In [250]:
for i in range(df1.shape[0]-1):
    if df1.loc[i, 'PostalCode'] == df1.loc[i+1, 'PostalCode']:
        df1.loc[i+1, 'Neighbourhood'] = df1.loc[i, 'Neighbourhood'] + ', ' + df1.loc[i+1, 'Neighbourhood']
        df1.drop([i], axis=0, inplace=True)

# reset_index after dropping rows
df1.reset_index(drop=True, inplace=True)

df1.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Not assigned


In [242]:
df1.shape

(103, 3)

__4. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.__
(Note: I feel it will be better to do this step before the above step "3. Combine the neighborhoods that exist in one postal code area." However, luckily, it makes no difference for our talbe here.)

In [349]:
for i in range(df1.shape[0]):        
    if df1.loc[i, 'Neighbourhood'] == 'Not assigned':
        print(f'i={i}', df1.loc[i, 'Borough'], df1.loc[i, 'Neighbourhood'])
        df1.loc[i, 'Neighbourhood'] = df1.loc[i, 'Borough']
        print(f'i={i}', 'Neighbourhood: ', df1.loc[i, 'Neighbourhood'])

df1.head()

i=4 Queen's Park Not assigned
i=4 Neighbourhood:  Queen's Park


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


__5. Use the .shape method to print the number of rows of your dataframe.__

In [253]:
print('The number of rows of the dataframe df1 is: ', df1.shape[0])

The number of rows of the dataframe df1 is:  103


### This marks the end of the code for Part 1 of Week #3's assignment, which is "4. Submit a link to your Notebook on your Github repository. (10 marks)"

----

### 1.4 Add the coordinates of each neighborhood to the dataframe

__Note: Here, in in 1.4.(a), we use the 'Geospatial_Coordinates.csv' file  to get the coordinate info. 
We also tried Geocode in 1.4.(b) with different providers and found only several of them work, but return slightly different coordinate values than the 'Geospatial_Coordinates.csv' file. Please ignore 1.4.(b).__

In [381]:
# make a copy of df1 and work on df2 
df2=df1.copy(deep=True)
df2.shape

(103, 3)

__1.4(a) Use the 'Geospatial_Coordinates.csv' file provided in the assignment.__

In [382]:
geocsv=pd.read_csv('Geospatial_Coordinates.csv', header=0)
#geocsv['Latitude'].astype('float64')
#geocsv['Longitude'].astype('float64')
geocsv.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [383]:
geocsv.loc[0,'Postal Code']

'M1B'

In [384]:
geocsv.shape

(103, 3)

In [385]:
df_con=[]
for i in range(df2.shape[0]):
    for j in range(geocsv.shape[0]):
        if df2.loc[i, 'PostalCode'] == geocsv.loc[j, 'Postal Code']:
            lat = geocsv.loc[j, 'Latitude']
            long= geocsv.loc[j, 'Longitude']
            row = [lat, long]
            df_con.append(row)
            
df_con=pd.DataFrame(data=df_con, columns=['Latitude','Longitude'])

df_con.head()

Unnamed: 0,Latitude,Longitude
0,43.753259,-79.329656
1,43.725882,-79.315572
2,43.65426,-79.360636
3,43.718518,-79.464763
4,43.662301,-79.389494


In [386]:
# use .concat to combine df1 and df_con to form a new df named as df3. 
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
df3=pd.concat([df1, df_con], axis=1)
df3.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


In [387]:
df3.shape

(103, 5)

### Here, df3 is the datafram with the geographical coordinates of each postal code.

### This marks the end of the code for Part 2 of Week #3's assignment, which is "Use the Geocoder package or the csv file to create the following dataframe. ... Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (2 marks) "

-------

__1.4.(b) Trying Geocode with different providers inclusing Google.__  
          __Please ignore this section.__  
        __But Google did not work and always returned None. Only several providers work, but all return slightly different coordinate values than the 'Geospatial_Coordinates.csv' file.__        
        

First, install geocoder in Anaconda: "pip install geocoder". Then, import geocoder in the notebook

In [264]:
import geocoder # import geocoder

__Notes:__ 
__1. I checked all the providers listed in the webpage https://geocoder.readthedocs.io/index.html__ 
__2. The '.google' does work, b/c it need API keys(?).__ 
__3. In fact, only ".arcgis", ".osm" and ".geocodefarm" work, but they return slightly different coords values than the 'Geospatial_Coordinates.csv' file.__ 
__4. So I will still use the results based on the csv file.__ 

In [323]:
# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates   #  the csv file gives : M1B,43.8066863,-79.1943534    
postal_code = 'M1B'                    # Test with postal_code = 'M1B' 

# geocodeer.geocodefarm, .osm,  .google
# while(lat_lng_coords is None):
        #g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        #lat_lng_coords = g.latlng
g = geocoder.geocodefarm('{}, Toronto, Ontario'.format(postal_code))

#g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#g = geocoder.google('Ottawa, ON')
#g = geocoder.yahoo('Mountain View, CA')
lat_lng_coords = g.latlng
print(lat_lng_coords)
#latitude = lat_lng_coords[0]
#longitude = lat_lng_coords[1]
#print('lat and lon are: ', lat_lng_coords[0], lat_lng_coords[1])

[43.8101539611717, -79.1946029663129]


In [283]:
## We use the .ok attribute on the returned object to check.
##(If geocoder was able to contact the server, but no result could be found for the given search terms, 
## the ok attribute on the returned object will be False.)
g.ok

True

__Note:__ For postal_code = 'M1B', __.arcgis__ returns [43.811525000000074, -79.19551721399995]
__.osm__ returns [43.653963, -79.387207], __.geocodefarm__ returns [43.8101539611717, -79.1946029663129], __wihch are different from__ the info in __the Geospatial_Coordinates.csv__ " M1B,43.8066863,-79.1943534 "

-----

## 2. Explore and cluster the neighborhoods in Toronto.

#### In this section, I work on the last part of Week # 3 assignment. 

From assignment:  
Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

to add enough Markdown cells to explain what you decided to do and to report any observations you make.
to generate maps to visualize your neighborhoods and how they cluster together.
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. __(3 marks)__

Download all the dependencies will be used.

In [393]:
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### 2.1. Explore Neighborhoods in Toronto

In [390]:
print('The dataframe has {} boroughs and {} neighborhoods with unique postal code.'.format(
        len(df3['Borough'].unique()),
        df3.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods with unique postal code.


In [391]:
df3['Borough'].unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [392]:
df3['Borough'].value_counts()

North York          24
Downtown Toronto    18
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East York            5
York                 5
East Toronto         5
Queen's Park         1
Mississauga          1
Name: Borough, dtype: int64

#### Make a copy of df3, and work on the new dataframe -- 'neighborhoods'.

In [397]:
neighborhoods=df3.copy(deep=True)

# rename the column
neighborhoods.rename(columns={'Neighbourhood' : 'Neighborhood'},inplace = True)

#### Use geopy library to get the latitude and longitude values of Toronto.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent Toronto_explorer, as shown below.

In [398]:
address = 'Toronto, Ontario'   # Toronto, Ontario

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [400]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Let's work with only boroughs that contain the word Toronto.
Downtown Toronto    18 ; 
Central Toronto      9 ; 
West Toronto         6 ; 
East Toronto         5 ;
Total = 38

In [458]:
#column_names = ['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']
#toronto_data=pd.DataFrame(columns=column_names)

toronto_data=neighborhoods.copy(deep=True)
for i in range(toronto_data.shape[0]):
    if 'Toronto' not in toronto_data.loc[i, 'Borough']:
        #print(i, toronton_data.loc[i, 'Borough'])
        toronto_data.drop([i], axis=0, inplace=True)
    #else:
        #print(i, toronton_data.loc[i, 'Borough'])

# reset_index after dropping rows
toronto_data.reset_index(drop=True, inplace=True)

toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [459]:
toronto_data['Borough'].value_counts()

Downtown Toronto    18
Central Toronto      9
West Toronto         6
East Toronto         5
Name: Borough, dtype: int64

Let's visualizat the neighborhoods with their borough names containing "toronto" in it.

In [464]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Next, use the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [485]:
# @hidden_cell
CLIENT_ID = '10YRYNZPU4OILBN2FEL54VDCKIK3KFTIGNDAOCK2H031P2WG' # your Foursquare ID
CLIENT_SECRET = 'Z1UJOWIH2EWC0Y3ZONBB1HTEI1AO42AEWFVXGRPOSGVPYJDS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version  '20180605'

#### Let's explore the first neighborhood area in our dataframe, which share the same postal code.

In [486]:
toronto_data.loc[0, 'Neighborhood']

'Harbourfront, Regent Park'

In [487]:
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Harbourfront, Regent Park are 43.6542599, -79.3606359.


Now, let's get the top 100 venues that are in 'Harbourfront, Regent Park' within a radius of 500 meters.

First, let's create the GET request URL. Name your URL url.

In [488]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API                                                            
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=10YRYNZPU4OILBN2FEL54VDCKIK3KFTIGNDAOCK2H031P2WG&client_secret=Z1UJOWIH2EWC0Y3ZONBB1HTEI1AO42AEWFVXGRPOSGVPYJDS&v=20180605&ll=43.6542599,-79.3606359&radius=500&limit=100'

Send the GET request and examine the resutls

In [490]:
results = requests.get(url).json()
#results

From the Foursquare lab in the previous module, we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [491]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [497]:
results_data=results['response']['groups'][0]['items']

In [494]:
nearbyTest = json_normalize(results_data)
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearbyTest =nearbyTest.loc[:, filtered_columns]
nearbyTest.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Roselle Desserts,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",43.653447,-79.362017
1,Tandem Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",43.653559,-79.361809
2,Toronto Cooper Koo Family Cherry St YMCA Centre,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",43.653191,-79.357947
3,Body Blitz Spa East,"[{'id': '4bf58dd8d48988d1ed941735', 'name': 'S...",43.654735,-79.359874
4,Morning Glory Cafe,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",43.653947,-79.361149


In [495]:
nearbyTest['venue.categories'] = nearbyTest.apply(get_category_type, axis=1)
nearbyTest.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Toronto Cooper Koo Family Cherry St YMCA Centre,Gym / Fitness Center,43.653191,-79.357947
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149


In [496]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row !!! # use the function get_category_type  !!!
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1) 

# clean columns (i.e., delete "venue." from the column name)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Toronto Cooper Koo Family Cherry St YMCA Centre,Gym / Fitness Center,43.653191,-79.357947
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149


And how many venues were returned by Foursquare?

In [498]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

46 venues were returned by Foursquare.


###  Explore Neighborhoods in Toronto

#### Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [499]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    # ???? [item for venue_list in venues_list for item in venue_list] ????? 
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list]) # ????
    # nearby_venues = pd.DataFrame(venues_list) <--- this does not work
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *Toronto_venues*.

In [502]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
The Danforth West, Riverdale
Design Exchange, Toronto Dominion Centre
Brockton, Exhibition Place, Parkdale Village
The Beaches West, India Bazaar
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North, Forest Hill West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
Harbord, University of Toronto
Runnymede, Swansea
Moore Park, Summerhill East
Chinatown, Grange Park, Kensington Market
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown, St. James Town
Fir

#### Let's check the size of the resulting dataframe

In [505]:
print(toronto_venues.shape)
toronto_venues.head()

(1700, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbourfront, Regent Park",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Harbourfront, Regent Park",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Harbourfront, Regent Park",43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,"Harbourfront, Regent Park",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Harbourfront, Regent Park",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


Let's check how many venues were returned for each neighborhood

In [506]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,57,57,57,57,57,57
"Brockton, Exhibition Place, Parkdale Village",19,19,19,19,19,19
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",15,15,15,15,15,15
"Cabbagetown, St. James Town",44,44,44,44,44,44
Central Bay Street,88,88,88,88,88,88
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,88,88,88,88,88,88


### Let's find out how many unique categories can be curated from all the returned venues

In [508]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 236 uniques categories.


## 2.3. Analyze Each Neighborhood

In [509]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [510]:
toronto_onehot.shape

(1700, 236)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [512]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.066667,0.066667,0.066667,0.133333,0.2,0.133333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.01,0.0,0.0,0.06,0.0,0.03,0.01,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011364,0.011364,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,...,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.011364


#### Let's confirm the new size

In [513]:
toronto_grouped.shape

(38, 236)

#### Let's print each neighborhood along with the top 5 most common venues

In [515]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2      Thai Restaurant  0.04
3           Steakhouse  0.04
4  American Restaurant  0.04


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.05
2          Restaurant  0.04
3  Seafood Restaurant  0.04
4              Bakery  0.04


----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0  Breakfast Spot  0.11
1            Café  0.11
2     Coffee Shop  0.11
3   Burrito Place  0.05
4             Bar  0.05


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station  0.12
1         Yoga Studio  0.06
2       Garden Center  0.06
3          Comic Shop  0.06
4                Park  0.06


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0   Airport Service  0.20
1    Ai

#### Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [516]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [518]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,Steakhouse,American Restaurant,Bar,Burger Joint,Gym,Hotel,Bakery
1,Berczy Park,Coffee Shop,Cocktail Bar,Café,Cheese Shop,Farmers Market,Beer Bar,Steakhouse,Seafood Restaurant,Restaurant,Bakery
2,"Brockton, Exhibition Place, Parkdale Village",Café,Breakfast Spot,Coffee Shop,Grocery Store,Italian Restaurant,Caribbean Restaurant,Stadium,Bar,Furniture / Home Store,Burrito Place
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Garden,Recording Studio,Auto Workshop,Skate Park,Burrito Place,Fast Food Restaurant,Farmers Market,Brewery,Restaurant
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Sculpture Garden,Boutique,Plane,Boat or Ferry,Airport Gate,Airport


## 2.3. Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [520]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [521]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,1,Coffee Shop,Park,Café,Pub,Bakery,Mexican Restaurant,Breakfast Spot,Theater,Chocolate Shop,Beer Store
1,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,1,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Middle Eastern Restaurant,Tea Room,Pizza Place,Sporting Goods Shop,Japanese Restaurant,Fast Food Restaurant
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Hotel,Restaurant,Café,Cosmetics Shop,Breakfast Spot,Gastropub,Bakery,Italian Restaurant,Diner
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Health Food Store,Pub,Music Venue,Wings Joint,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cocktail Bar,Café,Cheese Shop,Farmers Market,Beer Bar,Steakhouse,Seafood Restaurant,Restaurant,Bakery


Finally, let's visualize the resulting clusters

In [522]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 2.4 Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

### My observation and conclusions:      
   #### 1. If we set k=5 for our K-mean clustering algrithm, it is very interesting to see that the majority of neighborhoods being grouped as Cluster 2. Each of the other 4 clusters has included only one neighborhoods.  
   #### 2. I think there is a lot of room for us to improve our clustering method to get better clustering results. I would like to try to tune our K-mean clustering model and even other mthods in the future.
   #### 3. Nevertheless, based on the current results and observation, I am still trying to assign each cluster a different name to capture their main features, which are shown below. 

### Cluster 1 : I would like to call it as " Outdoor Activity Area".

In [534]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,"Moore Park, Summerhill East",Central Toronto,0,Playground,Trail,Tennis Court,Wings Joint,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop


### Cluster 2 : I would like to call it as "Drinking and Dining Area"

In [535]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + [1]  + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Harbourfront, Regent Park",Downtown Toronto,1,Coffee Shop,Park,Café,Pub,Bakery,Mexican Restaurant,Breakfast Spot,Theater,Chocolate Shop,Beer Store
1,"Ryerson, Garden District",Downtown Toronto,1,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Middle Eastern Restaurant,Tea Room,Pizza Place,Sporting Goods Shop,Japanese Restaurant,Fast Food Restaurant
2,St. James Town,Downtown Toronto,1,Coffee Shop,Hotel,Restaurant,Café,Cosmetics Shop,Breakfast Spot,Gastropub,Bakery,Italian Restaurant,Diner
3,The Beaches,East Toronto,1,Health Food Store,Pub,Music Venue,Wings Joint,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
4,Berczy Park,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Café,Cheese Shop,Farmers Market,Beer Bar,Steakhouse,Seafood Restaurant,Restaurant,Bakery
5,Central Bay Street,Downtown Toronto,1,Coffee Shop,Café,Italian Restaurant,Burger Joint,Chinese Restaurant,Japanese Restaurant,Restaurant,Sandwich Place,Bubble Tea Shop,Bar
6,Christie,Downtown Toronto,1,Grocery Store,Café,Park,Restaurant,Nightclub,Athletics & Sports,Baby Store,Diner,Italian Restaurant,Convenience Store
7,"Adelaide, King, Richmond",Downtown Toronto,1,Coffee Shop,Café,Thai Restaurant,Steakhouse,American Restaurant,Bar,Burger Joint,Gym,Hotel,Bakery
8,"Dovercourt Village, Dufferin",West Toronto,1,Supermarket,Bakery,Pharmacy,Music Venue,Middle Eastern Restaurant,Bank,Bar,Discount Store,Pool,Café
9,"Harbourfront East, Toronto Islands, Union Station",Downtown Toronto,1,Coffee Shop,Hotel,Aquarium,Italian Restaurant,Café,Fried Chicken Joint,Restaurant,Scenic Lookout,Pizza Place,Bakery


### Cluster 3: I would like to call it as "Garden of the City".

In [536]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + [1]  + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Roselawn,Central Toronto,2,Garden,Wings Joint,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


### Cluster 4: I would like to call it as "Tranportation Center".

In [537]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + [1]  + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,"Forest Hill North, Forest Hill West",Central Toronto,3,Bus Line,Trail,Jewelry Store,Sushi Restaurant,Wings Joint,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


### Cluster 5: I would like to call it as "Central Park Area".

In [538]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + [1]  + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Rosedale,Downtown Toronto,4,Park,Playground,Trail,Wings Joint,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


### The end of the notebook. 