<a href="https://colab.research.google.com/github/DataGF/Coursera_Capstone/blob/main/aplied_data_science_capstone_semana3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Index
- **[Final Assignment Week 3 - Part 1: From Web Scraping to Dataframe](#part1)**
- **[Final Assignment Week 3 - Part 2: Getting the latitude and the longitude coordinates of each neighborhood](#part2)**
- **[Final Assignment Week 3 - Part 3: Explore and cluster the neighborhoods in Toronto](#part3)**

<a name='part1'></a>
# Final Assignment Week 3 - Part 1: From Web Scraping to Dataframe

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Start by creating a new Notebook for this assignment.

2. [Hints for scraping Notebook](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/NewLinkWebscrapingHints.md).

![](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1619049600000&hmac=4qi5igt0I-A867o6XGrDKqiBt9R1U5DZP8gg6i03oHA)

3. To create the above dataframe:
  - The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood.
  - Only process the cells that have an assigned borough. Ignore cells with a borough that is **Not assigned**.
  - More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that **M5A** is listed twice and has two neighborhoods: **Harbourfront** and **Regent Park**. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in **row 11**  in the above table.
  - If a cell has a borough but a **Not assigned**  neighborhood, then the neighborhood will be the same as the borough.
  - Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
  - In the last cell of your notebook, use the **.shape** method to print the number of rows of your dataframe.

4. Submit a link to your Notebook on your Github repository.(**10 marks**)

**Note**: There are different website scraping libraries and packages in Python. For scraping the above table, you can simply use pandas  to read the table into a pandas dataframe.

Another way, which would help to learn for more complicated cases of web scraping is using the BeautifulSoup package. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe.



In [None]:
# import libraries and modules

import requests # library to handle requests
import pandas as pd # library for data analysis
import folium # library for map plotting
import numpy as np # library to handle data in a vectorized manner 

from bs4 import BeautifulSoup # module to webscrape data
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
from sklearn.cluster import KMeans # module to make Kmeans clusterization
from matplotlib import cm # module to color maps
from matplotlib import colors # module of the named colors supported in matplotlib


In [None]:
# request data from wikipedia

html_toronto_postal_codes = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').content
html_toronto_postal_codes


b'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>List of postal codes of Canada: M - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"1cfdf9d4-a60a-4503-892c-6ac37177795f","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":1013111980,"wgRevisionId":1013111980,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Wikipedia semi-prote

In [None]:
# uses the html parser of the BeautifulSoup module and prints the site structure

soup = BeautifulSoup(html_toronto_postal_codes, 'html.parser')
print(soup.prettify())


<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"1cfdf9d4-a60a-4503-892c-6ac37177795f","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":1013111980,"wgRevisionId":1013111980,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Wikipedia

In [None]:
# find table and print its structure

table = soup.find('table')
print(table.prettify())


<table cellpadding="2" cellspacing="0" rules="all" style="width:100%; border-collapse:collapse; border:1px solid #ccc;">
 <tbody>
  <tr>
   <td style="width:11%; vertical-align:top; color:#ccc;">
    <p>
     <b>
      M1A
     </b>
     <br/>
     <span style="font-size:85%;">
      <i>
       Not assigned
      </i>
     </span>
    </p>
   </td>
   <td style="width:11%; vertical-align:top; color:#ccc;">
    <p>
     <b>
      M2A
     </b>
     <br/>
     <span style="font-size:85%;">
      <i>
       Not assigned
      </i>
     </span>
    </p>
   </td>
   <td style="width:11%; vertical-align:top;">
    <p>
     <b>
      M3A
     </b>
     <br/>
     <span style="font-size:85%;">
      <a href="/wiki/North_York" title="North York">
       North York
      </a>
      <br/>
      (
      <a href="/wiki/Parkwoods" title="Parkwoods">
       Parkwoods
      </a>
      )
     </span>
    </p>
   </td>
   <td style="width:11%; vertical-align:top;">
    <p>
     <b>
      M4A
     </b>
 

In [None]:
# ignore 'Not assigned' values and search for first 3 values to get postal code

for row in table.findAll('td'):
    if row.span.text == 'Not assigned':
        pass
    else:
        print(row.p.text[:3])


M3A
M4A
M5A
M6A
M7A
M9A
M1B
M3B
M4B
M5B
M6B
M9B
M1C
M3C
M4C
M5C
M6C
M9C
M1E
M4E
M5E
M6E
M1G
M4G
M5G
M6G
M1H
M2H
M3H
M4H
M5H
M6H
M1J
M2J
M3J
M4J
M5J
M6J
M1K
M2K
M3K
M4K
M5K
M6K
M1L
M2L
M3L
M4L
M5L
M6L
M9L
M1M
M2M
M3M
M4M
M5M
M6M
M9M
M1N
M2N
M3N
M4N
M5N
M6N
M9N
M1P
M2P
M4P
M5P
M6P
M9P
M1R
M2R
M4R
M5R
M6R
M7R
M9R
M1S
M4S
M5S
M6S
M1T
M4T
M5T
M1V
M4V
M5V
M8V
M9V
M1W
M4W
M5W
M8W
M9W
M1X
M4X
M5X
M8X
M4Y
M7Y
M8Y
M8Z


In [None]:
# ignore 'Not assigned' values, and treat data to get bourough

for row in table.findAll('td'):
    if row.span.text == 'Not assigned':
        pass
    else:
        print((row.p.text).split('(')[0][3:])


North York
North York
Downtown Toronto
North York
Queen's Park
Etobicoke
Scarborough
North York
East York
Downtown Toronto
North York
Etobicoke
Scarborough
North York
East York
Downtown Toronto
York
Etobicoke
Scarborough
East Toronto
Downtown Toronto
York
Scarborough
East York
Downtown Toronto
Downtown Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East YorkEast Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
North York
North York
Scarborough
North York
North York
East Toronto
North York
York
North York
Scarborough
North York
North York
Central Toronto
Central Toronto
York
York
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Etobicoke
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
MississaugaCanada Post Gateway Processing Centre
Etobicoke
Scarboroug

In [None]:
# ignore 'Not assigned' values, and treat data to get neighborhood

for row in table.findAll('td'):
    if row.span.text == 'Not assigned':
        pass
    else:
        print((((((row.span.text).split('(')[1]).strip(')')).replace('/', ',')).replace(')', ' ')).strip(' '))

Parkwoods
Victoria Village
Regent Park , Harbourfront
Lawrence Manor , Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern , Rouge
Don Mills North
Parkview Hill , Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park , Princess Gardens , Martin Grove , Islington , Cloverdale
Rouge Hill , Port Union , Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate , Bloordale Gardens , Old Burnhamthorpe , Markland Wood
Guildwood , Morningside , West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor , Wilson Heights , Downsview North
Thorncliffe Park
Richmond , Adelaide , King
Dufferin , Dovercourt Village
Scarborough Village
Fairview , Henry Farm , Oriole
Northwood Park , York University
The Danforth  East
Harbourfront East , Union Station , Toronto Islands
Little Portugal , Trinity
Kennedy Park , Ionview , East Birchmount Park
Bayview Village

In [None]:
# combine the above loops to get postal code, bourough and neighborhood extract data function

def extract_data(table):
    table_data_list = []
    for row in table.findAll('td'):
        table_data_dict = {}
        if row.span.text=='Not assigned':
            pass            
        else:
            table_data_dict['PostalCode'] = row.p.text[:3]
            table_data_dict['Borough'] = (row.span.text).split('(')[0]
            table_data_dict['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
            table_data_list.append(table_data_dict)
    return table_data_list
     

In [None]:
# extracts the required data (postal code, borough, neighborhood) using the function defined in the previous step

toronto_pbn = extract_data(table)
toronto_pbn


[{'Borough': 'North York', 'Neighborhood': 'Parkwoods', 'PostalCode': 'M3A'},
 {'Borough': 'North York',
  'Neighborhood': 'Victoria Village',
  'PostalCode': 'M4A'},
 {'Borough': 'Downtown Toronto',
  'Neighborhood': 'Regent Park, Harbourfront',
  'PostalCode': 'M5A'},
 {'Borough': 'North York',
  'Neighborhood': 'Lawrence Manor, Lawrence Heights',
  'PostalCode': 'M6A'},
 {'Borough': "Queen's Park",
  'Neighborhood': 'Ontario Provincial Government',
  'PostalCode': 'M7A'},
 {'Borough': 'Etobicoke',
  'Neighborhood': 'Islington Avenue',
  'PostalCode': 'M9A'},
 {'Borough': 'Scarborough',
  'Neighborhood': 'Malvern, Rouge',
  'PostalCode': 'M1B'},
 {'Borough': 'North York',
  'Neighborhood': 'Don Mills North',
  'PostalCode': 'M3B'},
 {'Borough': 'East York',
  'Neighborhood': 'Parkview Hill, Woodbine Gardens',
  'PostalCode': 'M4B'},
 {'Borough': 'Downtown Toronto',
  'Neighborhood': 'Garden District, Ryerson',
  'PostalCode': 'M5B'},
 {'Borough': 'North York', 'Neighborhood': 'Glenca

In [None]:
# define and print required dataframe and treat some borough data

df_toronto_pbn = pd.DataFrame(toronto_pbn, columns=['PostalCode', 'Borough', 'Neighborhood'])
df_toronto_pbn['Borough'] = df_toronto_pbn['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                                               'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                                               'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                                               'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df_toronto_pbn.head(12)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [None]:
# Print number of rows and columns using .shape method

print(f'Number of rows:    {df_toronto_pbn.shape[0]}\nNumber of columns: {df_toronto_pbn.shape[1]}')


Number of rows:    103
Number of columns: 3


<a name='part2'></a>
# Final Assignment Week 3 - Part 2: Getting the latitude and the longitude coordinates of each neighborhood

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. 

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking  postal code M5G as an example, your code would look something like this:

```
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]
```

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: 

[GeoSpatial Dataset](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv)

Use the Geocoder package or the csv file to create the following dataframe:

![](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/HZ3jNHNOEeiMwApe4i-fLg_f44f0f10ccfaf42fcbdba9813364e173_Screen-Shot-2018-06-18-at-7.18.16-PM.png?expiry=1619049600000&hmac=8Wye1WRszPTdBsg3fujBfq73dpLuUDBFVGIcikWIBGA)

**Important Note**: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (**2 marks**)

In [None]:
# ! pip install geocoder --> run it if not installed

import geocoder

print('Successfully imported geocoder')

Successfully imported geocoder


In [None]:
# define get_lat_lng function to get latitude and longitude as proposed above

def get_lat_lng(postal_code):

    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    
    return latitude, longitude

get_lat_lng('M4A')


KeyboardInterrupt: ignored

The above function with proposed code had failed, so I had to use the provided geospatial data.

In [None]:
# load provided geospatial data

df_geospatial_data = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')
df_geospatial_data.head()


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [None]:
# rename columns to merge on the next step

df_geospatial_data.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
df_geospatial_data.head()


Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [None]:
# merge above table with previous table

df_toronto_geo_merged = pd.merge(df_geospatial_data, df_toronto_pbn, on='PostalCode')
df_toronto_geo_merged.head()


Unnamed: 0,PostalCode,Latitude,Longitude,Borough,Neighborhood
0,M1B,43.806686,-79.194353,Scarborough,"Malvern, Rouge"
1,M1C,43.784535,-79.160497,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae


In [None]:
# adjust features order

df_toronto_geo_merged_adjusted = df_toronto_geo_merged[['PostalCode','Borough', 'Neighborhood', 'Latitude', 'Longitude']]
df_toronto_geo_merged_adjusted


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437


<a name='part3'></a>
# Final Assignment Week 3 - Part 3: Explore and cluster the neighborhoods in Toronto

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

1. to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
2. to generate maps to visualize your neighborhoods and how they cluster together. 

Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (**3 marks**)

In [None]:

# import random # library for random number generation

# from pandas.io.json import json_normalize # module to transform json file into a pandas dataframe library

In [None]:
# function definition to get Toronto's latitude and longitude

def get_lat_lng_2(local):

    local = local

    geolocator = Nominatim(user_agent='toronto_explorer')
    toronto_coordinates = geolocator.geocode(local)
    latitude = toronto_coordinates.latitude
    longitude = toronto_coordinates.longitude

    return latitude, longitude


In [None]:
# Toronto's latitude and longitude

toronto_latitude = get_lat_lng_2("Toronto, ON")[0]
toronto_longitude = get_lat_lng_2("Toronto, ON")[1]

print(f'Toronto\'s latitude: {toronto_latitude}\nToronto\'s longitude: {toronto_longitude}')


Toronto's latitude: 43.6534817
Toronto's longitude: -79.3839347


In [None]:
# plot Toronto's map

toronto_map = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10)
toronto_map


In [None]:
# plot Toronto's map with neighborhoods

for lat, lng, borough, neighborhood in zip(df_toronto_geo_merged_adjusted['Latitude'],
                                           df_toronto_geo_merged_adjusted['Longitude'],
                                           df_toronto_geo_merged_adjusted['Borough'],
                                           df_toronto_geo_merged_adjusted['Neighborhood']):
    label = '{}; {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)

toronto_map


### We will simplify and work only with boroughs that contains the 'Downtown Toronto' word.

In [None]:
df_downtown_toronto = df_toronto_geo_merged_adjusted[df_toronto_geo_merged_adjusted['Borough'].str.contains('Downtown Toronto')]
df_downtown_toronto.reset_index(drop=True, inplace=True)
df_downtown_toronto


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
5,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
6,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752


In [None]:
# plot Toronto's map with neighborhoods

map_downtown_toronto = folium.Map(location=[43.651070, -79.347015], zoom_start=12)

for lat, lng, borough, neighborhood in zip(df_downtown_toronto['Latitude'],
                                           df_downtown_toronto['Longitude'],
                                           df_downtown_toronto['Borough'],
                                           df_downtown_toronto['Neighborhood']):
    label = '{}; {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)

map_downtown_toronto

### Uses Foursquare API to explore Downtown Toronto neighborhoods

In [None]:
# credentials and parameters to access foursquare

CLIENT_ID = '?????????????????????????????' # your Foursquare ID
CLIENT_SECRET = '?????????????????????????????' # your Foursquare Secret
ACCESS_TOKEN = '?????????????????????????????' # your FourSquare Access Token
VERSION = '20180605'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


In [None]:
# creates a neighborhood list and print it

neigh_list_dt = []

for idx in df_downtown_toronto.index:
    neigh_list_dt.append(df_downtown_toronto['Neighborhood'].loc[idx])
    
print(neigh_list_dt)


['Rosedale', 'St. James Town, Cabbagetown', 'Church and Wellesley', 'Regent Park, Harbourfront', 'Garden District, Ryerson', 'St. James Town', 'Berczy Park', 'Central Bay Street', 'Richmond, Adelaide, King', 'Harbourfront East, Union Station, Toronto Islands', 'Toronto Dominion Centre, Design Exchange', 'Commerce Court, Victoria Hotel', 'University of Toronto, Harbord', 'Kensington Market, Chinatown, Grange Park', 'CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport', 'Enclave of M5E', 'First Canadian Place, Underground city', 'Christie']


### The first neighborhood in Downtown Toronto dataframe is Rosedale, let's explore it within a radius of 500

In [None]:
# creates the request url for use foursquare API

rosedale_latitude = df_downtown_toronto.loc[0, 'Latitude']
resedale_longitude = df_downtown_toronto.loc[0, 'Longitude']
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    rosedale_latitude, 
    resedale_longitude, 
    radius, 
    LIMIT)

url


'https://api.foursquare.com/v2/venues/explore?&client_id=XDLPRXKYJVNWILOQL4TBIH2F00CTNKUQU4FNBRZDLWUQUUWE&client_secret=YYJXQBJD02NBHGZ0SY41WUV5HWORZSHPJLD0YEAC4BZM3XOR&v=20180605&ll=43.6795626,-79.37752940000001&radius=500&limit=100'

In [None]:
# GET request

results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '607f215e1056fb5b1eef5def'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4aff2d47f964a520743522e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/playground_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1e7941735',
         'name': 'Playground',
         'pluralName': 'Playgrounds',
         'primary': True,
         'shortName': 'Playground'}],
       'id': '4aff2d47f964a520743522e3',
       'location': {'address': '38 Scholfield Ave.',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'at Edgar Ave.',
        'distance': 327,
        'formattedAddress': ['38 Scholfield Ave. (at Edgar Ave.)',
         'Toronto ON',
         'Canada'],
        'labeled

In [None]:
# items from results

venues_resedale = results['response']['groups'][0]['items']
venues_resedale


[{'reasons': {'count': 0,
   'items': [{'reasonName': 'globalInteractionReason',
     'summary': 'This spot is popular',
     'type': 'general'}]},
  'referralId': 'e-0-4aff2d47f964a520743522e3-0',
  'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/playground_',
      'suffix': '.png'},
     'id': '4bf58dd8d48988d1e7941735',
     'name': 'Playground',
     'pluralName': 'Playgrounds',
     'primary': True,
     'shortName': 'Playground'}],
   'id': '4aff2d47f964a520743522e3',
   'location': {'address': '38 Scholfield Ave.',
    'cc': 'CA',
    'city': 'Toronto',
    'country': 'Canada',
    'crossStreet': 'at Edgar Ave.',
    'distance': 327,
    'formattedAddress': ['38 Scholfield Ave. (at Edgar Ave.)',
     'Toronto ON',
     'Canada'],
    'labeledLatLngs': [{'label': 'display',
      'lat': 43.68232820227814,
      'lng': -79.37893434347683}],
    'lat': 43.68232820227814,
    'lng': -79.37893434347683,
    'state': 'ON'},
   'name

In [None]:
# JSON to pandas dataframe

venues_downtown_toronto_rosedale = pd.json_normalize(venues_resedale)
venues_downtown_toronto_rosedale


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups
0,e-0-4aff2d47f964a520743522e3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4aff2d47f964a520743522e3,Rosedale Park,38 Scholfield Ave.,at Edgar Ave.,43.682328,-79.378934,"[{'label': 'display', 'lat': 43.68232820227814...",327,CA,Toronto,ON,Canada,"[38 Scholfield Ave. (at Edgar Ave.), Toronto O...","[{'id': '4bf58dd8d48988d1e7941735', 'name': 'P...",0,[]
1,e-0-4bd777aa5cf276b054639b00-1,0,"[{'summary': 'This spot is popular', 'type': '...",4bd777aa5cf276b054639b00,Whitney Park,,,43.682036,-79.373788,"[{'label': 'display', 'lat': 43.68203573063681...",408,CA,,,Canada,[Canada],"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",0,[]
2,e-0-4d0e77df76cc37045715767c-2,0,"[{'summary': 'This spot is popular', 'type': '...",4d0e77df76cc37045715767c,Alex Murray Parkette,107 Crescent Road,South Drive,43.6783,-79.382773,"[{'label': 'display', 'lat': 43.67830024047895...",444,CA,Toronto,ON,Canada,"[107 Crescent Road (South Drive), Toronto ON, ...","[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",0,[]
3,e-0-4ef8f2a3775b54cdb5bdec7c-3,0,"[{'summary': 'This spot is popular', 'type': '...",4ef8f2a3775b54cdb5bdec7c,Milkman's Lane,South Dr,at Glen Rd,43.676352,-79.373842,"[{'label': 'display', 'lat': 43.67635206801555...",464,CA,Toronto,ON,Canada,"[South Dr (at Glen Rd), Toronto ON, Canada]","[{'id': '4bf58dd8d48988d159941735', 'name': 'T...",0,[]


In [None]:
# filter columns

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venues_downtown_toronto_rosedale = venues_downtown_toronto_rosedale.loc[:, filtered_columns]

venues_downtown_toronto_rosedale


Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Rosedale Park,"[{'id': '4bf58dd8d48988d1e7941735', 'name': 'P...",43.682328,-79.378934
1,Whitney Park,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",43.682036,-79.373788
2,Alex Murray Parkette,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",43.6783,-79.382773
3,Milkman's Lane,"[{'id': '4bf58dd8d48988d159941735', 'name': 'T...",43.676352,-79.373842


In [None]:
# define function to extracts the categories of the venues

def extract_categories(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


In [None]:
# filter the category for each row

venues_downtown_toronto_rosedale['venue.categories'] = venues_downtown_toronto_rosedale.apply(extract_categories, axis=1)
venues_downtown_toronto_rosedale


Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Rosedale Park,Playground,43.682328,-79.378934
1,Whitney Park,Park,43.682036,-79.373788
2,Alex Murray Parkette,Park,43.6783,-79.382773
3,Milkman's Lane,Trail,43.676352,-79.373842


In [None]:
# rename columns

venues_downtown_toronto_rosedale.columns = [col.split(".")[-1] for col in venues_downtown_toronto_rosedale.columns]
venues_downtown_toronto_rosedale


Unnamed: 0,name,categories,lat,lng
0,Rosedale Park,Playground,43.682328,-79.378934
1,Whitney Park,Park,43.682036,-79.373788
2,Alex Murray Parkette,Park,43.6783,-79.382773
3,Milkman's Lane,Trail,43.676352,-79.373842


### Now let's explore all neighborhoods in Downtown Toronto

In [None]:
# extract all neighborhods in downtown Toronto data in a radius of 500 meters

def extract_downtown_toronto_neigh_data(name, latitude, longitude, radius=500):
    
    venues_downtown_toronto_list=[]

    for name, lat, lng in zip(name, latitude, longitude):

        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request and take its items
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return name, latitude, longitude
        venues_downtown_toronto_list.append([(
            name, 
            lat, 
            lng, 
            item['venue']['name'], 
            item['venue']['location']['lat'], 
            item['venue']['location']['lng'],  
            item['venue']['categories'][0]['name']) for item in results])

    venues_downtown_toronto = pd.DataFrame([item for row in venues_downtown_toronto_list for item in row])
    venues_downtown_toronto.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(venues_downtown_toronto)


In [None]:
# extract Downtown Toronto's venues

downtown_toronto_venues = extract_downtown_toronto_neigh_data(name=df_downtown_toronto['Neighborhood'],
                                                              latitude=df_downtown_toronto['Latitude'],
                                                              longitude=df_downtown_toronto['Longitude']
                                                              )


Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Enclave of M5E
First Canadian Place, Underground city
Christie


In [None]:
# print 5 first rows of Downtown Toronto's neighborhoods and venues dataframe

downtown_toronto_venues.head()


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
1,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
2,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
3,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
4,"St. James Town, Cabbagetown",43.667967,-79.367675,F'Amelia,43.667536,-79.368613,Italian Restaurant


In [None]:
# print number of rows and columns of Downtown Toronto's neighborhoods and venues dataframe

print(f'Downtown Toronto\'s dataframe rows: {downtown_toronto_venues.shape[0]}\nDowntown Toronto\'s dataframe columns: {downtown_toronto_venues.shape[1]}')


Downtown Toronto's dataframe rows: 1184
Downtown Toronto's dataframe columns: 7


In [None]:
# show the number of venues for each neighborhood

downtown_toronto_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,58,58,58,58,58,58
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",13,13,13,13,13,13
Central Bay Street,61,61,61,61,61,61
Christie,16,16,16,16,16,16
Church and Wellesley,75,75,75,75,75,75
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Enclave of M5E,97,97,97,97,97,97
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100


### Now let's prepare our dataframe for further analysis

In [None]:
# one hot encoding technique

downtown_toronto_venues_one = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix='', prefix_sep='')
downtown_toronto_venues_one


Unnamed: 0,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,...,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1179,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1180,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1181,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1182,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
# add neighborhood column back to dataframe

downtown_toronto_venues_one['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 
downtown_toronto_venues_one['Neighborhood']


0                          Rosedale
1                          Rosedale
2                          Rosedale
3                          Rosedale
4       St. James Town, Cabbagetown
                   ...             
1179                       Christie
1180                       Christie
1181                       Christie
1182                       Christie
1183                       Christie
Name: Neighborhood, Length: 1184, dtype: object

In [None]:
# group rows data by neighborhoods, take average of frequence and reset index

downtown_toronto_venues_one_grouped = downtown_toronto_venues_one.groupby('Neighborhood').mean().reset_index()
downtown_toronto_venues_one_grouped


Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Candy Store,...,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.034483,0.0,0.0,0.0,0.017241,0.017241,0.0,0.034483,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,...,0.034483,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.034483,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.076923,0.076923,0.076923,0.076923,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.032787,0.0,0.0,0.04918,0.0,...,0.016393,0.0,0.0,0.032787,0.0,0.065574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.016393
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0625,...,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.013333,0.0,0.013333,0.013333,0.0,0.013333,0.0,...,0.04,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.026667
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,...,0.07,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
6,Enclave of M5E,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.020619,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.020619,0.0,0.0,0.0,0.010309,0.0,0.0,0.030928,0.0,0.010309,0.010309,0.0,0.010309,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.030928,0.0,...,0.030928,0.0,0.0,0.0,0.0,0.020619,0.0,0.0,0.041237,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309
7,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.07,0.0,...,0.04,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
8,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.0,0.03,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0
9,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.04,0.0,...,0.03,0.01,0.0,0.01,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0


In [None]:
# print number of rows and columns of Downtown Toronto's neighborhoods and venues dataframe

print(f'Downtown Toronto\'s one hot enconded and grouped by categories rows: {downtown_toronto_venues_one_grouped.shape[0]}\nDowntown Toronto\'s one hot enconded and grouped by categories columns: {downtown_toronto_venues_one_grouped.shape[1]}')


Downtown Toronto's one hot enconded and grouped by categories rows: 18
Downtown Toronto's one hot enconded and grouped by categories columns: 205


### Initial analysis

In [None]:
# Print each neighborhood with top 10 venues

neighborhood_top_venues = 10

for neighborhood in downtown_toronto_venues_one_grouped['Neighborhood']:
    print(f'---> Top places at {neighborhood} <---')
    temp = downtown_toronto_venues_one_grouped[downtown_toronto_venues_one_grouped['Neighborhood'] == neighborhood].T.reset_index()
    temp.columns = ['venue', 'frequency']
    temp = temp.iloc[1:]
    temp['frequency'] = temp['frequency'].astype(float)
    temp = temp.round({'frequency': 2})
    print(temp.sort_values('frequency', ascending=False).reset_index(drop=True).head(neighborhood_top_venues))
    print('\n')


---> Top places at Berczy Park <---
                venue  frequency
0         Coffee Shop       0.09
1        Cocktail Bar       0.05
2      Farmers Market       0.03
3              Bakery       0.03
4            Pharmacy       0.03
5            Beer Bar       0.03
6         Cheese Shop       0.03
7  Seafood Restaurant       0.03
8          Restaurant       0.03
9        Gourmet Shop       0.02


---> Top places at CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport <---
                 venue  frequency
0     Airport Terminal       0.15
1              Airport       0.08
2                  Bar       0.08
3     Sculpture Garden       0.08
4  Rental Car Location       0.08
5                Plane       0.08
6   Airport Food Court       0.08
7          Coffee Shop       0.08
8        Boat or Ferry       0.08
9      Harbor / Marina       0.08


---> Top places at Central Bay Street <---
                venue  frequency
0         Coffee

In [None]:
# define function to organize data into a dataframe

def return_most_common_venues(row, neighborhood_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:neighborhood_top_venues]


In [None]:
three_first_elements_indicator = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(neighborhood_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, three_first_elements_indicator[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_venues_one_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_venues_one_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_venues_one_grouped.iloc[ind, :], neighborhood_top_venues)

neighborhoods_venues_sorted


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Pharmacy,Eastern European Restaurant
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Terminal,Airport,Bar,Coffee Shop,Plane,Rental Car Location,Sculpture Garden,Boat or Ferry,Airport Food Court,Airport Lounge
2,Central Bay Street,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Bubble Tea Shop,Burger Joint,Salad Place,Portuguese Restaurant,Poke Place,Restaurant
3,Christie,Grocery Store,Café,Park,Coffee Shop,Italian Restaurant,Baby Store,Restaurant,Athletics & Sports,Candy Store,Nightclub
4,Church and Wellesley,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Gay Bar,Restaurant,Yoga Studio,Mediterranean Restaurant,Pub,Hotel,Grocery Store
5,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym,Italian Restaurant,Seafood Restaurant,Deli / Bodega,Japanese Restaurant,American Restaurant
6,Enclave of M5E,Coffee Shop,Seafood Restaurant,Café,Cocktail Bar,Beer Bar,Japanese Restaurant,Italian Restaurant,Hotel,Restaurant,Cheese Shop
7,"First Canadian Place, Underground city",Coffee Shop,Café,Hotel,Japanese Restaurant,Restaurant,Gym,Steakhouse,Asian Restaurant,Deli / Bodega,Seafood Restaurant
8,"Garden District, Ryerson",Coffee Shop,Clothing Store,Bubble Tea Shop,Cosmetics Shop,Japanese Restaurant,Café,Middle Eastern Restaurant,Ramen Restaurant,Electronics Store,Hotel
9,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Aquarium,Café,Hotel,Restaurant,Brewery,Scenic Lookout,Sporting Goods Shop,Fried Chicken Joint,Music Venue


### Clustering our Downtown Toronto neighborhood venues' data

In [None]:
# drop unnecessary column

downtown_toronto_venues_one_grouped_clustering = downtown_toronto_venues_one_grouped.drop('Neighborhood', 1)
downtown_toronto_venues_one_grouped_clustering


Unnamed: 0,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,...,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.034483,0.0,0.0,0.0,0.017241,0.017241,0.0,0.034483,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,...,0.034483,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.034483,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0
1,0.076923,0.076923,0.076923,0.076923,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.032787,0.0,0.0,0.04918,0.0,0.0,...,0.016393,0.0,0.0,0.032787,0.0,0.065574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.016393
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0625,0.0,...,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.013333,0.0,0.013333,0.013333,0.0,0.013333,0.0,0.013333,...,0.04,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.026667
5,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,...,0.07,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.020619,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.020619,0.0,0.0,0.0,0.010309,0.0,0.0,0.030928,0.0,0.010309,0.010309,0.0,0.010309,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.030928,0.0,0.0,...,0.030928,0.0,0.0,0.0,0.0,0.020619,0.0,0.0,0.041237,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309
7,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.07,0.0,0.0,...,0.04,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.0,0.03,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.03,0.01,0.0,0.01,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0


In [None]:
# run k-means clustering with 5 clusters

kclusters = 5
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_venues_one_grouped_clustering)


In [None]:
# clusters labels generated for each row

kmeans.labels_[0:10]


array([0, 4, 0, 1, 0, 0, 0, 0, 0, 0], dtype=int32)

In [None]:
# add clustering labels

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)


In [None]:
toronto_downtown_merged = df_downtown_toronto
toronto_downtown_merged = toronto_downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_downtown_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,2,Park,Playground,Trail,Dessert Shop,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675,0,Café,Coffee Shop,Pub,Chinese Restaurant,Bakery,Pizza Place,Italian Restaurant,Restaurant,Sandwich Place,Convenience Store
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Gay Bar,Restaurant,Yoga Studio,Mediterranean Restaurant,Pub,Hotel,Grocery Store
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Café,Bakery,Breakfast Spot,Pub,Theater,Restaurant,Beer Store,Event Space
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Bubble Tea Shop,Cosmetics Shop,Japanese Restaurant,Café,Middle Eastern Restaurant,Ramen Restaurant,Electronics Store,Hotel
5,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Café,Coffee Shop,Cosmetics Shop,Cocktail Bar,Clothing Store,Restaurant,Park,Moroccan Restaurant,Creperie,Beer Bar
6,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Pharmacy,Eastern European Restaurant
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Bubble Tea Shop,Burger Joint,Salad Place,Portuguese Restaurant,Poke Place,Restaurant
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Coffee Shop,Café,Restaurant,Clothing Store,Deli / Bodega,Gym,Thai Restaurant,Hotel,Cosmetics Shop,Steakhouse
9,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,0,Coffee Shop,Aquarium,Café,Hotel,Restaurant,Brewery,Scenic Lookout,Sporting Goods Shop,Fried Chicken Joint,Music Venue


In [None]:
# Now let's visulazie the clusters
# create map
map_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.prism(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_downtown_merged['Latitude'], toronto_downtown_merged['Longitude'], toronto_downtown_merged['Neighborhood'], toronto_downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Let's eximine our 5 generated clusters

In [None]:
# Cluster 1

toronto_downtown_merged.loc[toronto_downtown_merged['Cluster Labels'] == 0, toronto_downtown_merged.columns[[1] + list(range(5, toronto_downtown_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,0,Café,Coffee Shop,Pub,Chinese Restaurant,Bakery,Pizza Place,Italian Restaurant,Restaurant,Sandwich Place,Convenience Store
2,Downtown Toronto,0,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Gay Bar,Restaurant,Yoga Studio,Mediterranean Restaurant,Pub,Hotel,Grocery Store
3,Downtown Toronto,0,Coffee Shop,Park,Café,Bakery,Breakfast Spot,Pub,Theater,Restaurant,Beer Store,Event Space
4,Downtown Toronto,0,Coffee Shop,Clothing Store,Bubble Tea Shop,Cosmetics Shop,Japanese Restaurant,Café,Middle Eastern Restaurant,Ramen Restaurant,Electronics Store,Hotel
5,Downtown Toronto,0,Café,Coffee Shop,Cosmetics Shop,Cocktail Bar,Clothing Store,Restaurant,Park,Moroccan Restaurant,Creperie,Beer Bar
6,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Pharmacy,Eastern European Restaurant
7,Downtown Toronto,0,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Bubble Tea Shop,Burger Joint,Salad Place,Portuguese Restaurant,Poke Place,Restaurant
8,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Clothing Store,Deli / Bodega,Gym,Thai Restaurant,Hotel,Cosmetics Shop,Steakhouse
9,Downtown Toronto,0,Coffee Shop,Aquarium,Café,Hotel,Restaurant,Brewery,Scenic Lookout,Sporting Goods Shop,Fried Chicken Joint,Music Venue
10,Downtown Toronto,0,Coffee Shop,Hotel,Café,Italian Restaurant,Salad Place,Seafood Restaurant,Restaurant,Japanese Restaurant,Breakfast Spot,Asian Restaurant


In [None]:
# Cluster 2

toronto_downtown_merged.loc[toronto_downtown_merged['Cluster Labels'] == 1, toronto_downtown_merged.columns[[1] + list(range(5, toronto_downtown_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Downtown Toronto,1,Grocery Store,Café,Park,Coffee Shop,Italian Restaurant,Baby Store,Restaurant,Athletics & Sports,Candy Store,Nightclub


In [None]:
# Cluster 3

toronto_downtown_merged.loc[toronto_downtown_merged['Cluster Labels'] == 2, toronto_downtown_merged.columns[[1] + list(range(5, toronto_downtown_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,2,Park,Playground,Trail,Dessert Shop,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


In [None]:
# Cluster 4

toronto_downtown_merged.loc[toronto_downtown_merged['Cluster Labels'] == 3, toronto_downtown_merged.columns[[1] + list(range(5, toronto_downtown_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,3,Café,Bookstore,Japanese Restaurant,Bar,Bakery,Yoga Studio,Sushi Restaurant,Sandwich Place,Pub,Poutine Place


In [None]:
# Cluster 5

toronto_downtown_merged.loc[toronto_downtown_merged['Cluster Labels'] == 4, toronto_downtown_merged.columns[[1] + list(range(5, toronto_downtown_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,4,Airport Terminal,Airport,Bar,Coffee Shop,Plane,Rental Car Location,Sculpture Garden,Boat or Ferry,Airport Food Court,Airport Lounge
