# Segment & Cluster Toronto Neighborhoods
by: Diardano Raihan (Indonesia)
<hr>

Objective:
- Previously, we have succeeded to retrieve the latitude and longitude coordinate in `Pre2_Coordinate_Retrieval.ipynb`notebook file. 

- Now, we will __explore__, __segment__, and __group neighborhoods__ into clusters to find similar neighborhoods in __Toronto City__.

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

to add enough Markdown cells to explain what you decided to do and to report any observations you make.
to generate maps to visualize your neighborhoods and how they cluster together.
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%config IPCompleter.greedy=True
%config IPCompleter.use_jedi=False

## Load Data

Let's import `toronto_poscode_latlng.csv` and turn it into a dataframe:

In [19]:
toronto_df = pd.read_csv('datasets/toronto_poscode_latlng.csv')
print(toronto_df.shape)
toronto_df.head()

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


As you might have guessed by now, for each postal code, we can have more than one neighborhood. From now on, we will treat each postal code as a neighborhood also. Let's see how many boroughs and postal codes (neighborhoods) we got:

In [24]:
print('The dataframe has {} boroughs and {} postal codes.'.format(
        len(toronto_df['Borough'].unique()),
        toronto_df.shape[0]
    )
)

The dataframe has 10 boroughs and 103 postal codes.


## Map: Toronto & Neighborhoods

Now that we have data required to create a map of each neighborhood coordinate using __Folium__ module. 

What's left is to define the coordinate of Toronto City itself. We can do get the coordinate using __Geopy__ library.

In [36]:
from geopy.geocoders import Nominatim

address = 'Toronto, Ontario'

# Define a unique user_agent
geolocator = Nominatim(user_agent="toronto_explorer")

# Retrieve Toronto coordinate
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


Now, we can see the neighbourhoods being superimposed on top of the city

In [62]:
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['latitude'], toronto_df['longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Map: A Borough and Neighborhoods

For illustration purpose, we will pick a borough that has the most neighborhoods in it, but still containing with _Toronto_ name in it.
Let's see what borough it is:

In [43]:
toronto_df.groupby(by='Borough').count().sort_values(by='Neighbourhood', ascending=False)

Unnamed: 0_level_0,PostalCode,Neighbourhood,latitude,longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
North York,24,24,24,24
Downtown Toronto,19,19,19,19
Scarborough,17,17,17,17
Etobicoke,12,12,12,12
Central Toronto,9,9,9,9
West Toronto,6,6,6,6
East Toronto,5,5,5,5
East York,5,5,5,5
York,5,5,5,5
Mississauga,1,1,1,1


Great, we will pick __Downtown Toronto__ and it has 19 neighborhoods. 

In [52]:
downtown_df = toronto_df[toronto_df.Borough =='Downtown Toronto'].reset_index(drop=True)
print(downtown_df.shape)
downtown_df.head()

(19, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
3,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
4,M5E,Downtown Toronto,Berczy Park,43.64536,-79.37306


Let's get the geographical coordinates of North York.

In [54]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North York are 43.6563221, -79.3809161.


As we did with all of Toronto City, let's visualize North York with the neighborhoods in it.

In [65]:
# create map of Toronto using latitude and longitude values
map_downtownToronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood in zip(downtown_df['latitude'], downtown_df['longitude'], downtown_df['Borough'], downtown_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtownToronto)  
    
map_downtownToronto

## Explore a Neighborhood in Downtown Toronto

Now, we will utilize the FourSquare API to explore Downtown neighborhoods and segment them

1. __Define Foursquare Credentials and Version__

In [67]:
CLIENT_ID = 'A4BZ4XU5N3JCM5ROZ05CQIZAKB3MURVFNOM24TRHJOOJIMB3' # your Foursquare ID
CLIENT_SECRET = 'I2QABIKFWWEMWYXZIMRFXK4IFSSXTRGG4EPEEVHWP0QYUCYY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

# print('Your credentails:')
# print('CLIENT_ID: ' + CLIENT_ID)
# print('CLIENT_SECRET:' + CLIENT_SECRET)

2. __Let's explore the first neighborhood in our dataframe.__

In [72]:
downtown_df.head(1)

Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264


In [71]:
print('First neighborhood: {}'.format(downtown_df.loc[0,'Neighbourhood']))

First neighborhood: Regent Park, Harbourfront


- Get the location coordinate of the neighborhood

In [77]:
neighborhood_latitude = downtown_df.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = downtown_df.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = downtown_df.loc[0, 'Neighbourhood'] # neighborhood name

print('The coordinate values of {} are\n- latitude: {},\n- longitude: {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

The coordinate values of Regent Park, Harbourfront are
- latitude: 43.65512000000007,
- longitude: -79.36263999999993.


3. __Now, let's get the top 100 venues that are in Regent Park, Harbourfront within a radius of 500 meters.__

- Create a GET request URL

In [78]:
LIMIT = 100
RADIUS = 500

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        neighborhood_latitude, 
        neighborhood_longitude,
        RADIUS,
        LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=A4BZ4XU5N3JCM5ROZ05CQIZAKB3MURVFNOM24TRHJOOJIMB3&client_secret=I2QABIKFWWEMWYXZIMRFXK4IFSSXTRGG4EPEEVHWP0QYUCYY&v=20180605&ll=43.65512000000007,-79.36263999999993&radius=500&limit=100'

- Send the GET request and examine the resutls

In [79]:
import requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fa9fed2b530f203321a934b'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 20,
  'suggestedBounds': {'ne': {'lat': 43.65962000450007,
    'lng': -79.35643191123269},
   'sw': {'lat': 43.650619995500065, 'lng': -79.36884808876717}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label'

In [107]:
results['response']['groups'][0]['items']

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '54ea41ad498e9a11e9e13308',
   'name': 'Roselle Desserts',
   'location': {'address': '362 King St E',
    'crossStreet': 'Trinity St',
    'lat': 43.653446723052674,
    'lng': -79.3620167174383,
    'labeledLatLngs': [{'label': 'display',
      'lat': 43.653446723052674,
      'lng': -79.3620167174383}],
    'distance': 192,
    'postalCode': 'M5A 1K9',
    'cc': 'CA',
    'city': 'Toronto',
    'state': 'ON',
    'country': 'Canada',
    'formattedAddress': ['362 King St E (Trinity St)',
     'Toronto ON M5A 1K9',
     'Canada']},
   'categories': [{'id': '4bf58dd8d48988d16a941735',
     'name': 'Bakery',
     'pluralName': 'Bakeries',
     'shortName': 'Bakery',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bakery_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'count': 0, 'grou

- Based on observation, it seems that all the information is in the __items__ key. Let's put that into a list of venues.

In [113]:
import json # library to handle JSON files
from pandas import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues.head(2)

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,...,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.venuePage.id,venue.location.neighborhood
0,e-0-54ea41ad498e9a11e9e13308-0,0,"[{'summary': 'This spot is popular', 'type': '...",54ea41ad498e9a11e9e13308,Roselle Desserts,362 King St E,Trinity St,43.653447,-79.362017,"[{'label': 'display', 'lat': 43.65344672305267...",...,CA,Toronto,ON,Canada,"[362 King St E (Trinity St), Toronto ON M5A 1K...","[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",0,[],,
1,e-0-53b8466a498e83df908c3f21-1,0,"[{'summary': 'This spot is popular', 'type': '...",53b8466a498e83df908c3f21,Tandem Coffee,368 King St E,at Trinity St,43.653559,-79.361809,"[{'label': 'display', 'lat': 43.65355870959944...",...,CA,Toronto,ON,Canada,"[368 King St E (at Trinity St), Toronto ON, Ca...","[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",0,[],,


- Okay, we have redundant columns that we do not need. We can filter the dataframe containing data columns needed only. 

In [114]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues.head(2)

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Roselle Desserts,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",43.653447,-79.362017
1,Tandem Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",43.653559,-79.361809


- The column __venue.categories__ is DISASTROUS!!! Let's extract the category `name` only inside that list of dictionary.

In [138]:
nearby_venues['venue.categories'] = nearby_venues['venue.categories'].apply(lambda x: x[0]['name'])
nearby_venues.head(2)

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809


- SWEET!!!! Let's clean the column names and see the how many venues returned by FourSquare

In [146]:
nearby_venues.columns = [column.split('.')[-1] for column in nearby_venues.columns]
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
nearby_venues.head()

20 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Figs Breakfast & Lunch,Breakfast Spot,43.655675,-79.364503
3,The Yoga Lounge,Yoga Studio,43.655515,-79.364955
4,Body Blitz Spa East,Spa,43.654735,-79.359874


## Explore Neihborhoods in Downtown Toronto

- __Let's create a function to repeat the same process to all the neighborhoods in Downtown Toronto__