## Introduction

In this lab, you can learn how to convert addresses into their equivalent latitude and longitude values. Also, you will learn to use the Foursquare API to explore neighborhoods in Downtown Toronto, Canada. You will use the **explore** function to get the most common venue categories/neighborhoods around Downtown Torondo.You will group the neighborhoods into clusters and   use the *k*-means clustering. Finally, you will use the Folium library to visualize the neighborhoods in Downtown Toronto and their emerging clusters.

In [1]:

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library
pd.options.display.width=0

usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


In [2]:
df=pd.read_csv('Toronto_Coordinates.csv')

In [3]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


#### for this project, we'll cluster only the neighborhoods in Downtown Toronto

In [4]:
downtown_toronto=df[df['Borough']=='Downtown Toronto'].reset_index(drop=True)

In [5]:
downtown_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


Let's get the geographical coordinates Downtown Toronto

In [6]:
address = 'Downtown Toronto, CA'

geolocator = Nominatim(user_agent="My_cluster")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.655115, -79.380219.


##### we now visualise the Downtown Toronto and it's neighbourhood

In [7]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'], downtown_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

#### Define Foursquare Credentials and Version

#### Let's explore the 4th neighborhood in our dataframe.


In [9]:
neighborhood_latitude = downtown_toronto.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = downtown_toronto.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = downtown_toronto.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Harbourfront,Regent Park are 43.6542599, -79.3606359.


#### Now, let's get the top 100 venues that are near Regent Park within a radius of 50000 meters or 50 km. Regent Park is choosen since it's a popular location which happens to be in the middle of Downtown Toronto

In [10]:
LIMIT=100
radius=50000

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=GFKFJH30KDFSU5G2ECM1TVBXTRUQCTU55A0UAE05LC55KOBY&client_secret=PR4OJ4W41FRUUKZ4ST3YRUKH5M5J3OPWZLWEL10GDQIV2ZHQ&v=20180605&ll=43.6542599,-79.3606359&radius=50000&limit=100'

In [11]:
results = requests.get(url).json()


In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [13]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Distillery Sunday Market,Farmers Market,43.650075,-79.361832
1,Souk Tabule,Mediterranean Restaurant,43.653756,-79.35439
2,The Distillery Historic District,Historic Site,43.650244,-79.359323
3,Arvo,Coffee Shop,43.649963,-79.361442
4,Rooster Coffee,Coffee Shop,43.6519,-79.365609


In [14]:
nearby_venues.shape

(100, 4)

nearby_venues has the data of the top 100 venues within 50km of Regent Park. Amount of venues and distance are large enough. There is no need on kooking for top venues around each neighborhood

In [15]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


#### To cluster our neighborhood, only the categories and coordinates are needed.

In [16]:
table=nearby_venues[['categories','lat','lng']]

In [17]:
table.head()

Unnamed: 0,categories,lat,lng
0,Farmers Market,43.650075,-79.361832
1,Mediterranean Restaurant,43.653756,-79.35439
2,Historic Site,43.650244,-79.359323
3,Coffee Shop,43.649963,-79.361442
4,Coffee Shop,43.6519,-79.365609


##### We need to get dummies for categories in order to proceed with clustering

In [18]:
onehot=pd.get_dummies(table,'categories')


In [19]:
onehot.head()

Unnamed: 0,lat,lng,categories_American Restaurant,categories_Aquarium,categories_Art Gallery,categories_Arts & Crafts Store,categories_Athletics & Sports,categories_BBQ Joint,categories_Bakery,categories_Baseball Stadium,categories_Brewery,categories_Bubble Tea Shop,categories_Butcher,categories_Café,categories_Chocolate Shop,categories_Clothing Store,categories_Cocktail Bar,categories_Coffee Shop,categories_Comedy Club,categories_Cosmetics Shop,categories_Dance Studio,categories_Dessert Shop,categories_Dog Run,categories_Farmers Market,categories_Food Truck,categories_French Restaurant,categories_Gastropub,categories_Gym,categories_Historic Site,categories_Hostel,categories_Hotel,categories_Ice Cream Shop,categories_Italian Restaurant,categories_Japanese Restaurant,categories_Juice Bar,categories_Lake,categories_Liquor Store,categories_Mediterranean Restaurant,categories_Mexican Restaurant,categories_Middle Eastern Restaurant,categories_Monument / Landmark,categories_Movie Theater,categories_Museum,categories_Neighborhood,categories_Park,categories_Performing Arts Venue,categories_Pizza Place,categories_Record Shop,categories_Restaurant,categories_Salad Place,categories_Sandwich Place,categories_Scenic Lookout,categories_Seafood Restaurant,categories_Spa,categories_Spanish Restaurant,categories_Speakeasy,categories_Sporting Goods Shop,categories_Tapas Restaurant,categories_Thai Restaurant,categories_Theater,categories_Theme Restaurant,categories_Train Station,categories_Yoga Studio
0,43.650075,-79.361832,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,43.653756,-79.35439,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,43.650244,-79.359323,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,43.649963,-79.361442,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,43.6519,-79.365609,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [20]:
# set number of clusters
kclusters = 4

#grouped_clustering = onehot.drop('categories', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(onehot)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 3, 3, 0, 0, 3, 2, 3])

In [21]:
onehot['categories']=table['categories']
# move neighborhood column to the first column 
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

onehot.head()

Unnamed: 0,categories,lat,lng,categories_American Restaurant,categories_Aquarium,categories_Art Gallery,categories_Arts & Crafts Store,categories_Athletics & Sports,categories_BBQ Joint,categories_Bakery,categories_Baseball Stadium,categories_Brewery,categories_Bubble Tea Shop,categories_Butcher,categories_Café,categories_Chocolate Shop,categories_Clothing Store,categories_Cocktail Bar,categories_Coffee Shop,categories_Comedy Club,categories_Cosmetics Shop,categories_Dance Studio,categories_Dessert Shop,categories_Dog Run,categories_Farmers Market,categories_Food Truck,categories_French Restaurant,categories_Gastropub,categories_Gym,categories_Historic Site,categories_Hostel,categories_Hotel,categories_Ice Cream Shop,categories_Italian Restaurant,categories_Japanese Restaurant,categories_Juice Bar,categories_Lake,categories_Liquor Store,categories_Mediterranean Restaurant,categories_Mexican Restaurant,categories_Middle Eastern Restaurant,categories_Monument / Landmark,categories_Movie Theater,categories_Museum,categories_Neighborhood,categories_Park,categories_Performing Arts Venue,categories_Pizza Place,categories_Record Shop,categories_Restaurant,categories_Salad Place,categories_Sandwich Place,categories_Scenic Lookout,categories_Seafood Restaurant,categories_Spa,categories_Spanish Restaurant,categories_Speakeasy,categories_Sporting Goods Shop,categories_Tapas Restaurant,categories_Thai Restaurant,categories_Theater,categories_Theme Restaurant,categories_Train Station,categories_Yoga Studio
0,Farmers Market,43.650075,-79.361832,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Mediterranean Restaurant,43.653756,-79.35439,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Historic Site,43.650244,-79.359323,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Coffee Shop,43.649963,-79.361442,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Coffee Shop,43.6519,-79.365609,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [22]:
onehot['Cluster Labels']=kmeans.labels_

In [23]:
onehot.head()

Unnamed: 0,categories,lat,lng,categories_American Restaurant,categories_Aquarium,categories_Art Gallery,categories_Arts & Crafts Store,categories_Athletics & Sports,categories_BBQ Joint,categories_Bakery,categories_Baseball Stadium,categories_Brewery,categories_Bubble Tea Shop,categories_Butcher,categories_Café,categories_Chocolate Shop,categories_Clothing Store,categories_Cocktail Bar,categories_Coffee Shop,categories_Comedy Club,categories_Cosmetics Shop,categories_Dance Studio,categories_Dessert Shop,categories_Dog Run,categories_Farmers Market,categories_Food Truck,categories_French Restaurant,categories_Gastropub,categories_Gym,categories_Historic Site,categories_Hostel,categories_Hotel,categories_Ice Cream Shop,categories_Italian Restaurant,categories_Japanese Restaurant,categories_Juice Bar,categories_Lake,categories_Liquor Store,categories_Mediterranean Restaurant,categories_Mexican Restaurant,categories_Middle Eastern Restaurant,categories_Monument / Landmark,categories_Movie Theater,categories_Museum,categories_Neighborhood,categories_Park,categories_Performing Arts Venue,categories_Pizza Place,categories_Record Shop,categories_Restaurant,categories_Salad Place,categories_Sandwich Place,categories_Scenic Lookout,categories_Seafood Restaurant,categories_Spa,categories_Spanish Restaurant,categories_Speakeasy,categories_Sporting Goods Shop,categories_Tapas Restaurant,categories_Thai Restaurant,categories_Theater,categories_Theme Restaurant,categories_Train Station,categories_Yoga Studio,Cluster Labels
0,Farmers Market,43.650075,-79.361832,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Mediterranean Restaurant,43.653756,-79.35439,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Historic Site,43.650244,-79.359323,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Coffee Shop,43.649963,-79.361442,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3
4,Coffee Shop,43.6519,-79.365609,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3


### Finally, let's visualize the resulting clusters

In [24]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi,cluster in zip(onehot['lat'], onehot['lng'], onehot['categories'],onehot['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [25]:
table['Cluster Labels']=kmeans.labels_
table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
categories        100 non-null object
lat               100 non-null float64
lng               100 non-null float64
Cluster Labels    100 non-null int32
dtypes: float64(2), int32(1), object(1)
memory usage: 2.8+ KB


In [26]:
table.head()

Unnamed: 0,categories,lat,lng,Cluster Labels
0,Farmers Market,43.650075,-79.361832,0
1,Mediterranean Restaurant,43.653756,-79.35439,0
2,Historic Site,43.650244,-79.359323,0
3,Coffee Shop,43.649963,-79.361442,3
4,Coffee Shop,43.6519,-79.365609,3


In [27]:
table.shape

(100, 4)

###  Examine some Clusters


##### Cluster 1 contents

In [28]:
table.loc[table['Cluster Labels'] == 0, table.columns[[0] + list(range(1, table.shape[1]))]]

Unnamed: 0,categories,lat,lng,Cluster Labels
0,Farmers Market,43.650075,-79.361832,0
1,Mediterranean Restaurant,43.653756,-79.35439,0
2,Historic Site,43.650244,-79.359323,0
5,Farmers Market,43.648743,-79.371597,0
6,Hotel,43.65906,-79.35003,0
12,Food Truck,43.649287,-79.374689,0
13,Athletics & Sports,43.647088,-79.351306,0
14,Hotel,43.642882,-79.383949,0
15,American Restaurant,43.651569,-79.379266,0
16,Hotel,43.656449,-79.37411,0


##### Cluster 2 contents

In [29]:
table.loc[table['Cluster Labels'] == 1, table.columns[[0] + list(range(1, table.shape[1]))]]

Unnamed: 0,categories,lat,lng,Cluster Labels
11,Neighborhood,43.653232,-79.385296,1
21,Café,43.659723,-79.346871,1
22,Brewery,43.641752,-79.387089,1
40,Café,43.657772,-79.376073,1
43,Café,43.669177,-79.353134,1
45,Café,43.650497,-79.378765,1
49,Pizza Place,43.662802,-79.33238,1
56,Brewery,43.673705,-79.33031,1
62,Neighborhood,43.639526,-79.380688,1
64,Pizza Place,43.656518,-79.380015,1
