<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto - PART 3</font></h1>

#### Exploring and clustering the neighborhoods in Toronto

#### Introduction

#### I will be using the Foursquare API to explore neighborhoods in selected cities in toronto. The Foursquare explore function will be used to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. The k-means clustering algorithm will be used for the analysis. Fnally, use the Folium library to visualize the neighborhoods in Toronto and their emerging clusters.

In [1]:
!conda install -c conda-forge geopy --yes        # if needed
!conda install -c conda-forge folium=0.5.0 --yes # if needed

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files


from geopy.geocoders import Nominatim 
GeoLocator = Nominatim(user_agent='My-IBMNotebook')# convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [2]:
# Saved Toronto postal codes with borough and neighbors.
toronto_task1_csv = "Toronto.TASK_1_df.csv"
# Saved Toronto postal codes with borough,neighbors,Longitude and Latitude.
toronto_task2_csv = "Toronto.TASK_II_df.csv"

In [3]:
toronto_neighborhoods = pd.read_csv(toronto_task2_csv)

In [4]:
toronto_neighborhoods.shape
toronto_neighborhoods.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,Scarborough,"Highland Creek, Port Union, Rouge Hill",43.784535,-79.160497
2,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,Scarborough,Woburn,43.770992,-79.216917
4,Scarborough,Cedarbrae,43.773136,-79.239476


#### Use geopy library to get the latitude and longitude values of Toronto Canada.

In [5]:
address = 'Toronto, Ontario Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto Canada are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of Toronto Canada are 43.653963, -79.387207.


#### Create a map of Toronto with neighborhoods superimposed on top

In [6]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_neighborhoods['Latitude'], toronto_neighborhoods['Longitude'], toronto_neighborhoods['Borough'], toronto_neighborhoods['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)

In [7]:
map_toronto

#### For this task, I will just reduce the our target analysis to Neighbohoods in East,West and Central Toronto only. Lets just take portion of dataframe where Boroughs contain word Toronto

In [8]:
toronto_data = toronto_neighborhoods[toronto_neighborhoods['Borough'].str.contains("Toronto")].reset_index(drop=True)
print(toronto_data.shape)
toronto_data.head()

(39, 4)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,East Toronto,The Beaches,43.676357,-79.293031
1,East Toronto,"Riverdale, The Danforth West",43.679557,-79.352188
2,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,East Toronto,Studio District,43.659526,-79.340923
4,Central Toronto,Lawrence Park,43.72802,-79.38879


#### Re-create the map with new markers for Toronto Neighborhoods

In [9]:
# I will be using the same coordinates for the previous view
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Utilizing the Foursquare API to explore and segment neighborhoods

In [10]:
CLIENT_ID = '1U1AKZMJUSXZOS1JIBWZQ5VAFUJVNW2KJJIWEQKTZB25YQC4'
CLIENT_SECRET = '15KXNIISPIBYRJOUIL5NV0IOWSUFMKS22VRJSLLZBWLYDQSR' 
VERSION = '20200214'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1U1AKZMJUSXZOS1JIBWZQ5VAFUJVNW2KJJIWEQKTZB25YQC4
CLIENT_SECRET:15KXNIISPIBYRJOUIL5NV0IOWSUFMKS22VRJSLLZBWLYDQSR


### 1. Exploring Neighbourhood in Toronto
#### Using the following foursquare api query url, search venues on all boroughs in selected Toronto neighborhoods
https://api.foursquare.com/v2/venues/search?client_id=CLIENT_ID&client_secret=CLIENT_SECRET&ll=LATITUDE,LONGITUDE&v=VERSION&query=Q

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id=1U1AKZMJUSXZOS1JIBWZQ5VAFUJVNW2KJJIWEQKTZB25YQC4&client_secret=15KXNIISPIBYRJOUIL5NV0IOWSUFMKS22VRJSLLZBWLYDQSR&v=20200213&ll=43.668999,-79.315572&radius=500&limit=30'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([( name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

#### Retrieve all venues given the Addresses

In [12]:
toronto_neighborhoods = toronto_data
toronto_venues = getNearbyVenues(names=toronto_neighborhoods['Neighbourhood'],
                                   latitudes=toronto_neighborhoods['Latitude'],
                                   longitudes=toronto_neighborhoods['Longitude']
                                  )

The Beaches
Riverdale, The Danforth West
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
North Midtown, The Annex, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
Bathurst Quay, CN Tower, Harbourfront West, Island airport, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

#### Check size of resulting dataframe

In [13]:
print(toronto_venues.shape)
toronto_venues.head()

(702, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,System Fitness,43.667171,-79.312733,Gym
1,The Beaches,43.676357,-79.293031,The Burger's Priest,43.666612,-79.315531,Burger Joint
2,The Beaches,43.676357,-79.293031,British Style Fish & Chips,43.668723,-79.317139,Fish & Chips Shop
3,The Beaches,43.676357,-79.293031,Brett's Ice Cream,43.667222,-79.312831,Ice Cream Shop
4,The Beaches,43.676357,-79.293031,Casa di Giorgio,43.666645,-79.315204,Italian Restaurant


In [14]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",18,18,18,18,18,18
"Bathurst Quay, CN Tower, Harbourfront West, Island airport, King and Spadina, Railway Lands, South Niagara",18,18,18,18,18,18
Berczy Park,18,18,18,18,18,18
"Brockton, Exhibition Place, Parkdale Village",18,18,18,18,18,18
Business Reply Mail Processing Centre 969 Eastern,18,18,18,18,18,18
"Cabbagetown, St. James Town",18,18,18,18,18,18
Central Bay Street,18,18,18,18,18,18
"Chinatown, Grange Park, Kensington Market",18,18,18,18,18,18
Christie,18,18,18,18,18,18
Church and Wellesley,18,18,18,18,18,18


In [15]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 17 uniques categories.


### 2. Analyze Each Borough Neighborhood

In [16]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Brewery,Burger Joint,Burrito Place,Fast Food Restaurant,Fish & Chips Shop,Gym,Ice Cream Shop,Italian Restaurant,Liquor Store,Movie Theater,Park,Pet Store,Pizza Place,Pub,Sandwich Place,Steakhouse,Sushi Restaurant
0,The Beaches,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,The Beaches,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


In [17]:
toronto_onehot.shape

(702, 18)

In [18]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Brewery,Burger Joint,Burrito Place,Fast Food Restaurant,Fish & Chips Shop,Gym,Ice Cream Shop,Italian Restaurant,Liquor Store,Movie Theater,Park,Pet Store,Pizza Place,Pub,Sandwich Place,Steakhouse,Sushi Restaurant
0,"Adelaide, King, Richmond",0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
1,"Bathurst Quay, CN Tower, Harbourfront West, Is...",0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
2,Berczy Park,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
3,"Brockton, Exhibition Place, Parkdale Village",0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
4,Business Reply Mail Processing Centre 969 Eastern,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
5,"Cabbagetown, St. James Town",0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
6,Central Bay Street,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
7,"Chinatown, Grange Park, Kensington Market",0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
8,Christie,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556
9,Church and Wellesley,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556,0.111111,0.055556,0.055556,0.055556,0.055556,0.055556,0.055556


In [19]:
toronto_grouped.shape

(39, 18)

In [20]:
num_top_venues = 5
for neigh in toronto_grouped['Neighborhood']:
    print("----"+neigh+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == neigh].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
            venue  freq
0            Park  0.11
1         Brewery  0.06
2   Movie Theater  0.06
3      Steakhouse  0.06
4  Sandwich Place  0.06


----Bathurst Quay, CN Tower, Harbourfront West, Island airport, King and Spadina, Railway Lands, South Niagara----
            venue  freq
0            Park  0.11
1         Brewery  0.06
2   Movie Theater  0.06
3      Steakhouse  0.06
4  Sandwich Place  0.06


----Berczy Park----
            venue  freq
0            Park  0.11
1         Brewery  0.06
2   Movie Theater  0.06
3      Steakhouse  0.06
4  Sandwich Place  0.06


----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0            Park  0.11
1         Brewery  0.06
2   Movie Theater  0.06
3      Steakhouse  0.06
4  Sandwich Place  0.06


----Business Reply Mail Processing Centre 969 Eastern----
            venue  freq
0            Park  0.11
1         Brewery  0.06
2   Movie Theater  0.06
3      Steakhouse  0.06
4  Sandwich Place

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [22]:
import numpy as np
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.shape

(39, 11)

### 3. Clustering Neighborhoods

In [23]:
from sklearn.cluster import KMeans
import sklearn.cluster.k_means_
km = KMeans(n_clusters=3, init='k-means++', max_iter=100, n_init=1, 
  verbose=True)

In [24]:
kclusters = 10
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(toronto_grouped_clustering)
print(kmeans.labels_[0:10])
print(len(kmeans.labels_))

[0 0 0 0 0 0 0 0 0 0]
39


  return_n_iter=True)


In [25]:
toronto_neighborhoods.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,East Toronto,The Beaches,43.676357,-79.293031
1,East Toronto,"Riverdale, The Danforth West",43.679557,-79.352188
2,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,East Toronto,Studio District,43.659526,-79.340923
4,Central Toronto,Lawrence Park,43.72802,-79.38879


In [28]:
toronto_merged = toronto_neighborhoods

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
#toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')



### Finally, Lets visualize the clusters

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'],kmeans.labels_):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters