<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Scraping the Toronto Postal Code, Borough and Neighborhood table from Wiki
### Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Assignment:  Process and Wrangle data</a>

2.  <a href="#item2">Assignment: get the geo coordinates for all the postal codes in the dataframe</a>

3.  <a href="#item3">Assignment: Explore and cluster the neighborhoods in Toronto</a>
    </font>
    </div>

In [1]:
import pandas as pd
import numpy as np
import geocoder

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#Get the first table from the Wiki page - the Toronto -103 FSAs
toronto_wiki = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
toronto_wiki.columns= ['PostalCode', 'Borough', 'Neighborhood']
toronto_wiki.head()
print('untrimed table shape: {}'.format(toronto_wiki.shape))

untrimed table shape: (180, 3)


## 1. Assignment:  Process and Wrangle data


Process the dataframe to drop the rows with 'Not assigned' Borough, and assign Borough to Neighborhood when Neighborhood value is 'Not assigned'
<li>Note: there are no cells in the raw table of the wiki page that has a borough but a <strong>Not assigned </strong> neighborhood. But the logic is added anyway</li>

In [2]:
toronto_wiki_trim = toronto_wiki.drop(toronto_wiki[toronto_wiki.Borough == 'Not assigned'].index)#[toronto_wiki['Borough'] != 'Not assigned']
toronto_wiki_trim['Neighborhood'] = np.where(toronto_wiki_trim.Neighborhood == 'Not assigned',toronto_wiki_trim.Borough, toronto_wiki_trim.Neighborhood)

In [3]:
print('trimed table shape: {}'.format(toronto_wiki_trim.shape))
toronto_wiki_trim.head()

trimed table shape: (103, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


## 2. Assignment: get the geo coordinates for all the postal codes in the dataframe


<li>With geocoder: geocoder.google('{}, Toronto, Ontario'.format(postal_code)), ref:https://geocoder.readthedocs.io/index.html. </li>

<li><strong>Method 1:</strong> use geocoder to get the coodinates from a PostalCode
<strong>Unfortunately not working well, it stucks in getting the data</strong></li>
<li><strong>Method 1 alternative:</strong> use geopy Nominatim to get the coodinates from a Neighborhood
<strong>however, some of the neighborhood returns None</strong></li>

In [4]:
#Define a function here to get the coordinates as an array with the sample code from the assignment.
def get_coodinates_geocoder(geo_name):
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g= geocoder.google(geo_name)
        lat_lng_coords = g.latlng
        print(lat_lng_coords)
    return lat_lng_coords[:2]

#The geolocator with Nominatim works fine, learned the code from the NYC neighborhoods cluster.
def get_coodinates_geolocator(geo_name):
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(geo_name)
    return [location.latitude, location.longitude] if location != None else [0, 0]

#PostalCode	Borough	Neighborhood	Latitude	Longitude apply the method above to dataframe
#geolocator returns none for some neighborhoods like "Ontario Provincial Government, Toronto", use zero if it is none.
#Ref: https://stackoverflow.com/questions/10715965/create-pandas-dataframe-by-appending-one-row-at-a-time/17496530#17496530
# using dict will boost the performance a lot compare with append.
def get_coodinates_geolocator_df(geo_data):
    column_names = ['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']
    geo_data_new= geo_data.apply(lambda row: dict(zip(column_names, ([row.PostalCode, row.Borough, row.Neighborhood] + get_coodinates_geolocator('{}, Toronto'.format(row.Neighborhood))))), axis=1)
    geo_data_new = pd.DataFrame(list([data for data in geo_data_new if data['Latitude'] > 0]))
    return geo_data_new

#Test code
#get_coodinates_geolocator_df(toronto_data_nbh_cluster.head())

#geoName = 'Ontario Provincial Government, Toronto'
#print("Coordinate of {} is {}".format(geoName, get_coodinates_geolocator(geoName)))

#get_coodinates_geocoder('M5A, Toronto, Ontario')
#concat the postal code, toronto and the borough into a string, and save it as another column.
#toronto_wiki_trim['geo_name'] = toronto_wiki_trim.apply(lambda x: '{}, {}, {}'.format(x.PostalCode, "Toronto", "Ontario"), axis=1)
#toronto_wiki_trim['Latitude', 'Longitude'] = toronto_wiki_trim['geo_name'].apply(get_coodinates, axis=1)

<strong>Method 2:</strong> using the csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

<Strong>Please see the head dataframe as the assignment requires</Strong>

In [5]:
geo_file = r'C:\Users\tong.PACKSIZE\Desktop\coursera_assignment\visualization_ipynb\Geospatial_Coordinates.csv'
#in order to merge two dataframes, the key column has to be the same.
postalcode_coords=pd.read_csv(geo_file).rename(columns={'Postal Code': 'PostalCode'})

toronto_geo_coords = pd.merge(toronto_wiki_trim, postalcode_coords, on='PostalCode')
toronto_geo_coords.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## 3. Assignment: Explore and cluster the neighborhoods in Toronto


Use Folium to create the map, utilizing the Foursquare API to explore the neighborhoods and segment them.
Work with only boroughs that contain the word Toronto,.

<strong>First:</strong> Create a map of Toronto with all the boroughs' coordinates

In [6]:
import requests # library to handle requests
from pandas.io.json import json_normalize # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

geolocator=Nominatim(user_agent='toronto_explorer')
location=geolocator.geocode("Toronto")
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_geo_coords['Latitude'], toronto_geo_coords['Longitude'], toronto_geo_coords['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Use Foursqare API to get data for all neighborhood in the borough that its name contains "Toronto" 
<li><strong>Step 1:</strong> Trim the data to drop the rows, which borough does not have toronto.</li>
<li><strong>Step 2:</strong> Reorganize the data to split the rows that have multiple neighborhoods.</li>

In [7]:
#Extract the Boroughs whoes name contains "Toronto"
toronto_data_cluster = toronto_geo_coords.loc[toronto_geo_coords['Borough'].str.contains('Toronto')]
print("Shape of the dataframe for the Boroughs that contains Toronto: {}".format(toronto_data_cluster.shape))

toronto_data_nbh_cluster = toronto_data_cluster.assign(Neighborhood=toronto_data_cluster['Neighborhood'].str.split(',')).explode('Neighborhood')
toronto_data_nbh_cluster.reset_index(inplace =True)

del toronto_data_nbh_cluster['index']
toronto_data_nbh_cluster.head()
print("Shape of the dataframe for the splited neighborhood: {}".format(toronto_data_nbh_cluster.shape))

Shape of the dataframe for the Boroughs that contains Toronto: (39, 5)
Shape of the dataframe for the splited neighborhood: (78, 5)


<li><strong>Step 3:</strong> Use get_coodinates_geolocator_df function defined above to get the coordinates for each neighborhood.</li>

In [8]:
toronto_data_nbh_cluster = get_coodinates_geolocator_df(toronto_data_nbh_cluster)
toronto_data_nbh_cluster.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457
1,M5A,Downtown Toronto,Harbourfront,43.64008,-79.38015
2,M7A,Downtown Toronto,Queen's Park,43.659659,-79.39034
3,M5B,Downtown Toronto,Garden District,43.6565,-79.377114
4,M5B,Downtown Toronto,Ryerson,43.658324,-79.378925


<li><strong>Step 4:</strong> Explore the data from Foursquare API with the one neighborhood</li>
The code is borrowed from Neighborhood NYC lab

In [9]:
CLIENT_ID = 'YEB5XZKX1Z4CTGYDAHXVEYL3PT4VYCQJTB3VF0WTA3OPXQYL' # your Foursquare ID
CLIENT_SECRET = 'UHUA10WSV1YIXUCOMKW4QDLVYVRQ0N4BJQT2XODM1B3JWC3Z' # your Foursquare Secret
ACCESS_TOKEN = 'LWJ0PV2F1E0MOEV3QREJGCQJHKDNM0PQLPQGIVXJ01W12E0D' # your FourSquare Access Token  XL4D25NY2BUPN2BLEB3NIMJXTKHUHYTTLGKJKUL2VTQK402I#_=_ {"access_token":"LWJ0PV2F1E0MOEV3QREJGCQJHKDNM0PQLPQGIVXJ01W12E0D"}
VERSION = '20180604'
LIMIT = 100 # A default Foursquare API limit value
RADIUS = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    toronto_data_nbh_cluster.iloc[0, -2], 
    toronto_data_nbh_cluster.iloc[0, -1], 
    RADIUS, 
    LIMIT)

results = requests.get(url).json()
#results
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Regent Park Aquatic Centre,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",43.6606,-79.361392
1,Sumach Espresso,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",43.658135,-79.359515
2,Daniels Spectrum,"[{'id': '4bf58dd8d48988d1f2931735', 'name': 'P...",43.660137,-79.361808
3,Thai To Go,"[{'id': '4bf58dd8d48988d149941735', 'name': 'T...",43.663418,-79.36071
4,Paintbox Bistro,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",43.66005,-79.362855


<li><strong>Step 5:</strong> Borrow getNearbyVenues function from lab to extract needed data from API Response</li>

In [10]:
#function borrowed from
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


<li><strong>Step 6:</strong> Run the neighborhoods in the dataframe toronto_data_nbh_cluster thru the function above</li>

In [11]:
venue_data = getNearbyVenues(toronto_data_nbh_cluster['Neighborhood'], toronto_data_nbh_cluster['Latitude'], toronto_data_nbh_cluster['Longitude'])

In [12]:
print('There are {} neighborhoods, {} uniques categories. shape of data : {}'.format(len(venue_data['Neighborhood'].unique()), len(venue_data['Venue Category'].unique()), venue_data.shape))
venue_data.tail(20)

There are 73 neighborhoods, 289 uniques categories. shape of data : (3691, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
3671,Church and Wellesley,43.670862,-79.372792,Toronto Public Library (St. James Town),43.66879,-79.374998,Library
3672,Church and Wellesley,43.670862,-79.372792,Isabella Hotel,43.670098,-79.375983,Hotel
3673,Church and Wellesley,43.670862,-79.372792,No Frills,43.671616,-79.378187,Grocery Store
3674,Church and Wellesley,43.670862,-79.372792,Tim Hortons,43.669855,-79.375578,Coffee Shop
3675,Church and Wellesley,43.670862,-79.372792,Booster Juice,43.671566,-79.378581,Juice Bar
3676,Church and Wellesley,43.670862,-79.372792,Shoppers Drug Mart,43.670177,-79.375649,Pharmacy
3677,Church and Wellesley,43.670862,-79.372792,TD Canada Trust,43.672484,-79.377162,Bank
3678,Church and Wellesley,43.670862,-79.372792,Tim Hortons,43.66782,-79.375075,Coffee Shop
3679,Church and Wellesley,43.670862,-79.372792,Aroma Espresso Bar,43.672154,-79.377885,Coffee Shop
3680,Church and Wellesley,43.670862,-79.372792,Handy Variety,43.669828,-79.375932,Food & Drink Shop


<li><strong>Step 7:</strong> Cluster Neighborhoods and create map</li>
use pd.get_dummies to return a dataframe of each category of the venue. 
Code is the same as the Neighborhoods New York. I have tried to use Concat the make a new dataframe out of neighborhood and dummies for categories, however, always get grouper for Neighborhood not 1-dimension. 

In [18]:
#The code below gives grouper not 1-dimension error, Todo: find out why
#venue_data_dummies = pd.concat([venue_data[['Neighborhood']], pd.get_dummies(venue_data[['Venue Category']], prefix="", prefix_sep="")], axis=1).reset_index() 
venue_data_dummies = pd.get_dummies(venue_data[['Venue Category']], prefix="", prefix_sep="")
venue_data_dummies['Neighborhood'] = venue_data['Neighborhood']
fixed_columns = [venue_data_dummies.columns[-1]] + list(venue_data_dummies.columns[:-1])
venue_data_dummies = venue_data_dummies[fixed_columns]

print('Shape of data : {}'.format(venue_data_dummies.shape))
venue_data_dummies.head()

Shape of data : (3691, 289)


Unnamed: 0,Yoga Studio,Accessories Store,African Restaurant,Airport,Airport Service,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group the rows by Neighborhood, and then take the mean of the frequency of the occurrence of each category.

In [23]:
venue_data_grouped = venue_data_dummies.groupby('Neighborhood').mean().reset_index()
venue_data_grouped.head()
#venue_data_grouped.iloc[0,0]

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,African Restaurant,Airport,Airport Service,American Restaurant,Animal Shelter,Antique Shop,Aquarium,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
1,Bathurst Quay,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Cabbagetown,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chinatown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.046875,0.015625,0.0,0.03125,0.0,0.015625,0.0,0.0,0.0
4,Deer Park,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0


Create a new dataframe and display the top 10 venues for each neighborhood

In [27]:
newColumns = ['Neighborhood']+["No. " +str(x+1)+ " Common Venue" for x in np.arange(10)]
neighbor_top10_venue = pd.DataFrame(columns= newColumns)
neighbor_top10_venue['Neighborhood'] = venue_data_grouped['Neighborhood'] 

for nb in np.arange(venue_data_grouped.shape[0]):
    neighbor_top10_venue.iloc[nb, 1:]= venue_data_grouped.iloc[nb, 1:].sort_values(ascending=False).index.values[0:10]
    
neighbor_top10_venue.head()

Unnamed: 0,Neighborhood,No. 1 Common Venue,No. 2 Common Venue,No. 3 Common Venue,No. 4 Common Venue,No. 5 Common Venue,No. 6 Common Venue,No. 7 Common Venue,No. 8 Common Venue,No. 9 Common Venue,No. 10 Common Venue
0,Adelaide,Coffee Shop,Restaurant,Café,Gym,Clothing Store,Japanese Restaurant,Italian Restaurant,Hotel,American Restaurant,Gastropub
1,Bathurst Quay,Coffee Shop,Café,Park,Grocery Store,Dance Studio,Sculpture Garden,Caribbean Restaurant,Sushi Restaurant,Garden,Japanese Restaurant
2,Cabbagetown,Restaurant,Café,Coffee Shop,Indian Restaurant,Diner,Bakery,Gastropub,Pub,Pizza Place,Beer Store
3,Chinatown,Café,Vegetarian / Vegan Restaurant,Mexican Restaurant,Bar,Coffee Shop,Vietnamese Restaurant,Dessert Shop,Grocery Store,Clothing Store,Bakery
4,Deer Park,Coffee Shop,Italian Restaurant,Sushi Restaurant,Café,Grocery Store,Thai Restaurant,Bank,Bagel Shop,Sandwich Place,Restaurant


Cluster Neighborhoods
<li>When using kclusters with value 4-6, it looks like there are a lot of neighborhoods fall into one cluster. 7 gives a more distributed clustering</li> 
[[ 0,  3],[ 1,  6],[ 2,  1],[ 3,  2],[ 4,  1],[ 5, 35],[ 6, 25]]

In [52]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

def array_value_counts(array):
    y=np.bincount(array)
    ii = np.nonzero(y)[0]
    return np.vstack((ii,y[ii])).T
    
kclusters = 7

neighbor_top10_venue_clustering = venue_data_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighbor_top10_venue_clustering)
#kmeans.labels_[0:10]
array_value_counts(kmeans.labels_)

array([[ 0,  3],
       [ 1,  6],
       [ 2,  1],
       [ 3,  2],
       [ 4,  1],
       [ 5, 35],
       [ 6, 25]], dtype=int32)

Add the cluster number into the neighbor_top10_venue, as well as the Coordinates for the neighborhood.

In [58]:
#Add Cluster
neighbor_top10_venue.insert(0, 'Cluster', kmeans.labels_)
neighbor_top10_venue.head()

#Reference back the coordinate from toronto_data_nbh_cluster
toronto_nbh_Coordis = toronto_data_nbh_cluster
toronto_nbh_Coordis = toronto_nbh_Coordis.join(neighbor_top10_venue.set_index('Neighborhood'), on='Neighborhood')
print("shape of dataframe: {}".format(toronto_nbh_Coordis.shape))
toronto_nbh_Coordis.head()

shape of dataframe: (74, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,No. 1 Common Venue,No. 2 Common Venue,No. 3 Common Venue,No. 4 Common Venue,No. 5 Common Venue,No. 6 Common Venue,No. 7 Common Venue,No. 8 Common Venue,No. 9 Common Venue,No. 10 Common Venue
0,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,5,Coffee Shop,Restaurant,Thai Restaurant,Pharmacy,Auto Dealership,Park,Pool,Pub,Electronics Store,Fast Food Restaurant
1,M5A,Downtown Toronto,Harbourfront,43.64008,-79.38015,5,Coffee Shop,Café,Restaurant,Hotel,Italian Restaurant,Brewery,History Museum,Pizza Place,Sandwich Place,Chinese Restaurant
2,M7A,Downtown Toronto,Queen's Park,43.659659,-79.39034,5,Coffee Shop,Café,Sandwich Place,Italian Restaurant,French Restaurant,Restaurant,Bubble Tea Shop,Thai Restaurant,Japanese Restaurant,Bank
3,M5B,Downtown Toronto,Garden District,43.6565,-79.377114,5,Clothing Store,Hotel,Coffee Shop,Restaurant,Bookstore,Fast Food Restaurant,Electronics Store,Burger Joint,Lingerie Store,Theater
4,M5B,Downtown Toronto,Ryerson,43.658324,-79.378925,5,Coffee Shop,Clothing Store,Diner,Hotel,Middle Eastern Restaurant,Italian Restaurant,Japanese Restaurant,Café,Movie Theater,Falafel Restaurant


Visualize the data with Folium map

In [59]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_nbh_Coordis['Latitude'], toronto_nbh_Coordis['Longitude'], toronto_nbh_Coordis['Neighborhood'], toronto_nbh_Coordis['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Map observation: it looks like cluster 5 and 6 are more close to the city center. Let's check the data of these two clusters.
Cluster 2 and 4 only has 1 neighborhood. Let's check that as well.

In [62]:
toronto_nbh_Coordis.loc[toronto_nbh_Coordis['Cluster']==5, :].head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,No. 1 Common Venue,No. 2 Common Venue,No. 3 Common Venue,No. 4 Common Venue,No. 5 Common Venue,No. 6 Common Venue,No. 7 Common Venue,No. 8 Common Venue,No. 9 Common Venue,No. 10 Common Venue
0,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,5,Coffee Shop,Restaurant,Thai Restaurant,Pharmacy,Auto Dealership,Park,Pool,Pub,Electronics Store,Fast Food Restaurant
1,M5A,Downtown Toronto,Harbourfront,43.64008,-79.38015,5,Coffee Shop,Café,Restaurant,Hotel,Italian Restaurant,Brewery,History Museum,Pizza Place,Sandwich Place,Chinese Restaurant
2,M7A,Downtown Toronto,Queen's Park,43.659659,-79.39034,5,Coffee Shop,Café,Sandwich Place,Italian Restaurant,French Restaurant,Restaurant,Bubble Tea Shop,Thai Restaurant,Japanese Restaurant,Bank
3,M5B,Downtown Toronto,Garden District,43.6565,-79.377114,5,Clothing Store,Hotel,Coffee Shop,Restaurant,Bookstore,Fast Food Restaurant,Electronics Store,Burger Joint,Lingerie Store,Theater
4,M5B,Downtown Toronto,Ryerson,43.658324,-79.378925,5,Coffee Shop,Clothing Store,Diner,Hotel,Middle Eastern Restaurant,Italian Restaurant,Japanese Restaurant,Café,Movie Theater,Falafel Restaurant
5,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704,5,Coffee Shop,Pizza Place,Grocery Store,Café,Market,Intersection,Italian Restaurant,Beer Store,Bistro,Library
7,M5E,Downtown Toronto,Berczy Park,43.647984,-79.375396,5,Coffee Shop,Café,Restaurant,Seafood Restaurant,Japanese Restaurant,Italian Restaurant,Bakery,Cocktail Bar,Beer Bar,Gym
8,M5G,Downtown Toronto,Central Bay Street,43.652651,-79.382503,5,Coffee Shop,Clothing Store,Hotel,Café,Bakery,Seafood Restaurant,Restaurant,Salad Place,Diner,Thai Restaurant
10,M5H,Downtown Toronto,Richmond,43.648587,-79.391373,5,Coffee Shop,Café,Thai Restaurant,Arts & Crafts Store,Clothing Store,Restaurant,Hotel,Sandwich Place,Shoe Store,Italian Restaurant
11,M5H,Downtown Toronto,Adelaide,43.650298,-79.380477,5,Coffee Shop,Restaurant,Café,Gym,Clothing Store,Japanese Restaurant,Italian Restaurant,Hotel,American Restaurant,Gastropub


In [66]:
toronto_nbh_Coordis.loc[toronto_nbh_Coordis['Cluster']==6, :].head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,No. 1 Common Venue,No. 2 Common Venue,No. 3 Common Venue,No. 4 Common Venue,No. 5 Common Venue,No. 6 Common Venue,No. 7 Common Venue,No. 8 Common Venue,No. 9 Common Venue,No. 10 Common Venue
6,M4E,East Toronto,The Beaches,43.671024,-79.296712,6,Beach,Park,Nail Salon,Pizza Place,Japanese Restaurant,Coffee Shop,Pub,Breakfast Spot,Bar,Skating Rink
9,M6G,Downtown Toronto,Christie,43.664111,-79.418405,6,Korean Restaurant,Coffee Shop,Pub,Cocktail Bar,Sandwich Place,Café,Ice Cream Shop,Mexican Restaurant,Indian Restaurant,Grocery Store
13,M6H,West Toronto,Dufferin,43.660202,-79.435719,6,Bar,Bakery,Coffee Shop,Mexican Restaurant,Restaurant,Café,Cocktail Bar,Sandwich Place,Beer Store,Vietnamese Restaurant
18,M6J,West Toronto,Little Portugal,43.647413,-79.431116,6,Bar,Café,Coffee Shop,Restaurant,Korean Restaurant,Bakery,Cocktail Bar,Japanese Restaurant,Italian Restaurant,Dive Bar
20,M4K,East Toronto,The Danforth West,43.686433,-79.300355,6,Grocery Store,Pharmacy,Bus Line,Coffee Shop,Baseball Field,Sushi Restaurant,Fried Chicken Joint,French Restaurant,Light Rail Station,Fast Food Restaurant
21,M4K,East Toronto,Riverdale,43.66547,-79.352594,6,Vietnamese Restaurant,Chinese Restaurant,Grocery Store,Light Rail Station,Bakery,Bar,Fast Food Restaurant,Breakfast Spot,Vegetarian / Vegan Restaurant,Baseball Field
24,M6K,West Toronto,Brockton,43.650917,-79.440022,6,Bar,Park,Vietnamese Restaurant,Gastropub,Portuguese Restaurant,Bus Stop,Café,French Restaurant,Korean Restaurant,Dive Bar
25,M6K,West Toronto,Parkdale Village,43.640495,-79.436897,6,Tibetan Restaurant,Pharmacy,Diner,Bakery,Restaurant,Pizza Place,Indian Restaurant,Liquor Store,Fast Food Restaurant,Boutique
27,M4L,East Toronto,India Bazaar,43.672223,-79.323503,6,Indian Restaurant,Grocery Store,Restaurant,Café,Bus Stop,Bar,Italian Restaurant,Snack Place,Burger Joint,Shopping Plaza
28,M4L,East Toronto,The Beaches West,43.671024,-79.296712,6,Beach,Park,Nail Salon,Pizza Place,Japanese Restaurant,Coffee Shop,Pub,Breakfast Spot,Bar,Skating Rink


In [65]:
toronto_nbh_Coordis.loc[toronto_nbh_Coordis['Cluster']==2, :]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,No. 1 Common Venue,No. 2 Common Venue,No. 3 Common Venue,No. 4 Common Venue,No. 5 Common Venue,No. 6 Common Venue,No. 7 Common Venue,No. 8 Common Venue,No. 9 Common Venue,No. 10 Common Venue
59,M4V,Central Toronto,Forest Hill SE,43.693559,-79.413902,2,Home Service,Park,Arts & Crafts Store,Bank,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant


In [64]:
toronto_nbh_Coordis.loc[toronto_nbh_Coordis['Cluster']==4, :]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,No. 1 Common Venue,No. 2 Common Venue,No. 3 Common Venue,No. 4 Common Venue,No. 5 Common Venue,No. 6 Common Venue,No. 7 Common Venue,No. 8 Common Venue,No. 9 Common Venue,No. 10 Common Venue
51,M4T,Central Toronto,Moore Park,43.690388,-79.383297,4,Tennis Court,Convenience Store,Gym,Egyptian Restaurant,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant


In [63]:
toronto_nbh_Coordis.loc[toronto_nbh_Coordis['Cluster']==3, :]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,No. 1 Common Venue,No. 2 Common Venue,No. 3 Common Venue,No. 4 Common Venue,No. 5 Common Venue,No. 6 Common Venue,No. 7 Common Venue,No. 8 Common Venue,No. 9 Common Venue,No. 10 Common Venue
50,M6S,West Toronto,Swansea,43.64494,-79.478313,3,Park,Dance Studio,Social Club,Construction & Landscaping,Restaurant,Women's Store,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner
68,M4W,Downtown Toronto,Rosedale,43.678356,-79.380746,3,Park,Playground,Bike Trail,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant
