<h1 align=center><font size = 5> Toranto Neighbourhood Research </font></h1>



## Introduction
Using postcode and neighbourhood data from wikipedia and Foursquare API , analysis the neighbourhoods and help people select a good place to live 

This is triggered by IBM Data Science Professional capstone.

## Table of Contents

The notebook will contain the following sections:

1.  Download and Data pre-processing
    Get Data from wikipedia and cleansing
    
2.  EDA - Explore Neighborhoods data in Toronto
    Explore venues data matching each location. 
    
3.  EDA - Analyze each neighbourhood 
    Grouping
    Feature Engineering by creating dummy features
    
4.  Model Development: Clustering
    KNN. TODO: try another ways in the futures.
    
5.  Model Evaluation and Report
    
TODO:    
KNN result not perfect, need more data or features from other sources like Crime data? Education Data?    

In [154]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
import requests
from bs4 import BeautifulSoup

# 1. Download and Data Pre-processing

## 1.1 Get data from wikipedia

*Note*: tricky to get the table data from html file

In [3]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(url).text
soup_html = BeautifulSoup(source, 'lxml')
#print(soup_html.prettify())

In [4]:
table_string = soup_html.find_all('table', 'wikitable sortable')[0]

In [5]:
pd_table = pd.read_html(str(table_string))[0]
pd_table.columns = pd_table.iloc[0]
pd_table = pd_table[1:]

### 1.2 Data preprocessing - Delete data "Not assigned"
 

In [6]:
borough_assigned = pd_table[pd_table['Borough']!='Not assigned'].reset_index(drop=True)

### 1.3  Data preprocessing - Assign neighourhood with "Not assigned." 

In [7]:
borough_assigned[borough_assigned['Neighbourhood']=='Not assigned'] = borough_assigned['Borough']

In [8]:
#Check if still exist 'Not assigned'
sum(borough_assigned['Neighbourhood']=='Not assigned')

0

### 1.4 Combine more Neighbourhoods for same postal code area

By unique postcode, geo data could be generated, then get venues from Foursquare API 

In [114]:
def join_array(arr):
    return ",".join(arr)

groupby_Postcode = borough_assigned.groupby('Postcode').agg({'Neighbourhood': join_array, 
                                                             'Borough': lambda x: list(set(x))[0]})

neighbourhoods = groupby_Postcode.reset_index()
print(neighbourhoods.shape)
neighbourhoods.head()

(103, 3)


Unnamed: 0,Postcode,Neighbourhood,Borough
0,M1B,"Rouge,Malvern",Scarborough
1,M1C,"Highland Creek,Rouge Hill,Port Union",Scarborough
2,M1E,"Guildwood,Morningside,West Hill",Scarborough
3,M1G,Woburn,Scarborough
4,M1H,Cedarbrae,Scarborough


In [115]:
neighbourhoods.shape

(103, 3)

### 1.5 Add longtitude and latitude

Only with longtitude and latidue info, venues could be explored through Foursquare API

In [116]:
geo_coordinates_url = 'http://cocl.us/Geospatial_data'

geo_coordinates = pd.read_csv(geo_coordinates_url)

geo_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [117]:
neighbourhoods_with_lng_lat = pd.merge(neighbourhoods, geo_coordinates, left_on="Postcode", right_on='Postal Code')

neighbourhoods_with_lng_lat.drop(['Postal Code'], axis=1, inplace=True)

neighbourhoods_with_lng_lat.head()

Unnamed: 0,Postcode,Neighbourhood,Borough,Latitude,Longitude
0,M1B,"Rouge,Malvern",Scarborough,43.806686,-79.194353
1,M1C,"Highland Creek,Rouge Hill,Port Union",Scarborough,43.784535,-79.160497
2,M1E,"Guildwood,Morningside,West Hill",Scarborough,43.763573,-79.188711
3,M1G,Woburn,Scarborough,43.770992,-79.216917
4,M1H,Cedarbrae,Scarborough,43.773136,-79.239476


### 1.6 Only analyze the data that the Borough contains 'Toronto"

Focus on the arar

In [124]:
toronto_data = neighbourhoods_with_lng_lat[neighbourhoods_with_lng_lat['Borough'].str
                                                                                 .contains('Toronto')] \
                                                                                 .reset_index(drop=True)

In [125]:
print(toronto_data.shape)
toronto_data.head()

(38, 5)


Unnamed: 0,Postcode,Neighbourhood,Borough,Latitude,Longitude
0,M4E,The Beaches,East Toronto,43.676357,-79.293031
1,M4K,"The Danforth West,Riverdale",East Toronto,43.679557,-79.352188
2,M4L,"The Beaches West,India Bazaar",East Toronto,43.668999,-79.315572
3,M4M,Studio District,East Toronto,43.659526,-79.340923
4,M4N,Lawrence Park,Central Toronto,43.72802,-79.38879


Let's see the this on the map

In [126]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto'

geolocator = Nominatim(user_agent='test')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Let's view the neighborhoods in map

In [127]:
import folium # map rendering library

# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them

### 1.7 Understanding data from Foursquare API

#### Define Foursquare Credentials and Version

In [128]:
CLIENT_ID = 'ME0B4UESS5YNR2IZBNIQMQCRXIQUVOCCNP3R3A0E0QEC2WZH' # your Foursquare ID
CLIENT_SECRET = 'R55W1ZBE1QB5JG0AM4SZXOFR0PDGJJ24YHTVF14R2RKXFWNR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ME0B4UESS5YNR2IZBNIQMQCRXIQUVOCCNP3R3A0E0QEC2WZH
CLIENT_SECRET:R55W1ZBE1QB5JG0AM4SZXOFR0PDGJJ24YHTVF14R2RKXFWNR


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [129]:
toronto_data.loc[0,'Neighbourhood']

'The Beaches'

Get the neighborhood's latitude and longitude values.

In [130]:
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


#### Now, let's get the top 100 venues that are in The Beaches within a radius of 500 meters.

In [131]:
radius = 500
top_n = 100
url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
    CLIENT_ID, 
    CLIENT_SECRET,
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude,
    radius,
    top_n)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c1c447bdb04f53ab6735573'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'The Beaches',
  'headerFullLocation': 'The Beaches, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 5,
  'suggestedBounds': {'ne': {'lat': 43.680857404499996,
    'lng': -79.28682091449052},
   'sw': {'lat': 43.67185739549999, 'lng': -79.29924148550948}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b8daea1f964a520480833e3',
       'name': 'Grover Pub and Grub',
       'location': {'address': '676 Kingston Rd.',
        'crossStreet': 'at Main St.',
        'lat': 43.679181434941015,
        'lng': -79.29721535878515,
        'labeledLatLng

In [132]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe

In [133]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Grover Pub and Grub,Pub,43.679181,-79.297215
1,Starbucks,Coffee Shop,43.678798,-79.298045
2,Glen Stewart Park,Park,43.675278,-79.294647
3,Upper Beaches,Neighborhood,43.680563,-79.292869
4,Dip 'n Sip,Coffee Shop,43.678897,-79.297745


In [134]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

5 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Toronto

#### Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [135]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighbourhood and create a new dataframe called *toronto_venues*.

In [139]:
neighborhood_names = toronto_data.loc[:, 'Neighbourhood'] # neighborhood name
neighborhood_names

0                                           The Beaches
1                           The Danforth West,Riverdale
2                         The Beaches West,India Bazaar
3                                       Studio District
4                                         Lawrence Park
5                                      Davisville North
6                                    North Toronto West
7                                            Davisville
8                            Moore Park,Summerhill East
9     Deer Park,Forest Hill SE,Rathnelly,South Hill,...
10                                             Rosedale
11                           Cabbagetown,St. James Town
12                                 Church and Wellesley
13                             Harbourfront,Regent Park
14                              Ryerson,Garden District
15                                       St. James Town
16                                          Berczy Park
17                                   Central Bay

In [140]:
LIMIT = 100

neighborhood_latitudes = toronto_data.loc[:, 'Latitude'] # neighborhood latitude value
neighborhood_longitudes = toronto_data.loc[:, 'Longitude'] # neighborhood longitude value
neighborhood_names = toronto_data.loc[:, 'Neighbourhood'] # neighborhood name

toronto_venues = getNearbyVenues(neighborhood_names, neighborhood_latitudes, neighborhood_longitudes)


The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvall

#### Explore the new venues data

In [141]:
print(toronto_venues.shape)
toronto_venues.head()

(1688, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
1,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
2,The Beaches,43.676357,-79.293031,Glen Stewart Park,43.675278,-79.294647,Park
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Beaches,43.676357,-79.293031,Dip 'n Sip,43.678897,-79.297745,Coffee Shop


In [142]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,55,55,55,55,55,55
"Brockton,Exhibition Place,Parkdale Village",21,21,21,21,21,21
Business reply mail Processing Centre969 Eastern,16,16,16,16,16,16
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",13,13,13,13,13,13
"Cabbagetown,St. James Town",50,50,50,50,50,50
Central Bay Street,80,80,80,80,80,80
"Chinatown,Grange Park,Kensington Market",95,95,95,95,95,95
Christie,15,15,15,15,15,15
Church and Wellesley,86,86,86,86,86,86


In [143]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 228 uniques categories.


## 3. Analyze Each Neighborhood

### 3.1 Feature Engineering, Get Dummy features

In [144]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot.rename(columns={'Neighborhood': 'Venue_Neighborhood'}, inplace=True)

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(1688, 229)


Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Get the last features for each neighborhood

In [145]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(38, 229)


Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business reply mail Processing Centre969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.076923,0.076923,0.076923,0.153846,0.153846,0.153846,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Explore the most common venues

In [146]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [147]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Steakhouse,American Restaurant,Thai Restaurant,Bakery,Clothing Store,Asian Restaurant,Gym,Bar
1,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Farmers Market,Bakery,Italian Restaurant,Steakhouse,Cheese Shop,Pub,Café
2,"Brockton,Exhibition Place,Parkdale Village",Coffee Shop,Café,Breakfast Spot,Gym,Italian Restaurant,Convenience Store,Pet Store,Grocery Store,Nightclub,Climbing Gym
3,Business reply mail Processing Centre969 Eastern,Comic Shop,Auto Workshop,Smoke Shop,Park,Light Rail Station,Spa,Farmers Market,Fast Food Restaurant,Brewery,Burrito Place
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Boutique,Sculpture Garden


## 4. Model Creation:  Clustering 

Run *k*-means to cluster the neighborhood into 5 clusters.

In [172]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
print(kmeans.labels_.shape)

(38,)


Check the clustering result on each neighborhood by merging sorted venues and cluster number

In [173]:
toronto_merged = toronto_data

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = pd.merge(toronto_merged, neighborhoods_venues_sorted, left_on='Neighbourhood', right_on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Neighbourhood,Borough,Latitude,Longitude,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,The Beaches,East Toronto,43.676357,-79.293031,0,The Beaches,Coffee Shop,Venue_Neighborhood,Park,Pub,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
1,M4K,"The Danforth West,Riverdale",East Toronto,43.679557,-79.352188,0,"The Danforth West,Riverdale",Greek Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Bookstore,Bubble Tea Shop,Bakery,Spa,Juice Bar,Liquor Store
2,M4L,"The Beaches West,India Bazaar",East Toronto,43.668999,-79.315572,0,"The Beaches West,India Bazaar",Pizza Place,Sandwich Place,Park,Steakhouse,Sushi Restaurant,Food & Drink Shop,Ice Cream Shop,Pub,Movie Theater,Fish & Chips Shop
3,M4M,Studio District,East Toronto,43.659526,-79.340923,0,Studio District,Café,Coffee Shop,Gastropub,Italian Restaurant,Bakery,American Restaurant,Yoga Studio,Coworking Space,Seafood Restaurant,Sandwich Place
4,M4N,Lawrence Park,Central Toronto,43.72802,-79.38879,0,Lawrence Park,Dim Sum Restaurant,Park,Swim School,Bus Line,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


Visualization

In [174]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Model Evaluation: Examine Clusters

review each cluster and give a clear label or name to each cluster

#### Cluster 1

In [164]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,0,The Beaches,Coffee Shop,Venue_Neighborhood,Park,Pub,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
1,"The Danforth West,Riverdale",0,"The Danforth West,Riverdale",Greek Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Bookstore,Bubble Tea Shop,Bakery,Spa,Juice Bar,Liquor Store
2,"The Beaches West,India Bazaar",0,"The Beaches West,India Bazaar",Pizza Place,Sandwich Place,Park,Steakhouse,Sushi Restaurant,Food & Drink Shop,Ice Cream Shop,Pub,Movie Theater,Fish & Chips Shop
3,Studio District,0,Studio District,Café,Coffee Shop,Gastropub,Italian Restaurant,Bakery,American Restaurant,Yoga Studio,Coworking Space,Seafood Restaurant,Sandwich Place
4,Lawrence Park,0,Lawrence Park,Dim Sum Restaurant,Park,Swim School,Bus Line,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
5,Davisville North,0,Davisville North,Grocery Store,Clothing Store,Burger Joint,Food & Drink Shop,Dance Studio,Hotel,Sandwich Place,Breakfast Spot,Park,Discount Store
6,North Toronto West,0,North Toronto West,Coffee Shop,Sporting Goods Shop,Clothing Store,Gym / Fitness Center,Fast Food Restaurant,Mexican Restaurant,Diner,Dessert Shop,Park,Chinese Restaurant
7,Davisville,0,Davisville,Dessert Shop,Sandwich Place,Pizza Place,Café,Italian Restaurant,Coffee Shop,Sushi Restaurant,Seafood Restaurant,Fried Chicken Joint,Diner
8,"Moore Park,Summerhill East",0,"Moore Park,Summerhill East",Restaurant,Yoga Studio,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
9,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",0,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",Coffee Shop,Pub,Pizza Place,Supermarket,Bagel Shop,Fried Chicken Joint,Sports Bar,American Restaurant,Convenience Store,Vietnamese Restaurant


#### This is Common Neighbourhoods

#### Cluster 2

In [165]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,"The Annex,North Midtown,Yorkville",1,"The Annex,North Midtown,Yorkville",Coffee Shop,Sandwich Place,Café,Pizza Place,Park,Liquor Store,Burger Joint,Jewish Restaurant,Indian Restaurant,BBQ Joint


#### This is a Down Town 

#### Cluster 3

In [166]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Stn A PO Boxes 25 The Esplanade,2,Stn A PO Boxes 25 The Esplanade,Coffee Shop,Restaurant,Café,Seafood Restaurant,Pub,Cocktail Bar,Hotel,Italian Restaurant,Art Gallery,Breakfast Spot


#### This is Sea Side neighbourhood

#### Cluster 4

In [167]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Central Bay Street,3,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Bar,Ice Cream Shop,Café,Burger Joint,Bubble Tea Shop,Chinese Restaurant,Spa


#### This is a Leisure Neighbourhood

#### Cluster 5

In [168]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Roselawn,4,Roselawn,Garden,Dim Sum Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
27,"CN Tower,Bathurst Quay,Island airport,Harbourf...",4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Boutique,Sculpture Garden


#### This is Airport  Area neighbourhood