# Capstone Course
## Week 03 Assignment
### Part 3 - Exploring neighborhoods in Toronto
### Applying clustering analysis to the data


### Relevant notes:
Using an exported CSV file from the previous notebooks: w03_toronto_data.csv

It was saved using ';' to preserve the information of multiple neighborhoods


In [1]:
# Import pre-processed data for Toronto neighbourhoods
import pandas as pd

df_lat_lng = pd.read_csv ('w03_toronto_data.csv', sep = ';', index_col=0)

df_lat_lng.head(5)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


In [2]:
# Import additional libraries to complete the workbook
import numpy as np

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import folium # map rendering library

print('Required libraries imported.')

Required libraries imported.


## Start visualizing the Toronoto neighborhoods

In [3]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [4]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_lat_lng['Latitude'], df_lat_lng['Longitude'], df_lat_lng['Borough'], df_lat_lng['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Foursquare Data
### NOTE: Keys for the Foursquare API are removed after obtaining the information

In [5]:
# We add aditional libraries to handle the response of the Foursquare API

import json # library to handle JSON files
import requests # library to handle requests

# And add API credentials

CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

# The credentials will be removed after obtaining information


## And reuse functions to explore venues for the Neoghborhoods by their coordinates
### Initial function has a radius that defaults to 500


In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
# Now we obtain the Toronto Venues

toronto_venues = getNearbyVenues(names = df_lat_lng['Neighbourhood'],
                                latitudes = df_lat_lng['Latitude'],
                                longitudes = df_lat_lng['Longitude']
                                )

toronto_venues.head(5)

Parkwoods
Victoria Village
Regent Park ,  Harbourfront
Lawrence Manor ,  Lawrence Heights
 Ontario Provincial Government
Islington Avenue
Malvern ,  Rouge
Don MillsNorth
Parkview Hill ,  Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park ,  Princess Gardens ,  Martin Grove ,  Islington ,  Cloverdale
Rouge Hill ,  Port Union ,  Highland Creek
Don MillsSouthFlemingdon Park
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate ,  Bloordale Gardens ,  Old Burnhamthorpe ,  Markland Wood
Guildwood ,  Morningside ,  West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor ,  Wilson Heights ,  Downsview North
Thorncliffe Park
Richmond ,  Adelaide ,  King
Dufferin ,  Dovercourt Village
Scarborough Village
Fairview ,  Henry Farm ,  Oriole
Northwood Park ,  York University
The Danforth  East
Harbourfront East ,  Union Station ,  Toronto Islands
Little Portugal ,  Trinity
Kennedy Park ,  Ionvi

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [8]:
# And confirm all the data we obtained
toronto_venues.shape

(2247, 7)

### Obtained 2,247 venues data

In [9]:
# And count the information
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ontario Provincial Government,41,41,41,41,41,41
Agincourt,5,5,5,5,5,5
"Alderwood , Long Branch",9,9,9,9,9,9
"Bathurst Manor , Wilson Heights , Downsview North",21,21,21,21,21,21
Bayview Village,4,4,4,4,4,4
...,...,...,...,...,...,...
WillowdaleWest,7,7,7,7,7,7
Woburn,4,4,4,4,4,4
Woodbine Heights,11,11,11,11,11,11
"York Mills , Silver Hills",1,1,1,1,1,1


### Some neigberhoods concentrate a greater amount of venues

In [10]:
# Check for unique Venue Categories
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 274 uniques categories.


In [11]:
# And prepare to analise the foursquere data, encoding the data we obtained

# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [12]:
# And the generated encoding matrix shape
toronto_onehot.shape


(2247, 274)

In [13]:
# We groupe the information by the venue catoregory ocurrence
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped


Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Ontario Provincial Government,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02439,0.0,0.000000,0.0,0.0,0.0,0.0,0.02439,0.0
1,Agincourt,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00000,0.0
2,"Alderwood , Long Branch",0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00000,0.0
3,"Bathurst Manor , Wilson Heights , Downsview ...",0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00000,0.0
4,Bayview Village,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,WillowdaleWest,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00000,0.0
95,Woburn,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00000,0.0
96,Woodbine Heights,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.090909,0.0,0.0,0.0,0.0,0.00000,0.0
97,"York Mills , Silver Hills",0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.00000,0.0


In [14]:
# Review the goruping
toronto_grouped.shape

(99, 274)

In [16]:
# And prepare to identify the top five categories of each neighborhood,
# using functions presented in the lab,
# so we can use the attributes for clustering pourposes

# Reuse function to order venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [17]:
# And reuse code to create new dataframe with top 10 venues for each neighborhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ontario Provincial Government,Coffee Shop,Park,Gym,Discount Store,Restaurant,Portuguese Restaurant,Nightclub,Mexican Restaurant,Juice Bar,Italian Restaurant
1,Agincourt,Lounge,Latin American Restaurant,Skating Rink,Clothing Store,Breakfast Spot,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run
2,"Alderwood , Long Branch",Pizza Place,Skating Rink,Coffee Shop,Pharmacy,Pool,Pub,Sandwich Place,Gym,Airport Service,Ethiopian Restaurant
3,"Bathurst Manor , Wilson Heights , Downsview ...",Coffee Shop,Bank,Pet Store,Sandwich Place,Supermarket,Middle Eastern Restaurant,Shopping Mall,Restaurant,Deli / Bodega,Diner
4,Bayview Village,Japanese Restaurant,Chinese Restaurant,Café,Bank,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store


In [18]:
# We verify the shape 
neighborhoods_venues_sorted.shape

(99, 11)

In [19]:
# We verify the shape 
toronto_grouped.shape

(99, 274)

## K-Means Clustering
### Once we have the data of venues encoded and processed, with relevant top 10 categories
### We have relevant attrributes that we can use to identify relevant clusters for the Toronto Neigborhoods

In [20]:
# We reuse the same steps as the lab

# set number of clusters to 5
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [21]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# Create a copy of our initial dataframe
toronto_merged = df_lat_lng

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged.head() # check for the kmeans cluster label

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,2.0,Park,Food & Drink Shop,Women's Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Coffee Shop,Intersection,Portuguese Restaurant,Hockey Arena,Women's Store,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,0.0,Coffee Shop,Park,Pub,Bakery,Mexican Restaurant,Café,Theater,Performing Arts Venue,Spa,Shoe Store
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Women's Store,Coffee Shop,Miscellaneous Shop,Shoe Store,Boutique,Event Space,Gift Shop,Vietnamese Restaurant
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,0.0,Coffee Shop,Park,Gym,Discount Store,Restaurant,Portuguese Restaurant,Nightclub,Mexican Restaurant,Juice Bar,Italian Restaurant


In [22]:
# The joined results might have missing values
number_nan = toronto_merged['Cluster Labels'].isna().sum()
number_nan


4

In [23]:
# And remove records that had issues
bool_series = pd.notnull(toronto_merged['Cluster Labels'])
clean_toronto = toronto_merged[bool_series] 
clean_toronto.shape

(99, 16)

In [24]:
#Verify we eliminated records withh issues
number_nan = clean_toronto['Cluster Labels'].isna().sum()
number_nan

0

In [25]:
# And visualize the K-Means clustering results
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(clean_toronto['Latitude'], clean_toronto['Longitude'], clean_toronto['Neighbourhood'], clean_toronto['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)

    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Diving in to the clusters composition

## We first dive into the Cluster 0

### Has 84 occurrences, check radius for the data

In [26]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,0.0,Coffee Shop,Intersection,Portuguese Restaurant,Hockey Arena,Women's Store,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
2,"Regent Park , Harbourfront",0.0,Coffee Shop,Park,Pub,Bakery,Mexican Restaurant,Café,Theater,Performing Arts Venue,Spa,Shoe Store
3,"Lawrence Manor , Lawrence Heights",0.0,Clothing Store,Furniture / Home Store,Women's Store,Coffee Shop,Miscellaneous Shop,Shoe Store,Boutique,Event Space,Gift Shop,Vietnamese Restaurant
4,Ontario Provincial Government,0.0,Coffee Shop,Park,Gym,Discount Store,Restaurant,Portuguese Restaurant,Nightclub,Mexican Restaurant,Juice Bar,Italian Restaurant
7,Don MillsNorth,0.0,Japanese Restaurant,Gym / Fitness Center,Café,Caribbean Restaurant,Baseball Field,Women's Store,Diner,Discount Store,Distribution Center,Dog Run
...,...,...,...,...,...,...,...,...,...,...,...,...
97,"First Canadian Place , Underground city",0.0,Coffee Shop,Café,Restaurant,Gym,American Restaurant,Japanese Restaurant,Seafood Restaurant,Steakhouse,Asian Restaurant,Hotel
99,Church and Wellesley,0.0,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Café,Hotel,Men's Store,Gastropub
100,Enclave of M4L,0.0,Light Rail Station,Butcher,Garden,Recording Studio,Fast Food Restaurant,Auto Workshop,Farmers Market,Burrito Place,Brewery,Spa
101,"Old Mill South , King's Mill Park , Sunnylea...",0.0,Baseball Field,Breakfast Spot,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore


## Then dive into the Cluster 1
### Only 1 ocurrence

In [27]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,"York Mills , Silver Hills",1.0,Cafeteria,Women's Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Department Store


## Then dive into the Cluster 2
### With 12 ocurrences, with very similar mix of venues

In [28]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,2.0,Park,Food & Drink Shop,Women's Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
21,Caledonia-Fairbanks,2.0,Park,Women's Store,Market,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Drugstore
35,The Danforth East,2.0,Convenience Store,Coffee Shop,Park,Women's Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
40,DownsviewEast CFB Toronto,2.0,Airport,Park,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
49,"North Park , Maple Leaf Park , Upwood Park",2.0,Park,Bakery,Construction & Landscaping,Women's Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
61,Lawrence Park,2.0,Bus Line,Park,Swim School,Women's Store,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
64,Weston,2.0,Convenience Store,Park,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store
66,York Mills West,2.0,Convenience Store,Park,Bank,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store
83,"Moore Park , Summerhill East",2.0,Tennis Court,Park,Playground,Restaurant,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
85,"Milliken , Agincourt North , Steeles East , ...",2.0,Playground,Park,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center


## Then dive into the Cluster 3
### With 1 ocurrences, why?

In [29]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,"Malvern , Rouge",3.0,Fast Food Restaurant,Women's Store,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop


## Then dive into the Cluster 4
### With 1 ocurrences, why?

In [30]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Roselawn,4.0,Garden,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant,Deli / Bodega


# First Summary:
## Frist iteration, with a radius of 500
Cluster 0 = 84 occurrences, no significant patterns visible to the naked eye

Cluster 1 =  1 occurrence,  no significant pattern  visible to the naked eye

#### Cluster 2 = 12 occurrence,  significant pattern, parks and other very similar venues

Cluster 3 =  1 occurrence,  no significant pattern  visible to the naked eye

Cluster 4 =  1 occurrence,  no significant pattern  visible to the naked eye

## Fisrst observations dont provide a significant patterns that provide more insight,
## further itreations with diferent parametes should be usefull: 
###    - Is the radius a signinficant factor that skews results for K-Means?
###    - Or are the neighborhoods very similar in caracteristics?

# ----------------------------------------------------------------------
## Second iteration
### We will review the same exercise adjusting the radius obtained from the venues
### And review if there is a significant diference in the caracteristics changing 
### the initial assumption of 500 meters, to 800 meters

In [34]:
# Now we obtain the Toronto Venues with a diferent radius

toronto_venues_2 = getNearbyVenues(names = df_lat_lng['Neighbourhood'],
                                latitudes = df_lat_lng['Latitude'],
                                longitudes = df_lat_lng['Longitude'],
                                radius = 800 )

toronto_venues_2.head(5)

Parkwoods
Victoria Village
Regent Park ,  Harbourfront
Lawrence Manor ,  Lawrence Heights
 Ontario Provincial Government
Islington Avenue
Malvern ,  Rouge
Don MillsNorth
Parkview Hill ,  Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park ,  Princess Gardens ,  Martin Grove ,  Islington ,  Cloverdale
Rouge Hill ,  Port Union ,  Highland Creek
Don MillsSouthFlemingdon Park
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate ,  Bloordale Gardens ,  Old Burnhamthorpe ,  Markland Wood
Guildwood ,  Morningside ,  West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor ,  Wilson Heights ,  Downsview North
Thorncliffe Park
Richmond ,  Adelaide ,  King
Dufferin ,  Dovercourt Village
Scarborough Village
Fairview ,  Henry Farm ,  Oriole
Northwood Park ,  York University
The Danforth  East
Harbourfront East ,  Union Station ,  Toronto Islands
Little Portugal ,  Trinity
Kennedy Park ,  Ionvi

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,DVP at York Mills,43.758899,-79.334099,Road
3,Parkwoods,43.753259,-79.329656,TTC Stop #09083,43.759655,-79.332223,Bus Stop
4,Parkwoods,43.753259,-79.329656,TTC Stop 9083,43.759251,-79.334,Bus Stop


In [35]:
# And confirm the amount of venues recieved
toronto_venues_2.shape

(4022, 7)

In [36]:
# And count the information
toronto_venues_2.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ontario Provincial Government,100,100,100,100,100,100
Agincourt,20,20,20,20,20,20
"Alderwood , Long Branch",15,15,15,15,15,15
"Bathurst Manor , Wilson Heights , Downsview North",26,26,26,26,26,26
Bayview Village,10,10,10,10,10,10
...,...,...,...,...,...,...
WillowdaleWest,8,8,8,8,8,8
Woburn,7,7,7,7,7,7
Woodbine Heights,14,14,14,14,14,14
"York Mills , Silver Hills",4,4,4,4,4,4


### Form some neighborhoods we reached the 100 venue limit parameter

In [37]:
# Check for unique Venue Categories
print('There are {} uniques categories.'.format(len(toronto_venues_2['Venue Category'].unique())))

There are 333 uniques categories.


In [38]:
# We enconde the new data we obtained, we now have at least 333 unique categories

# one hot encoding
toronto_onehot_2 = pd.get_dummies(toronto_venues_2[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot_2['Neighborhood'] = toronto_venues_2['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot_2.columns[-1]] + list(toronto_onehot_2.columns[:-1])
toronto_onehot_2 = toronto_onehot_2[fixed_columns]

toronto_onehot_2.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
# We review the encoding matrix shape - second excercise
toronto_onehot_2.shape

(4022, 333)

In [40]:
# We groupe the information by the venue catoregory ocurrence - second excercise
toronto_grouped_2 = toronto_onehot_2.groupby('Neighborhood').mean().reset_index()
toronto_grouped_2

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Ontario Provincial Government,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.010000,0.0,0.0,0.0,0.01,0.0,0.01,0.0
1,Agincourt,0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.00,0.0
2,"Alderwood , Long Branch",0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.00,0.0
3,"Bathurst Manor , Wilson Heights , Downsview ...",0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.00,0.0
4,Bayview Village,0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.00,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,WillowdaleWest,0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.00,0.0
98,Woburn,0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.00,0.0
99,Woodbine Heights,0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.071429,0.0,0.0,0.0,0.00,0.0,0.00,0.0
100,"York Mills , Silver Hills",0.00,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00,0.0,0.000000,0.0,0.0,0.0,0.00,0.0,0.00,0.0


In [41]:
# We review the shape - second excercise
toronto_grouped_2.shape

(102, 333)

In [42]:
# And create new dataframe with top 10 venues - second excercise

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_2 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_2['Neighborhood'] = toronto_grouped_2['Neighborhood']

for ind in np.arange(toronto_grouped_2.shape[0]):
    neighborhoods_venues_sorted_2.iloc[ind, 1:] = return_most_common_venues(toronto_grouped_2.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_2.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ontario Provincial Government,Coffee Shop,Gastropub,Italian Restaurant,Japanese Restaurant,Park,Chinese Restaurant,Thai Restaurant,Ice Cream Shop,Office,Hotel
1,Agincourt,Chinese Restaurant,Clothing Store,Supermarket,Breakfast Spot,Skating Rink,Shanghai Restaurant,Seafood Restaurant,Sandwich Place,Pool Hall,Badminton Court
2,"Alderwood , Long Branch",Pizza Place,Convenience Store,Coffee Shop,Gas Station,Pub,Pharmacy,Park,Sandwich Place,Discount Store,Skating Rink
3,"Bathurst Manor , Wilson Heights , Downsview ...",Coffee Shop,Pizza Place,Park,Bank,Restaurant,Bridal Shop,Middle Eastern Restaurant,Shopping Mall,Supermarket,Diner
4,Bayview Village,Japanese Restaurant,Bank,Skating Rink,Café,Shopping Mall,Park,Chinese Restaurant,Grocery Store,Eastern European Restaurant,Empanada Restaurant


In [43]:
# Review the shape
neighborhoods_venues_sorted_2.shape

(102, 11)

In [44]:
# We reuse the same steps as the lab and apply K-Means - Second excercise

# set number of clusters to 5
kclusters = 5

toronto_grouped_clustering_2 = toronto_grouped_2.drop('Neighborhood', 1)

# run k-means clustering
kmeans_2 = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering_2)

# check cluster labels generated for each row in the dataframe
kmeans_2.labels_[0:10] 


array([2, 2, 4, 4, 0, 2, 2, 2, 2, 2])

In [45]:
# And complement the datafram information

# add clustering labels
neighborhoods_venues_sorted_2.insert(0, 'Cluster Labels', kmeans_2.labels_)

# Create a copy of our initial dataframe
toronto_merged_2 = df_lat_lng

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged_2 = toronto_merged_2.join(neighborhoods_venues_sorted_2.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged_2.head() # check for the kmeans cluster label

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Bus Stop,Park,Food & Drink Shop,Road,Design Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar
1,M4A,North York,Victoria Village,43.725882,-79.315572,2.0,Intersection,Coffee Shop,Park,Hockey Arena,Café,Portuguese Restaurant,Ethiopian Restaurant,Dog Run,Design Studio,Dessert Shop
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,2.0,Coffee Shop,Park,Pub,Theater,Italian Restaurant,Café,Restaurant,Bakery,Thai Restaurant,Cosmetics Shop
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763,2.0,Furniture / Home Store,Clothing Store,Vietnamese Restaurant,Fried Chicken Joint,Seafood Restaurant,Restaurant,Fast Food Restaurant,Coffee Shop,Dessert Shop,Paper / Office Supplies Store
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,2.0,Coffee Shop,Gastropub,Italian Restaurant,Japanese Restaurant,Park,Chinese Restaurant,Thai Restaurant,Ice Cream Shop,Office,Hotel


In [47]:
# The joined results might have missing values
number_nan = toronto_merged_2['Cluster Labels'].isna().sum()
number_nan

1

In [48]:
# And remove records that had issues
bool_series_2 = pd.notnull(toronto_merged_2['Cluster Labels'])
clean_toronto_2 = toronto_merged_2[bool_series] 
clean_toronto_2.shape

(99, 16)

In [49]:
#Verify we eliminated records withh issues
number_nan = clean_toronto_2['Cluster Labels'].isna().sum()
number_nan

0

In [53]:
# And visualize the K-Means clustering results - Second excercise
# create map
map_clusters_2 = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(clean_toronto_2['Latitude'], clean_toronto_2['Longitude'], clean_toronto_2['Neighbourhood'], clean_toronto_2['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)

    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters_2)
       
map_clusters_2

## Clusters now have a diferent composition
### Detailed review should offer more insight

## Cluster 0 - 11 Ocurrences - Relevant observation = Parks!

In [54]:
toronto_merged_2.loc[toronto_merged_2['Cluster Labels'] == 0, toronto_merged_2.columns[[2] + list(range(5, toronto_merged_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,0.0,Bus Stop,Park,Food & Drink Shop,Road,Design Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar
5,Islington Avenue,0.0,Pharmacy,Playground,Café,Grocery Store,Bank,Park,Shopping Mall,Skating Rink,Women's Store,Dive Bar
21,Caledonia-Fairbanks,0.0,Park,Mexican Restaurant,Women's Store,Japanese Restaurant,Pharmacy,Cosmetics Shop,Café,Market,Beer Store,Bank
22,Woburn,0.0,Park,Coffee Shop,Convenience Store,Construction & Landscaping,Business Service,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar
39,Bayview Village,0.0,Japanese Restaurant,Bank,Skating Rink,Café,Shopping Mall,Park,Chinese Restaurant,Grocery Store,Eastern European Restaurant,Empanada Restaurant
45,"York Mills , Silver Hills",0.0,Cafeteria,Business Service,Park,Pool,Dog Run,Fabric Shop,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
46,DownsviewWest,0.0,Park,Vietnamese Restaurant,Spa,Moving Target,Bank,Shopping Mall,Coffee Shop,Grocery Store,Gym / Fitness Center,Pizza Place
66,York Mills West,0.0,Park,Restaurant,Convenience Store,Gym,Bank,Golf Course,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
83,"Moore Park , Summerhill East",0.0,Park,Grocery Store,Playground,Tennis Court,Thai Restaurant,Café,Candy Store,Sandwich Place,Japanese Restaurant,Gym
91,Rosedale,0.0,Park,Trail,Playground,Grocery Store,Candy Store,Bank,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store


## Cluster 1 - 1 Ocurrence - Relevant observation = Rental car gives awar the airport!

In [56]:
toronto_merged_2.loc[toronto_merged_2['Cluster Labels'] == 1, toronto_merged_2.columns[[2] + list(range(5, toronto_merged_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,"Clairville , Humberwood , Woodbine Downs , ...",1.0,Rental Car Location,Women's Store,Donut Shop,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run


## Cluster 2 - 65 Ocurrences - Relevant observation = Similar venues, Coffee Shops!

In [58]:
toronto_merged_2.loc[toronto_merged_2['Cluster Labels'] == 2, toronto_merged_2.columns[[2] + list(range(5, toronto_merged_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,2.0,Intersection,Coffee Shop,Park,Hockey Arena,Café,Portuguese Restaurant,Ethiopian Restaurant,Dog Run,Design Studio,Dessert Shop
2,"Regent Park , Harbourfront",2.0,Coffee Shop,Park,Pub,Theater,Italian Restaurant,Café,Restaurant,Bakery,Thai Restaurant,Cosmetics Shop
3,"Lawrence Manor , Lawrence Heights",2.0,Furniture / Home Store,Clothing Store,Vietnamese Restaurant,Fried Chicken Joint,Seafood Restaurant,Restaurant,Fast Food Restaurant,Coffee Shop,Dessert Shop,Paper / Office Supplies Store
4,Ontario Provincial Government,2.0,Coffee Shop,Gastropub,Italian Restaurant,Japanese Restaurant,Park,Chinese Restaurant,Thai Restaurant,Ice Cream Shop,Office,Hotel
6,"Malvern , Rouge",2.0,Fast Food Restaurant,Trail,Chinese Restaurant,Paper / Office Supplies Store,Bus Station,Coffee Shop,Hobby Shop,Filipino Restaurant,Diner,Discount Store
...,...,...,...,...,...,...,...,...,...,...,...,...
97,"First Canadian Place , Underground city",2.0,Coffee Shop,Hotel,Café,Restaurant,Bar,Seafood Restaurant,Japanese Restaurant,Gastropub,Beer Bar,Theater
98,"The Kingsway , Montgomery Road , Old Mill North",2.0,Italian Restaurant,Breakfast Spot,Sushi Restaurant,Park,Bank,Restaurant,Liquor Store,Burger Joint,Bar,Bakery
99,Church and Wellesley,2.0,Coffee Shop,Japanese Restaurant,Restaurant,Park,Gay Bar,Gastropub,Ramen Restaurant,Pizza Place,Men's Store,Mediterranean Restaurant
100,Enclave of M4L,2.0,Fast Food Restaurant,Pizza Place,Light Rail Station,Italian Restaurant,Restaurant,Harbor / Marina,Bar,Bakery,Coffee Shop,Grocery Store


## Cluster 3 - 1 Ocurrence - Relevant observation = Venue information is inconclusive

In [59]:
toronto_merged_2.loc[toronto_merged_2['Cluster Labels'] == 3, toronto_merged_2.columns[[2] + list(range(5, toronto_merged_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Roselawn,3.0,Garden,Playground,Pet Store,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar


## Cluster 4 - 24 Ocurrences - Relevant observation = Food venues Pizza!

In [60]:
toronto_merged_2.loc[toronto_merged_2['Cluster Labels'] == 4, toronto_merged_2.columns[[2] + list(range(5, toronto_merged_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Parkview Hill , Woodbine Gardens",4.0,Pizza Place,Brewery,Fast Food Restaurant,Gastropub,Bus Line,Bank,Bakery,Café,Intersection,Athletics & Sports
10,Glencairn,4.0,Grocery Store,Pizza Place,Gas Station,Fast Food Restaurant,Playground,Latin American Restaurant,Bus Line,Japanese Restaurant,Tennis Court,Ice Cream Shop
11,"West Deane Park , Princess Gardens , Martin ...",4.0,Pizza Place,Hotel,Convenience Store,Coffee Shop,Gym,Mexican Restaurant,Theater,Bank,Restaurant,Women's Store
14,Woodbine Heights,4.0,Pizza Place,Cosmetics Shop,Beer Store,Skating Rink,Diner,Bus Line,Bus Stop,Sandwich Place,Asian Restaurant,Curling Ice
18,"Guildwood , Morningside , West Hill",4.0,Pizza Place,Coffee Shop,Fast Food Restaurant,Supermarket,Grocery Store,Greek Restaurant,Bank,Fried Chicken Joint,Sports Bar,Laundromat
27,Hillcrest Village,4.0,Pharmacy,Park,Pool,Recreation Center,Bakery,Bank,Korean Restaurant,Ice Cream Shop,Shopping Mall,Restaurant
28,"Bathurst Manor , Wilson Heights , Downsview ...",4.0,Coffee Shop,Pizza Place,Park,Bank,Restaurant,Bridal Shop,Middle Eastern Restaurant,Shopping Mall,Supermarket,Diner
32,Scarborough Village,4.0,Ice Cream Shop,Sandwich Place,Coffee Shop,Fast Food Restaurant,Pizza Place,Restaurant,Convenience Store,Dog Run,Dessert Shop,Dim Sum Restaurant
34,"Northwood Park , York University",4.0,Pizza Place,Furniture / Home Store,Massage Studio,Fast Food Restaurant,Sandwich Place,Japanese Restaurant,Bank,Bar,Caribbean Restaurant,Chinese Restaurant
50,Humber Summit,4.0,Arts & Crafts Store,Pizza Place,Empanada Restaurant,Bakery,Women's Store,Donut Shop,Diner,Discount Store,Distribution Center,Dive Bar


## Final Summary

## Cluster data can provide a relevant story:
- Cluster 0 would identify Neghborhoos that are near popular parks.
- Cluster 1 does not share much information, but the rental car is a clue for the Airport, and how it afects those Neighborhoods.
- Cluster 2 is the most ocurring, being a good profile of the type of venues available  to a significant part of Niegborhoods of Toronto (Coffe Shops).
- Cluster 3 does not share much information, not much insight from this cluster with one ocurrence.
- Cluster 4 has the second most ocurring count, with pizza being very popular.

## Final notes:
- The cuantitiy of data points (Venues) and geographic relevance of the data (radius), have significant impact in the Clustering algorithms.
- The initial assumptions rendered results that only made us ask more questions, instead of providing insight.
- The second excersise (with adjusted radius) describes a relevant profile of Toronto, and gives way to improving the type of questions and data we should gather in order to better explore the observations.
