# Toronto City Venue Analysis

### This notebook will be used to explore, segment, and cluster the neighborhoods in the city of Toronto using data from the FourSquare API.

In [1]:
#importing necessary libraries

import numpy as np
import pandas as pd
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#plotting/visualization
import matplotlib as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

# import k-means from clustering
from sklearn.cluster import KMeans

%matplotlib inline 

## 1) Extract & Transform Data

Wikipidea has a list of Toronto neighborhoods! This is perfect. I will need to clean up the data but it will work for this project.

In [2]:
#scrapping toronto city data from wikipedia page
scrape = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
#the above function returns a list of dataframes, lets see how many and explore them
len(scrape)

3

In [3]:
scrape[0].head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [4]:
scrape[1].head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
0,,Canadian postal codes,,,,,,,,,,,,,,,,
1,NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...,NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...,NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...,,,,,,,,,,,,,,,
2,NL,NS,PE,NB,QC,QC,QC,ON,ON,ON,ON,ON,MB,SK,AB,BC,NU/NT,YT
3,A,B,C,E,G,H,J,K,L,M,N,P,R,S,T,V,X,Y


In [5]:
scrape[2].head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
0,NL,NS,PE,NB,QC,QC,QC,ON,ON,ON,ON,ON,MB,SK,AB,BC,NU/NT,YT
1,A,B,C,E,G,H,J,K,L,M,N,P,R,S,T,V,X,Y


In [6]:
#toronto city data is on the first element of scrape list
toronto_data= scrape[0]

Data extraction was successful. Now I need to clean up the data to get rid of empty values and deal duplicates in postal codes.

In [7]:
#getting rid of Boroughs without designation
toronto_data= toronto_data[toronto_data.Borough != 'Not assigned']
toronto_data.reset_index(drop= True, inplace= True)

#assign Borough name to Neighbourhoods without designation
neigh_notassigned= toronto_data.index[toronto_data['Neighbourhood']== 'Not assigned'].to_list()
for i in neigh_notassigned:
    toronto_data.iloc[i,2]= toronto_data.iloc[i,1]

#I need unique postal code values for further analysis, so I am
#combining Neighbourhoods with equal postal codes into a single Neighbourhood row
unique_postcodes= toronto_data['Postcode'].unique().shape[0] #using later for accuracy check
toronto_data= toronto_data.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()

#checking for accuracy: the new dataframe should be the same row size as number of unique postal codes before combination
print('Is dataframe accurate: {}'.format(int(toronto_data.shape[0]) == int(unique_postcodes)))

Is dataframe accurate: True


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [8]:
#displaying dataframe
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [9]:
print(toronto_data.shape)

(103, 3)


To get FourSquare data, I first need to acquire latitude & longitude coordinates of neighborhoods.

In [10]:
#using pgeocode library to get coordinates from postal codes

import pgeocode
nomi = pgeocode.Nominatim('ca') #specifiying country= 'ca' (Canada)
#testing library accuracy
test_postalcode= toronto_data.iloc[0,0]
nomi.query_postal_code('000 '+ test_postalcode)

postal_code                                       M1B
country code                                       CA
place_name        Scarborough (Malvern / Rouge River)
state_name                                    Ontario
state_code                                         ON
county_name                               Scarborough
county_code                                       NaN
community_name                                    NaN
community_code                                    NaN
latitude                                      43.8113
longitude                                     -79.193
accuracy                                            6
Name: 0, dtype: object

In [11]:
#since library is accurate I will now loop over all of the postal codes in toronto_data

postal_codes= toronto_data['Postcode'] #getting postcode data
postcode_lat= [] #initializing neighborhood latitude list
postcode_lng= [] #initializing neighborhood longitude list
for i in range(len(toronto_data['Postcode'])):
    geo = pgeocode.Nominatim('ca')
    location= geo.query_postal_code('000 '+ toronto_data.iloc[i,0])
    postcode_lat.append(location['latitude'])
    postcode_lng.append(location['longitude'])
    
#checking data format/accuracy

#since there are 103 postcodes, list length should be of 103
print('latitude list size: {}'.format(len(postcode_lat)))
print('longitude list size: {}'.format(len(postcode_lng)))

for i in range(103):
    if np.isnan(postcode_lat[i]) == True or np.isnan(postcode_lng[i])== True:
        print('lat = {} , long = {} at index = {}'.format(postcode_lat[i], postcode_lng[i], 
                                                        postcode_lat.index(postcode_lat[i])))
    else:
        continue

latitude list size: 103
longitude list size: 103
lat = nan , long = nan at index = 86


In [12]:
#There are nan values in latitude and longitude data, lets see what neighborhood is this:
toronto_data.iloc[86,:]

Postcode                                           M7R
Borough                                    Mississauga
Neighbourhood    Canada Post Gateway Processing Centre
Name: 86, dtype: object

In [13]:
#Looking up postcode M7R on google maps, it displays a small area only surrounding the canadian post office
#or as it is called above: Canada Post Gateway Processing Centre neighborhood.
#Since there is nothing else besides a post office in this area, we can get rid the data point without
#affecting future clustering models.

toronto_data.drop(labels= 86, inplace= True) #dropping M7R postcode row in toronto city dataframe
toronto_data.reset_index(drop= True, inplace= True) #reseting index
del postcode_lat[86] #dropping nan value
del postcode_lng[86] #dropping nan value

#lets check the shape of new dataframe and lat/lng lists to make sure they match
print('Dataframe size = {}'.format(toronto_data.shape[0]))
print('latitude list size = {}'.format(len(postcode_lat)))
print('longitude list size = {}'.format(len(postcode_lng)))

Dataframe size = 102
latitude list size = 102
longitude list size = 102


In [14]:
#Now that lat/lng coordinates are obtained, I will add them to the toronto dataframe
toronto_data['Latitude']= pd.DataFrame(postcode_lat)
toronto_data['Longitude']= pd.DataFrame(postcode_lng)

#Display final toronto city dataframe before clustering
toronto_data.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.8113,-79.193
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.7878,-79.1564
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866
3,M1G,Scarborough,Woburn,43.7712,-79.2144
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389
5,M1J,Scarborough,Scarborough Village,43.7464,-79.2323
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.7298,-79.2639
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.7122,-79.2843
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.7247,-79.2312
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.6952,-79.2646


I will now use the Fourquare API to extract local venue data for each postal code to use it to explore and cluster neighborhoods.

In [15]:
#defining FourSquare API credentials
CLIENT_ID = 'HEWRUDEHSTA3SWHFOAIVYQ3SVTFACUZOYVEM4X04PVB30HD0'
CLIENT_SECRET = 'AZWGDHVDUIOHXKQIXHTXQXLK4IJS5NKODYY1UXVYFCLNAQ2Q' 
VERSION = '20190721' # Foursquare API version or date of data requested. Their database is updated frequently.

#defining variables needed for API get request
radius= 1500 #distance (meters) around postcode coordinates where venues will be searched
limit= 1000   #number of venues returned by foursquare
postal_codes= toronto_data['Postcode'] #postcodes to loop over
number_ofvenues= [] #storing number of venues in each neighborhood to add to toronto_data later

#New dataframe 'toronto_venues' to display venues of toronto and their data
toronto_venues= pd.DataFrame({'Venue':[] , 'Category':[] , 'Latitude':[], 'Longitude':[],'Neighbourhood':[]})

#looping over each neighborhood postcode to get venue data
for i in range(len(postal_codes)):
    lat= toronto_data.loc[i,'Latitude'] #neighborhood latitude
    long= toronto_data.loc[i,'Longitude'] #neighborhood longitude
    neigh_name= toronto_data.loc[i,'Neighbourhood'] #neighborhood name
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    long, 
    radius, 
    limit) #get request url
    
    results = requests.get(url).json() #saving API get request json output
    venues = results['response']['groups'][0]['items'] #getting list of venues from json
    nearby_venues = json_normalize(venues) #turning json list onto a dataframe for easier processing
    filtered_columns = ['venue.name', 'venue.categories',
                        'venue.location.lat', 'venue.location.lng'] 
    nearby_venues =nearby_venues.loc[:, filtered_columns] #keeping only important columns of dataframe
    
    #The venue.catefories column has dictionaries as elements. Which means I must extract the venue category name
    #from each element in this column.
    ven_list= [] #list to store venue category names
    num_venues= len(nearby_venues['venue.categories']) #number of venues in dataframe
    number_ofvenues.append(num_venues) #adding number of venues to previously created list
    
    for i in range(num_venues):
        category= nearby_venues.iloc[i,1][0] #extracts and saves venue dictionary
        category_name= category['name'] #makes call to dictionary to extrac category name
        ven_list.append(category_name) #add category name to list

    nearby_venues['venue.categories'] = ven_list  #replace old venue.categories column with new category dataframe
    #make the venue dataframe more readable by changing column names
    nearby_venues.rename(columns= {'venue.name':'Venue', 'venue.categories':'Category',
                        'venue.location.lat':'Latitude', 'venue.location.lng':'Longitude'}, inplace= True)
    #adding a Neighbourhood column with each respective venue neighborhood name to resemble 'toronto_venues' dataframe
    nearby_venues['Neighbourhood']= neigh_name
    #append each new venue dataframe to the overall toronto_venues dataframe
    toronto_venues= toronto_venues.append(nearby_venues,ignore_index= True)
    
#add number of venues of each neighborhood to toronto_data
toronto_data['Venue Count']= number_ofvenues
    

In [16]:
#lets check if number of venues is the same as venue dataframe size
toronto_venues.shape[0]== sum(number_ofvenues)

True

In [17]:
#display final venues dataframe
print('Total number of venues: {}'.format(sum(number_ofvenues)))
toronto_venues.head()

Total number of venues: 6779


Unnamed: 0,Venue,Category,Latitude,Longitude,Neighbourhood
0,African Rainforest Pavilion,Zoo Exhibit,43.817725,-79.183433,"Rouge, Malvern"
1,Canadiana exhibit,Zoo Exhibit,43.817962,-79.193374,"Rouge, Malvern"
2,penguin exhibit,Zoo Exhibit,43.819435,-79.185959,"Rouge, Malvern"
3,Toronto Zoo,Zoo,43.820582,-79.181551,"Rouge, Malvern"
4,Lion Exhibit,Zoo Exhibit,43.819228,-79.186977,"Rouge, Malvern"


## Data visualization & Insights building

#### Visiualizing Toronto neighborhoods

In [18]:
# create map of Toronto using latitude and longitude values
toron_lat= 43.59 # city latitude
toron_lgn= -79.3832 # city longitude
map_toronto = folium.Map(location=[toron_lat, toron_lgn], zoom_start=10) #create map using folium library

# add markers to map
for lat, lng, borough, neighbourhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#dbd50f',
        fill_opacity=0.6,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Exploring Neighborhoods & Venues

In [19]:
# finding unique categories
print('There are {} uniques categories'.format(len(toronto_venues['Category'].unique())))

There are 341 uniques categories


In [20]:
# displaying number of venues per neighborhood
pd.set_option('display.max_rows', 110) # helps display full dataframe
toronto_data[['Neighbourhood','Venue Count']].sort_values('Venue Count', ascending= False).reset_index(drop=True) 

Unnamed: 0,Neighbourhood,Venue Count
0,"Cabbagetown, St. James Town",100
1,Queen's Park,100
2,"The Beaches West, India Bazaar",100
3,Studio District,100
4,Davisville North,100
5,North Toronto West,100
6,Davisville,100
7,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",100
8,Church and Wellesley,100
9,"Harbourfront, Regent Park",100


In [21]:
no_hundred= 102- 34 # number of neighborhoods with less than 100 venues
print('%.0f%% of neighborhoods have less than 100 venues'%(no_hundred/102*100))
print('Smaller neighborhoods seem to be more common')

67% of neighborhoods have less than 100 venues
Smaller neighborhoods seem to be more common


In [22]:
# one hot encoding data will make it easier to analyze and cluster
venues_onehot = pd.get_dummies(toronto_venues[['Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
venues_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])
venues_onehot = venues_onehot[fixed_columns]
venues_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Arcade,...,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
4,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [23]:
venues_frequency = venues_onehot.groupby('Neighbourhood').mean().reset_index()
venues_frequency.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Arcade,...,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0


In [24]:
# lets explore top 5 venues of each neighborhood
for neighbor in venues_frequency['Neighbourhood']:
    print(neighbor)
    top_five = venues_frequency[venues_frequency['Neighbourhood'] == neighbor].T.reset_index()
    top_five.columns = ['Venue','Frequency']
    top_five = top_five.iloc[1:]
    top_five['Frequency'] = top_five['Frequency'].astype(float)
    top_five = top_five.round({'Frequency': 2})
    print(top_five.sort_values('Frequency', ascending=False).reset_index(drop=True).head(5))
    print('\n')

Adelaide, King, Richmond
                           Venue  Frequency
0                          Hotel       0.07
1                    Coffee Shop       0.06
2                           Café       0.05
3                        Theater       0.04
4  Vegetarian / Vegan Restaurant       0.03


Agincourt
                   Venue  Frequency
0     Chinese Restaurant       0.18
1          Shopping Mall       0.05
2   Cantonese Restaurant       0.04
3  Vietnamese Restaurant       0.04
4                 Bakery       0.04


Agincourt North, L'Amoreaux East, Milliken, Steeles East
                Venue  Frequency
0  Chinese Restaurant       0.22
1         Coffee Shop       0.07
2              Bakery       0.06
3         Pizza Place       0.06
4   Korean Restaurant       0.06


Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown
                  Venue  Frequency
0           Pizza Place       0.18
1           Coffee Shop       0.18
2        

                   Venue  Frequency
0  Vietnamese Restaurant       0.12
1            Pizza Place       0.08
2               Pharmacy       0.08
3         Sandwich Place       0.04
4              Pet Store       0.04


Downsview Northwest
                  Venue  Frequency
0               Theater       0.08
1              Pharmacy       0.08
2  Fast Food Restaurant       0.08
3         Grocery Store       0.08
4           Pizza Place       0.08


Downsview West
            Venue  Frequency
0     Coffee Shop       0.12
1  Discount Store       0.08
2     Pizza Place       0.08
3    Intersection       0.04
4     Supermarket       0.04


Downsview, North Park, Upwood Park
                   Venue  Frequency
0            Coffee Shop       0.13
1  Vietnamese Restaurant       0.10
2     Chinese Restaurant       0.05
3                 Bakery       0.05
4            Supermarket       0.05


East Birchmount Park, Ionview, Kennedy Park
                  Venue  Frequency
0           Coffee Shop    

                Venue  Frequency
0         Coffee Shop       0.11
1  Italian Restaurant       0.05
2           Gastropub       0.04
3         Men's Store       0.03
4      Ice Cream Shop       0.03


Rosedale
           Venue  Frequency
0    Coffee Shop       0.08
1           Park       0.07
2           Bank       0.04
3            Pub       0.04
4  Grocery Store       0.03


Roselawn
              Venue  Frequency
0  Sushi Restaurant       0.08
1       Coffee Shop       0.08
2          Pharmacy       0.06
3      Skating Rink       0.06
4              Bank       0.06


Rouge, Malvern
                  Venue  Frequency
0           Zoo Exhibit       0.50
1  Fast Food Restaurant       0.09
2                   Zoo       0.06
3  Other Great Outdoors       0.06
4           Coffee Shop       0.03


Runnymede, Swansea
                Venue  Frequency
0                Café       0.08
1              Bakery       0.08
2         Coffee Shop       0.08
3                Park       0.08
4  Italian Re

Coffe Shops are definitely popular! Almost every neighborhood has them in their top five.

Because some neighborhoods have venues that are equally frequent, lets expand the search into top ten. However, this time lets create an actual dataframe to store and use this information for further analysis.

In [25]:
# creating columns
columns = ['Neighbourhood']
for i in np.arange(10):
    columns.append(i+1)

# creating dataframe
neighborhoods_top_ten = pd.DataFrame(columns=columns)
neighborhoods_top_ten['Neighbourhood'] = venues_frequency['Neighbourhood']

# function to get top 10
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_sorted = row_categories.sort_values(ascending=False)
    return row_sorted.index.values[0:num_top_venues]

# looping over each neighborhood using the function above and appending the values
for neighbor in np.arange(venues_frequency.shape[0]):
    neighborhoods_top_ten.iloc[neighbor, 1:] = return_most_common_venues(venues_frequency.iloc[neighbor, :], 10)

neighborhoods_top_ten.head()

Unnamed: 0,Neighbourhood,1,2,3,4,5,6,7,8,9,10
0,"Adelaide, King, Richmond",Hotel,Coffee Shop,Café,Theater,Pizza Place,Vegetarian / Vegan Restaurant,Burrito Place,Restaurant,Thai Restaurant,Gastropub
1,Agincourt,Chinese Restaurant,Shopping Mall,Gym / Fitness Center,Vietnamese Restaurant,Asian Restaurant,Coffee Shop,Bakery,Supermarket,Cantonese Restaurant,Caribbean Restaurant
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Chinese Restaurant,Coffee Shop,Bakery,Korean Restaurant,Pizza Place,Pharmacy,Bubble Tea Shop,Noodle House,Dumpling Restaurant,Discount Store
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Coffee Shop,Pizza Place,Grocery Store,Fast Food Restaurant,Bus Line,Beer Store,Fried Chicken Joint,Pharmacy,Flea Market,Café
4,"Alderwood, Long Branch",Coffee Shop,Convenience Store,Park,Pharmacy,Pizza Place,Café,Grocery Store,Bar,Discount Store,Greek Restaurant


Overall I have noticed most neighborhoods have similar traits like coffee shops, parks and varios types of restaurants. However, I also noticed some neighborhoods with frequent hotel venues, theaters and even zoo exibits. These could be potential tourist spots.

## Data Analysis

I will be using a clustering machine learning model to try to pin point tourist spots, which I believe to be those containing large numbers of hotels and entertainment (theaters, zoos, ...etc).

#### Model Development

In [37]:
# set number of clusters
k = 4
toronto_cluster = venues_frequency.drop('Neighbourhood', 1)

# fit data to model
kmeans = KMeans(n_clusters=k, random_state=5).fit(toronto_cluster)

# create new dataframe with cluster labels
toronto_data_clustered = toronto_data
toronto_data_clustered['Cluster Labels'] = kmeans.labels_
toronto_data_clustered = toronto_data_clustered.join(neighborhoods_top_ten.set_index('Neighbourhood'), on='Neighbourhood')
toronto_data_clustered.head()


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Venue Count,Cluster Labels,1,2,3,4,5,6,7,8,9,10
0,M1B,Scarborough,"Rouge, Malvern",43.8113,-79.193,34,1,Zoo Exhibit,Fast Food Restaurant,Other Great Outdoors,Zoo,Fruit & Vegetable Store,Caribbean Restaurant,Chinese Restaurant,Paper / Office Supplies Store,Café,Business Service
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.7878,-79.1564,24,0,Coffee Shop,Breakfast Spot,Pharmacy,Playground,Mobile Phone Shop,Mexican Restaurant,Pizza Place,Food & Drink Shop,Fried Chicken Joint,Neighborhood
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866,31,0,Pizza Place,Coffee Shop,Fast Food Restaurant,Breakfast Spot,Burger Joint,Bar,Liquor Store,Beer Store,Supermarket,Fried Chicken Joint
3,M1G,Scarborough,Woburn,43.7712,-79.2144,23,2,Pharmacy,Pizza Place,Coffee Shop,Indian Restaurant,Sandwich Place,Fast Food Restaurant,Burger Joint,Music Store,Supermarket,Bank
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389,62,0,Coffee Shop,Fast Food Restaurant,Indian Restaurant,Grocery Store,Bakery,Pharmacy,Bank,Gym,Chinese Restaurant,Diner


#### Model Visualization

In [38]:
# creating map
map_toronto_clustered = folium.Map(location=[toron_lat, toron_lgn], zoom_start=10)

# setting color scheme for the clusters
colors_array = cm.rainbow(np.linspace(0, 1, k))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# adding markers
markers_colors = []
for lat, lon, neigh, cluster in zip(toronto_data_clustered['Latitude'], toronto_data_clustered['Longitude'], toronto_data_clustered['Neighbourhood'], toronto_data_clustered['Cluster Labels']):
    label = folium.Popup(str(neigh) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_toronto_clustered)
       
map_toronto_clustered

#### Cluster 0

In [39]:
toronto_data_clustered.loc[toronto_data_clustered['Cluster Labels'] == 0, toronto_data_clustered.columns[[2] + list(range(5, toronto_data_clustered.shape[1]))]]

Unnamed: 0,Neighbourhood,Venue Count,Cluster Labels,1,2,3,4,5,6,7,8,9,10
1,"Highland Creek, Rouge Hill, Port Union",24,0,Coffee Shop,Breakfast Spot,Pharmacy,Playground,Mobile Phone Shop,Mexican Restaurant,Pizza Place,Food & Drink Shop,Fried Chicken Joint,Neighborhood
2,"Guildwood, Morningside, West Hill",31,0,Pizza Place,Coffee Shop,Fast Food Restaurant,Breakfast Spot,Burger Joint,Bar,Liquor Store,Beer Store,Supermarket,Fried Chicken Joint
4,Cedarbrae,62,0,Coffee Shop,Fast Food Restaurant,Indian Restaurant,Grocery Store,Bakery,Pharmacy,Bank,Gym,Chinese Restaurant,Diner
5,Scarborough Village,43,0,Fast Food Restaurant,Coffee Shop,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Big Box Store,Liquor Store,Theater,Bank
6,"East Birchmount Park, Ionview, Kennedy Park",38,0,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Pharmacy,Sporting Goods Shop,Sandwich Place,Discount Store,Grocery Store,Bank,Bus Line
10,"Dorset Park, Scarborough Town Centre, Wexford ...",71,0,Coffee Shop,Fast Food Restaurant,Indian Restaurant,Pet Store,Grocery Store,Pizza Place,Chinese Restaurant,Supermarket,Furniture / Home Store,Light Rail Station
25,Parkwoods,31,0,Bank,Pharmacy,Supermarket,Coffee Shop,Fast Food Restaurant,Café,Liquor Store,Beer Store,Mobile Phone Shop,Fish & Chips Shop
38,Leaside,71,0,Coffee Shop,Indian Restaurant,Bakery,Restaurant,Grocery Store,Supermarket,Park,Sandwich Place,Burger Joint,Electronics Store
41,"The Danforth West, Riverdale",89,0,Greek Restaurant,Café,Pizza Place,Park,Coffee Shop,Pub,Italian Restaurant,Yoga Studio,Trail,Burger Joint
42,"The Beaches West, India Bazaar",100,0,Coffee Shop,Park,Indian Restaurant,Pub,Café,Brewery,Beach,Bakery,BBQ Joint,Japanese Restaurant


#### Cluster 1

In [40]:
toronto_data_clustered.loc[toronto_data_clustered['Cluster Labels'] == 1, toronto_data_clustered.columns[[2] + list(range(5, toronto_data_clustered.shape[1]))]]

Unnamed: 0,Neighbourhood,Venue Count,Cluster Labels,1,2,3,4,5,6,7,8,9,10
0,"Rouge, Malvern",34,1,Zoo Exhibit,Fast Food Restaurant,Other Great Outdoors,Zoo,Fruit & Vegetable Store,Caribbean Restaurant,Chinese Restaurant,Paper / Office Supplies Store,Café,Business Service
7,"Clairlea, Golden Mile, Oakridge",44,1,Coffee Shop,Pizza Place,Fast Food Restaurant,Sandwich Place,Grocery Store,Burger Joint,Dog Run,Greek Restaurant,Pub,Bakery
8,"Cliffcrest, Cliffside, Scarborough Village West",14,1,Fast Food Restaurant,Beach,Furniture / Home Store,Bistro,Coffee Shop,Pharmacy,Park,Sandwich Place,Bank,Pizza Place
9,"Birch Cliff, Cliffside West",29,1,Park,Pizza Place,Coffee Shop,Asian Restaurant,Fast Food Restaurant,Bakery,Bank,Thai Restaurant,Bar,General Entertainment
11,"Maryvale, Wexford",75,1,Coffee Shop,Middle Eastern Restaurant,Restaurant,Pizza Place,Pharmacy,Grocery Store,Discount Store,Intersection,Pool Hall,Asian Restaurant
12,Agincourt,56,1,Chinese Restaurant,Shopping Mall,Gym / Fitness Center,Vietnamese Restaurant,Asian Restaurant,Coffee Shop,Bakery,Supermarket,Cantonese Restaurant,Caribbean Restaurant
13,"Clarks Corners, Sullivan, Tam O'Shanter",46,1,Fast Food Restaurant,Park,Korean Restaurant,Falafel Restaurant,Bank,Pharmacy,Sandwich Place,Cantonese Restaurant,Vietnamese Restaurant,Coffee Shop
14,"Agincourt North, L'Amoreaux East, Milliken, St...",54,1,Chinese Restaurant,Coffee Shop,Bakery,Korean Restaurant,Pizza Place,Pharmacy,Bubble Tea Shop,Noodle House,Dumpling Restaurant,Discount Store
15,L'Amoreaux West,51,1,Chinese Restaurant,Coffee Shop,Sandwich Place,Fast Food Restaurant,Pool,Athletics & Sports,Pizza Place,Park,Tennis Court,Bakery
16,Upper Rouge,5,1,Playground,Sculpture Garden,Movie Theater,Trail,Zoo Exhibit,Event Service,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store


#### Cluster 2

In [41]:
toronto_data_clustered.loc[toronto_data_clustered['Cluster Labels'] == 2, toronto_data_clustered.columns[[2] + list(range(5, toronto_data_clustered.shape[1]))]]

Unnamed: 0,Neighbourhood,Venue Count,Cluster Labels,1,2,3,4,5,6,7,8,9,10
3,Woburn,23,2,Pharmacy,Pizza Place,Coffee Shop,Indian Restaurant,Sandwich Place,Fast Food Restaurant,Burger Joint,Music Store,Supermarket,Bank
17,Hillcrest Village,46,2,Coffee Shop,Chinese Restaurant,Bakery,Bank,Sandwich Place,Supermarket,Grocery Store,Sushi Restaurant,Pharmacy,Pizza Place
22,Willowdale South,100,2,Korean Restaurant,Coffee Shop,Bubble Tea Shop,Japanese Restaurant,Pizza Place,Grocery Store,Café,Ramen Restaurant,Bank,Pharmacy
23,York Mills West,42,2,Coffee Shop,Park,Bank,Sandwich Place,Japanese Restaurant,Burger Joint,Thai Restaurant,Restaurant,Dog Run,Optical Shop
24,Willowdale West,27,2,Park,Pizza Place,Coffee Shop,Pharmacy,Discount Store,Bus Line,Skating Rink,Bookstore,Shopping Mall,Eastern European Restaurant
30,"CFB Toronto, Downsview East",39,2,Athletics & Sports,Spa,Park,Gym / Fitness Center,Turkish Restaurant,Metro Station,Sports Bar,Latin American Restaurant,Beer Store,Steakhouse
33,Downsview Northwest,25,2,Pizza Place,Pharmacy,Fast Food Restaurant,Grocery Store,Coffee Shop,Theater,Hotel,Tea Room,Falafel Restaurant,Chinese Restaurant
35,"Woodbine Gardens, Parkview Hill",39,2,Pharmacy,Fast Food Restaurant,Pizza Place,Brewery,Gym / Fitness Center,Park,Intersection,Coffee Shop,Sandwich Place,Grocery Store
36,Woodbine Heights,86,2,Coffee Shop,Pizza Place,Park,Sandwich Place,Thai Restaurant,Ice Cream Shop,Café,Pharmacy,Bar,Sushi Restaurant
37,The Beaches,100,2,Coffee Shop,Pub,Breakfast Spot,Grocery Store,Japanese Restaurant,BBQ Joint,Pharmacy,Beach,Sandwich Place,Bar


#### Cluster 3

In [42]:
toronto_data_clustered.loc[toronto_data_clustered['Cluster Labels'] == 3, toronto_data_clustered.columns[[2] + list(range(5, toronto_data_clustered.shape[1]))]]

Unnamed: 0,Neighbourhood,Venue Count,Cluster Labels,1,2,3,4,5,6,7,8,9,10
92,Islington Avenue,29,3,Grocery Store,Pharmacy,Park,Garden,Bakery,Bank,Gourmet Shop,Laundry Service,Supermarket,Liquor Store


## Conclusion

I was able to identify tourist friendly neighborhoods using kmeans clustering. 
Here are my conclusions:

1) It seems that I was correct, the heavy tourist neighborhoods are in cluster 1. This cluster contains frequent hotels, bars, coffe shops to keep tourists going, and entertainment such zoos, theaters and even sculpure gardens.

2) Cluster 3 in the other hand is small but to the point. This cluster is clearly for small town neighborhoods! Nothing crazy here, just the basics for survival and some nature (park & garden) to relax. 

3) Cluster 2 is a step above cluster 3, these are residential neighborhoods where your local toronto citizens might reside. They have many banks, grocery stores, pharmacies, bakeries, restaurants, coffee shops, etc.

4) Cluster 0 is interesting because it can be described as containing semi-tourist neighborhoods. This means they have the aspects of a tourist place with few bars, hotels, and other entertainment, but also contain a few grocery stores and pharmacies. These could be perfect neighborhoods for airbnb travelers looking for a local vibe type of vacation!

### Thank you for your time!