# Introduction
This notebook describes and characterizes the neighborhoods of Toronto by clustering them. This is targeted toward tourists to decide where to lodge and businesses to decide where to build locations.

# Data
The data that we will use is Neighborhood and postal code data in addition to coordinate geography data to link the locations to neighborhoods.
Then, we can use the FourSquare API to find surrounding venues and filter to find restaurants.

### Neighborhoods and Postal Codes
Below, necessary packages were imported and the data for boroughs, postal codes, and neighborhoods of toronto was read from the wikipedia webpage.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M') [0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Then, columns with Borough as "Not assigned" were dropped.

In [3]:
df.drop(labels = [n for n in range(df.shape[0]) if df.loc[n,:].Borough == 'Not assigned'],axis = 0,inplace = True)
df.reset_index(inplace = True)
df.drop(labels = 'index',axis = 1,inplace = True)

In [4]:
print(df.shape)
df

(103, 3)


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


This gives us a dataframe including all specified neighborhoods.

### Neighborhoods and Locations

Below, the coordinates for each of the above boroughs/neighborhoods were imported and merged with the previous dataframe to create the one below.

In [5]:
coords = pd.read_csv('Geospatial_Coordinates.csv')

In [6]:
dfc = df.merge(coords,how = 'outer')
dfc

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


More packages were imported and an initial visualization of the locations was created using folium.

In [7]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import json
import requests

In [8]:
# create map of area using the average latitude and longitude values
latitude = dfc["Latitude"].mean()
longitude = dfc["Longitude"].mean()
map_area = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(dfc['Latitude'], dfc['Longitude'], dfc['Borough'], dfc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_area)  
    
map_area

We can see all the locations from this map.

### FourSquare Venue Data

Connection with API is then established to gain data on venues in a 1000 m radius of each of these locations.

In [12]:
#Credentials redacted for privacy purposes

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
toronto_venues = getNearbyVenues(names=dfc['Neighborhood'],
                                   latitudes=dfc['Latitude'],
                                   longitudes=dfc['Longitude'],
                                   radius = 1000)

The table for venues was filtered for restaurants. The types of the restaurant was put into a new column and venues that just had a category of "Restaurant" with no specific cuisine were dropped.

In [15]:
print(toronto_venues.shape)
toronto_rest = toronto_venues[toronto_venues["Venue Category"].str.contains("Restaurant")].reset_index().drop("index",axis = 1)
toronto_rest["Venue Category"] = toronto_rest.apply(lambda x: x["Venue Category"].replace("Restaurant", "").strip(), axis=1)
toronto_rest.rename(columns = {"Venue Category":"Restaurant Type"},inplace = True)
toronto_rest = toronto_rest[toronto_rest["Restaurant Type"] != ""].reset_index().drop("index",axis = 1)
print(toronto_rest.shape)
toronto_rest.head()

(4902, 7)
(1040, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Restaurant Type
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean
1,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food
2,Parkwoods,43.753259,-79.329656,Spicy Chicken House,43.760639,-79.325671,Chinese
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese
4,"Regent Park, Harbourfront",43.65426,-79.360636,Souk Tabule,43.653756,-79.35439,Mediterranean


### K-means and one-hot encoding
Below, dummy variables are created to convert the categorical restaurant type to many quantitive binary variables. For example, Allwyn's Bakery above is converted to a 1 in the Caribbean column and a 0 for all other binary variables.

In [16]:
toronto_onehot = pd.get_dummies(toronto_rest[['Restaurant Type']], prefix="", prefix_sep="")

toronto_onehot.insert(0,'Neighborhood',toronto_rest['Neighborhood'])

toronto_onehot.insert(1,'Latitude',toronto_rest['Neighborhood Latitude'])

toronto_onehot.insert(2,'Longitude',toronto_rest['Neighborhood Longitude'])

print(toronto_onehot.shape)
toronto_onehot.head()

(1040, 62)


Unnamed: 0,Neighborhood,Latitude,Longitude,Afghan,African,American,Asian,Belgian,Brazilian,Cajun / Creole,...,Sushi,Syrian,Taiwanese,Tapas,Thai,Theme,Tibetan,Turkish,Vegetarian / Vegan,Vietnamese
0,Parkwoods,43.753259,-79.329656,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,43.753259,-79.329656,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,43.753259,-79.329656,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,43.725882,-79.315572,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",43.65426,-79.360636,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.shape

(90, 62)

Then, k-means clustering is run on the grouped data to create a cluster map with 4 clusters.

In [18]:
def kmeans_plot(kclusters):

    toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

    kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

    toronto_grouped.insert(3, 'Cluster Labels', kmeans.labels_)
# create map
    map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(toronto_grouped['Latitude'], toronto_grouped['Longitude'], toronto_grouped['Neighborhood'], toronto_grouped['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)
       
    return map_clusters

# Results/Discussion 
### Cluster Plotting
Below, the k-means is run and the clustered map is plotted.

In [19]:
knum = 4
kmeans_plot(knum)

We can see the clusters visually on the map here.

### Dictionary output and characterization of clusters
Using the cluster label data, the restaurant types for each location can be extracted in dictionary format using the function below:

In [20]:
def rest_type(clus_num):
    dat = toronto_grouped[toronto_grouped["Cluster Labels"] == clus_num]
    cuis_dict = {}
    for n in range(dat.shape [0]):
        cuis = list(dat.iloc[n,4:][dat.iloc[n,4:] != 0].index)
        cuis_dict.update({dat.iloc[n,0]: cuis})
    return cuis_dict

In [21]:
rest_type(0)

{'Agincourt': ['Cantonese',
  'Caribbean',
  'Chinese',
  'Hong Kong',
  'Indian',
  'Japanese',
  'Latin American',
  'Malay',
  'Mediterranean',
  'Seafood',
  'Sri Lankan',
  'Sushi'],
 'Bayview Village': ['Chinese', 'Japanese'],
 'Canada Post Gateway Processing Centre': ['Asian',
  'Caribbean',
  'Chinese',
  'Falafel',
  'Indian',
  'Japanese',
  'Mexican',
  'Middle Eastern',
  'Portuguese',
  'Sushi'],
 'Cedarbrae': ['Caribbean', 'Chinese', 'Fast Food', 'Hakka', 'Indian', 'Thai'],
 'Dorset Park, Wexford Heights, Scarborough Town Centre': ['Asian',
  'Chinese',
  'Fast Food',
  'Indian',
  'Italian',
  'Vietnamese'],
 'Hillcrest Village': ['Chinese', 'Korean'],
 'Kennedy Park, Ionview, East Birchmount Park': ['Asian',
  'Chinese',
  'Fast Food'],
 'Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens': ['American',
  'Chinese'],
 "Milliken, Agincourt North, Steeles East, L'Amoreaux East": ['Caribbean',
  'Chinese',
  'Korean',
  'Malay',
  'Vegetarian / Vegan']

This first cluster appears to describe locations with many east and south asian cuisines.

In [22]:
rest_type(1)

{'Humber Summit': ['Italian'],
 "Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South East": ['Eastern European',
  'Italian'],
 'Rouge Hill, Port Union, Highland Creek': ['Italian']}

This second cluster appears to describe locations with surrounding Italian and eastern european restaurant types.

In [23]:
rest_type(2)

{'Alderwood, Long Branch': ['Moroccan'],
 'Bathurst Manor, Wilson Heights, Downsview North': ['Mediterranean',
  'Middle Eastern',
  'Sushi'],
 'Bedford Park, Lawrence Manor East': ['American',
  'Comfort Food',
  'Fast Food',
  'Greek',
  'Indian',
  'Italian',
  'Sushi',
  'Thai'],
 'Berczy Park': ['American',
  'Comfort Food',
  'French',
  'Greek',
  'Italian',
  'Japanese',
  'Middle Eastern',
  'Seafood',
  'Thai',
  'Vegetarian / Vegan'],
 'Birch Cliff, Cliffside West': ['Thai'],
 'Brockton, Parkdale Village, Exhibition Place': ['American',
  'Caribbean',
  'Comfort Food',
  'Ethiopian',
  'French',
  'Hawaiian',
  'Indian',
  'Italian',
  'Japanese',
  'Mexican',
  'New American',
  'Seafood',
  'Tapas',
  'Tibetan',
  'Vegetarian / Vegan'],
 'Business reply mail Processing Centre, South Central Letter Processing Plant Toronto': ['American',
  'Fast Food',
  'French',
  'Italian',
  'Sushi',
  'Thai'],
 'Central Bay Street': ['American',
  'Falafel',
  'Fast Food',
  'Italian',

This third cluster describes locations that have a somewhat even split of restaurant types. These locations have a little of all types, giving a variety of types.

In [24]:
rest_type(3)

{'Caledonia-Fairbanks': ['Falafel',
  'Fast Food',
  'Japanese',
  'Mexican',
  'Portuguese'],
 'Del Ray, Mount Dennis, Keelsdale and Silverthorn': ['Fast Food', 'Italian'],
 'Glencairn': ['Asian',
  'Fast Food',
  'Italian',
  'Japanese',
  'Latin American',
  'Mediterranean'],
 'Golden Mile, Clairlea, Oakridge': ['Fast Food', 'Mexican'],
 'Guildwood, Morningside, West Hill': ['Fast Food', 'Greek'],
 'Lawrence Manor, Lawrence Heights': ['Fast Food',
  'Greek',
  'Korean',
  'Seafood',
  'Sushi',
  'Vietnamese'],
 'Malvern, Rouge': ['Caribbean', 'Chinese', 'Fast Food'],
 'Parkview Hill, Woodbine Gardens': ['Fast Food'],
 'Scarborough Village': ['Fast Food', 'Japanese'],
 'South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens': ['Caribbean',
  'Fast Food'],
 'West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale': ['Mexican']}

This fourth cluster depicts locations that have many surrounding fast food restaurants in addition to miscellaneous cuisines. Fast food restaurants appear to be the main characteristic of this cluster, however.

# Conclusion

This project aimed to cluster the neighborhoods of the Toronto area to inform prospective tourists on where to lodge. This also aimed to inform businesses of their competitor's locations so they can choose their location. By extracting the data, filtering, and converting to restaurant type, we were able to encode, group, and run k-means analysis to create a clustered map and dictionary. Ultimately, we characterized each cluster to enable tourists and businesses to make their respective choices.