# Segmenting and Clustering Neighborhoods in Toronto

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1.- Start by creating a new Notebook for this assignment.<br>
2.- Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas  dataframe like the one shown below:

<img src="https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1608681600000&hmac=Yi6TNvsRkrWSUYG6x8x5fMtY-LDY9JkZhYoMC-e9Uyk" width="600" align="left">

3.- To create the above dataframe:

 - The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood.<br>
 <br>
 - Only process the cells that have an assigned borough. Ignore cells with a borough that is __Not assigned__.<br>
 <br>
 - More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that __M5A__ is listed twice and has two neighborhoods: __Harbourfront__ and __Regent Park__. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in __row 11__  in the above table.<br>
 <br>
 - If a cell has a borough but a __Not assigned__ neighborhood, then the neighborhood will be the same as the borough.<br>
 <br>
 - Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.<br>
 <br>
 - In the last cell of your notebook, use the __.shape__ method to print the number of rows of your dataframe.

4.- Submit a link to your Notebook on your Github repository. __(10 marks)__

__Note:__ There are different website scraping libraries and packages in Python. For scraping the above table, you can simply use pandas  to read the table into a pandas dataframe.

Another way, which would help to learn for more complicated cases of web scraping is using the BeautifulSoup package. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe.

__1.1.__ Import the necessary libraries:

In [1]:
import pandas as pd
import numpy as np

__1.2.__ Define table source address and read the table into a pandas data frame:

In [2]:
#Define table source address as 'url':
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
#Read table into a data frame:
TPC_df = pd.read_html(url)[0]

__1.3.__ Drop all rows where the 'Borough' column has a 'Not assigned' value and replace all existing 'Not assigned' values in the 'Neighbourhood' column with their corresponding values in the 'Borough' column:

In [3]:
#Drop all rows where 'Borough' shows a 'Not assigned' value:
TPC_df = TPC_df[TPC_df['Borough'] != 'Not assigned'].reset_index(drop=True)
#Replace 'Not assigned' values in 'Neighbourhood' column with existing values in 'Borough' column:
TPC_df.Neighbourhood.replace('Not assigned', TPC_df.Borough, inplace=True)

__1.4.__ Group the data frame by 'Postal Code', join unique values in 'Borough' and 'Neighborhood' and separate them with a comma ', ':

In [4]:
#Create a new data frame that is grouped by postal code and add a custom lambda function to join unique values in other columns:
#Please note that there where no duplicate Postal Codes in the original table, and thus, the resulting dataframe remained the same as the original:
TPC_grouped = TPC_df.groupby('Postal Code')['Borough'].apply(lambda x: ', '.join(np.unique(x))).reset_index()
TPC_grouped['Neighborhood'] = TPC_df.groupby('Postal Code',as_index=False)['Neighbourhood'].apply(lambda x: ', '.join(np.unique(x)))
TPC_df = TPC_grouped

__1.5.__ Show the first 5 rows of the resulting data frame:

In [5]:
#Show first 5 rows of the grouped data frame:
TPC_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


__1.6.__ Show the dimensions of the resulting data frame:

In [6]:
#Show the dimensions of the resulting data frame:
TPC_df.shape

(103, 3)

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. 

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking  postal code __M5G__ as an example, your code would look something like this:

<span style="color:blue">import</span> geocoder <span style="color:green"># import geocoder</span>

<span style="color:green"># initialize your variable to None</span><br>
lat_lng_coords = <span style="color:blue">None</span>

<span style="color:green"># loop until you get the coordinates</span><br>
<span style="color:blue">while</span>(lat_lng_coords <span style="color:blue">is None</span>):<br>
&emsp; g = geocoder.google(<span style="color:darkred">'{}, Toronto, Ontario'</span>.<span style="color:blue">format</span>(postal_code))<br>
&emsp; lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]<br>
longitude = lat_lng_coords[1]

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Use the Geocoder package or the csv file to create the following dataframe:

<img src="https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/HZ3jNHNOEeiMwApe4i-fLg_f44f0f10ccfaf42fcbdba9813364e173_Screen-Shot-2018-06-18-at-7.18.16-PM.png?expiry=1608681600000&hmac=qqIXUGsiKVGkh8SAjZccBgI1Wn4s5mMcoRl9djyz2Ao" width="700" align="left">

__Important Note__: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (__2 marks__)

__2.1.__ Get the .csv file containing the latitude and longitude coordinates and read it into a pandas dataframe:

In [7]:
latlng_df = pd.read_csv('http://cocl.us/Geospatial_data')
latlng_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


__2.2.__ Perform an outer join to add the Latitude and Longitude coordinates into the "TPC_df" data frame:

In [8]:
TPC_df = pd.merge(TPC_df, latlng_df,on='Postal Code',how='left')
TPC_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


__2.3.__ Show the dimensions of the updated "TPC_df" data frame:

In [9]:
TPC_df.shape

(103, 5)

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

1. to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
2. to generate maps to visualize your neighborhoods and how they cluster together. 
3. Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

__3.1.__ Import necessary libraries and packages:

In [10]:
!pip install geopy
!pip install folium
from geopy.geocoders import Nominatim
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium



__3.2.__ Specify the Foursquare credentials (the following hidden cell contains: "CLIENT_ID", "CLIENT_SECRET" and "VERSION"):

In [11]:
# @hidden_cell
CLIENT_ID = 'NSBYKNBKZT0MG11WD4A3JHKVGZPBLKTVTNYKAAIHZWVAPBUQ'
CLIENT_SECRET = 'QQNOTRUP03JL1YXCKGXWTSWETWWNOFMSNIDYNGIFDNHUHG0B'
VERSION = '20201223'

__3.3.__ Check how many postal codes, boroughs and neighborhoods there are in Toronto:

In [12]:
print("There are: ", TPC_df['Postal Code'].unique().shape[0], 'postal codes, ',
      TPC_df['Borough'].unique().shape[0], 'boroughs and ', 
      len(np.unique(np.concatenate((TPC_df['Neighborhood'].str.split(', ')),axis=0))), 
      'neighborhoods in Toronto')

There are:  103 postal codes,  10 boroughs and  208 neighborhoods in Toronto


__3.4.__ Find out which borough has the highest amount of postal codes and neighborhoods:

In [13]:
TPC_grouped1 = TPC_df.groupby('Borough')['Postal Code'].apply(lambda x: len(np.unique(x))).reset_index()
TPC_grouped2 = TPC_df.groupby('Borough')['Neighborhood'].apply(lambda x: len(np.unique(', '.join(np.unique(x)).split(', ')))).reset_index()
TPC_grouped = pd.DataFrame(columns=['Postal Code Count', 'Borough', 'Neighborhood Count'])
TPC_grouped[['Postal Code Count', 'Borough']] = TPC_grouped1[['Postal Code','Borough']]
TPC_grouped[['Neighborhood Count']] = TPC_grouped2[['Neighborhood']]
TPC_grouped
#Note: The Runnymede neighborhood is splitted between West Toronto and York. As a result, Runnymede is included twice in this table. Thus, 
#the total number of neighborhoods in this table is 209 instead of 208 as previously showed.

Unnamed: 0,Postal Code Count,Borough,Neighborhood Count
0,9,Central Toronto,17
1,19,Downtown Toronto,38
2,5,East Toronto,8
3,5,East York,7
4,12,Etobicoke,47
5,1,Mississauga,1
6,24,North York,32
7,17,Scarborough,38
8,6,West Toronto,13
9,5,York,8


Etobicoke has 47 neighborhoods, the most number of neighborhoods in any borough in Toronto. However, Etobicoke only has 12 postal codes. North York has 24 postal codes, but only 32 neighborhoods. On the other hand, Downtown Toronto has a good combination of a high number of postal codes and a high number of neighborhoods. For this reason, I will choose the Downtown Toronto borough in this analysis.

Since there can be many neighborhoods within a postal code, many postal codes within a neighborhood, and latitude and longitude coordinates are only available for postal codes, I will do my analysis based on postal codes instead of neighborhoods.

__3.5.__ Create a list with all of the postal codes in Downtown Toronto:

In [14]:
Downtown_Toronto_PCodes = TPC_df[TPC_df['Borough']=='Downtown Toronto'].reset_index(drop=True)
Downtown_Toronto_PCodes

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
5,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
6,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752


__3.6.__ Borrow function from previous lab to loop through all of the available postal codes in Downtown Toronto and obtain a maximum of 100 venues within a specified radius of 500m from the coordinates of each postal code:

In [15]:
def getNearbyVenues(PCodes, latitudes, longitudes, radius=500, limit=100):
    
    venues_list=[]
    for PCode, lat, lng in zip(PCodes, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            PCode, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Postal Code Latitude', 
                  'Postal Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
toronto_venues = getNearbyVenues(Downtown_Toronto_PCodes['Postal Code'], Downtown_Toronto_PCodes['Latitude'], Downtown_Toronto_PCodes['Longitude'], radius=500, limit=100)
print(toronto_venues.shape)
toronto_venues.head()

(1219, 7)


Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M4W,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
1,M4W,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
2,M4W,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
3,M4W,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
4,M4X,43.667967,-79.367675,Cranberries,43.667843,-79.369407,Diner


__3.7.__ Count the number of resulting venues in each Downtown Toronto postal code:

In [17]:
toronto_venues[['Postal Code','Venue']].groupby('Postal Code').count()

Unnamed: 0_level_0,Venue
Postal Code,Unnamed: 1_level_1
M4W,4
M4X,46
M4Y,78
M5A,47
M5B,100
M5C,79
M5E,57
M5G,59
M5H,96
M5J,100


__3.8.__ Drop any postal codes with less than 20 venues:

In [18]:
#Identify the postal codes that have less than 20 venues:
drop_Pcodes = []
Pcode_series = toronto_venues.groupby('Postal Code').apply(lambda x : len(x)>19)
for index, value in Pcode_series.iteritems():
    if value == False:
        drop_Pcodes.append(index)
        
#Drop postal codes with less than 20 venues from the "toronto_venues" data frame:
toronto_venues_adjusted = toronto_venues.groupby('Postal Code').filter(lambda x : len(x)>19)

#Drop postal codes with less than 20 venues from the "Downtown_Toronto_PCodes" data frame and name it "toronto_merged":
toronto_merged = Downtown_Toronto_PCodes[~Downtown_Toronto_PCodes['Postal Code'].isin(drop_Pcodes)].reset_index(drop=True)

#Show codes to be droped:
drop_Pcodes

['M4W', 'M5V', 'M6G']

__3.9.__ Find the number of unique venue categories:

In [19]:
print(len(toronto_venues_adjusted['Venue Category'].unique()), "unique venue categories!")

195 unique venue categories!


__3.10.__ Perform one hot encoing on venue categories:

In [20]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues_adjusted[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Postal Code'] = toronto_venues_adjusted['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(1183, 196)


Unnamed: 0,Postal Code,Adult Boutique,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
4,M4X,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,M4X,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,M4X,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,M4X,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,M4X,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


__3.11.__ Group all postal codes and calculate the mean for each category:

In [21]:
toronto_grouped = toronto_onehot.groupby('Postal Code').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(16, 196)


Unnamed: 0,Postal Code,Adult Boutique,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,M4X,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4Y,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.012821,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.025641
2,M5A,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,...,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277
3,M5B,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0
4,M5C,0.0,0.037975,0.0,0.0,0.012658,0.0,0.0,0.012658,0.012658,...,0.012658,0.012658,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.0


__3.12.__ Perform K-means clustering on the Downtonwn Toronto postal codes:

In [22]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([4, 4, 4, 4, 4, 4, 3, 1, 1, 1, 1, 0, 2, 4, 1, 3], dtype=int32)

__3.13.__ Obtain the coordinates of Toronto, Canada in order to start the map at the right position:

In [23]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


__3.14.__ Visualize the resulting clusters in the map:

In [24]:
# append the cluster labels to the "toronto_merged" dataframe:
toronto_merged['Cluster Labels'] = kmeans.labels_

In [25]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, neigh in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Postal Code'], toronto_merged['Cluster Labels'], toronto_merged['Neighborhood']):
    label = folium.Popup('Postal Code: ' + str(poi) + ' Neighborhood: ' + str(neigh) +' Cluster: ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

__3.15.__ Analyze the resulting clusters:

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code', 'Cluster']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Pcodes_venues_sorted = pd.DataFrame(columns=columns)
Pcodes_venues_sorted['Postal Code'] = toronto_merged['Postal Code']
Pcodes_venues_sorted['Cluster'] = toronto_merged['Cluster Labels']

for ind in np.arange(toronto_grouped.shape[0]):
    Pcodes_venues_sorted.iloc[ind, 2:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

Pcodes_venues_sorted.sort_values(by=['Cluster'], ascending=False).reset_index(drop=True)

Unnamed: 0,Postal Code,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4X,4,Coffee Shop,Pizza Place,Café,Bakery,Pub,Restaurant,Italian Restaurant,Park,Pharmacy,Flower Shop
1,M4Y,4,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Yoga Studio,Pub,Fast Food Restaurant,Men's Store,Mediterranean Restaurant
2,M5A,4,Coffee Shop,Park,Café,Pub,Bakery,Theater,Breakfast Spot,Wine Shop,Performing Arts Venue,Shoe Store
3,M5B,4,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Hotel,Bubble Tea Shop,Middle Eastern Restaurant,Japanese Restaurant,Fast Food Restaurant,Italian Restaurant
4,M5C,4,Coffee Shop,Café,American Restaurant,Cosmetics Shop,Cocktail Bar,Hotel,Department Store,Cheese Shop,Gym,Creperie
5,M5E,4,Coffee Shop,Cocktail Bar,Farmers Market,Beer Bar,Restaurant,Cheese Shop,Bakery,Pharmacy,Seafood Restaurant,Park
6,M5W,4,Coffee Shop,Seafood Restaurant,Restaurant,Café,Cocktail Bar,Beer Bar,Italian Restaurant,Japanese Restaurant,Park,Gym
7,M5G,3,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Bubble Tea Shop,Salad Place,Burger Joint,Thai Restaurant,Poke Place,Japanese Restaurant
8,M7A,3,Coffee Shop,Sushi Restaurant,Yoga Studio,Beer Bar,Burrito Place,Sandwich Place,Café,Restaurant,College Auditorium,Creperie
9,M5T,2,Café,Vegetarian / Vegan Restaurant,Mexican Restaurant,Vietnamese Restaurant,Coffee Shop,Caribbean Restaurant,Arts & Crafts Store,Bar,Park,Dessert Shop


By looking at the map we can observe that postal codes in cluster number 4 are situated to the East of Yonge Street in Downtown Toronto where coffe shops are the most common type of venue. Other common venues in this side of town include Cafes, Seafood, Japanese, and Sushi restaurants. If you are looking for a place to eat seafood and/or japanese food, cluster 4 seems to be the right side of town to visit.

The postal codes in cluster 3 are situated to the North of Dundas Street West, right in between University Avenue and Yonge Street. Again, the most common venues in this cluster are coffee shops. Other common venues are italian restaurants and sandwich places. Maybe if you are looking to eat fast food (sandwiches) or Italian food this might be a good side of town to visit.

There is just one postal code in custer 2. Chinatown is located in this cluster. Cluster 2 seems to be a good place to visit if you are looking for a vegetarian/vegan, Mexican, or Vietnamese type of restaurant.

The postal codes in cluster 1 are located to the south of cluster 3; this area is situated to the South of Dundas Street West, right in between University Avenue and Yonge Street. Cluster 1 seems to be the most busy side of town for tourism as most of the hotels are located in this area. The most common venues in this cluster are still coffee shops, but Hotels, Cafes, Gyms and Aquiariums are also very common in this area.

Lastly, there is just one postal code in cluster 0. The University of Toronto is located in this area and the most common venues in this side of town are Cafes, Bars and Italian restaurants.