# Exploring the Distribution of Cafe in Toronto


When I was in Toronto, I found the cafe there was not well distributed. There are too many cafe in downtown area of Toronto, but nearly none in other places. Therefore, I would like to cluster the neighborhoods of Toronto, and visualize the distribution of Cafe in different neighborhoods. Then invesgators can refer to this report to guide their choices for whether to open a new cafe, and where the cafe should be.

### 1. Import the libraries.

In [2]:

pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/79/37/d420b7fdc9a550bd29b8cfeacff3b38502d9600b09d7dfae9a69e623b891/lxml-4.5.2-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 3.2MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.2
Note: you may need to restart the kernel to use updated packages.


In [3]:

pip install bs4

Collecting bs4
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 5.8MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from beautifulsoup4->bs4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jupyterlab/.cache/pip/wheels/a0/b0/b2/4f80b9456b87abedbc0bf2d52235414c3467d8889be38dd472
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.9.1 bs4-0.0.

In [4]:
# import the libraries that I will use in this report
import numpy as np 
import pandas as pd

import requests
import lxml.html as lh
import bs4 as bs
import urllib.request

import folium

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
print('Libraries imported.')

Libraries imported.


### 2. Read data from url

In [5]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
res = requests.get(url)
soup = bs.BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
data = pd.read_json(df[0].to_json(orient='records'))

view the first 5 rows of data

In [6]:
data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### 3. Cleaning the data

In [7]:
# Ignore the cells with borough not assigned
data=data[data['Borough']!= "Not assigned"]
data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [8]:
# group neighbourhood by postal code and borough
data=data.groupby(['Postal Code','Borough'],as_index=False).agg(','.join)
data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [9]:
# replace the not assigned in neighborhood with its borough
data[data['Neighborhood']=="Not Assigned"]['Neighborhood']=data[data['Neighborhood']=="Not Assigned"]['Borough']

In [10]:
# print the shape of cleaning data
print('our data has', data.shape[0],'rows, and', data.shape[1], 'columns')

our data has 103 rows, and 3 columns


In [11]:
data.shape

(103, 3)

### 4. Assign the latitude and longitude

I use the csv file and implemented the latitude and longitude into our datasets

In [12]:
lat_lon_df=pd.read_csv('Geospatial_Coordinates.csv')
lat_lon_df.head()


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
latitude=[]
longitude=[]

for postal_code in list(data['Postal Code']):
    for i in lat_lon_df.index:
        if lat_lon_df.loc[i,'Postal Code']==postal_code:
            latitude.append(lat_lon_df.loc[i,'Latitude'])
            longitude.append(lat_lon_df.loc[i,'Longitude'])

data['Latitude']=latitude
data['Longitude']=longitude

data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### 5. Cluster the neighbourhoods in Toronto

In [14]:
toronto_map = folium.Map(location=[43.65, -79.4], zoom_start=10)

X = data['Latitude']
Y = data['Longitude']
Z = np.stack((X, Y), axis=1)

kmeans = KMeans(n_clusters=5, random_state=0).fit(Z)

clusters = kmeans.labels_
colors = ['red', 'green', 'blue', 'yellow','pink']
data['Cluster'] = clusters

for latitude, longitude, borough, cluster in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Cluster']):
    label = folium.Popup(borough, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7).add_to(toronto_map)  

toronto_map

In [15]:
toronto_map.save('toronto_map.html')

In [16]:
data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0


### 6. Now we use Foursquare API to get

In [17]:
# define Foursquare Credentials and Version
CLIENT_ID = 'ID' # your Foursquare ID
CLIENT_SECRET = 'PASSWORD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TJSUP5PPJ1ZBCSKWJ0M2PT5QRGKKUPEGQQYSAATVI0VVA22F
CLIENT_SECRET:RTGTZ5UKV32JYWUNR13ATVZFA0VALL0UGHQHKDM0H0GOYHXL


Return the top 100 venues in each of the neighborhood with radius 2000

In [18]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(data['Latitude'], data['Longitude'], data['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [19]:

# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(8648, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Malvern, Rouge",43.806686,-79.194353,African Rainforest Pavilion,43.817725,-79.183433,Zoo Exhibit
1,"Malvern, Rouge",43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
2,"Malvern, Rouge",43.806686,-79.194353,Toronto Pan Am Sports Centre,43.790623,-79.193869,Athletics & Sports
3,"Malvern, Rouge",43.806686,-79.194353,Toronto Zoo,43.820582,-79.181551,Zoo
4,"Malvern, Rouge",43.806686,-79.194353,Polar Bear Exhibit,43.823372,-79.185145,Zoo


Check how many venues are returned for each neighborhood

In [20]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,100,100,100,100,100,100
"Alderwood, Long Branch",100,100,100,100,100,100
"Bathurst Manor, Wilson Heights, Downsview North",53,53,53,53,53,53
Bayview Village,54,54,54,54,54,54
"Bedford Park, Lawrence Manor East",100,100,100,100,100,100
...,...,...,...,...,...,...
"Willowdale, Willowdale West",54,54,54,54,54,54
Woburn,68,68,68,68,68,68
Woodbine Heights,98,98,98,98,98,98
York Mills West,100,100,100,100,100,100


Look at all the categories in the top 100 venues

In [21]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

venues_df['VenueCategory'].unique()

There are 329 uniques categories.


array(['Zoo Exhibit', 'Spa', 'Athletics & Sports', 'Zoo', 'Restaurant',
       'Bank', 'Liquor Store', 'Caribbean Restaurant',
       'Paper / Office Supplies Store', 'Fried Chicken Joint',
       'Fast Food Restaurant', 'Pizza Place', 'Gas Station',
       'Skating Rink', 'Bus Station', 'Intersection', 'Pub', 'Park',
       'Curling Ice', 'Grocery Store', 'Mediterranean Restaurant',
       'Burger Joint', 'Italian Restaurant', 'Breakfast Spot', 'Bakery',
       'Neighborhood', 'Pharmacy', 'Ice Cream Shop', 'Mexican Restaurant',
       'Coffee Shop', 'Sandwich Place', 'Beer Store', 'Supermarket',
       'Gym / Fitness Center', 'Diner', 'Gym', 'Discount Store',
       'Food & Drink Shop', 'Mobile Phone Shop', 'Bar', 'Pet Store',
       'Japanese Restaurant', 'Convenience Store', 'Fish & Chips Shop',
       'Smoothie Shop', 'Juice Bar', 'Sports Bar', 'Train Station',
       'Greek Restaurant', 'Hotel', 'Salon / Barbershop',
       'Indian Restaurant', 'Automotive Shop', 'Asian Restaurant

### 7. Explore each neighborhood

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,...,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


Let's examine the new data frame size

In [23]:
toronto_onehot.shape

(8648, 329)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [24]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Antique Shop,Aquarium,...,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.010000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.00,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.010000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.00,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0
95,Woburn,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0
96,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0
97,York Mills West,0.0,0.0,0.0,0.0,0.0,0.010000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.01,0.0


Let's confirm the new size

In [25]:
toronto_grouped.shape

(99, 329)

#### we only concern about the coffee shop conditions, therefore, create a new dataframe of coffee shop only

In [26]:
toronto_cafe=toronto_grouped[['Neighborhood','Coffee Shop','Cafeteria','Café']]
toronto_cafe['Total']=toronto_cafe[['Coffee Shop','Cafeteria','Café']].sum(axis=1)

toronto_cafe.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Neighborhood,Coffee Shop,Cafeteria,Café,Total
0,Agincourt,0.08,0.0,0.0,0.08
1,"Alderwood, Long Branch",0.1,0.0,0.03,0.13
2,"Bathurst Manor, Wilson Heights, Downsview North",0.075472,0.0,0.018868,0.09434
3,Bayview Village,0.074074,0.0,0.037037,0.111111
4,"Bedford Park, Lawrence Manor East",0.11,0.0,0.03,0.14


#### Let's print the general information of coffee shop in each neighborhood of Toronto

In [27]:
for hood in toronto_cafe['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_cafe[toronto_cafe['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True))
    
    print('\n')

----Agincourt----
         venue  freq
0  Coffee Shop  0.08
1        Total  0.08
2    Cafeteria  0.00
3         Café  0.00


----Alderwood, Long Branch----
         venue  freq
0        Total  0.13
1  Coffee Shop  0.10
2         Café  0.03
3    Cafeteria  0.00


----Bathurst Manor, Wilson Heights, Downsview North----
         venue  freq
0        Total  0.09
1  Coffee Shop  0.08
2         Café  0.02
3    Cafeteria  0.00


----Bayview Village----
         venue  freq
0        Total  0.11
1  Coffee Shop  0.07
2         Café  0.04
3    Cafeteria  0.00


----Bedford Park, Lawrence Manor East----
         venue  freq
0        Total  0.14
1  Coffee Shop  0.11
2         Café  0.03
3    Cafeteria  0.00


----Berczy Park----
         venue  freq
0        Total  0.13
1  Coffee Shop  0.09
2         Café  0.04
3    Cafeteria  0.00


----Birch Cliff, Cliffside West----
         venue  freq
0        Total  0.15
1  Coffee Shop  0.13
2         Café  0.02
3    Cafeteria  0.00


----Brockton, Parkdale V

#### And let's review the top 10 venues in each neighborhood, here observe coffee shp, cafe, cafeteria individually

Use the sum of coffee shop, cafeteria and cafe as total coffee for our evaluation

Write a function to order the venues in decsending order

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a new data frame with the top 5 venues in each neighborhoods of Toronto

In [29]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Chinese Restaurant,Coffee Shop,Restaurant,Pharmacy,Cantonese Restaurant
1,"Alderwood, Long Branch",Coffee Shop,Fast Food Restaurant,Department Store,Pizza Place,Breakfast Spot
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Pizza Place,Park,Bank,Deli / Bodega
3,Bayview Village,Park,Chinese Restaurant,Coffee Shop,Shopping Mall,Bank
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Sushi Restaurant,Italian Restaurant,Bakery,Sandwich Place


### 8. Clustering the neighborhoods

run the kMeans to cluster neighborhoods of Toronto based on the frequency of total coffee visitings:

In [30]:
# set number of clusters
kclusters = 5

toronto_clustering = toronto_grouped.drop(['Neighborhood'],axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 4, 4, 1, 2, 4, 1, 1, 1], dtype=int32)

Now let's create a new data frame with the cluster labels from k means, and the top 5 venues

In [31]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [32]:
toronto_merged = data

In [33]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0,3,Zoo Exhibit,Restaurant,Fast Food Restaurant,Athletics & Sports,Bus Station
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0,4,Breakfast Spot,Pizza Place,Coffee Shop,Pet Store,Bank
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0,4,Pizza Place,Coffee Shop,Park,Bank,Breakfast Spot
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0,4,Coffee Shop,Fast Food Restaurant,Discount Store,Bank,Sandwich Place
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0,4,Coffee Shop,Clothing Store,Gas Station,Sandwich Place,Bank


### Finally, let's visualize the result

In [34]:
# create map
import matplotlib.colors as colors
import folium 

map_clusters = folium.Map(location=[43.65, -79.4], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters       

In [35]:
map_clusters.save("toronto_cluster.html")

### 8. Examine each cluster

#### Cluster 1

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
16,Scarborough,0,0,Golf Course,Farm,Playground,Trail,Sculpture Garden


#### Cluster 2

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
36,East York,2,1,Park,Coffee Shop,Pizza Place,Café,Thai Restaurant
37,East Toronto,2,1,Coffee Shop,Pub,Breakfast Spot,Beach,Japanese Restaurant
38,East York,2,1,Coffee Shop,Indian Restaurant,Park,Bakery,Grocery Store
40,East York,2,1,Café,Greek Restaurant,Coffee Shop,Bakery,Gastropub
41,East Toronto,2,1,Café,Greek Restaurant,Park,Vietnamese Restaurant,Bakery
42,East Toronto,2,1,Park,Café,Brewery,Coffee Shop,Beach
43,East Toronto,2,1,Coffee Shop,Park,Bakery,Vietnamese Restaurant,Brewery
44,Central Toronto,4,1,Coffee Shop,Italian Restaurant,Sushi Restaurant,Pizza Place,Pub
45,Central Toronto,4,1,Coffee Shop,Italian Restaurant,Park,Bakery,Café
46,Central Toronto,4,1,Italian Restaurant,Coffee Shop,Café,Sushi Restaurant,Bakery


#### Cluster 3

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
21,North York,4,2,Korean Restaurant,Coffee Shop,Bubble Tea Shop,Café,Middle Eastern Restaurant
22,North York,4,2,Korean Restaurant,Grocery Store,Japanese Restaurant,Bubble Tea Shop,Supermarket
52,Downtown Toronto,3,2,Coffee Shop,Japanese Restaurant,Café,Park,Diner
53,Downtown Toronto,3,2,Coffee Shop,Park,Japanese Restaurant,Café,Restaurant
54,Downtown Toronto,3,2,Coffee Shop,Park,Gastropub,Japanese Restaurant,Café
55,Downtown Toronto,3,2,Coffee Shop,Japanese Restaurant,Hotel,Park,Theater
56,Downtown Toronto,3,2,Coffee Shop,Park,Hotel,Café,Japanese Restaurant
57,Downtown Toronto,3,2,Café,Bookstore,Park,Restaurant,Yoga Studio
58,Downtown Toronto,3,2,Coffee Shop,Hotel,Café,Plaza,Park
59,Downtown Toronto,3,2,Café,Hotel,Gym,Park,Coffee Shop


#### Cluster 4

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Scarborough,0,3,Zoo Exhibit,Restaurant,Fast Food Restaurant,Athletics & Sports,Bus Station


#### Cluster 5

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

From this result, Cluster 2, 3 and 5 are more suitable for running a coffee shop