## Part 1
<u>*Data scraping*</u>
Initially import the modules you will need:
- pandas (dataframe)
- numpy 
- beautiful soup (to parse through the text of the url and scrape it)
- requests(to access the url)

In [2]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests


- Obtain the wikipedia url that has the neighborhoods and postcodes.
- Create the beautiful soup object. 
    - find the table in the soup object and prettify it so we can inspect it

In [3]:
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = requests.get(wiki_url).text
soup = BeautifulSoup(data, 'html5lib')

In [4]:
table = soup.find('table')
#print(table.prettify())

- Create a list that will house the data we scrape
- go through each cell in the table on the wiki article and scrape from it the Post Code, Neighborhood and Borough.
    - When scraping the borough and neighborhood we need to split on the brackets as the name not in brackets is the Borough and the name(s) in brackets are the Neighborhood(s). 

In [5]:
table_contents=[]
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

In [6]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


## Part 2
<u>*Coordinates*</u>
<br>
Below code is edited to loop through every post code in the dataframe and receive their latitude and longitude. I kept returning None however, so went ahead with using the .csv instead. 

In [7]:
#import geocoder # import geocoder

# initialize your variable to None
#lat_lng_coords = None
#Latitude = []
#Longitude = []

# loop until you get the coordinates
#for pc in df['PostalCode']:
    #postal_code = pc
    #print(postal_code)
   # while(lat_lng_coords is None):
        #g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
       # lat_lng_coords = g.latlng
    #print(postal_code)
    #print(lat_lng_coords[0])
    #print(lat_lng_coorsd[1])
    #Latitude.append(lat_lng_coords[0])
   # Longitude.append(lat_lng_coords[1])

In [8]:
coords = pd.read_csv('C:/Users/swamp/Documents/Geospatial_Coordinates.csv')
coords

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


Here I merged the two dataframes created on the column they share - 'PostalCode'.

In [9]:
df = pd.merge(df, coords, on='PostalCode')
df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## Part 3
<u>*Clustering*</u>
<br> Dependencies we already have
- numpy
- pandas
- requests
<br>

So we still need folium, sklearn KMeans, Matplotlib, pandas.io.json, json, and geopy.geocoders


In [10]:
import json
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

Let's first see how many boroughs and neighborhoods Toronto has. We will then use geocoder to obtain the coordinates of Toronto. After, we use folium to create a map and add markers to it to show the different neighborhoods. 

In [13]:
print(f"The dataframe has {len(df['Borough'].unique())} boroughs and {df.shape[0]} neighborhoods.")

The dataframe has 15 boroughs and 103 neighborhoods.


In [15]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="TR_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of Toronto are {latitude}, {longitude}.')

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [19]:
#creating a map of Toronto
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10.5)

#adding markers
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = f'{neighborhood},{borough}'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto                                                                                            

Let's zoom in a bit more on downtown Toronto. SInce Just using Downtown Toronto only provides aroudn 17 boroughs we are going to use Downtown, Central, East, and West Toronto to generate this dataset.

In [43]:
DTT_data = df[df['Borough'] =='Downtown Toronto'].reset_index(drop=True)
WT_data =  df[df['Borough']== 'West Toronto'].reset_index(drop=True) 
CT_data =  df[df['Borough'] == 'Central Toronto'].reset_index(drop=True)
ET_data =  df[df['Borough'] =='East Toronto'].reset_index(drop=True)

In [45]:
T_data = pd.concat([DTT_data, WT_data, CT_data, ET_data], ignore_index = True)
T_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


Let's get the geographical coordinates of Downtown Toronto and then start to plot all these boroughs

In [50]:
address = 'Downtown Toronto, ON'

geolocator = Nominatim(user_agent="TR_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of Toronto are {latitude}, {longitude}.')

#creating a map of Toronto
map_DTtoronto = folium.Map(location=[latitude, longitude], zoom_start=11)

#adding markers
for lat, lng, borough, neighborhood in zip(T_data['Latitude'], T_data['Longitude'], T_data['Borough'], T_data['Neighborhood']):
    label = f'{neighborhood},{borough}'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_DTtoronto)

map_DTtoronto                                              

The geograpical coordinate of Toronto are 43.6563221, -79.3809161.


Now to start utilizing Foursquare API to explore the neighborhoods and segment them. 

### Define Foursquare Credentials and version

In [51]:
CLIENT_ID = 'JSPIGTXNCYYYVMGTZAEMV0OAXVEU544QSOOODC02WV2TJXJA' # your Foursquare ID
CLIENT_SECRET = 'WXHUIRGYZ0CBDXZN2XFPXXGHSYLAVJUZ3G01RM5EP4WLWLP3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JSPIGTXNCYYYVMGTZAEMV0OAXVEU544QSOOODC02WV2TJXJA
CLIENT_SECRET:WXHUIRGYZ0CBDXZN2XFPXXGHSYLAVJUZ3G01RM5EP4WLWLP3


### Exploring the Neighborhoods
Using the function in the Tutorial, we obtain venues for all the enighborhoods in our list. 

In [53]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Toronto_venues = getNearbyVenues(names=T_data['Neighborhood'],
                                   latitudes=T_data['Latitude'],
                                   longitudes=T_data['Longitude']
                                  )

Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
North Toronto West
The Annex, North Midtown, Yorkville
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
St

In [54]:
print(Toronto_venues.shape)
Toronto_venues.head()

(1476, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


#### How many unique categories can be curated from all the returned venues

In [58]:
print(f"There are {len(Toronto_venues['Venue Category'].unique())} uniques categories.")

There are 224 uniques categories.


## Now we analyze each neighborhood

In [60]:
#Use one hot encoding
toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [62]:
toronto_onehot.shape

(1476, 224)

##### Now let's group rows by neighborhoods and by taking the mean of the frequency of occcurrence of each category

In [63]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.066667,0.2,0.066667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,...,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Commerce Court, Victoria Hotel",0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
7,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Dufferin, Dovercourt Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625


In [64]:
toronto_grouped.shape

(36, 224)

##### Now we print each neighborhood along with the top 5 most common venues

In [65]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
          venue  freq
0   Coffee Shop  0.05
1  Cocktail Bar  0.05
2        Bakery  0.05
3      Beer Bar  0.03
4           Pub  0.03


----Brockton, Parkdale Village, Exhibition Place----
                venue  freq
0                Café  0.14
1      Breakfast Spot  0.09
2         Coffee Shop  0.09
3              Bakery  0.05
4  Italian Restaurant  0.05


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                 venue  freq
0      Airport Service  0.20
1      Harbor / Marina  0.07
2  Rental Car Location  0.07
3   Airport Food Court  0.07
4         Airport Gate  0.07


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.18
1      Sandwich Place  0.06
2                Café  0.06
3          Restaurant  0.04
4  Italian Restaurant  0.04


----Christie----
           venue  freq
0  Grocery Store  0.25
1           Café  0.19
2           Park  0.12
3     Baby Store  0.06
4  

<b>Now we put that into a *pandas* dataframe.</b>
<br>
First, we write a function to sort the venues in descending order, and then we create a datafram to display the top 10 venues for each neighborhood.

In [66]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Pub,Pharmacy,Seafood Restaurant,Cheese Shop,Farmers Market,Restaurant
1,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Coffee Shop,Bakery,Italian Restaurant,Stadium,Furniture / Home Store,Nightclub,Climbing Gym,Bar
2,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Harbor / Marina,Rental Car Location,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,Plane,Bar,Sculpture Garden
3,Central Bay Street,Coffee Shop,Sandwich Place,Café,Restaurant,Italian Restaurant,Salad Place,Bubble Tea Shop,Japanese Restaurant,Burger Joint,Spa
4,Christie,Grocery Store,Café,Park,Baby Store,Restaurant,Candy Store,Athletics & Sports,Italian Restaurant,Nightclub,Coffee Shop


### Clustering
Now we run k-means to cluster thje neighborhood into 5 clusters

In [67]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Creating a new datafram that includes the clusters and top 10 venues

In [68]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = T_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Café,Theater,Chocolate Shop,Mexican Restaurant,Spa
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Japanese Restaurant,Bubble Tea Shop,Ramen Restaurant,Theater,Bookstore,Middle Eastern Restaurant
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Café,Cocktail Bar,Hotel,Restaurant,Cosmetics Shop,Clothing Store,Gym,Beer Bar,Moroccan Restaurant
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Pub,Pharmacy,Seafood Restaurant,Cheese Shop,Farmers Market,Restaurant
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1,Coffee Shop,Sandwich Place,Café,Restaurant,Italian Restaurant,Salad Place,Bubble Tea Shop,Japanese Restaurant,Burger Joint,Spa


Lastly, we visualise.

In [69]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters
This will be interseting to examine as though we have 5 clusters, only 1 of them is spread out

#### Cluster 1

In [71]:
toronto_merged.loc[toronto_merged['Cluster Labels'] ==0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Central Toronto,0,Mexican Restaurant,Trail,Sushi Restaurant,Jewelry Store,Yoga Studio,Museum,Martial Arts School,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant


#### Cluster 2

In [72]:
toronto_merged.loc[toronto_merged['Cluster Labels'] ==1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Café,Theater,Chocolate Shop,Mexican Restaurant,Spa
1,Downtown Toronto,1,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Japanese Restaurant,Bubble Tea Shop,Ramen Restaurant,Theater,Bookstore,Middle Eastern Restaurant
2,Downtown Toronto,1,Coffee Shop,Café,Cocktail Bar,Hotel,Restaurant,Cosmetics Shop,Clothing Store,Gym,Beer Bar,Moroccan Restaurant
3,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Pub,Pharmacy,Seafood Restaurant,Cheese Shop,Farmers Market,Restaurant
4,Downtown Toronto,1,Coffee Shop,Sandwich Place,Café,Restaurant,Italian Restaurant,Salad Place,Bubble Tea Shop,Japanese Restaurant,Burger Joint,Spa
5,Downtown Toronto,1,Grocery Store,Café,Park,Baby Store,Restaurant,Candy Store,Athletics & Sports,Italian Restaurant,Nightclub,Coffee Shop
6,Downtown Toronto,1,Coffee Shop,Café,Clothing Store,Restaurant,Hotel,Thai Restaurant,Gym,Pizza Place,Cosmetics Shop,Salad Place
7,Downtown Toronto,1,Coffee Shop,Aquarium,Café,Hotel,Fried Chicken Joint,Brewery,Scenic Lookout,Restaurant,Park,Baseball Stadium
8,Downtown Toronto,1,Coffee Shop,Hotel,Café,Restaurant,Salad Place,Seafood Restaurant,Bakery,Italian Restaurant,Japanese Restaurant,Gastropub
9,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Hotel,Gym,American Restaurant,Deli / Bodega,Bakery,Seafood Restaurant,Japanese Restaurant


#### Cluster 3

In [75]:
toronto_merged.loc[toronto_merged['Cluster Labels'] ==2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Central Toronto,2,Restaurant,Intersection,Yoga Studio,New American Restaurant,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant


#### Cluster 4

In [74]:
toronto_merged.loc[toronto_merged['Cluster Labels'] ==3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,3,Park,Playground,Trail,Museum,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
23,Central Toronto,3,Park,Bus Line,Swim School,Yoga Studio,Museum,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant


#### Cluster 5

In [73]:
toronto_merged.loc[toronto_merged['Cluster Labels'] ==4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Central Toronto,4,Fast Food Restaurant,Garden,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant


So after inspecting the clusters we can see that the large cluster in unsuprisingly coffe shops and cafes. THe other clusters are far smaller and represent very specific things, from parks to different types of restraunts.
<br>
The way I would label the clusters would be as follows:
- Cluster 1: Foreign Food
- Cluster 2: Coffee Shops/Cafes
- Cluster 3: Restraunts
- Cluster 4: Parks/Recreation
- Cluster 5: Fast Food