<h1>Applied Data Science Capstone Project</h1>

<h2>Assignment #2:  Segmenting and Clustering Neighborhoods in Toronto</h2>

<h2>Part 03. Exploring the data</h2>

<h3>0. Prepare all the necessary stuff</h3>

In [1]:
# install required packages
!conda install -c conda-forge folium=0.5.0 --yes
!conda install -c conda-forge geopy --yes
print('Installation process is complete')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Installation process is complete


In [2]:
# import required libraries
import requests, csv, os, sys, folium, json
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
print('Import process is complete')

Import process is complete


<h3>1. Working with the table 'List of postal codes of Canada: M' from Wikipedia</h3>

<h4>1.1. Retrieve the data and create a dataframe</h4>

In [3]:
# retrieve the data from Wikipedia
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(source.content, 'lxml')

# There are several tables on the page, so we need to find the exact one:
table = soup.find('table', class_='wikitable sortable')

In [4]:
# Prepare the csv file: 
csv_file = open('neighborhoods.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['PostalCode', 'Borough', 'Neighborhood'])

33

In [5]:
# Find all rows and columns of the table and write the data into csv file:
# Since table headers use the [th] tag, it will throw an error, so there must be an exception.
for items in table.find_all('tr')[1::1]:
    entries = items.find_all('td')
    try:
        PostalCode = entries[0].get_text(strip=True)
        Borough = entries[1].get_text(strip=True)
        Neighbourhood = entries[2].get_text(strip=True)
    except IndexError:
        pass
    csv_writer.writerow([PostalCode, Borough, Neighbourhood])

csv_file.close()

In [6]:
# Create the dataframe:
neighborhoods = pd.read_csv('neighborhoods.csv')

<h4>1.2. Process the data</h4>

In [7]:
# Exclude rows that don't have an assigned borough
neighborhoods = neighborhoods[neighborhoods.Borough != 'Not assigned']

# If a neighborhoods exists in several postal codes, combine the neighborhoods
neighborhoods = neighborhoods.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()

# Change unassigned neighborhood to the same value as borough
neighborhoods.loc[neighborhoods['Neighborhood'] == 'Not assigned', 'Neighborhood'] = neighborhoods['Borough']

<h4>1.3. See the result</h4>

In [8]:
# see the dataframe
neighborhoods.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [9]:
# Print the number of rows of the dataframe
neighborhoods.shape

(103, 3)

<h3>2. Adding geocodes</h3>

<h4>2.1. Acquire the data</h4>

<p><em>Disclaimer:</em> as they said that the Geocoder package might be buggy, I did not want to risk it, so I downloaded the data from the csv file, kindly provided by the Instructor</p>

In [10]:
# acquire the data
geocodes = pd.read_csv('https://cocl.us/Geospatial_data', header=0, names=['PostalCode', 'Latitude', 'Longitude'])

<h4>2.2. Add location data to the dataframe</h4>

In [11]:
# merge dataframes
neighborhoods_geo = pd.merge(neighborhoods, geocodes, on='PostalCode')

<h4>2.3. See the result</h4>

In [12]:
neighborhoods_geo.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


<h3>3. Exploring the data</h3>

<h4>3.1. Etobicoke</h4>

<p><em>Wikipedia:</em><br>
Etobicoke (pronounced /ɛˈtoʊbɪkoʊ/, with silent 'ke') is an administrative district and former city that makes up the western part of Toronto, Ontario, Canada.<br>
Etobicoke was first settled by Europeans in the 1790s; the municipality grew into city status in the 20th century.<br>
The name "Etobicoke" was derived from the Mississauga word <em>wah-do-be-kang (wadoopikaang)</em>, meaning "place where the alders grow". 
</p>
<p>I chose this borough not only because it's not too large, so your screens are not going to be all covered with data, but the main reason was: I just liked the name very much! So hello to the people of Etobicoke!</p>
<p>So let's get to it!</p>

In [13]:
# Create a dataframe for Etobicoke borough
etobicoke_data = neighborhoods_geo[neighborhoods_geo['Borough'] == 'Etobicoke'].reset_index(drop=True)
etobicoke_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M8V,Etobicoke,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321
1,M8W,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484
2,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
3,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509
4,M8Z,Etobicoke,"Kingsway Park South West, Mimico NW, The Queen...",43.628841,-79.520999
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724
7,M9C,Etobicoke,"Bloordale Gardens, Eringate, Markland Wood, Ol...",43.643515,-79.577201
8,M9P,Etobicoke,Westmount,43.696319,-79.532242
9,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724


<h4>3.2. Create a map of Etobicoke</h4>

In [14]:
# Get the coordinates
address = 'Etobicoke, Toronto'
geolocator = Nominatim(user_agent='IBM')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Coordinates of {} are: {}, {}.'.format(address, latitude, longitude))

Coordinates of Etobicoke, Toronto are: 43.6435559, -79.5656326.


In [15]:
# Draw a map
map_etobicoke = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, label in zip(etobicoke_data['Latitude'], etobicoke_data['Longitude'], etobicoke_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='white',
        fill_opacity=0.7,
        parse_html=False).add_to(map_etobicoke)  
map_etobicoke

<h4>3.3. Gather the information about Etobicoke venues from Foursquare</h4>

<p>To make sure I don't lose my Foursquare credentials, I saved them to a file. Unfortunately, the Watson Studio file management system is pretty awkward, so I had to keep it hidden anyways. Sorry about that</p>

In [16]:
# The code was removed by Watson Studio for sharing.

In [17]:
# foursquare settings
CLIENT_ID = open('fsq_id', 'r').read()
CLIENT_SECRET = open('fsq_s', 'r').read()
VERSION = '20190214'
LIMIT = 100
radius = 500

In [18]:
# Get the top 100 venues in Etobicoke

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
# Get those venues
etobicoke_venues = getNearbyVenues(names=etobicoke_data['Neighborhood'],
                                   latitudes=etobicoke_data['Latitude'],
                                   longitudes=etobicoke_data['Longitude']
                                  )

Humber Bay Shores, Mimico South, New Toronto
Alderwood, Long Branch
The Kingsway, Montgomery Road, Old Mill North
Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea
Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor
Islington Avenue
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Westmount
Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips
Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown
Northwest


In [20]:
# Shape it
print(etobicoke_venues.shape)
etobicoke_venues.head()

(76, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,LCBO,43.602281,-79.499302,Liquor Store
1,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,New Toronto Fish & Chips,43.601849,-79.503281,Restaurant
2,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,Delicia Bakery & Pastry,43.601403,-79.503012,Bakery
3,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,Lucky Dice Restaurant,43.601392,-79.503056,Café
4,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,McDonald's,43.60247,-79.498963,Fast Food Restaurant


<h4>3.4. Prepare the acquired data for future exploration</h4>

In [21]:
# Number of venues for each neighborhood
etobicoke_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",11,11,11,11,11,11
"Alderwood, Long Branch",11,11,11,11,11,11
"Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe",5,5,5,5,5,5
"Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park",2,2,2,2,2,2
"Humber Bay Shores, Mimico South, New Toronto",16,16,16,16,16,16
"Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea",2,2,2,2,2,2
"Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips",3,3,3,3,3,3
"Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor",13,13,13,13,13,13
Northwest,3,3,3,3,3,3
"The Kingsway, Montgomery Road, Old Mill North",3,3,3,3,3,3


In [22]:
# One hot encoding
etobicoke_onehot = pd.get_dummies(etobicoke_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
etobicoke_onehot['Neighborhood'] = etobicoke_venues['Neighborhood'] 

# Move neighborhood column to the first column
fixed_columns = [etobicoke_onehot.columns[-1]] + list(etobicoke_onehot.columns[:-1])
etobicoke_onehot = etobicoke_onehot[fixed_columns]

# See the result
print(etobicoke_onehot.shape)
etobicoke_onehot.head()


(76, 43)


Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Beer Store,Burger Joint,Bus Line,...,Restaurant,River,Sandwich Place,Seafood Restaurant,Skating Rink,Smoke Shop,Social Club,Supplement Shop,Tanning Salon,Wings Joint
0,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,"Humber Bay Shores, Mimico South, New Toronto",0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Humber Bay Shores, Mimico South, New Toronto",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
# Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
etobicoke_grouped = etobicoke_onehot.groupby('Neighborhood').mean().reset_index()

# See the result
etobicoke_grouped

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Beer Store,Burger Joint,Bus Line,...,Restaurant,River,Sandwich Place,Seafood Restaurant,Skating Rink,Smoke Shop,Social Club,Supplement Shop,Tanning Salon,Wings Joint
0,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,...,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0
2,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cloverdale, Islington, Martin Grove, Princess ...",0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Humber Bay Shores, Mimico South, New Toronto",0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0625,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0
5,"Humber Bay, King's Mill Park, Kingsway Park So...",0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Kingsview Village, Martin Grove Gardens, Richv...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Kingsway Park South West, Mimico NW, The Queen...",0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,...,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.076923,0.076923,0.076923
8,Northwest,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"The Kingsway, Montgomery Road, Old Mill North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0


In [24]:
# Confirm the new size
etobicoke_grouped.shape

(11, 43)

<h4>3.5. Let's see what we have to deal with</h4>

As there are pretty few venues in the borogh, I'm not going to make the list too long. Top-3 is quite enough.

In [25]:
# Print each neighborhood along with the top 3 most common venues
num_top_venues = 3

for hood in etobicoke_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = etobicoke_grouped[etobicoke_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                 venue  freq
0        Grocery Store  0.18
1  Japanese Restaurant  0.09
2       Discount Store  0.09


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.18
1             Gym  0.09
2  Sandwich Place  0.09


----Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe----
          venue  freq
0  Liquor Store   0.2
1      Pharmacy   0.2
2    Beer Store   0.2


----Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park----
                 venue  freq
0                 Bank   0.5
1          Golf Course   0.5
2  American Restaurant   0.0


----Humber Bay Shores, Mimico South, New Toronto----
                 venue  freq
0                 Café  0.12
1          Coffee Shop  0.12
2  American Restaurant  0.06


----Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, 

<h4>3.6. Create a dataframe with all of this information</h4>

In [26]:
# A function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
# Create the new dataframe and display the top 3 venues for each neighborhood.

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = etobicoke_grouped['Neighborhood']

for ind in np.arange(etobicoke_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(etobicoke_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pizza Place,Fast Food Restaurant
1,"Alderwood, Long Branch",Pizza Place,Pool,Athletics & Sports
2,"Bloordale Gardens, Eringate, Markland Wood, Ol...",Liquor Store,Beer Store,Pharmacy
3,"Cloverdale, Islington, Martin Grove, Princess ...",Golf Course,Bank,Chinese Restaurant
4,"Humber Bay Shores, Mimico South, New Toronto",Coffee Shop,Café,American Restaurant
5,"Humber Bay, King's Mill Park, Kingsway Park So...",Baseball Field,Park,Wings Joint
6,"Kingsview Village, Martin Grove Gardens, Richv...",Park,Bus Line,Pizza Place
7,"Kingsway Park South West, Mimico NW, The Queen...",Wings Joint,Gym,Bakery
8,Northwest,Rental Car Location,Drugstore,Bar
9,"The Kingsway, Montgomery Road, Old Mill North",Smoke Shop,River,Park


<h4>3.7. Cluster neighborhoods</h4>

<p>Again, since the borough is small, I believe we don't need to go crazy with the number of clusters</p>

In [28]:
# set number of clusters
kclusters = 4

# define cluster
etobicoke_grouped_clustering = etobicoke_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(etobicoke_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 1, 0, 2, 2, 0, 3, 0, 0], dtype=int32)

Creating a dataframe including the cluster as well as the top-3 venues for each neighborhood:

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

etobicoke_merged = etobicoke_data

# merge two dataframes together to get coordinates for neighborhoods
etobicoke_merged = etobicoke_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# just in case of insufficient data 
etobicoke_merged = etobicoke_merged.dropna()
etobicoke_merged['Cluster Labels'].astype(int)

# see the result
etobicoke_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M8V,Etobicoke,"Humber Bay Shores, Mimico South, New Toronto",43.605647,-79.501321,0.0,Coffee Shop,Café,American Restaurant
1,M8W,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,0.0,Pizza Place,Pool,Athletics & Sports
2,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,0.0,Smoke Shop,River,Park
3,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509,2.0,Baseball Field,Park,Wings Joint
4,M8Z,Etobicoke,"Kingsway Park South West, Mimico NW, The Queen...",43.628841,-79.520999,0.0,Wings Joint,Gym,Bakery


<h4>3.8. Draw a map</h4>

In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(etobicoke_merged['Latitude'], etobicoke_merged['Longitude'], etobicoke_merged['Neighborhood'], etobicoke_merged['Cluster Labels']):
    print(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

0.0
0.0
0.0
2.0
0.0
1.0
0.0
0.0
2.0
0.0
3.0


<h4>3.9. See the result</h4>

<h5>First cluster:</h5>

In [31]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 0, etobicoke_merged.columns[[2] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"Humber Bay Shores, Mimico South, New Toronto",0.0,Coffee Shop,Café,American Restaurant
1,"Alderwood, Long Branch",0.0,Pizza Place,Pool,Athletics & Sports
2,"The Kingsway, Montgomery Road, Old Mill North",0.0,Smoke Shop,River,Park
4,"Kingsway Park South West, Mimico NW, The Queen...",0.0,Wings Joint,Gym,Bakery
7,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.0,Liquor Store,Beer Store,Pharmacy
8,Westmount,0.0,Pizza Place,Chinese Restaurant,Intersection
10,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,Grocery Store,Pizza Place,Fast Food Restaurant


Well, it looks like these guys like to have fun: they barely go to the park, but they surely like pizza along with some liquor and smokes. 

<h5>Second cluster:</h5>

In [32]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 1, etobicoke_merged.columns[[2] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
6,"Cloverdale, Islington, Martin Grove, Princess ...",1.0,Golf Course,Bank,Chinese Restaurant


These guys look like a wealthy ones with golf and banking venues. 

<h6>Third cluster:</h6>

In [33]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 2, etobicoke_merged.columns[[2] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
3,"Humber Bay, King's Mill Park, Kingsway Park So...",2.0,Baseball Field,Park,Wings Joint
9,"Kingsview Village, Martin Grove Gardens, Richv...",2.0,Park,Bus Line,Pizza Place


These guys are all into outdoors things like baseball and walking. No wonder: they have some beautiful parks in the area! 

<h6>Fourth cluster:</h6>

In [34]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 3, etobicoke_merged.columns[[2] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
11,Northwest,3.0,Rental Car Location,Drugstore,Bar


That's the most unusual thing here. Don't drink and drive, guys! 

<hr>
<h3>That's all, folks</h3>

<p>Thank you for taking your time and reviewing my work. </p>
<p>I wish you the best of luck with completing this course. </p>
<p>Have a great time!</p>