# Segmenting and Clustering Neighborhoods in Toronto

## Part 1: Scrape Wikipedia page and create a dataframe

We start by importing the libraries that we need for this assignment.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

from bs4 import BeautifulSoup # import recommended html parsing library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.18.1-py_0 conda-forge

geographiclib- 100% |################################| Time: 0:00:00  23.94 MB/s
geopy-1.18.1-p 100% |################################| Time: 0:00:00  36.95 MB/s
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  55.44 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  34.00 MB/s
vincent-0.4.4- 100% |###################

Now we can scrape the wikipedia page by using the request library and BeautifulSoup.

In [22]:
# Assign the link of the website through which we are going to scrape the data and assign it to variable named website_url.
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [23]:
# Read the source code and create soup object
soup = BeautifulSoup(source,'lxml')

In [24]:
# Find the article in the HTML script
article = soup.find('table', class_='wikitable sortable')

# Create table list
table_list = []
for rows in article.find_all('td'):
    row = rows.text
    row = row.replace('\n', '')
    table_list.append(row)

In the next step we put all the data that we collected in a pandas dataframe. 

In [25]:
# Define the dataframe columns
column_names = ['PostalCode', 'Borough', 'Neighborhood'] 

# Create empty dataframe 
df = []
df = pd.DataFrame(columns=column_names)

We only want to process the cells that have an assigned borough. We need to ignore cells the cells with a borough that is Not assigned.

In [26]:
# Only process the cells that have an assigned borough. 
# Ignore cells with a borough that is Not assigned.
df.iloc[:,0]=table_list[::3]
df.iloc[:,1]=table_list[1::3]
df.iloc[:,2]=table_list[2::3]
df.replace("Not assigned", np.nan, inplace = True)
df.dropna(subset=["Borough"], axis=0, inplace = True)
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [27]:
# Assign the non-assigned neighborhood names of boroughs
for i in range(0, df.shape[0]):
    if pd.isnull(df.loc[i,'Neighborhood']):
        df.replace(df.loc[i,'Neighborhood'], df.loc[i,'Borough'],inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

In [28]:
# Combine the rows with the same code into one row and seperate them by comma
df['Neighborhood'] = df[['PostalCode','Borough','Neighborhood']].groupby(['PostalCode','Borough'])['Neighborhood'].transform(lambda x: ','.join(x)) 
df.drop_duplicates(inplace=True)
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


In the last cell of your notebook, we need to use the .shape method to print the number of rows of your dataframe.

In [29]:
# Use the shape method to print the number of rows of your dataframe.
df.shape

(103, 3)

## Part 2: Add latitude and longitude to the dataframe

In this part we use a csv file that has the geographical coordinates of each postal code. We add the data from the csv file to the dataframe we´ve created so that it also shows the latitude and longitude of each postal code.

In [30]:
#Lets rename the existing dataframe to Toronto 1
Toronto1=df

#We create a second dataframe which contains the data from the csv file
Toronto2=pd.read_csv("http://cocl.us/Geospatial_data")

#We add the values to the first dataframe
Toronto1[['Latitude', 'Longitude']]=Toronto2[['Latitude', 'Longitude']]

#Let's check the result
Toronto1.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.806686,-79.194353
1,M4A,North York,Victoria Village,43.784535,-79.160497
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.763573,-79.188711
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.770992,-79.216917
4,M7A,Queen's Park,Queen's Park,43.773136,-79.239476


In [32]:
#Finally, let's check the new shape
Toronto1.shape

(103, 5)

## Part 3. Explore and cluster the neighborhoods in Toronto

In the last part we are going to explore and cluster the neighborhoods in Toronto. We will work with boroughs that contains the word Toronto and then replicate the same analysis we did to the New York City Data. 

We use geopy library to get the latitude and longitude values of Toronto. In order to define an instance of the geocoder, we need to define a user_agent. We call it 'toronto_explorer'. 

In [33]:
address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Next we create a map of Toronto with the neighborhoods superimposed on top. 

In [34]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Toronto1['Latitude'], Toronto1['Longitude'], Toronto1['Borough'], Toronto1['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

We will simplify the above map and segment and cluster only the neighborhoods in Etobicoke. So let's slice the original dataframe and create a new dataframe of the Etobicoke data.  

In [45]:
etobicoke_data = Toronto1[Toronto1['Borough'] == 'Etobicoke'].reset_index(drop=True)
etobicoke_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M9A,Etobicoke,Islington Avenue,43.744734,-79.239476
1,M9B,Etobicoke,"Cloverdale,Islington,Martin Grove,Princess Gar...",43.750072,-79.295849
2,M9C,Etobicoke,"Bloordale Gardens,Eringate,Markland Wood,Old B...",43.803762,-79.363452
3,M9P,Etobicoke,Westmount,43.648429,-79.38228
4,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.647927,-79.41975
5,M8V,Etobicoke,"Humber Bay Shores,Mimico South,New Toronto",43.605647,-79.501321
6,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.602414,-79.543484
7,M8W,Etobicoke,"Alderwood,Long Branch",43.667856,-79.532242
8,M9W,Etobicoke,Northwest,43.650943,-79.554724
9,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North",43.706876,-79.518188


Let's get the geographical coordinates of Etobicoke.

In [47]:
address = 'Etobicoke, Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Etobicoke are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Etobicoke are 43.6435559, -79.5656326.


As we did with all of Toronto, let's visualize the neighborhoods of Etobicoke.

In [49]:
# create map of Etobicoke using latitude and longitude values
map_etobicoke = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(etobicoke_data['Latitude'], etobicoke_data['Longitude'], etobicoke_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_etobicoke)  
    
map_etobicoke

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [50]:
#Define Foursquare Credentials and Version 
CLIENT_ID = 'RNOZQCLVFPDRIZTKLDYFOPDRHJDZD0WAXRIYSECCTPOP2VMF' # your Foursquare ID
CLIENT_SECRET = 'XB3BVV05SCJF3LYM5YWXZFJAKUOSKSEYT44IF31H11SWCVQN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RNOZQCLVFPDRIZTKLDYFOPDRHJDZD0WAXRIYSECCTPOP2VMF
CLIENT_SECRET:XB3BVV05SCJF3LYM5YWXZFJAKUOSKSEYT44IF31H11SWCVQN


Let's first explore the first neighborhood in our dataframe. 

Get the neighborhood's name. 

In [51]:
etobicoke_data.loc[0, 'Neighborhood']

'Islington Avenue'

Get the neighborhood's latitude and longitude values.

In [52]:
neighborhood_latitude = etobicoke_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = etobicoke_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = etobicoke_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Latitude and longitude values of Islington Avenue are 43.7447342, -79.23947609999999.


Since Etobicoke is a small neighborhood, let's get the top 100 venues within a radius of 2500 meters.

First, let's create the GET request URL. We name the url 'url'.

In [85]:
radius = 2500 # define radius
limit = 100
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=RNOZQCLVFPDRIZTKLDYFOPDRHJDZD0WAXRIYSECCTPOP2VMF&client_secret=XB3BVV05SCJF3LYM5YWXZFJAKUOSKSEYT44IF31H11SWCVQN&v=20180605&ll=43.7447342,-79.23947609999999&radius=2500&limit=100'

Send the GET request and examine the results. 

In [86]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c601a3f4434b95f3b3dfdee'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b5cc7c9f964a520a84329e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1ca941735',
         'name': 'Pizza Place',
         'pluralName': 'Pizza Places',
         'primary': True,
         'shortName': 'Pizza'}],
       'id': '4b5cc7c9f964a520a84329e3',
       'location': {'address': 'Danforth Road',
        'cc': 'CA',
        'city': 'Scarborough',
        'country': 'Canada',
        'distance': 530,
        'formattedAddress': ['Danforth Road', 'Scarborough ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.743699384861635,
          'lng': -79.2459

We know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [87]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [88]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Diamond Pizza,Pizza Place,43.743699,-79.245922
1,Anjappar Authentic Chettinadu Restaurant,Indian Restaurant,43.741592,-79.226799
2,Charcoal Kebab House,Chinese Restaurant,43.757348,-79.238842
3,The Comic Room,Bookstore,43.754258,-79.244319
4,Thomson Memorial Park,Park,43.758891,-79.2545


We can check how many venues were returned by Foursquare. 

In [89]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

99 venues were returned by Foursquare.


Now let's create a function to repeat the same process to all the neighborhoods in Etobicoke. 

In [90]:
def getNearbyVenues(names, latitudes, longitudes, radius=2500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues.

In [91]:
etobicoke_venues = getNearbyVenues(names=etobicoke_data['Neighborhood'],
                                   latitudes=etobicoke_data['Latitude'],
                                   longitudes=etobicoke_data['Longitude']
                                  )

Islington Avenue
Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park
Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe
Westmount
Kingsview Village,Martin Grove Gardens,Richview Gardens,St. Phillips
Humber Bay Shores,Mimico South,New Toronto
Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown
Alderwood,Long Branch
Northwest
The Kingsway,Montgomery Road,Old Mill North
Humber Bay,King's Mill Park,Kingsway Park South East,Mimico NE,Old Mill South,The Queensway East,Royal York South East,Sunnylea
Kingsway Park South West,Mimico NW,The Queensway West,Royal York South West,South of Bloor


Let's check the size of the resulting dataframe. 

In [93]:
print(etobicoke_venues.shape)
etobicoke_venues.head()

(1037, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Islington Avenue,43.744734,-79.239476,Diamond Pizza,43.743699,-79.245922,Pizza Place
1,Islington Avenue,43.744734,-79.239476,Anjappar Authentic Chettinadu Restaurant,43.741592,-79.226799,Indian Restaurant
2,Islington Avenue,43.744734,-79.239476,Charcoal Kebab House,43.757348,-79.238842,Chinese Restaurant
3,Islington Avenue,43.744734,-79.239476,The Comic Room,43.754258,-79.244319,Bookstore
4,Islington Avenue,43.744734,-79.239476,Thomson Memorial Park,43.758891,-79.2545,Park


Let's check how many venues were returned for each neighborhood

In [94]:
etobicoke_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",100,100,100,100,100,100
"Alderwood,Long Branch",45,45,45,45,45,45
"Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe",67,67,67,67,67,67
"Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park",100,100,100,100,100,100
"Humber Bay Shores,Mimico South,New Toronto",100,100,100,100,100,100
"Humber Bay,King's Mill Park,Kingsway Park South East,Mimico NE,Old Mill South,The Queensway East,Royal York South East,Sunnylea",67,67,67,67,67,67
Islington Avenue,99,99,99,99,99,99
"Kingsview Village,Martin Grove Gardens,Richview Gardens,St. Phillips",100,100,100,100,100,100
"Kingsway Park South West,Mimico NW,The Queensway West,Royal York South West,South of Bloor",91,91,91,91,91,91
Northwest,100,100,100,100,100,100


Let's find out how many unique categories can be curated from all the returned venues

In [95]:
print('There are {} uniques categories.'.format(len(etobicoke_venues['Venue Category'].unique())))

There are 176 uniques categories.


Let's analyze each neighborhood

In [108]:
# one hot encoding
etobicoke_onehot = pd.get_dummies(etobicoke_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column
etobicoke_onehot['Neighborhood'] = etobicoke_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(etobicoke_onehot.columns)
cols = ['Neighborhood'] + cols[:-1]
etobicoke_onehot = etobicoke_onehot[cols]

etobicoke_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Aquarium,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Workshop,Automotive Shop,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beer Bar,Beer Store,Big Box Store,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Cantonese Restaurant,Caribbean Restaurant,Casino,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Rec Center,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hardware Store,History Museum,Hockey Arena,Home Service,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Laser Tag,Latin American Restaurant,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Movie Theater,Museum,Neighborhood.1,New American Restaurant,Nightclub,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool Hall,Portuguese Restaurant,Pub,Racecourse,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Stadium,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Taxi,Tea Room,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store
0,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Islington Avenue,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Islington Avenue,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Islington Avenue,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [106]:
etobicoke_onehot.shape

(1037, 176)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [107]:
etobicoke_grouped = etobicoke_onehot.groupby('Neighborhood').mean().reset_index()
etobicoke_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Aquarium,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Workshop,Automotive Shop,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beer Bar,Beer Store,Big Box Store,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Cantonese Restaurant,Caribbean Restaurant,Casino,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Rec Center,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hardware Store,History Museum,Hockey Arena,Home Service,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Laser Tag,Latin American Restaurant,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Movie Theater,Museum,New American Restaurant,Nightclub,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool Hall,Portuguese Restaurant,Pub,Racecourse,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Stadium,South American Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Taxi,Tea Room,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.03,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.04,0.0,0.03,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.044444,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.066667,0.022222,0.0,0.044444,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.044444,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.044444,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bloordale Gardens,Eringate,Markland Wood,Old B...",0.0,0.0,0.0,0.0,0.0,0.0,0.044776,0.0,0.0,0.014925,0.0,0.074627,0.029851,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.014925,0.0,0.0,0.014925,0.014925,0.029851,0.0,0.059701,0.0,0.0,0.0,0.0,0.0,0.074627,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.014925,0.014925,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0,0.029851,0.029851,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.059701,0.0,0.0,0.029851,0.0,0.0,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074627,0.0,0.0,0.014925,0.029851,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044776,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cloverdale,Islington,Martin Grove,Princess Gar...",0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.03,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0
4,"Humber Bay Shores,Mimico South,New Toronto",0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01


Let´s confirm the new size

In [110]:
etobicoke_grouped.shape

(12, 176)

Let's print each neighborhood along with the top 5 most common venues

In [111]:
num_top_venues = 5

for hood in etobicoke_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = etobicoke_grouped[etobicoke_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown----
                  venue  freq
0           Coffee Shop  0.09
1  Fast Food Restaurant  0.06
2            Restaurant  0.05
3           Pizza Place  0.04
4        Sandwich Place  0.04


----Alderwood,Long Branch----
           venue  freq
0    Coffee Shop  0.09
1       Pharmacy  0.07
2    Golf Course  0.07
3    Pizza Place  0.07
4  Grocery Store  0.04


----Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe----
                 venue  freq
0                 Park  0.07
1          Coffee Shop  0.07
2               Bakery  0.07
3  Japanese Restaurant  0.06
4   Chinese Restaurant  0.06


----Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park----
                       venue  freq
0                Coffee Shop  0.09
1  Middle Eastern Restaurant  0.06
2         Chinese Restaurant  0.04
3             Sandwich Place  0.04
4               Burger Joint  0.04


----Hum

Let´s put that into a pandas dataframe.

First, let's write a function to sort the venues in descending order.

In [113]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [115]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = etobicoke_grouped['Neighborhood']

for ind in np.arange(etobicoke_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(etobicoke_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Coffee Shop,Fast Food Restaurant,Restaurant,Pizza Place,Sandwich Place,Furniture / Home Store,Café,Seafood Restaurant,Breakfast Spot,Burger Joint
1,"Alderwood,Long Branch",Coffee Shop,Pizza Place,Golf Course,Pharmacy,Shopping Mall,Supermarket,Bank,Bakery,Liquor Store,Grocery Store
2,"Bloordale Gardens,Eringate,Markland Wood,Old B...",Coffee Shop,Bakery,Park,Chinese Restaurant,Japanese Restaurant,Sandwich Place,Asian Restaurant,Korean Restaurant,Caribbean Restaurant,Grocery Store
3,"Cloverdale,Islington,Martin Grove,Princess Gar...",Coffee Shop,Middle Eastern Restaurant,Fast Food Restaurant,Sandwich Place,Chinese Restaurant,Burger Joint,Pet Store,Pharmacy,Pizza Place,Supermarket
4,"Humber Bay Shores,Mimico South,New Toronto",Coffee Shop,Park,Sandwich Place,Restaurant,Sushi Restaurant,Italian Restaurant,Supermarket,Bakery,Pizza Place,Café


We run K-means to cluster the neighborhood into 5 clusters. 

In [116]:
# set number of clusters
kclusters = 5

etobicoke_grouped_clustering = etobicoke_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(etobicoke_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 0, 4, 4, 1, 1, 2, 3, 4], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [117]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

etobicoke_merged = etobicoke_data

# merge etobicoke_grouped with etobicoke_data to add latitude/longitude for each neighborhood
etobicoke_merged = etobicoke_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

etobicoke_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M9A,Etobicoke,Islington Avenue,43.744734,-79.239476,1,Fast Food Restaurant,Coffee Shop,Pizza Place,Pharmacy,Grocery Store,Sandwich Place,Chinese Restaurant,Bank,Beer Store,Liquor Store
1,M9B,Etobicoke,"Cloverdale,Islington,Martin Grove,Princess Gar...",43.750072,-79.295849,4,Coffee Shop,Middle Eastern Restaurant,Fast Food Restaurant,Sandwich Place,Chinese Restaurant,Burger Joint,Pet Store,Pharmacy,Pizza Place,Supermarket
2,M9C,Etobicoke,"Bloordale Gardens,Eringate,Markland Wood,Old B...",43.803762,-79.363452,0,Coffee Shop,Bakery,Park,Chinese Restaurant,Japanese Restaurant,Sandwich Place,Asian Restaurant,Korean Restaurant,Caribbean Restaurant,Grocery Store
3,M9P,Etobicoke,Westmount,43.648429,-79.38228,2,Hotel,Coffee Shop,Theater,Café,Concert Hall,Thai Restaurant,Restaurant,Sporting Goods Shop,Bookstore,Diner
4,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.647927,-79.41975,2,Café,Bar,Bakery,Pizza Place,Italian Restaurant,Cocktail Bar,Coffee Shop,Asian Restaurant,Yoga Studio,New American Restaurant


Finally, let's visualize the resulting clusters

In [118]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(etobicoke_merged['Latitude'], etobicoke_merged['Longitude'], 
    etobicoke_merged['Neighborhood'], etobicoke_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. 

#### Cluster 1

In [120]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 0, 
etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Etobicoke,0,Coffee Shop,Bakery,Park,Chinese Restaurant,Japanese Restaurant,Sandwich Place,Asian Restaurant,Korean Restaurant,Caribbean Restaurant,Grocery Store


The first cluster is a cluster where you can have something to eat. 

#### Cluster 2

In [121]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 1, 
etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Etobicoke,1,Fast Food Restaurant,Coffee Shop,Pizza Place,Pharmacy,Grocery Store,Sandwich Place,Chinese Restaurant,Bank,Beer Store,Liquor Store
9,Etobicoke,1,Coffee Shop,Pizza Place,Sandwich Place,Fast Food Restaurant,Grocery Store,Pharmacy,Convenience Store,Vietnamese Restaurant,Bakery,Supermarket
10,Etobicoke,1,Coffee Shop,Fast Food Restaurant,Indian Restaurant,Pizza Place,Grocery Store,Bank,Pharmacy,Chinese Restaurant,Sandwich Place,Park


The second cluster contains places to eat but also grocery stores. 

#### Cluster 3

In [123]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 2, 
etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Etobicoke,2,Hotel,Coffee Shop,Theater,Café,Concert Hall,Thai Restaurant,Restaurant,Sporting Goods Shop,Bookstore,Diner
4,Etobicoke,2,Café,Bar,Bakery,Pizza Place,Italian Restaurant,Cocktail Bar,Coffee Shop,Asian Restaurant,Yoga Studio,New American Restaurant


The third cluster contains places to sleep, places to eat and some entertainment like theaters. 

#### Cluster 4

In [125]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 3, 
etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,3,Hotel,American Restaurant,Coffee Shop,Sandwich Place,Fast Food Restaurant,Steakhouse,Pizza Place,Mediterranean Restaurant,Bank,Café


The fourth cluster contains a very nice hotel and some places to eat.

#### Cluster 5

In [126]:
etobicoke_merged.loc[etobicoke_merged['Cluster Labels'] == 4, 
etobicoke_merged.columns[[1] + list(range(5, etobicoke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Etobicoke,4,Coffee Shop,Middle Eastern Restaurant,Fast Food Restaurant,Sandwich Place,Chinese Restaurant,Burger Joint,Pet Store,Pharmacy,Pizza Place,Supermarket
5,Etobicoke,4,Coffee Shop,Park,Sandwich Place,Restaurant,Sushi Restaurant,Italian Restaurant,Supermarket,Bakery,Pizza Place,Café
6,Etobicoke,4,Coffee Shop,Fast Food Restaurant,Restaurant,Pizza Place,Sandwich Place,Furniture / Home Store,Café,Seafood Restaurant,Breakfast Spot,Burger Joint
7,Etobicoke,4,Coffee Shop,Pizza Place,Golf Course,Pharmacy,Shopping Mall,Supermarket,Bank,Bakery,Liquor Store,Grocery Store
8,Etobicoke,4,Coffee Shop,Convenience Store,Bank,Fast Food Restaurant,Sandwich Place,Grocery Store,Pizza Place,Pharmacy,Café,Beer Store


The fifth and final cluster is definately a place where you can get the best coffee in town!


### This concludes my assignment. Hope you enjoyed it! 