   # Downtown Toronto Cocktails

I'm looking to move to downtown Toronto. Before I do though, I want to evaluate the various beverage venues in the area. Understanding the types and quantities of beverage venues amongst the downtown neighborhoods will help me narrow my housing search.

This workbook will use provided Toronto postal code data and downloaded place information from Foursquare. The data will be analyzed, prepared, and subsequently modeled using k-means clustering, a common unsupervised machine learning algorithm. The results will depict a grouped location set that can easily be mapped.

### Contents
    1. Download and Explore Postal Codes
    2. Explore Venue Data from Foursquare
    3. Analyze Venues in Each Neighborhood
    4. Cluster Neighborhoods
    5. Evaluate Clusters
    6. Wrap-up

First, load needed dependencies.

In [1]:
import pandas as pd
import numpy as np

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium
import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    folium-0.11.0              |             py_0          61 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    branca:          0.4.1-py_0        conda-forge
    folium:          

## 1. Download and Explore Postal Codes

#### Scrape Toronto postal code data from the web and create a dataframe.

In [2]:
web_tables = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
postal_codes = web_tables[0]

View the size and partial dataframe values.

In [3]:
print (postal_codes.shape)

postal_codes.head()

(180, 3)


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Remove the "Not assigned" boroughs.

In [4]:
postal_codes = postal_codes[postal_codes["Borough"] != "Not assigned"].reset_index(drop = True)

postal_codes.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Check if any values are still missing.

In [5]:
postal_codes.isnull().values.any()

False

Check out basic counts and shape of dataframe.

In [6]:
postal_codes.describe()

Unnamed: 0,Postal Code,Borough,Neighborhood
count,103,103,103
unique,103,10,99
top,M9P,North York,Downsview
freq,1,24,4


#### Find the coordinates of each postal code using the provided .csv file because geocoder was unreliable.

In [7]:
lat_long_df = pd.read_csv("https://cocl.us/Geospatial_data")

print(lat_long_df.shape)

lat_long_df.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Let's merge the two dataframes and check out the result.

In [8]:
postal_codes = postal_codes.join(lat_long_df.set_index("Postal Code"), on="Postal Code")
            
print(postal_codes.shape)

postal_codes.head()

(103, 5)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


#### Use a map to display the postal codes and their respective boroughs and neighborhoods.

In [9]:
#Toronto coordinates
latitude = 43.653316
longitude = -79.384100

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, postal_code, borough in zip(postal_codes['Latitude'], postal_codes['Longitude'], postal_codes['Postal Code'], postal_codes['Borough']):
    label = '{}, {}'.format(postal_code, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### That's a lot of postal codes! Let's focus on Downtown Toronto.

Create a new (focused) dataframe to work with.

In [10]:
dt_postal_codes = postal_codes[postal_codes["Borough"] == "Downtown Toronto"].reset_index(drop=True)

dt_postal_codes

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


Create a downtown postal code map.

In [11]:
# Latitude and longitude values were defined above.
map_downtown_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postal_code, borough in zip(dt_postal_codes['Latitude'], dt_postal_codes['Longitude'], dt_postal_codes['Postal Code'], dt_postal_codes['Borough']):
    label = '{}, {}'.format(postal_code, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  

map_downtown_toronto

The M5V postal code appears to be centered within the Billy Bishop airport. Because moving to the middle of an airport is highly unlikely, let's move the M5V postal code across the channel to Little Norway Park (43.634776, -79.398437).

In [12]:
dt_postal_codes.at[13, "Latitude"] = 43.634776
dt_postal_codes.at[13, "Longitude"] = -79.398437

Redraw the downtown postal code map. Also, let's display the neighborhood(s) instead of the borough. We already know the borough (Downtown Toronto).

In [13]:
# Latitude and longitude values were defined above.
map_downtown_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postal_code, neighborhood in zip(dt_postal_codes['Latitude'], dt_postal_codes['Longitude'], dt_postal_codes['Postal Code'], dt_postal_codes['Neighborhood']):
    label = '{}, {}'.format(postal_code, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  

map_downtown_toronto

That's better.

## 2. Explore Venue Data from Foursquare

#### Use the Foursquare API to download venue data.

In [37]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


Create a function to collect venues within a specific area.

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius, LIMIT):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now, let's get the top 100 venues that are centered within these postal codes within a 1000 meter radius of each postal code (neighborhood(s)).

In [16]:
dt_venues = getNearbyVenues(names = dt_postal_codes['Neighborhood'],
                                   latitudes = dt_postal_codes['Latitude'],
                                   longitudes = dt_postal_codes['Longitude'],
                                   radius = 1000,
                                   LIMIT = 100
                                  )


Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


Let's look at the initial data.

In [17]:
print (dt_venues.shape)

dt_venues.head()

(1736, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
3,"Regent Park, Harbourfront",43.65426,-79.360636,The Distillery Historic District,43.650244,-79.359323,Historic Site
4,"Regent Park, Harbourfront",43.65426,-79.360636,Corktown Common,43.655618,-79.356211,Park


Let's check how many venues were returned for each neighborhood.

In [18]:
dt_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",78,78,78,78,78,78
Central Bay Street,100,100,100,100,100,100
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100
"Kensington Market, Chinatown, Grange Park",100,100,100,100,100,100


Let's find out how many unique categories can be curated from all the returned venues.

In [19]:
print('There are {} uniques categories.'.format(len(dt_venues['Venue Category'].unique())))

There are 208 uniques categories.


Let's list the venue categories.

In [20]:
sorted(dt_venues['Venue Category'].unique())

['Airport',
 'Airport Food Court',
 'Airport Lounge',
 'Airport Service',
 'Airport Terminal',
 'American Restaurant',
 'Animal Shelter',
 'Aquarium',
 'Art Gallery',
 'Art Museum',
 'Arts & Crafts Store',
 'Asian Restaurant',
 'Athletics & Sports',
 'Auto Dealership',
 'BBQ Joint',
 'Bakery',
 'Bank',
 'Bar',
 'Baseball Stadium',
 'Basketball Stadium',
 'Beach',
 'Beer Bar',
 'Beer Store',
 'Belgian Restaurant',
 'Bike Shop',
 'Bistro',
 'Bookstore',
 'Botanical Garden',
 'Brazilian Restaurant',
 'Breakfast Spot',
 'Brewery',
 'Bridal Shop',
 'Bubble Tea Shop',
 'Burger Joint',
 'Burrito Place',
 'Café',
 'Camera Store',
 'Candy Store',
 'Caribbean Restaurant',
 'Cheese Shop',
 'Chinese Restaurant',
 'Chocolate Shop',
 'Clothing Store',
 'Cocktail Bar',
 'Coffee Shop',
 'College Gym',
 'College Rec Center',
 'College Theater',
 'Comedy Club',
 'Comfort Food Restaurant',
 'Comic Shop',
 'Concert Hall',
 'Convenience Store',
 'Cosmetics Shop',
 'Creperie',
 'Cupcake Shop',
 'Dance Studi

#### Now let's create a separate dataframe consisting of various beverage venues from the dt_venues data.

In [21]:
dt_booze = dt_venues[(dt_venues["Venue Category"] == "Bar") |
                     (dt_venues["Venue Category"] == "Beer Bar") |
                     (dt_venues["Venue Category"] == "Beer Store") |
                     (dt_venues["Venue Category"] == "Bistro") |
                     (dt_venues["Venue Category"] == "Brewery") |
                     (dt_venues["Venue Category"] == "Cocktail Bar") |
                     (dt_venues["Venue Category"] == "Gastropub") |
                     (dt_venues["Venue Category"] == "Gay Bar") |
                     (dt_venues["Venue Category"] == "Hotel Bar") |
                     (dt_venues["Venue Category"] == "Irish Pub") |
                     (dt_venues["Venue Category"] == "Jazz Club") |
                     (dt_venues["Venue Category"] == "Karaoke Bar") |
                     (dt_venues["Venue Category"] == "Liquor Store") |
                     (dt_venues["Venue Category"] == "Lounge") |
                     (dt_venues["Venue Category"] == "Music Venue") |
                     (dt_venues["Venue Category"] == "Nightclub") |
                     (dt_venues["Venue Category"] == "Pub") |
                     (dt_venues["Venue Category"] == "Sake Bar") |
                     (dt_venues["Venue Category"] == "Speakeasy") |
                     (dt_venues["Venue Category"] == "Sports Bar") |
                     (dt_venues["Venue Category"] == "Wine Bar")
                    ].reset_index(drop=True)

print(dt_booze.shape)

dt_booze.head()

(159, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,LCBO,43.650982,-79.365361,Liquor Store
1,"Regent Park, Harbourfront",43.65426,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub
2,"Regent Park, Harbourfront",43.65426,-79.360636,Berkeley Bistro,43.64996,-79.363888,Gastropub
3,"Regent Park, Harbourfront",43.65426,-79.360636,Mill St. Brew Pub,43.650353,-79.358489,Pub
4,"Regent Park, Harbourfront",43.65426,-79.360636,The Aviary,43.653634,-79.354662,Pub


## 3. Analyze Venues in Each Neighborhood

First, create binary values for each venue category.

In [22]:
# one hot encoding
dt_onehot = pd.get_dummies(dt_booze[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighborhood'] = dt_booze['Neighborhood'] 

# move neighborhood column to the first column
first_column = dt_onehot.pop("Neighborhood")
dt_onehot.insert(0, "Neighborhood", first_column)

dt_onehot.head()

Unnamed: 0,Neighborhood,Bar,Beer Bar,Beer Store,Bistro,Brewery,Cocktail Bar,Gastropub,Gay Bar,Hotel Bar,Jazz Club,Karaoke Bar,Liquor Store,Lounge,Music Venue,Pub,Sake Bar,Speakeasy,Sports Bar,Wine Bar
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


And let's examine the new dataframe size.

In [23]:
dt_onehot.shape

(159, 20)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [24]:
dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()
dt_grouped

Unnamed: 0,Neighborhood,Bar,Beer Bar,Beer Store,Bistro,Brewery,Cocktail Bar,Gastropub,Gay Bar,Hotel Bar,Jazz Club,Karaoke Bar,Liquor Store,Lounge,Music Venue,Pub,Sake Bar,Speakeasy,Sports Bar,Wine Bar
0,Berczy Park,0.0,0.166667,0.0,0.083333,0.0,0.166667,0.166667,0.0,0.0,0.083333,0.0,0.166667,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.166667,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0
2,Central Bay Street,0.166667,0.166667,0.0,0.0,0.0,0.0,0.333333,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0
3,Christie,0.181818,0.0,0.0,0.0,0.0,0.272727,0.090909,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.090909,0.0,0.0,0.090909,0.090909
4,Church and Wellesley,0.142857,0.142857,0.0,0.0,0.0,0.0,0.142857,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0
5,"Commerce Court, Victoria Hotel",0.0,0.1,0.0,0.0,0.0,0.2,0.3,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.166667,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.166667,0.0,0.0
7,"Garden District, Ryerson",0.111111,0.111111,0.0,0.0,0.0,0.0,0.555556,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.1,0.0,0.1,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.1,0.0,0.1,0.1,0.0
9,"Kensington Market, Chinatown, Grange Park",0.538462,0.153846,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923


Let's print each neighborhood along with the top 5 most common venues.

In [25]:
num_top_venues = 5

for hood in dt_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dt_grouped[dt_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
          venue  freq
0  Cocktail Bar  0.17
1     Gastropub  0.17
2      Beer Bar  0.17
3  Liquor Store  0.17
4     Jazz Club  0.08


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
        venue  freq
0         Bar  0.17
1  Beer Store  0.17
2     Brewery  0.17
3   Speakeasy  0.17
4         Pub  0.17


----Central Bay Street----
         venue  freq
0    Gastropub  0.33
1          Bar  0.17
2      Gay Bar  0.17
3  Music Venue  0.17
4     Beer Bar  0.17


----Christie----
          venue  freq
0  Cocktail Bar  0.27
1           Bar  0.18
2   Karaoke Bar  0.09
3    Sports Bar  0.09
4           Pub  0.09


----Church and Wellesley----
       venue  freq
0    Gay Bar  0.29
1        Bar  0.14
2   Sake Bar  0.14
3        Pub  0.14
4  Gastropub  0.14


----Commerce Court, Victoria Hotel----
          venue  freq
0     Gastropub   0.3
1  Cocktail Bar   0.2
2     Jazz Club   0.1
3     Speakeasy   0.1
4        

#### Let's put that into a dataframe.

First, let's write a function to sort the venues in descending order.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']

for ind in np.arange(dt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Beer Bar,Liquor Store,Cocktail Bar,Gastropub,Jazz Club,Pub,Bistro,Lounge,Gay Bar,Beer Store
1,"CN Tower, King and Spadina, Railway Lands, Har...",Bar,Speakeasy,Pub,Beer Store,Brewery,Hotel Bar,Gay Bar,Beer Bar,Bistro,Cocktail Bar
2,Central Bay Street,Gastropub,Bar,Beer Bar,Music Venue,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
3,Christie,Cocktail Bar,Bar,Liquor Store,Gastropub,Sports Bar,Karaoke Bar,Wine Bar,Pub,Lounge,Music Venue
4,Church and Wellesley,Gay Bar,Bar,Beer Bar,Sake Bar,Pub,Gastropub,Beer Store,Bistro,Brewery,Cocktail Bar
5,"Commerce Court, Victoria Hotel",Gastropub,Cocktail Bar,Jazz Club,Speakeasy,Beer Bar,Pub,Lounge,Gay Bar,Beer Store,Bistro
6,"First Canadian Place, Underground city",Speakeasy,Beer Bar,Pub,Lounge,Cocktail Bar,Gastropub,Wine Bar,Gay Bar,Beer Store,Bistro
7,"Garden District, Ryerson",Gastropub,Bar,Beer Bar,Music Venue,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
8,"Harbourfront East, Union Station, Toronto Islands",Brewery,Speakeasy,Beer Bar,Pub,Lounge,Liquor Store,Bistro,Sports Bar,Wine Bar,Gastropub
9,"Kensington Market, Chinatown, Grange Park",Bar,Beer Bar,Cocktail Bar,Wine Bar,Speakeasy,Jazz Club,Pub,Music Venue,Lounge,Liquor Store


## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 6 clusters.

In [28]:
# set number of clusters
kclusters = 6

# To perform the clustring, the "Neighborhood" column needs to be removed.
dt_grouped_clustering = dt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:18] 

array([0, 3, 4, 0, 4, 0, 0, 4, 3, 2, 4, 3, 0, 1, 0, 5, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dt_merged = dt_postal_codes

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dt_merged = dt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

dt_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,3,Pub,Gastropub,Bar,Beer Store,Brewery,Liquor Store,Karaoke Bar,Gay Bar,Beer Bar,Bistro
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4,Gastropub,Bar,Beer Bar,Pub,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,4,Gastropub,Bar,Beer Bar,Music Venue,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Gastropub,Cocktail Bar,Jazz Club,Pub,Music Venue,Bistro,Liquor Store,Gay Bar,Beer Bar,Beer Store
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Beer Bar,Liquor Store,Cocktail Bar,Gastropub,Jazz Club,Pub,Bistro,Lounge,Gay Bar,Beer Store
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,4,Gastropub,Bar,Beer Bar,Music Venue,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,0,Cocktail Bar,Bar,Liquor Store,Gastropub,Sports Bar,Karaoke Bar,Wine Bar,Pub,Lounge,Music Venue
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Beer Bar,Speakeasy,Pub,Gastropub,Wine Bar,Gay Bar,Beer Store,Bistro,Brewery,Cocktail Bar
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,3,Brewery,Speakeasy,Beer Bar,Pub,Lounge,Liquor Store,Bistro,Sports Bar,Wine Bar,Gastropub
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,0,Cocktail Bar,Jazz Club,Speakeasy,Beer Bar,Pub,Lounge,Bistro,Brewery,Gastropub,Gay Bar


Finally, let's visualize the resulting clusters.

In [30]:
# create map. Coordinates declared above.
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighborhood'], dt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster + 1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

Cluster 1 (Red)

In [31]:
dt_merged.loc[dt_merged['Cluster Labels'] == 0, dt_merged.columns[[2] + list(range(6, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,Gastropub,Cocktail Bar,Jazz Club,Pub,Music Venue,Bistro,Liquor Store,Gay Bar,Beer Bar,Beer Store
4,Berczy Park,Beer Bar,Liquor Store,Cocktail Bar,Gastropub,Jazz Club,Pub,Bistro,Lounge,Gay Bar,Beer Store
6,Christie,Cocktail Bar,Bar,Liquor Store,Gastropub,Sports Bar,Karaoke Bar,Wine Bar,Pub,Lounge,Music Venue
7,"Richmond, Adelaide, King",Beer Bar,Speakeasy,Pub,Gastropub,Wine Bar,Gay Bar,Beer Store,Bistro,Brewery,Cocktail Bar
9,"Toronto Dominion Centre, Design Exchange",Cocktail Bar,Jazz Club,Speakeasy,Beer Bar,Pub,Lounge,Bistro,Brewery,Gastropub,Gay Bar
10,"Commerce Court, Victoria Hotel",Gastropub,Cocktail Bar,Jazz Club,Speakeasy,Beer Bar,Pub,Lounge,Gay Bar,Beer Store,Bistro
15,Stn A PO Boxes,Cocktail Bar,Gastropub,Beer Bar,Jazz Club,Pub,Bistro,Lounge,Liquor Store,Gay Bar,Beer Store
17,"First Canadian Place, Underground city",Speakeasy,Beer Bar,Pub,Lounge,Cocktail Bar,Gastropub,Wine Bar,Gay Bar,Beer Store,Bistro


Cluster 2 (Purple)

In [32]:
dt_merged.loc[dt_merged['Cluster Labels'] == 1, dt_merged.columns[[2] + list(range(6, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Rosedale,Bistro,Wine Bar,Hotel Bar,Beer Bar,Beer Store,Brewery,Cocktail Bar,Gastropub,Gay Bar,Jazz Club


Cluster 3 (Blue)

In [33]:
dt_merged.loc[dt_merged['Cluster Labels'] == 2, dt_merged.columns[[2] + list(range(6, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,"University of Toronto, Harbord",Bar,Beer Bar,Pub,Jazz Club,Sake Bar,Music Venue,Lounge,Liquor Store,Karaoke Bar,Sports Bar
12,"Kensington Market, Chinatown, Grange Park",Bar,Beer Bar,Cocktail Bar,Wine Bar,Speakeasy,Jazz Club,Pub,Music Venue,Lounge,Liquor Store


Cluster 4 (Sea Green)

In [34]:
dt_merged.loc[dt_merged['Cluster Labels'] == 3, dt_merged.columns[[2] + list(range(6, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Pub,Gastropub,Bar,Beer Store,Brewery,Liquor Store,Karaoke Bar,Gay Bar,Beer Bar,Bistro
8,"Harbourfront East, Union Station, Toronto Islands",Brewery,Speakeasy,Beer Bar,Pub,Lounge,Liquor Store,Bistro,Sports Bar,Wine Bar,Gastropub
13,"CN Tower, King and Spadina, Railway Lands, Har...",Bar,Speakeasy,Pub,Beer Store,Brewery,Hotel Bar,Gay Bar,Beer Bar,Bistro,Cocktail Bar


Cluster 5 (Pale Green)

In [35]:
dt_merged.loc[dt_merged['Cluster Labels'] == 4, dt_merged.columns[[2] + list(range(6, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Queen's Park, Ontario Provincial Government",Gastropub,Bar,Beer Bar,Pub,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
2,"Garden District, Ryerson",Gastropub,Bar,Beer Bar,Music Venue,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
5,Central Bay Street,Gastropub,Bar,Beer Bar,Music Venue,Gay Bar,Hotel Bar,Beer Store,Bistro,Brewery,Cocktail Bar
18,Church and Wellesley,Gay Bar,Bar,Beer Bar,Sake Bar,Pub,Gastropub,Beer Store,Bistro,Brewery,Cocktail Bar


Cluster 6 (Orange)

In [36]:
dt_merged.loc[dt_merged['Cluster Labels'] == 5, dt_merged.columns[[2] + list(range(6, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,"St. James Town, Cabbagetown",Gastropub,Pub,Wine Bar,Hotel Bar,Beer Bar,Beer Store,Bistro,Brewery,Cocktail Bar,Gay Bar


## 5. Wrap-up

Now that clustering is comlplete, it appears Cluster 4 (Sea Green) fits my preferences for beverage venues.

These neighborhoods are also nearest Lake Ontario, so they are likely the most expensive to live in as well. Budget considerations were outside the scope of this analysis.