# Segmentation and Clustering of Neighborhoods in Toronto - Capstone Project

This Notebook will be used for the assignments in the the Applied Data Science Captsone Module of the IBM Data Science Certificate 

## First section of the assignment : Postal Code, Borough and Neighborhood Dataframe

First, we import the needed libraries for the exercise

In [1]:
import numpy as np
import pandas as pd

from urllib.request import urlopen
from bs4 import BeautifulSoup

In [2]:
# We use the BeautifulSoup library to parse the wiki page html code

link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
def getHTMLContent(link):
    html = urlopen(link)
    soup = BeautifulSoup(html, 'html.parser')
    return soup

In [3]:
# parsing of the html code to understand how the data is structured

content = getHTMLContent('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
tables = content.find_all('table')
for table in tables:
    print(table.prettify())

<table class="wikitable sortable">
 <tbody>
  <tr>
   <th>
    Postcode
   </th>
   <th>
    Borough
   </th>
   <th>
    Neighborhood
   </th>
  </tr>
  <tr>
   <td>
    M1A
   </td>
   <td>
    Not assigned
   </td>
   <td>
    Not assigned
   </td>
  </tr>
  <tr>
   <td>
    M2A
   </td>
   <td>
    Not assigned
   </td>
   <td>
    Not assigned
   </td>
  </tr>
  <tr>
   <td>
    M3A
   </td>
   <td>
    <a href="/wiki/North_York" title="North York">
     North York
    </a>
   </td>
   <td>
    <a href="/wiki/Parkwoods" title="Parkwoods">
     Parkwoods
    </a>
   </td>
  </tr>
  <tr>
   <td>
    M4A
   </td>
   <td>
    <a href="/wiki/North_York" title="North York">
     North York
    </a>
   </td>
   <td>
    <a href="/wiki/Victoria_Village" title="Victoria Village">
     Victoria Village
    </a>
   </td>
  </tr>
  <tr>
   <td>
    M5A
   </td>
   <td>
    <a href="/wiki/Downtown_Toronto" title="Downtown Toronto">
     Downtown Toronto
    </a>
   </td>
   <td>
    <a href="/

In [4]:
# The table is the only one with the type 'wikitable sortable' so we target this table to get the data in three different lists that will be our dataframe columns
table = content.find('table', {'class': 'wikitable sortable'})
rows = table.find_all('tr')

Postal_code = []
Borough = []
Neighborhood = []
for row in rows:
    cells = row.find_all('td')
    if len(cells) > 1:
        Postal_code.append(cells[0].text)
        Borough.append(cells[1].text)
        Neighborhood.append(cells[2].text.strip())

# create the dataframe based on the created columns
data_content = {'Postal_Code': Postal_code, 'Borough': Borough, 'Neighborhood': Neighborhood }
df = pd.DataFrame(data=data_content)
df.head(20)

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


In [6]:
# drop the rows with unassigned Boroughs
df = df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,Postal_Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


In [7]:
# join the Neighborhoods of the same postal code and borough in a unique row
df = df.groupby(['Postal_Code','Borough'], as_index = False).agg(lambda x:','.join(x))
df.head()


Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Now let's look for unassigned Neighborhood that have a Borough in order to assign the Borough's name as Neighborhood. 

In [8]:
# locate any neighborhhod that are unassigned
df.loc[df['Neighborhood'] == 'Not assigned']

Unnamed: 0,Postal_Code,Borough,Neighborhood
93,M9A,Queen's Park,Not assigned


In [9]:
# replace the Neighborhood unassigned value with the Borough value 
df.at[85,'Neighborhood']='Queen\'s Park'
df.loc[df['Borough']== 'Queen\'s Park']

Unnamed: 0,Postal_Code,Borough,Neighborhood
93,M9A,Queen's Park,Not assigned


In [10]:
# Finally, find out the shape of the dataframe
df.shape

(103, 3)

## Now let's merge this dataframe with the latitude and longitude coordinates for each neighborhood

In [10]:
!conda install -c conda-forge geocoder --yes


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

    geocoder:        1.38.1-py_1       conda-forge
    ratelim:         0.1.6-py_2        conda-forge

The following packages will be UPDATED:

    

In [11]:
import geocoder
# address = df['Postal_Code']
# latlng = []
g = geocoder.arcgis('M5G Downtown Toronto, Ontario')
g.json

{'address': 'Downtown Toronto, Ontario',
 'bbox': {'northeast': [43.66011000000004, -79.37289999999994],
  'southwest': [43.64011000000004, -79.39289999999995]},
 'confidence': 7,
 'lat': 43.65011000000004,
 'lng': -79.38289999999995,
 'ok': True,
 'quality': 'Locality',
 'raw': {'name': 'Downtown Toronto, Ontario',
  'extent': {'xmin': -79.39289999999995,
   'ymin': 43.64011000000004,
   'xmax': -79.37289999999994,
   'ymax': 43.66011000000004},
  'feature': {'geometry': {'x': -79.38289999999995, 'y': 43.65011000000004},
   'attributes': {'Score': 95.2, 'Addr_Type': 'Locality'}}},
 'score': 95.2,
 'status': 'OK'}

In [21]:
postal_code = df['Postal_Code'].tolist()
g = geocoder.arcgis('{}, Toronto, Ontario'.format(df['Postal_Code']))
lat = g.latlng
print(lat)

None


In [None]:
postal_code = (df['Postal_Code']+' Toronto, Ontario').tolist()
postal_code = pd.DataFrame(data=postal_code)
postal_code

In [None]:
latitude = []
longitude = []
for row in postal_code:
    g = geocoder.arcgis('M8Y Toronto, Ontario')
    latlng = g.latlng
    # latitude.append(latlng[0])
    # longitude.append(latlng[1])
print(latlng)

The geocoder loops are running indefenitely, so we will use the csv produced on coursera platform. 

In [11]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postal Code;Latitude;Longitude
0,M1B;43.8066863;-79.1943534
1,M1C;43.7845351;-79.1604971
2,M1E;43.7635726;-79.1887115
3,M1G;43.7709921;-79.2169174
4,M1H;43.773136;-79.2394761


In [12]:
latlng = pd.DataFrame(columns = ['Postal_Code','Latitude','Longitude'])
latlng[['Postal_Code','Latitude','Longitude']]= df_data_1['Postal Code;Latitude;Longitude'].str.split(';',expand=True)

In [13]:
latlng.head()

Unnamed: 0,Postal_Code,Latitude,Longitude
0,M1B,43.8066863,-79.1943534
1,M1C,43.7845351,-79.1604971
2,M1E,43.7635726,-79.1887115
3,M1G,43.7709921,-79.2169174
4,M1H,43.773136,-79.2394761


Now that we have both our dataframes, let's merge them to obtain the final one. 

In [14]:
final_df = pd.merge(df,latlng,on='Postal_Code',how='left')
final_df.head(30)

Unnamed: 0,Postal_Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.8066863,-79.1943534
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.7845351,-79.1604971
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.7635726,-79.1887115
3,M1G,Scarborough,Woburn,43.7709921,-79.2169174
4,M1H,Scarborough,Cedarbrae,43.773136,-79.2394761
5,M1J,Scarborough,Scarborough Village,43.7447342,-79.2394761
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.7279292,-79.2620294
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.7111117,-79.2845772
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.2394761
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.2648481


Here is the final data frame with Boroughs, Neighborhoods and their Geospatial coordinates. 

In [15]:
print(final_df['Borough'].unique())

['Scarborough' 'North York' 'East York' 'East Toronto' 'Central Toronto'
 'Downtown Toronto' 'York' 'West Toronto' 'Mississauga' 'Etobicoke'
 "Queen's Park"]


There is 11 different Borough in Toronto wich is a lot (compared to only 5 in New york), let's take only the Boroughs with 'Toronto' in it to narrow down the analysis. 

In [16]:
toronto_boroughs = ['East Toronto','Central Toronto','Downtown Toronto','West Toronto']
Toronto = final_df[final_df.Borough.isin(toronto_boroughs)]
Toronto.head()

Unnamed: 0,Postal_Code,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.6763574,-79.2930312
41,M4K,East Toronto,"The Danforth West,Riverdale",43.6795571,-79.352188
42,M4L,East Toronto,"The Beaches West,India Bazaar",43.6689985,-79.3155716
43,M4M,East Toronto,Studio District,43.6595255,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.7280205,-79.3887901


Let's get the latitude and longitude of Old Toronto, grouping the boroughs we just chose. 

In [17]:
# the geospatial coordinates of Toronto are:
latitude = 43.741667
longitude = -79.373333

In [18]:
# let's create a map of Old Toronto 
!conda install -c conda-forge folium=0.5.0 --yes
import folium 


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    altair-4.0.0               |             py_0         606 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.1 MB

The following NEW packages will be 

In [19]:
Toronto['Latitude'] = pd.to_numeric(Toronto['Latitude'])
Toronto['Longitude'] = pd.to_numeric(Toronto['Longitude'])
Toronto.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Postal_Code      object
Borough          object
Neighborhood     object
Latitude        float64
Longitude       float64
dtype: object

In [20]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Toronto['Latitude'], Toronto['Longitude'], Toronto['Borough'], Toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Now we have all of our Data, we can start using Foursquare data to explore the Neighborhood. 

In [21]:
# First set the credentials
CLIENT_ID = 'VBGQ0UN4PQTEAJNRAR155JQ0MB2532XOZQV4UWXLA0HTK0LB' # your Foursquare ID
CLIENT_SECRET = 'OZQI4YGQSK1NOG4DUFABKYBMTO4NMKWGPRTFAU3WOEEMYLEL' # your Foursquare Secret
VERSION = '20200103' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VBGQ0UN4PQTEAJNRAR155JQ0MB2532XOZQV4UWXLA0HTK0LB
CLIENT_SECRET:OZQI4YGQSK1NOG4DUFABKYBMTO4NMKWGPRTFAU3WOEEMYLEL


In [22]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame(pd.np.empty((0, 3)))
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                  'Venue',
                  'Venue Category']
    
    return(nearby_venues)

In [46]:
from bs4 import BeautifulSoup 
import requests 
toronto_venues = getNearbyVenues(names=Toronto['Neighborhood'],
                                   latitudes=Toronto['Latitude'],
                                   longitudes=Toronto['Longitude']
                                  )

The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvalles
Runnymede

In [47]:
print(toronto_venues.shape)
toronto_venues.head(10)

(866, 3)


Unnamed: 0,Neighborhood,Venue,Venue Category
0,The Beaches,Glen Manor Ravine,Trail
1,The Beaches,The Big Carrot Natural Food Market,Health Food Store
2,The Beaches,Grover Pub and Grub,Pub
3,The Beaches,Upper Beaches,Neighborhood
4,"The Danforth West,Riverdale",Pantheon,Greek Restaurant
5,"The Danforth West,Riverdale",Dolce Gelato,Ice Cream Shop
6,"The Danforth West,Riverdale",MenEssentials,Cosmetics Shop
7,"The Danforth West,Riverdale",Cafe Fiorentina,Italian Restaurant
8,"The Danforth West,Riverdale",Mezes,Greek Restaurant
9,"The Danforth West,Riverdale",Moksha Yoga Danforth,Yoga Studio


In [48]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now that we have all of our venues, let's get the top venues by neighborhood and apply nearest-neighbor to the dataframe to create clusters. 

In [49]:
cols = list(toronto_onehot.columns.values)
cols

['Yoga Studio',
 'Airport',
 'Airport Food Court',
 'Airport Gate',
 'Airport Lounge',
 'Airport Service',
 'Airport Terminal',
 'American Restaurant',
 'Art Gallery',
 'Arts & Crafts Store',
 'Asian Restaurant',
 'Athletics & Sports',
 'Auto Workshop',
 'BBQ Joint',
 'Baby Store',
 'Bagel Shop',
 'Bakery',
 'Bank',
 'Bar',
 'Basketball Stadium',
 'Beer Bar',
 'Beer Store',
 'Belgian Restaurant',
 'Bistro',
 'Board Shop',
 'Boat or Ferry',
 'Bookstore',
 'Boutique',
 'Bowling Alley',
 'Breakfast Spot',
 'Brewery',
 'Bubble Tea Shop',
 'Burger Joint',
 'Burrito Place',
 'Bus Line',
 'Butcher',
 'Café',
 'Cajun / Creole Restaurant',
 'Candy Store',
 'Caribbean Restaurant',
 'Cheese Shop',
 'Chinese Restaurant',
 'Chocolate Shop',
 'Church',
 'Climbing Gym',
 'Clothing Store',
 'Cocktail Bar',
 'Coffee Shop',
 'College Arts Building',
 'College Gym',
 'Comfort Food Restaurant',
 'Comic Shop',
 'Concert Hall',
 'Convenience Store',
 'Cosmetics Shop',
 'Coworking Space',
 'Creperie',
 'Cuba

In [50]:
toronto_onehot = toronto_onehot[[
 'Neighborhood','Yoga Studio',
 'Airport',
 'Airport Food Court',
 'Airport Gate',
 'Airport Lounge',
 'Airport Service',
 'Airport Terminal',
 'American Restaurant',
 'Art Gallery',
 'Arts & Crafts Store',
 'Asian Restaurant',
 'Athletics & Sports',
 'Auto Workshop',
 'BBQ Joint',
 'Baby Store',
 'Bagel Shop',
 'Bakery',
 'Bank',
 'Bar',
 'Basketball Stadium',
 'Beer Bar',
 'Beer Store',
 'Belgian Restaurant',
 'Bistro',
 'Board Shop',
 'Boat or Ferry',
 'Bookstore',
 'Boutique',
 'Bowling Alley',
 'Breakfast Spot',
 'Brewery',
 'Bubble Tea Shop',
 'Burger Joint',
 'Burrito Place',
 'Bus Line',
 'Butcher',
 'Café',
 'Cajun / Creole Restaurant',
 'Candy Store',
 'Caribbean Restaurant',
 'Cheese Shop',
 'Chinese Restaurant',
 'Chocolate Shop',
 'Church',
 'Climbing Gym',
 'Clothing Store',
 'Cocktail Bar',
 'Coffee Shop',
 'College Arts Building',
 'College Gym',
 'Comfort Food Restaurant',
 'Comic Shop',
 'Concert Hall',
 'Convenience Store',
 'Cosmetics Shop',
 'Coworking Space',
 'Creperie',
 'Cuban Restaurant',
 'Dance Studio',
 'Deli / Bodega',
 'Dessert Shop',
 'Diner',
 'Discount Store',
 'Dog Run',
 'Dumpling Restaurant',
 'Eastern European Restaurant',
 'Ethiopian Restaurant',
 'Event Space',
 'Falafel Restaurant',
 'Farmers Market',
 'Fast Food Restaurant',
 'Fish & Chips Shop',
 'Fish Market',
 'Flea Market',
 'Flower Shop',
 'Food',
 'Food & Drink Shop',
 'Food Court',
 'Food Truck',
 'Fountain',
 'French Restaurant',
 'Fried Chicken Joint',
 'Fruit & Vegetable Store',
 'Furniture / Home Store',
 'Gaming Cafe',
 'Garden',
 'Garden Center',
 'Gastropub',
 'Gay Bar',
 'General Entertainment',
 'General Travel',
 'Gift Shop',
 'Gluten-free Restaurant',
 'Gourmet Shop',
 'Greek Restaurant',
 'Grocery Store',
 'Gym',
 'Gym / Fitness Center',
 'Harbor / Marina',
 'Health Food Store',
 'Historic Site',
 'History Museum',
 'Hobby Shop',
 'Home Service',
 'Hostel',
 'Hotel',
 'Hotel Bar',
 'IT Services',
 'Ice Cream Shop',
 'Indian Restaurant',
 'Indie Movie Theater',
 'Intersection',
 'Italian Restaurant',
 'Japanese Restaurant',
 'Jazz Club',
 'Jewelry Store',
 'Juice Bar',
 'Korean Restaurant',
 'Lake',
 'Latin American Restaurant',
 'Light Rail Station',
 'Liquor Store',
 'Lounge',
 'Malay Restaurant',
 'Market',
 'Mexican Restaurant',
 'Middle Eastern Restaurant',
 'Miscellaneous Shop',
 'Modern European Restaurant',
 'Monument / Landmark',
 'Movie Theater',
 'Museum',
 'Music Venue',
 'New American Restaurant',
 'Nightclub',
 'Noodle House',
 'Opera House',
 'Organic Grocery',
 'Park',
 'Performing Arts Venue',
 'Pet Store',
 'Pharmacy',
 'Pizza Place',
 'Plane',
 'Playground',
 'Plaza',
 'Poke Place',
 'Portuguese Restaurant',
 'Pub',
 'Ramen Restaurant',
 'Record Shop',
 'Rental Car Location',
 'Restaurant',
 'Roof Deck',
 'Salad Place',
 'Salon / Barbershop',
 'Sandwich Place',
 'Sculpture Garden',
 'Seafood Restaurant',
 'Skate Park',
 'Skating Rink',
 'Smoke Shop',
 'Smoothie Shop',
 'Spa',
 'Speakeasy',
 'Sporting Goods Shop',
 'Sports Bar',
 'Stadium',
 'Stationery Store',
 'Steakhouse',
 'Supermarket',
 'Sushi Restaurant',
 'Swim School',
 'Taco Place',
 'Tailor Shop',
 'Taiwanese Restaurant',
 'Tea Room',
 'Tennis Court',
 'Thai Restaurant',
 'Theater',
 'Theme Restaurant',
 'Toy / Game Store',
 'Trail',
 'Train Station',
 'Vegetarian / Vegan Restaurant',
 'Video Game Store',
 'Vietnamese Restaurant',
 'Wine Bar',
 'Wine Shop',
 'Wings Joint']]

In [89]:
toronto_onehot = toronto_onehot.groupby('Neighborhood').sum().reset_index()
toronto_onehot.head()

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,"Adelaide,King,Richmond",0,0,0,0,0,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
1,Berczy Park,0,0,0,0,0,0,0,0,1,...,0,0,0,0,1,0,0,0,0,0
2,"Brockton,Exhibition Place,Parkdale Village",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Business Reply Mail Processing Centre 969 Eastern,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0,1,1,1,1,3,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [90]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [83]:
Toronto1 = Toronto[['Neighborhood','Latitude','Longitude']]
Toronto1.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
37,The Beaches,43.676357,-79.293031
41,"The Danforth West,Riverdale",43.679557,-79.352188
42,"The Beaches West,India Bazaar",43.668999,-79.315572
43,Studio District,43.659526,-79.340923
44,Lawrence Park,43.72802,-79.38879


In [91]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_onehot['Neighborhood']

for ind in np.arange(toronto_onehot.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_onehot.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide,King,Richmond",Sushi Restaurant,Coffee Shop,Café,Hotel,Steakhouse
1,Berczy Park,Coffee Shop,Cocktail Bar,Café,Seafood Restaurant,Farmers Market
2,"Brockton,Exhibition Place,Parkdale Village",Café,Coffee Shop,Breakfast Spot,Burrito Place,Bar
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Skate Park,Burrito Place,Butcher,Fast Food Restaurant
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Plane,Coffee Shop,Boutique,Boat or Ferry


In [92]:
from sklearn.cluster import KMeans
# set number of clusters, let's start with five and then expand to test 
kclusters = 5

toronto_clustering = toronto_onehot.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 1, 2, 2, 1, 0, 1, 2, 2], dtype=int32)

In [93]:
# add clustering labels to the dataframe 
neighborhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

toronto_merged = Toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal_Code,Borough,Neighborhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
37,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Health Food Store,Trail,Pub,Wings Joint,Cuban Restaurant
41,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,4,Greek Restaurant,Italian Restaurant,Ice Cream Shop,Fruit & Vegetable Store,Pub
42,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,2,Park,Sandwich Place,Brewery,Italian Restaurant,Fast Food Restaurant
43,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,American Restaurant,Italian Restaurant,Bakery
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,2,Bus Line,Park,Swim School,Wings Joint,Event Space


In [61]:
import matplotlib.colors as colors
import matplotlib.cm as cm

In [94]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

__Cluster 1__ : Coffee shop and café seems to be the common venue of this cluster 

In [95]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
43,East Toronto,0,Café,Coffee Shop,American Restaurant,Italian Restaurant,Bakery
54,Downtown Toronto,0,Clothing Store,Coffee Shop,Café,Movie Theater,Sushi Restaurant
55,Downtown Toronto,0,Coffee Shop,Gastropub,Restaurant,Japanese Restaurant,Italian Restaurant
57,Downtown Toronto,0,Coffee Shop,Ice Cream Shop,Chinese Restaurant,Café,Bubble Tea Shop
58,Downtown Toronto,0,Sushi Restaurant,Coffee Shop,Café,Hotel,Steakhouse
65,Central Toronto,0,Sandwich Place,Café,Coffee Shop,Cosmetics Shop,Burger Joint
84,West Toronto,0,Café,Italian Restaurant,Sushi Restaurant,Pizza Place,Coffee Shop
85,Downtown Toronto,0,Coffee Shop,Park,Gym,Wings Joint,Fried Chicken Joint


__Cluster 2__ : This cluster seems similar to the first one 

In [96]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
47,Central Toronto,1,Dessert Shop,Sandwich Place,Coffee Shop,Italian Restaurant,Gym
51,Downtown Toronto,1,Restaurant,Bakery,Italian Restaurant,Café,Park
53,Downtown Toronto,1,Coffee Shop,Park,Bakery,Breakfast Spot,Pub
66,Downtown Toronto,1,Café,Sandwich Place,Restaurant,Japanese Restaurant,Bookstore
67,Downtown Toronto,1,Café,Comfort Food Restaurant,Vietnamese Restaurant,Bakery,Mexican Restaurant
77,West Toronto,1,Bar,Asian Restaurant,Pizza Place,Vietnamese Restaurant,Cuban Restaurant
78,West Toronto,1,Café,Coffee Shop,Breakfast Spot,Burrito Place,Bar


__Cluster 3__ : Parks, café and bars are the most common venue in this cluster 

In [97]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
37,East Toronto,2,Health Food Store,Trail,Pub,Wings Joint,Cuban Restaurant
42,East Toronto,2,Park,Sandwich Place,Brewery,Italian Restaurant,Fast Food Restaurant
44,Central Toronto,2,Bus Line,Park,Swim School,Wings Joint,Event Space
45,Central Toronto,2,Park,Clothing Store,Gym,Breakfast Spot,Sandwich Place
46,Central Toronto,2,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Bagel Shop
48,Central Toronto,2,Park,Playground,Restaurant,Tennis Court,College Gym
49,Central Toronto,2,Coffee Shop,Pub,Light Rail Station,Sushi Restaurant,Supermarket
50,Downtown Toronto,2,Park,Playground,Trail,Wings Joint,Cuban Restaurant
52,Downtown Toronto,2,Burger Joint,Gay Bar,Ramen Restaurant,Juice Bar,Beer Bar
56,Downtown Toronto,2,Coffee Shop,Cocktail Bar,Café,Seafood Restaurant,Farmers Market


__Cluster 4__ : Coffee Shop and Restaurants are present in all neighborhoods of the cluster

In [98]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
60,Downtown Toronto,3,Coffee Shop,Restaurant,Deli / Bodega,Japanese Restaurant,Café
61,Downtown Toronto,3,Café,Coffee Shop,Restaurant,Gastropub,Deli / Bodega
70,Downtown Toronto,3,Café,Coffee Shop,Deli / Bodega,Restaurant,Steakhouse


__Cluster 5__ : This cluster contains one unique neighborhood

In [99]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
41,East Toronto,4,Greek Restaurant,Italian Restaurant,Ice Cream Shop,Fruit & Vegetable Store,Pub


The first two clusters of the analyis are similar. In all clusters, it looks like cafes and coffee shop are very popular in Old Toronto, as are Parks and bars. Restaurant are lower in the top venues. In all, it seems all popular venues are linked to food and drinks, without much room for other kind of venues such as shops and cultural sites. 