<h1>Coursera Capstone Project:<br>"Segmenting and Clustering Neighborhoods in Toronto"</h1>
<h2>Alexander Ivanov</h2>

<p>Week 3 Part 1, 2 and 3</p>

<h4>Import libraries</h4>

In [1]:
import pandas as pd
import numpy as np
import requests

<h4>Definition of functions for printing HTML in code</h4>

In [2]:
from IPython.core.display import HTML

def print_html(str):
    display(HTML(str))

def print_h(str, n=4):
    print_html("<h{}>{}</h{}>".format(n, str, n))

<h4>Load data from Wikipedia</h4>

In [3]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

<h2>Web-site scraping using BeautifulSoup library </h2>

In [4]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(source, 'html.parser')
tr_list = soup.find_all('table')[0].find_all('tr')

def get_data_from_tr(tr_i):
    def get_data_from_td(td_i):
        return list(tr_list[tr_i].children)[td_i].get_text()
    return [get_data_from_td(1), get_data_from_td(3), get_data_from_td(5)[:-1]]

postal_codes = pd.DataFrame(
    [
        get_data_from_tr(tr_i) 
        for tr_i in range(1, len(tr_list))
    ], 
    columns=get_data_from_tr(0))

postal_codes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


<h4>Drop rows with a borough that is "Not assigned"</h4>

In [5]:
postal_codes = postal_codes[postal_codes.Borough != 'Not assigned']
print_html("<h4>Shape: {}</h4>".format(postal_codes.shape))
print_html(postal_codes.head().to_html())

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


<h4>Group neighbourhoods in the same borough</h4>

In [6]:
groups = postal_codes.groupby(postal_codes.Postcode).agg(lambda x: ", ".join(x))
result = pd.merge(
    postal_codes[['Postcode', 'Borough']], 
    groups.Neighbourhood,
    on='Postcode',
    how='inner'
).drop_duplicates().reset_index()
del result['index']

<h4>Set borough name for neighbourhoods without values</h4>

In [7]:
result['Neighbourhood'] = result.apply(lambda x: x.Neighbourhood if x.Neighbourhood != 'Not assigned' else x.Borough, axis=1)

<h2>Print result week 1</h2>
<p>Check whether it is the same as required</p>

In [9]:
test_postalcode_list = ["M5G", "M2H", "M4B", "M1J", "M4G", "M4M", "M1R", "M9V", "M9L", "M5V", "M1B", "M5A"]
print_h("Result shape: {}".format(result.shape))
result[result["Postcode"].isin(test_postalcode_list)]

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M5A,Downtown Toronto,Harbourfront
6,M1B,Scarborough,"Rouge, Malvern"
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
23,M4G,East York,Leaside
24,M5G,Downtown Toronto,Central Bay Street
27,M2H,North York,Hillcrest Village
32,M1J,Scarborough,Scarborough Village
50,M9L,North York,Humber Summit
54,M4M,East Toronto,Studio District
71,M1R,Scarborough,"Maryvale, Wexford"


<h2>Loading geospatial coordinates from csv file and merging</h2>

In [10]:
lat_lng_df = pd.read_csv('Geospatial_Coordinates.csv')
lat_lng_df = lat_lng_df.rename({'Postal Code' : 'Postcode'}, axis='columns')
result = result.merge(lat_lng_df, on='Postcode', how='inner')
result

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509


In [12]:
test_postalcode_list = ["M5G", "M2H", "M4B", "M1J", "M4G", "M4M", "M1R", "M9V", "M9L", "M5V", "M1B", "M5A"]
print_h("Result shape: {}".format(result.shape))
result[result["Postcode"].isin(test_postalcode_list)]

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
23,M4G,East York,Leaside,43.70906,-79.363452
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
27,M2H,North York,Hillcrest Village,43.803762,-79.363452
32,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
50,M9L,North York,Humber Summit,43.756303,-79.565963
54,M4M,East Toronto,Studio District,43.659526,-79.340923
71,M1R,Scarborough,"Maryvale, Wexford",43.750072,-79.295849


<h2>Get Toronto coordinates using geopy library</h2>

In [13]:
from geopy.geocoders import Nominatim
address  = 'Toronto, Canada'
geolocator = Nominatim(user_agent="Coursera_Capstone_Project")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
display(HTML('<h4>{} : Latitude: {}, Longitude: {}.</h4>'.format(address, latitude, longitude)))

<h2>Display map of Toronto with neighborhoods marks using folium library</h2>

In [16]:
import folium

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(result['Latitude'], result['Longitude'], result['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

<h4>Limit data to the neighborhoods in Toronto</h4>

In [17]:
toronto_data = result[result['Borough'].str.contains("Toronto")].reset_index()
del toronto_data['index']
toronto_data

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568
9,M6H,West Toronto,"Dovercourt Village, Dufferin",43.669005,-79.442259


<h2>Expolore venues by neighborhood coordinates using Foursquare API</h2>

In [18]:
import json

with open('credentials.json') as json_file:
    json_data = json.load(json_file)
CLIENT_ID = json_data["client_id"]
CLIENT_SECRET = json_data["client_secret"]
VERSION = '20180605' # Foursquare API version
radius = 500 # define radius
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

<h4>Function that explores nearby venues using Foursquare API</h4>

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Harbourfront
Queen's Park
Ryerson, Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
The Danforth West, Riverdale
Design Exchange, Toronto Dominion Centre
Brockton, Exhibition Place, Parkdale Village
The Beaches West, India Bazaar
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North, Forest Hill West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
Harbord, University of Toronto
Runnymede, Swansea
Moore Park, Summerhill East
Chinatown, Grange Park, Kensington Market
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown, St. James Town
Fir

<h4>Check the size of the resulting dataframe</h4>

In [24]:
print_h("Shape: {}".format(toronto_venues.shape))
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


<h4>Check how many venues were returned for each neighborhood</h4>

In [25]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,57,57,57,57,57,57
"Brockton, Exhibition Place, Parkdale Village",23,23,23,23,23,23
Business Reply Mail Processing Centre 969 Eastern,18,18,18,18,18,18
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",17,17,17,17,17,17
"Cabbagetown, St. James Town",49,49,49,49,49,49
Central Bay Street,79,79,79,79,79,79
"Chinatown, Grange Park, Kensington Market",87,87,87,87,87,87
Christie,18,18,18,18,18,18
Church and Wellesley,86,86,86,86,86,86


<h4>How many unique categories can be curated from all the returned venues?</h4>

In [26]:
display(HTML('<h4>There are {} uniques categories.</h4>'.format(len(toronto_venues['Venue Category'].unique()))))

<h2>Analyze Each Neighborhood</h2>

In [27]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [28]:
toronto_onehot.shape

(1731, 238)

<h4>Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category</h4>

In [29]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,...,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.0,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.034483,0.0,0.057471,0.011494,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.011628,0.0


In [30]:
toronto_grouped.shape

(39, 238)

<h4>Print each neighboorhood with the top 5 most common venue</h4>

In [39]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print_h(hood)
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print_html(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues).to_html())
    print_html("<br>")
    

Unnamed: 0,venue,freq
0,Coffee Shop,0.07
1,Thai Restaurant,0.04
2,Café,0.04
3,Bar,0.04
4,Restaurant,0.04


Unnamed: 0,venue,freq
0,Coffee Shop,0.07
1,Cheese Shop,0.04
2,Bakery,0.04
3,Restaurant,0.04
4,Café,0.04


Unnamed: 0,venue,freq
0,Café,0.13
1,Bakery,0.09
2,Breakfast Spot,0.09
3,Coffee Shop,0.09
4,Intersection,0.04


Unnamed: 0,venue,freq
0,Light Rail Station,0.11
1,Yoga Studio,0.06
2,Garden Center,0.06
3,Comic Shop,0.06
4,Recording Studio,0.06


Unnamed: 0,venue,freq
0,Airport Service,0.18
1,Airport Lounge,0.12
2,Airport Terminal,0.12
3,Harbor / Marina,0.06
4,Sculpture Garden,0.06


Unnamed: 0,venue,freq
0,Coffee Shop,0.08
1,Pizza Place,0.06
2,Bakery,0.06
3,Pub,0.04
4,Restaurant,0.04


Unnamed: 0,venue,freq
0,Coffee Shop,0.16
1,Italian Restaurant,0.05
2,Sandwich Place,0.05
3,Burger Joint,0.04
4,Ice Cream Shop,0.04


Unnamed: 0,venue,freq
0,Bar,0.07
1,Café,0.06
2,Vietnamese Restaurant,0.06
3,Coffee Shop,0.05
4,Bakery,0.05


Unnamed: 0,venue,freq
0,Grocery Store,0.22
1,Café,0.17
2,Park,0.11
3,Gas Station,0.06
4,Candy Store,0.06


Unnamed: 0,venue,freq
0,Coffee Shop,0.08
1,Japanese Restaurant,0.07
2,Gay Bar,0.05
3,Restaurant,0.03
4,Sushi Restaurant,0.03


Unnamed: 0,venue,freq
0,Coffee Shop,0.12
1,Restaurant,0.07
2,Café,0.07
3,Hotel,0.05
4,Gym,0.04


Unnamed: 0,venue,freq
0,Sandwich Place,0.09
1,Dessert Shop,0.09
2,Gym,0.06
3,Pizza Place,0.06
4,Café,0.06


Unnamed: 0,venue,freq
0,Hotel,0.25
1,Gym,0.12
2,Park,0.12
3,Department Store,0.12
4,Food & Drink Shop,0.12


Unnamed: 0,venue,freq
0,Pub,0.13
1,Coffee Shop,0.13
2,Pizza Place,0.07
3,Liquor Store,0.07
4,Light Rail Station,0.07


Unnamed: 0,venue,freq
0,Coffee Shop,0.13
1,Café,0.08
2,Hotel,0.05
3,Restaurant,0.05
4,Bar,0.04


Unnamed: 0,venue,freq
0,Bakery,0.11
1,Pharmacy,0.11
2,Middle Eastern Restaurant,0.06
3,Supermarket,0.06
4,Bar,0.06


Unnamed: 0,venue,freq
0,Coffee Shop,0.11
1,Café,0.07
2,Restaurant,0.05
3,Gym,0.03
4,Gastropub,0.03


Unnamed: 0,venue,freq
0,Trail,0.2
1,Sushi Restaurant,0.2
2,Park,0.2
3,Jewelry Store,0.2
4,Bus Line,0.2


Unnamed: 0,venue,freq
0,Café,0.14
1,Bookstore,0.06
2,Bar,0.06
3,Restaurant,0.06
4,Bakery,0.06


Unnamed: 0,venue,freq
0,Coffee Shop,0.16
1,Café,0.06
2,Bakery,0.06
3,Park,0.06
4,Pub,0.06


Unnamed: 0,venue,freq
0,Coffee Shop,0.12
1,Aquarium,0.05
2,Café,0.04
3,Hotel,0.04
4,Scenic Lookout,0.03


Unnamed: 0,venue,freq
0,Mexican Restaurant,0.09
1,Café,0.09
2,Bar,0.09
3,Thai Restaurant,0.09
4,Speakeasy,0.04


Unnamed: 0,venue,freq
0,Park,0.33
1,Bus Line,0.33
2,Swim School,0.33
3,Yoga Studio,0.0
4,Museum,0.0


Unnamed: 0,venue,freq
0,Bar,0.12
1,Coffee Shop,0.07
2,Asian Restaurant,0.05
3,Restaurant,0.05
4,Wine Bar,0.04


Unnamed: 0,venue,freq
0,Playground,0.25
1,Trail,0.25
2,Restaurant,0.25
3,Tennis Court,0.25
4,Park,0.0


Unnamed: 0,venue,freq
0,Clothing Store,0.17
1,Sporting Goods Shop,0.09
2,Coffee Shop,0.09
3,Yoga Studio,0.04
4,Miscellaneous Shop,0.04


Unnamed: 0,venue,freq
0,Gift Shop,0.14
1,Breakfast Spot,0.14
2,Coffee Shop,0.07
3,Eastern European Restaurant,0.07
4,Dog Run,0.07


Unnamed: 0,venue,freq
0,Coffee Shop,0.24
1,Park,0.05
2,Burger Joint,0.05
3,Yoga Studio,0.02
4,Italian Restaurant,0.02


Unnamed: 0,venue,freq
0,Park,0.5
1,Playground,0.25
2,Trail,0.25
3,Music Venue,0.0
4,Market,0.0


Unnamed: 0,venue,freq
0,Garden,1.0
1,Yoga Studio,0.0
2,Music Venue,0.0
3,Market,0.0
4,Massage Studio,0.0


Unnamed: 0,venue,freq
0,Café,0.08
1,Coffee Shop,0.08
2,Pizza Place,0.06
3,Italian Restaurant,0.06
4,Sushi Restaurant,0.06


Unnamed: 0,venue,freq
0,Coffee Shop,0.08
1,Clothing Store,0.08
2,Café,0.03
3,Cosmetics Shop,0.03
4,Middle Eastern Restaurant,0.03


Unnamed: 0,venue,freq
0,Coffee Shop,0.07
1,Café,0.06
2,Restaurant,0.05
3,Hotel,0.04
4,Clothing Store,0.04


Unnamed: 0,venue,freq
0,Coffee Shop,0.11
1,Café,0.04
2,Restaurant,0.04
3,Beer Bar,0.03
4,Seafood Restaurant,0.03


Unnamed: 0,venue,freq
0,Café,0.1
1,Coffee Shop,0.07
2,Brewery,0.05
3,American Restaurant,0.05
4,Gastropub,0.05


Unnamed: 0,venue,freq
0,Sandwich Place,0.14
1,Café,0.14
2,Coffee Shop,0.14
3,Vegetarian / Vegan Restaurant,0.05
4,BBQ Joint,0.05


Unnamed: 0,venue,freq
0,Pub,0.2
1,Health Food Store,0.2
2,Coffee Shop,0.2
3,Trail,0.2
4,Yoga Studio,0.0


Unnamed: 0,venue,freq
0,Sandwich Place,0.1
1,Park,0.1
2,Pizza Place,0.1
3,Gym,0.05
4,Board Shop,0.05


Unnamed: 0,venue,freq
0,Greek Restaurant,0.22
1,Coffee Shop,0.07
2,Italian Restaurant,0.07
3,Ice Cream Shop,0.05
4,Bookstore,0.05


<h4>Define function to sort the venues in descending order</h4>

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<h4>Create the new dataframe and display the top 10 venues for each neighborhood</h4>

In [46]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Thai Restaurant,Café,Bar,Restaurant,Sushi Restaurant,Gastropub,Steakhouse,Lounge,Cosmetics Shop
1,Berczy Park,Coffee Shop,Farmers Market,French Restaurant,Bakery,Seafood Restaurant,Restaurant,Cheese Shop,Café,Beer Bar,Cocktail Bar
2,"Brockton, Exhibition Place, Parkdale Village",Café,Breakfast Spot,Coffee Shop,Bakery,Climbing Gym,Burrito Place,Japanese Restaurant,Italian Restaurant,Stadium,Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Yoga Studio,Auto Workshop,Skate Park,Smoke Shop,Spa,Farmers Market,Fast Food Restaurant,Burrito Place,Restaurant
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Lounge,Airport Terminal,Boutique,Sculpture Garden,Airport,Airport Food Court,Bar,Harbor / Marina,Rental Car Location


<h2>Dived neighborhoods in Toronto into clusters using sklearn library</h2>

In [47]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [48]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), left_on="Neighbourhood", right_on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0,Coffee Shop,Pub,Café,Bakery,Park,Restaurant,Mexican Restaurant,Theater,Breakfast Spot,Antique Shop
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0,Coffee Shop,Park,Burger Joint,Gym,Fast Food Restaurant,Portuguese Restaurant,Nightclub,Music Venue,Mexican Restaurant,Juice Bar
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Japanese Restaurant,Bubble Tea Shop,Restaurant,Pizza Place,Diner
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Restaurant,Clothing Store,Hotel,Italian Restaurant,Beer Bar,Breakfast Spot,Cosmetics Shop,Bakery
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Trail,Coffee Shop,Health Food Store,Pub,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Women's Store


In [49]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [52]:
descriptions = ["Contain mainly restaurants, coffee shop and gift shops","Family area or living area as in contains parks, swimming school and trails","Cluster with garden, restaurants, shops and entertaiment"]
for i in range(kclusters):
    display(HTML("<h3>Cluster: " + str(i) + "</h3><p>" + descriptions[i] + "</p>"))
    display(HTML(toronto_merged.loc[toronto_merged['Cluster Labels'] == i, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].to_html()))

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Pub,Café,Bakery,Park,Restaurant,Mexican Restaurant,Theater,Breakfast Spot,Antique Shop
1,Downtown Toronto,0,Coffee Shop,Park,Burger Joint,Gym,Fast Food Restaurant,Portuguese Restaurant,Nightclub,Music Venue,Mexican Restaurant,Juice Bar
2,Downtown Toronto,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Japanese Restaurant,Bubble Tea Shop,Restaurant,Pizza Place,Diner
3,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Clothing Store,Hotel,Italian Restaurant,Beer Bar,Breakfast Spot,Cosmetics Shop,Bakery
4,East Toronto,0,Trail,Coffee Shop,Health Food Store,Pub,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Women's Store
5,Downtown Toronto,0,Coffee Shop,Farmers Market,French Restaurant,Bakery,Seafood Restaurant,Restaurant,Cheese Shop,Café,Beer Bar,Cocktail Bar
6,Downtown Toronto,0,Coffee Shop,Sandwich Place,Italian Restaurant,Ice Cream Shop,Juice Bar,Café,Burger Joint,Japanese Restaurant,Bar,Salad Place
7,Downtown Toronto,0,Grocery Store,Café,Park,Baby Store,Restaurant,Diner,Italian Restaurant,Athletics & Sports,Nightclub,Candy Store
8,Downtown Toronto,0,Coffee Shop,Thai Restaurant,Café,Bar,Restaurant,Sushi Restaurant,Gastropub,Steakhouse,Lounge,Cosmetics Shop
9,West Toronto,0,Pharmacy,Bakery,Music Venue,Bus Stop,Café,Recording Studio,Bar,Bank,Supermarket,Middle Eastern Restaurant


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,1,Park,Swim School,Bus Line,Women's Store,Dim Sum Restaurant,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
21,Central Toronto,1,Trail,Park,Bus Line,Sushi Restaurant,Jewelry Store,Doner Restaurant,Discount Store,Distribution Center,Dog Run,Women's Store
33,Downtown Toronto,1,Park,Trail,Playground,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Women's Store


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,2,Garden,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
