<b>This notebook will be mainly used for the capstone project.</b>

# 1. Toronto postal codes scrapping

## Request content

In [1]:
import requests

In [2]:
content = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [32]:
content.text[:200]

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>List of postal codes of Canada: M - Wikipedia</title>\n<script>document.documentElement.className="c'

## Parsing with Beautiful Soup

In [7]:
from bs4 import BeautifulSoup

In [9]:
parser = BeautifulSoup(content.text, 'html.parser')

In [11]:
parser.head.title

<title>List of postal codes of Canada: M - Wikipedia</title>

In [19]:
codes_table = parser.body.find_all("table", class_ = "wikitable sortable")[0]

In [20]:
codes_elements = codes_table.select('tr')

In [25]:
codes_elements[:2]

[<tr>
 <th>Postcode</th>
 <th>Borough</th>
 <th>Neighbourhood
 </th></tr>, <tr>
 <td>M1A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>]

In [28]:
codes = [[value.text for value in element.select('td')] for element in codes_elements[1:]]

In [29]:
codes[:2]

[['M1A', 'Not assigned', 'Not assigned\n'],
 ['M2A', 'Not assigned', 'Not assigned\n']]

## Converting to Pandas

In [30]:
import pandas as pd

In [54]:
postal_codes = pd.DataFrame(codes, columns=['PostalCode', 'Borough', 'Neighborhood'])

In [55]:
postal_codes.head(6)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned\n
1,M2A,Not assigned,Not assigned\n
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n
5,M6A,North York,Lawrence Heights\n


## Cleaning

In [56]:
import numpy as np

In [57]:
postal_codes.replace(to_replace = ['Not assigned', 'Not assigned\n'], value = np.nan, inplace = True)

In [58]:
postal_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n


### 1. Dealing with the empty Neighborhood but not-empty Borough

In [62]:
postal_codes.loc[postal_codes['Neighborhood'].isnull() & postal_codes['Borough'].notnull()]

Unnamed: 0,PostalCode,Borough,Neighborhood
9,M9A,Queen's Park,


In [63]:
postal_codes['Neighborhood'] = postal_codes['Neighborhood'].fillna(postal_codes['Borough'])

In [64]:
postal_codes.loc[postal_codes['Neighborhood'].isnull() & postal_codes['Borough'].notnull()]

Unnamed: 0,PostalCode,Borough,Neighborhood


In [65]:
postal_codes = postal_codes.dropna(axis = 0)

In [66]:
postal_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n
5,M6A,North York,Lawrence Heights\n
6,M6A,North York,Lawrence Manor\n


In [67]:
postal_codes.shape

(210, 3)

### 2.Cleaning names

In [68]:
postal_codes['Neighborhood'] = postal_codes['Neighborhood'].str.replace('\n', '')

In [69]:
postal_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


### 3.Dealing with repeated postal codes

In [74]:
postal_codes.loc[postal_codes['PostalCode'] == 'M5J']

Unnamed: 0,PostalCode,Borough,Neighborhood
82,M5J,Downtown Toronto,Harbourfront East
83,M5J,Downtown Toronto,Toronto Islands
84,M5J,Downtown Toronto,Union Station


In [79]:
def aggregate_neighborhood(arr):
    return ','.join(set(arr.values))

In [80]:
postal_codes_grouped = postal_codes.groupby('PostalCode').agg(aggregate_neighborhood).reset_index()

In [81]:
postal_codes_grouped.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Rouge Hill,Highland Creek,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [84]:
postal_codes_grouped.loc[postal_codes_grouped['PostalCode'] == 'M5J']

Unnamed: 0,PostalCode,Borough,Neighborhood
59,M5J,Downtown Toronto,"Toronto Islands,Union Station,Harbourfront East"


In [85]:
postal_codes_grouped.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Rouge Hill,Highland Creek,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [86]:
print('Rows: ', postal_codes_grouped.shape[0])

Rows:  103


# 2. Getting geolocations

In [113]:
conda install -c conda-forge geopy

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: D:\Installed\Anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          91 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.20.0-py_0



Downloading and Extracting Packages

geopy-1.20.0         | 57 KB     |            |   0% 
geopy-1.20.0         | 57 KB     | ##7        |  28% 
geopy-1.20.0         | 57 KB  

In [135]:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

In [141]:
geolocator = Nominatim(user_agent="coursera",timeout=10)
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=2)

In [145]:
for postal_code in postal_codes_grouped['PostalCode']:
    location = None
    while location is None:
        location = geocode('{}, Toronto, Ontario'.format(postal_code))
    if location:
        postal_codes_grouped.loc[postal_codes_grouped['PostalCode'] == postal_code, 'Latitude'] = location.latitude
        postal_codes_grouped.loc[postal_codes_grouped['PostalCode'] == postal_code, 'Longitude'] = location.longitude

KeyboardInterrupt: 

In [147]:
#geocoder services are very slow and get a lot of errors

In [148]:
geo = pd.read_csv('https://cocl.us/Geospatial_data')

In [162]:
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [163]:
postal_codes_loc = pd.merge(left = postal_codes_grouped, right = geo, left_on='PostalCode', right_on = 'Postal Code', how = 'left')

In [165]:
postal_codes_loc.drop(columns=['Postal Code'], inplace=True)
postal_codes_loc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill,Highland Creek,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [168]:
postal_codes_loc.to_csv('./postal_codes.csv', index=False)

# 3. Clustering the neighborhoods in Toronto

## Primarly viz

In [171]:
!conda install -c conda-forge folium=0.5.0 --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: D:\Installed\Anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.0.1               |             py_0         575 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2019.9.11          |           py37_0         147 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
           

In [172]:
import folium

In [176]:
#find Toronto coordinates
address = 'Toronto'

geolocator = Nominatim(user_agent="Coursera", timeout=3)
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [179]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(postal_codes_loc['Latitude'], postal_codes_loc['Longitude'], postal_codes_loc['Borough'], postal_codes_loc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

In [180]:
map_toronto

## Neighborhood data

In [182]:
postal_codes_loc['Borough'].unique()

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       'Mississauga', 'Etobicoke', "Queen's Park"], dtype=object)

In [187]:
toronto_loc = postal_codes_loc[postal_codes_loc['Borough'].str.contains('Toronto')]

In [188]:
toronto_loc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"Riverdale,The Danforth West",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [189]:
CLIENT_ID = '4IUUIV0EABE14A44D5TYBAM50VPX0Q2PQSLN4JYS12YJEJD3' # your Foursquare ID
CLIENT_SECRET = 'GR3L5KUNHQLFSEZ1SPI2MLB00E5OMFO22KCB5LZQ1EKIASP0' # your Foursquare Secret
VERSION = '20200122' # Foursquare API version


In [192]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            500, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [193]:
toronto_venues = getNearbyVenues(names=toronto_loc['Neighborhood'],
                                   latitudes=toronto_loc['Latitude'],
                                   longitudes=toronto_loc['Longitude']
                                  )

The Beaches
Riverdale,The Danforth West
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Rathnelly,South Hill,Summerhill West,Deer Park,Forest Hill SE
Rosedale
St. James Town,Cabbagetown
Church and Wellesley
Harbourfront
Garden District,Ryerson
St. James Town
Berczy Park
Central Bay Street
King,Richmond,Adelaide
Toronto Islands,Union Station,Harbourfront East
Toronto Dominion Centre,Design Exchange
Commerce Court,Victoria Hotel
Roselawn
Forest Hill West,Forest Hill North
Yorkville,North Midtown,The Annex
University of Toronto,Harbord
Chinatown,Grange Park,Kensington Market
King and Spadina,South Niagara,CN Tower,Harbourfront West,Bathurst Quay,Railway Lands,Island airport
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Trinity,Little Portugal
Exhibition Place,Parkdale Village,Brockton
The Junction South,High Park
Roncesvalles,Parkdale
Swansea,R

In [220]:
toronto_venues = toronto_venues.loc[toronto_venues['Venue Category'] != 'Neighborhood']
print(toronto_venues.shape)
toronto_venues.head()

(1712, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
3,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
5,"Riverdale,The Danforth West",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [221]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 233 uniques categories.


### Number of venues in the neighborhoods

In [222]:
toronto_venues.groupby('Neighborhood').count()['Venue']

Neighborhood
Berczy Park                                                                                              58
Business Reply Mail Processing Centre 969 Eastern                                                        15
Central Bay Street                                                                                       84
Chinatown,Grange Park,Kensington Market                                                                  87
Christie                                                                                                 17
Church and Wellesley                                                                                     85
Commerce Court,Victoria Hotel                                                                           100
Davisville                                                                                               35
Davisville North                                                                                          9
Dovercourt Vill

## Analyze neighborhoods

In [226]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.shape

(1712, 234)

In [227]:
toronto_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,"Riverdale,The Danforth West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Convert number of venues into frequencies

In [228]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [230]:
toronto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0
1,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,...,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.011905
3,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.045977,0.0,0.057471,0.011494,0.0,0.0,0.0
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.011765,0.0,0.011765
6,"Commerce Court,Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0
7,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Dovercourt Village,Dufferin",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0


## Cluster neighbors

In [231]:
from sklearn.cluster import KMeans

In [272]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=15).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
toronto_clusters = kmeans.labels_


In [273]:
toronto_clusters

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 4, 1, 0, 0, 0,
       0, 0, 4, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0])

## Select top-10 venue types

In [275]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [293]:
columns = ['Neighborhood']
columns.extend(['#{} Most common Venue'.format(i) for i in range(1,11)])

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], 10)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Steakhouse,Bakery,Farmers Market,Café,Cheese Shop,Beer Bar,Liquor Store
1,Business Reply Mail Processing Centre 969 Eastern,Park,Garden,Light Rail Station,Farmers Market,Spa,Fast Food Restaurant,Burrito Place,Restaurant,Brewery,Auto Workshop
2,Central Bay Street,Coffee Shop,Italian Restaurant,Café,Chinese Restaurant,Sandwich Place,Juice Bar,Burger Joint,Japanese Restaurant,Ice Cream Shop,Bar
3,"Chinatown,Grange Park,Kensington Market",Café,Vietnamese Restaurant,Chinese Restaurant,Dumpling Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Bar,Bakery,Mexican Restaurant,Cocktail Bar
4,Christie,Grocery Store,Café,Park,Baby Store,Gas Station,Candy Store,Athletics & Sports,Italian Restaurant,Diner,Nightclub


### Add clusters

In [298]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', toronto_clusters)

In [299]:

neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
0,0,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Steakhouse,Bakery,Farmers Market,Café,Cheese Shop,Beer Bar,Liquor Store
1,0,Business Reply Mail Processing Centre 969 Eastern,Park,Garden,Light Rail Station,Farmers Market,Spa,Fast Food Restaurant,Burrito Place,Restaurant,Brewery,Auto Workshop
2,0,Central Bay Street,Coffee Shop,Italian Restaurant,Café,Chinese Restaurant,Sandwich Place,Juice Bar,Burger Joint,Japanese Restaurant,Ice Cream Shop,Bar
3,0,"Chinatown,Grange Park,Kensington Market",Café,Vietnamese Restaurant,Chinese Restaurant,Dumpling Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Bar,Bakery,Mexican Restaurant,Cocktail Bar
4,0,Christie,Grocery Store,Café,Park,Baby Store,Gas Station,Candy Store,Athletics & Sports,Italian Restaurant,Diner,Nightclub
5,0,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Gym,Pub,Men's Store,Mediterranean Restaurant,Hotel
6,0,"Commerce Court,Victoria Hotel",Coffee Shop,Café,Hotel,Restaurant,Gym,Seafood Restaurant,Steakhouse,Japanese Restaurant,Italian Restaurant,Deli / Bodega
7,0,Davisville,Dessert Shop,Sandwich Place,Gym,Sushi Restaurant,Italian Restaurant,Pizza Place,Café,Coffee Shop,Brewery,Toy / Game Store
8,0,Davisville North,Gym,Food & Drink Shop,Sandwich Place,Hotel,Asian Restaurant,Department Store,Dog Run,Breakfast Spot,Park,Eastern European Restaurant
9,0,"Dovercourt Village,Dufferin",Pharmacy,Bakery,Music Venue,Middle Eastern Restaurant,Café,Brewery,Bar,Supermarket,Bank,Park


### Add geo-data

In [304]:
toronto_merged = toronto_loc

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged = toronto_merged.reset_index(drop=True)
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Trail,Other Great Outdoors,Health Food Store,Pub,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Yoga Studio
1,M4K,East Toronto,"Riverdale,The Danforth West",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Liquor Store,Sports Bar,Spa,Juice Bar
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,Park,Pizza Place,Brewery,Burger Joint,Burrito Place,Sandwich Place,Pub,Coffee Shop,Gym,Sushi Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Bakery,Brewery,Italian Restaurant,American Restaurant,Yoga Studio,Comfort Food Restaurant,Seafood Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,Park,Swim School,Bus Line,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


### Visualize clusters

In [306]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       


In [307]:
map_clusters

In [308]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
1,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Liquor Store,Sports Bar,Spa,Juice Bar
2,East Toronto,0,Park,Pizza Place,Brewery,Burger Joint,Burrito Place,Sandwich Place,Pub,Coffee Shop,Gym,Sushi Restaurant
3,East Toronto,0,Café,Coffee Shop,Gastropub,Bakery,Brewery,Italian Restaurant,American Restaurant,Yoga Studio,Comfort Food Restaurant,Seafood Restaurant
5,Central Toronto,0,Gym,Food & Drink Shop,Sandwich Place,Hotel,Asian Restaurant,Department Store,Dog Run,Breakfast Spot,Park,Eastern European Restaurant
6,Central Toronto,0,Sporting Goods Shop,Coffee Shop,Yoga Studio,Salon / Barbershop,Café,Restaurant,Rental Car Location,Chinese Restaurant,Clothing Store,Park
7,Central Toronto,0,Dessert Shop,Sandwich Place,Gym,Sushi Restaurant,Italian Restaurant,Pizza Place,Café,Coffee Shop,Brewery,Toy / Game Store
9,Central Toronto,0,Pub,Coffee Shop,American Restaurant,Supermarket,Restaurant,Fried Chicken Joint,Sports Bar,Sushi Restaurant,Pizza Place,Liquor Store
11,Downtown Toronto,0,Restaurant,Coffee Shop,Café,Italian Restaurant,Pizza Place,Pub,Bakery,Japanese Restaurant,Caribbean Restaurant,Indian Restaurant
12,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Gym,Pub,Men's Store,Mediterranean Restaurant,Hotel
13,Downtown Toronto,0,Coffee Shop,Park,Pub,Bakery,Café,Breakfast Spot,Mexican Restaurant,Restaurant,Shoe Store,Brewery


In [309]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
8,Central Toronto,1,Trail,Tennis Court,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [310]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
22,Central Toronto,2,Pool,Garden,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [311]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
0,East Toronto,3,Trail,Other Great Outdoors,Health Food Store,Pub,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Yoga Studio


In [312]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,#1 Most common Venue,#2 Most common Venue,#3 Most common Venue,#4 Most common Venue,#5 Most common Venue,#6 Most common Venue,#7 Most common Venue,#8 Most common Venue,#9 Most common Venue,#10 Most common Venue
4,Central Toronto,4,Park,Swim School,Bus Line,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
10,Downtown Toronto,4,Park,Playground,Trail,Yoga Studio,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
23,Central Toronto,4,Park,Jewelry Store,Trail,Sushi Restaurant,Yoga Studio,Dim Sum Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
