## Capstone Assignment 

## Part 1- Segmenting and clustering neighborhoods in Toronto

Explore and cluster the neighborhoods in Toronto. You are required to code and scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, obtain the data in the table of postal codes and transform the data into a pandas dataframe.

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

In [2]:
url= requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
#df = pd.read_html(url)

soup = BeautifulSoup(url,'lxml')
match = soup.find('table', class_="wikitable sortable")
df = pd.read_html(str(match))
df = df[0]
df.shape

(287, 3)

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

In [3]:
df.columns=['PostalCode', 'Borough', 'Neighborhood']
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Not assigned
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Queen's Park


Only process the cells that have an assigned borough. Ignore cells with a borough that is 'Not assigned'.

In [4]:
#b= df[~df.Borough.isin(['Not assigned'])]
b = df[df.Borough != 'Not assigned']
b.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma.

In [5]:
b = b.groupby(['PostalCode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
b
#b = b.groupby(['PostalCode','Borough']).agg(lambda col:','.join(col))

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [6]:
# check M5A
b[b.PostalCode == 'M5A']

Unnamed: 0,PostalCode,Borough,Neighborhood
53,M5A,Downtown Toronto,Harbourfront


If a cell has a borough but a 'Not assigned' neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.

Use the .shape method to print the number of rows of your dataframe.

In [7]:
# assign neighborhood value to be the same as borough if the cell has a borough but neighborhood is 'Not assigned'.

for i, row in df.iterrows():
    if df.loc[i,'Neighborhood'] == 'Not assigned':
       df.loc[i,'Neighborhood'] = df.loc[i,'Borough']
    else:
        df.loc[i,'Neighborhood'] = df.loc[i,'Neighborhood']
        
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Queen's Park


9th cell values are both Queen's Park

In [8]:
print(b.shape)
print('Total no. of rows:', b.shape[0])

(103, 3)
Total no. of rows: 103


___

## Part 2- Merging latitude and longitude coordinates of each neighborhood

Create a Foursquare developer account. Make a call to get the latitude and longitude coordinates of a given postal code in each neighborhood using the Geocoder Python package or use this link http://cocl.us/Geospatial_data to access the csv file that has the geographical coordinates of each postal code. Create the dataframe.

In [9]:
url = 'http://cocl.us/Geospatial_data'
df1 = pd.read_csv(url)
df1.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
df1.columns=['PostalCode','Latitude','Longitude']
df = b.merge(df1, on = 'PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


___

## Part 3- Explore and cluster the neighborhoods in Toronto



Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. Generate maps to visualize your neighborhoods and how they cluster together. Report any observations you make.

<br>
<br>

Areas of Exploration:
1. Fetch the nearby Asian Restaurants in Toronto 
2. Fetch nearby places that sell coffee in Toronto
3. Find the top 10 venues in Toronto and cluster the neighborhoods according to the venue
4. Draw simple observations about generated clusters

In [11]:
# Code removed for sharing
CLIENT_ID = # Foursquare ID 
CLIENT_SECRET = # Foursquare Secret

 Find out how many boroughs there are in Toronto

In [12]:
df['Borough'].unique()

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       "Queen's Park", 'Mississauga', 'Etobicoke'], dtype=object)

Create a new dataframe for Toronto boroughs

In [13]:
Toronto = df[df['Borough'].str.contains("Toronto")==True].reset_index(drop=True)
print(Toronto.shape)
Toronto

(38, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049


In [14]:
#Define instance of geocoder 

address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude

print(lat, lng)
print('The geograpical coordinate of Toronto are {}, {}.'.format(lat, lng))

43.653963 -79.387207
The geograpical coordinate of Toronto are 43.653963, -79.387207.


### Asian restaurants in Toronto

Let's search for Asian restaurants in Toronto within 500m radius using the 
Foursquare venue *categoryId*

Define the search query and send a get request. 

In [15]:
# define search query
version = '20191101'
categoryid = '4bf58dd8d48988d142941735'  # venues categoryId for Asian restaurants
radius = 500
limit = 15

#define the url
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, version, categoryid, radius, limit)

# send get request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5dd6ada6006dce001bb95216'},
 'response': {'venues': [{'id': '5bf765b2c5b11c002c1c8fc6',
    'name': 'ZenQ',
    'location': {'address': '171 Dundas Street W',
     'crossStreet': 'Dundas & Centre',
     'lat': 43.654911,
     'lng': -79.387266,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.654911,
       'lng': -79.387266}],
     'distance': 105,
     'postalCode': 'M5G 1C8',
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['171 Dundas Street W (Dundas & Centre)',
      'Toronto ON M5G 1C8',
      'Canada']},
    'categories': [{'id': '52e81612bcbc57f1066b7a0c',
      'name': 'Bubble Tea Shop',
      'pluralName': 'Bubble Tea Shops',
      'shortName': 'Bubble Tea',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bubble_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1574350271',
    'hasPerk': False},
   {'id': '5d782433f

In [16]:
# transform json results into pandas dataframe
venues = results['response']['venues']
venues = json_normalize(venues)

# filter dataframe
filter = ['name','categories'] + [col for col in venues.columns if col.startswith('location.')] + ['id']
venues_filtered = venues.loc[:, filter]

# create function to return categories name
def get_category(row):
    try:
        cat_list = row['categories']
    except: 
        cat_list = row['venue.categories']
    if len(cat_list) == 0:
        return None
    else:
        return cat_list[0]['name']

# apply function
venues_filtered['categories'] = venues_filtered.apply(get_category, axis =1)

# rename column headers
venues_filtered.columns = [column.split('.')[-1] for column in venues_filtered.columns]
venues_filtered = venues_filtered.drop(columns=['formattedAddress','labeledLatLngs'], axis = 0)
venues_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,lat,lng,neighborhood,postalCode,state,id
0,ZenQ,Bubble Tea Shop,171 Dundas Street W,CA,Toronto,Canada,Dundas & Centre,105,43.654911,-79.387266,,M5G 1C8,ON,5bf765b2c5b11c002c1c8fc6
1,Gyubee Japanese Grill,Japanese Restaurant,157 Dundas St W,CA,Toronto,Canada,,239,43.655363,-79.384955,,M5B 1E4,ON,5d782433f6e3190008b1a914
2,Rolltation,Japanese Restaurant,207 Dundas St W,CA,Toronto,Canada,at University Ave,107,43.654918,-79.387424,,M5G 1C8,ON,5773f01f498e98371390bdfd
3,Gyugyuya,Japanese Restaurant,177 Dundas St W,CA,Toronto,Canada,,149,43.655174,-79.386416,,M5G 1C7,ON,5310c76611d2c1b4531ff3cc
4,Sansotei Ramen 三草亭,Ramen Restaurant,179 Dundas St. W,CA,Toronto,Canada,btwn Centre Ave. & Chestnut St.,144,43.655157,-79.386501,,M5G 1Z8,ON,504bbf2ce4b0168121235cbe
5,Hong Shing Chinese Restaurant,Chinese Restaurant,195 Dundas St W,CA,Toronto,Canada,at University Ave,107,43.654925,-79.387089,,M5G 1C7,ON,4b2027b5f964a520f82d24e3
6,Kimchi Korea House,Korean Restaurant,149 Dundas St. W,CA,Toronto,Canada,btwn Chestnut & Elizabeth,214,43.655392,-79.385412,,M5G 1C6,ON,50535800e4b0c6b6851ee5fc
7,Konjiki Ramen,Noodle House,41 Elm Street,CA,Toronto,Canada,Bay Street,480,43.657449,-79.38368,,M5G 1H1,ON,5cd9e1ea9cadd9002b001a3e
8,Koh Lipe,Thai Restaurant,35 Baldwin Street,CA,Toronto,Canada,,550,43.655933,-79.39348,,M5T 1L1,ON,5cb0d23175dcb7002cb5e3ad
9,Manpuku まんぷく,Japanese Restaurant,105 McCaul St. Unit 29-31,CA,Toronto,Canada,at Dundas St. W.,277,43.653612,-79.390613,,M5T 2X4,ON,4ad9f607f964a520691c21e3


List of Asian restaurants as shown in Toronto within 500m radius. Noticed that Bubble Tea is an asian drink stall but most likely do not operate like a restaurant, appeared in the search results.

### Map: Asian Restaurants in Toronto 

In [17]:
# generate map centred around 
venues_map = folium.Map(location=[lat, lng], zoom_start=14)


# add as blue circle markers
for lat, lng, label in zip(venues_filtered.lat, venues_filtered.lng, venues_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Places that sell coffee in Toronto

Let's search for places that sell coffee within 200m radius in Toronto

In [18]:
# define search query
radius = 200
limit = 15
query = 'coffee'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, version, query, radius, limit)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5dd6adb869babe001c398041'},
 'response': {'venues': [{'id': '59f784dd28122f14f9d5d63d',
    'name': 'HotBlack Coffee',
    'location': {'address': '245 Queen Street West',
     'crossStreet': 'at St Patrick St',
     'lat': 43.65036434800487,
     'lng': -79.38866907575726,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.65036434800487,
       'lng': -79.38866907575726}],
     'distance': 226,
     'postalCode': 'M5V 1Z4',
     'cc': 'CA',
     'neighborhood': 'Entertainment District',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['245 Queen Street West (at St Patrick St)',
      'Toronto ON M5V 1Z4',
      'Canada']},
    'categories': [{'id': '4bf58dd8d48988d1e0931735',
      'name': 'Coffee Shop',
      'pluralName': 'Coffee Shops',
      'shortName': 'Coffee Shop',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_',
       'suffix': '.png'},
      'pr

In [19]:
# transform json results into pandas dataframe
venues = results['response']['venues']
venues = json_normalize(venues)
venues.head()

# filter dataframe

filter = ['name','categories'] + [col for col in venues.columns if col.startswith('location.')] + ['id']
venues_filtered = venues.loc[:, filter]
venues_filtered

# apply function
venues_filtered['categories'] = venues_filtered.apply(get_category, axis =1)


# rename column headers
venues_filtered.columns = [column.split('.')[-1] for column in venues_filtered.columns]
venues_filtered = venues_filtered.drop(columns=['formattedAddress','labeledLatLngs'], axis = 0)
venues_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,lat,lng,neighborhood,postalCode,state,id
0,HotBlack Coffee,Coffee Shop,245 Queen Street West,CA,Toronto,Canada,at St Patrick St,226,43.650364,-79.388669,Entertainment District,M5V 1Z4,ON,59f784dd28122f14f9d5d63d
1,Sam James Coffee Bar (SJCB),Café,150 King St. W,CA,Toronto,Canada,in the PATH,224,43.647881,-79.384332,,M5H 4B6,ON,4fccaa8fe4b05a98df3d9417
2,Coffee office,,350 Bay St - 7th Floor,CA,Toronto,Canada,,25,43.649498,-79.386479,,,ON,4baa31def964a52037523ae3
3,"Coffee, Oysters, Champagne",Lounge,214 King Street West,CA,Toronto,Canada,,232,43.647309,-79.38673,,M5H 3S6,ON,5c01d5553183940025479371
4,Bulldog Coffee,Café,111 Richmond St W,CA,Toronto,Canada,York,219,43.650319,-79.383831,,M5H 2G4,ON,5a8b8484c0cacb23056b8412
5,Fahrenheit Coffee,Coffee Shop,111 Richmond St,CA,Toronto,Canada,,226,43.650361,-79.383767,,M5H 2G4,ON,5da78e6c4e6c340008c76076
6,Timothy's World Coffee,Coffee Shop,150 York Street,CA,Toronto,Canada,Adelaide,164,43.649243,-79.384181,,,ON,4bf424fbe5eba593884e1f90
7,google coffee bar,Corporate Coffee Shop,,CA,,Canada,,224,43.650452,-79.383864,,,,58e2860dd772f9415f435032
8,Starbucks,Coffee Shop,"180 Queen St W,Suite 102.3A",CA,Toronto,Canada,at Simcoe St.,213,43.650751,-79.388047,,M5V 3X3,ON,4ae60299f964a52003a421e3


There are 9 shops selling coffee in Toronto within 200m radius (yay i love coffee!)

### Map: Places that sell coffee in Toronto

In [20]:
# generate map centred around 
venues_map = folium.Map(location=[lat, lng], zoom_start=15)


# add as blue circle markers
for lat, lng, label in zip(venues_filtered.lat, venues_filtered.lng, venues_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='orange',
        popup=label,
        fill = True,
        fill_color='orange',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Number of unique categories in Toronto

In [22]:
limit = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            version, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# run the above function on each neighborhood and create a new df 
Toronto_venues = getNearbyVenues(names=Toronto['Neighborhood'],
                                   latitudes=Toronto['Latitude'],
                                   longitudes=Toronto['Longitude']
                                  )
print('Complete')

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

In [23]:
print (Toronto_venues.shape)
print('There are {} unique categories:'.format(len(Toronto_venues['Venue Category'].unique())))
Toronto_venues.head()

(1705, 7)
There are 234 unique categories:


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [24]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,57,57,57,57,57,57
"Brockton, Exhibition Place, Parkdale Village",21,21,21,21,21,21
Business Reply Mail Processing Centre 969 Eastern,18,18,18,18,18,18
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",16,16,16,16,16,16
"Cabbagetown, St. James Town",43,43,43,43,43,43
Central Bay Street,82,82,82,82,82,82
"Chinatown, Grange Park, Kensington Market",96,96,96,96,96,96
Christie,17,17,17,17,17,17
Church and Wellesley,89,89,89,89,89,89


### One hot encoding

In [25]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]


print(Toronto_onehot.shape)
Toronto_onehot.head()

(1705, 234)


Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Take the mean of the frequency of occurance for each category and group rows by neighborhood

In [26]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()

print(Toronto_grouped.shape)
Toronto_grouped

(38, 234)


Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0625,0.0625,0.0625,0.125,0.1875,0.0625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.041667,0.0,0.052083,0.010417,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011236,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,...,0.011236,0.0,0.0,0.0,0.0,0.011236,0.011236,0.0,0.011236,0.011236


### Top 10 common venues in each neighborhood

In [27]:
num_top_venues = 10

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.07
1                 Café  0.05
2           Steakhouse  0.04
3                  Bar  0.04
4       Breakfast Spot  0.03
5               Bakery  0.03
6       Cosmetics Shop  0.03
7           Restaurant  0.03
8  American Restaurant  0.03
9                Hotel  0.03


----Berczy Park----
                venue  freq
0         Coffee Shop  0.07
1              Bakery  0.05
2  Seafood Restaurant  0.04
3         Cheese Shop  0.04
4      Farmers Market  0.04
5          Steakhouse  0.04
6                Café  0.04
7            Beer Bar  0.04
8        Cocktail Bar  0.04
9              Museum  0.02


----Brockton, Exhibition Place, Parkdale Village----
                  venue  freq
0                  Café  0.10
1        Breakfast Spot  0.10
2           Coffee Shop  0.10
3          Intersection  0.05
4  Caribbean Restaurant  0.05
5                   Bar  0.05
6          Climbing Gym  0.05
7                Bakery

9        Mexican Restaurant  0.00


----North Toronto West----
                 venue  freq
0       Clothing Store  0.14
1          Coffee Shop  0.10
2  Sporting Goods Shop  0.10
3          Yoga Studio  0.05
4                 Café  0.05
5   Salon / Barbershop  0.05
6   Chinese Restaurant  0.05
7   Mexican Restaurant  0.05
8           Shoe Store  0.05
9           Restaurant  0.05


----Parkdale, Roncesvalles----
                         venue  freq
0                  Coffee Shop  0.13
1                    Gift Shop  0.13
2               Breakfast Spot  0.07
3  Eastern European Restaurant  0.07
4                      Dog Run  0.07
5                Movie Theater  0.07
6           Italian Restaurant  0.07
7                   Restaurant  0.07
8                          Bar  0.07
9                         Bank  0.07


----Rosedale----
                       venue  freq
0                       Park  0.50
1                 Playground  0.25
2                      Trail  0.25
3                  

Create a new dataframe that shows the top 10 venues for each neighborhood

In [28]:
# put into dataframe and sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


# create new df that displays the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Hotel,Asian Restaurant,Cosmetics Shop,Restaurant,Breakfast Spot,Thai Restaurant
1,Berczy Park,Coffee Shop,Bakery,Farmers Market,Beer Bar,Steakhouse,Cocktail Bar,Seafood Restaurant,Cheese Shop,Café,Restaurant
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Café,Breakfast Spot,Grocery Store,Bakery,Performing Arts Venue,Pet Store,Climbing Gym,Caribbean Restaurant,Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Skate Park,Garden,Recording Studio,Burrito Place,Fast Food Restaurant,Auto Workshop,Farmers Market,Spa,Brewery
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Lounge,Harbor / Marina,Bar,Coffee Shop,Sculpture Garden,Boutique,Boat or Ferry,Plane,Airport


### Clustering Neighborhoods using K Means 

Run k-means to cluster the neighborhood into 3 clusters

In [29]:
# set number of clusters
kclusters = 3

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:40] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       0, 1, 0, 1, 1, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [30]:
# new df that contains cluster and top 10 venues for each neighborhood

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = Toronto

# merge toronto_grouped with neighborhoods_venues_sorted to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Pub,Health Food Store,Trail,Wings Joint,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Dumpling Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Liquor Store,Indian Restaurant,Spa,Bookstore,Brewery
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,1,Sandwich Place,Park,Brewery,Sushi Restaurant,Steakhouse,Fish & Chips Shop,Italian Restaurant,Fast Food Restaurant,Liquor Store,Pet Store
3,M4M,East Toronto,Studio District,43.659526,-79.340923,1,Café,Coffee Shop,Italian Restaurant,American Restaurant,Bakery,Park,Seafood Restaurant,Bar,Stationery Store,Coworking Space
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Bus Line,Park,Swim School,Wings Joint,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


### Visualize Clusters

In [31]:
# create map
map_clusters = folium.Map(location=[lat, lng], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


### Examining Clusters

In [32]:
# cluster 1
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,0,Bus Line,Park,Swim School,Wings Joint,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
8,Central Toronto,0,Park,Playground,Trail,Tennis Court,Wings Joint,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run
10,Downtown Toronto,0,Park,Playground,Trail,Wings Joint,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
23,Central Toronto,0,Park,Trail,Jewelry Store,Sushi Restaurant,Wings Joint,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [33]:
# Cluster 2
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,1,Pub,Health Food Store,Trail,Wings Joint,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Dumpling Restaurant
1,East Toronto,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Liquor Store,Indian Restaurant,Spa,Bookstore,Brewery
2,East Toronto,1,Sandwich Place,Park,Brewery,Sushi Restaurant,Steakhouse,Fish & Chips Shop,Italian Restaurant,Fast Food Restaurant,Liquor Store,Pet Store
3,East Toronto,1,Café,Coffee Shop,Italian Restaurant,American Restaurant,Bakery,Park,Seafood Restaurant,Bar,Stationery Store,Coworking Space
5,Central Toronto,1,Hotel,Gym,Park,Sandwich Place,Breakfast Spot,Clothing Store,Food & Drink Shop,Wings Joint,Discount Store,Dog Run
6,Central Toronto,1,Clothing Store,Sporting Goods Shop,Coffee Shop,Yoga Studio,Rental Car Location,Shoe Store,Spa,Diner,Salon / Barbershop,Burger Joint
7,Central Toronto,1,Pizza Place,Gym,Dessert Shop,Sandwich Place,Sushi Restaurant,Coffee Shop,Italian Restaurant,Café,Seafood Restaurant,Skating Rink
9,Central Toronto,1,Pub,Coffee Shop,Fried Chicken Joint,Liquor Store,Restaurant,Sports Bar,Bagel Shop,Supermarket,Sushi Restaurant,Athletics & Sports
11,Downtown Toronto,1,Coffee Shop,Restaurant,Park,Pizza Place,Bakery,Italian Restaurant,Café,Pub,Market,Pet Store
12,Downtown Toronto,1,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Café,Italian Restaurant,Mediterranean Restaurant,Men's Store,Gym


In [34]:
# Cluster 3
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,2,Home Service,Garden,Wings Joint,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
