# Segmenting and Clustering Neighborhoods in Toronto

In this analysis,I explored how to convert addresses into their equivalent latitude and longitude values. 
Also,learned how to use the Foursquare API to explore neighborhoods in Toronto City.I used the explore 
function to get the most common venue categories in each neighborhood, and then used this feature to 
group the neighborhoods into clusters.I used the k-means clustering algorithm to complete this task. 
Also used the Folium library to visualize the neighborhoods and their emerging clusters.

Table of Contents

* Download and Explore Dataset
* Explore Neighborhoods in Toronto ,Canada
* Analyze Each Neighborhood
* Cluster Neighborhoods
* Examine Clusters

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


## 1. Download and Explore Dataset

In [8]:
from bs4 import BeautifulSoup


Scrap List of postal codes of Canada wiki page content by using BeautifulSoup

In [11]:
# download url data from internet
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(url).text
Canada_data = BeautifulSoup(source, 'lxml')

Convert content of PostalCode HTML table as dataframe

In [13]:
# creat a new Dataframe
column_names = ['Postalcode','Borough','Neighborhood']
toronto = pd.DataFrame(columns = column_names)

# loop through to find postcode, borough, neighborhood 
content = Canada_data.find('div', class_='mw-parser-output')
table = content.table.tbody
postcode = 0
borough = 0
neighborhood = 0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            postcode = td.text
            i = i + 1
        elif i == 1:
            borough = td.text
            i = i + 1
        elif i == 2: 
            neighborhood = td.text.strip('\n').replace(']','')
    toronto = toronto.append({'Postalcode': postcode,'Borough': borough,'Neighborhood': neighborhood},ignore_index=True)

# clean dataframe 
toronto = toronto[toronto.Borough!='Not assigned']
toronto = toronto[toronto.Borough!= 0]
toronto.reset_index(drop = True, inplace = True)
i = 0
for i in range(0,toronto.shape[0]):
    if toronto.iloc[i][2] == 'Not assigned':
        toronto.iloc[i][2] = toronto.iloc[i][1]
        i = i+1
                                 
df = toronto.groupby(['Postalcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()

In [14]:
df

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Drop "None" rows in DataFrame.
Drop any row which contains 'Not assigned' value.
All "Not assigned" will be replace to 'NaN' using numpy.

In [16]:
df = df.dropna()
empty = 'Not assigned'
df = df[(df.Postalcode != empty ) & (df.Borough != empty) & (df.Neighborhood != empty)]

In [18]:
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [19]:
def neighborhood_list(grouped):    
    return ', '.join(sorted(grouped['Neighborhood'].tolist()))
                    
grp = df.groupby(['Postalcode', 'Borough'])
df2 = grp.apply(neighborhood_list).reset_index(name='Neighborhood')

In [20]:
print(df2.shape)
df2.head()

(103, 3)


Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [52]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df2['Borough'].unique()),
        df2.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


In [53]:
latitude=[] #List to collect the latitudes
longitude=[] #List to collect the longitudes

for i in df2['Postalcode']: #Iterating through Postalcodes to collect the locations data
    j='toronto,'+i
    try:
        url ="https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}".format(API_key,j)
        
        response = requests.get(url).json() # get response
        geographical_data = response['results'][0]['geometry']['location'] # get geographical coordinates
        
        latitude.append(geographical_data['lat'])
        longitude.append(geographical_data['lng'])
    except:
        pass

In [58]:
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K    100% |████████████████████████████████| 102kB 15.8MB/s 
[?25hCollecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [64]:
import geocoder


In [65]:
def get_latlng(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords
    
get_latlng('M4G')

[43.70976500000006, -79.36379132299999]

In [78]:
postal_codes = df['Postalcode']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes.tolist() ]

In [79]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df2['Latitude'] = df_coords['Latitude']
df2['Longitude'] = df_coords['Longitude']

In [81]:
df2.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.81165,-79.195561
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785605,-79.158701
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76569,-79.175299
3,M1G,Scarborough,Woburn,43.768216,-79.21761
4,M1H,Scarborough,Cedarbrae,43.769608,-79.23944
5,M1J,Scarborough,Scarborough Village,43.743085,-79.232172
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.72626,-79.26367
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.713213,-79.28491
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.723575,-79.234976
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.69669,-79.260069


In [63]:
address = 'Toronto, Ontario'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Ontario are 43.653963, -79.387207.


In [82]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df2['Latitude'], df2['Longitude'], df2['Borough'], df2['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [86]:
Scarborough_data = df2[df2['Borough'] == 'Scarborough'].reset_index(drop=True)
Scarborough_data


Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.81165,-79.195561
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785605,-79.158701
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76569,-79.175299
3,M1G,Scarborough,Woburn,43.768216,-79.21761
4,M1H,Scarborough,Cedarbrae,43.769608,-79.23944
5,M1J,Scarborough,Scarborough Village,43.743085,-79.232172
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.72626,-79.26367
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.713213,-79.28491
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.723575,-79.234976
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.69669,-79.260069


In [87]:
address = 'Scarborough,Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude_s = location.latitude
longitude_s = location.longitude
print('The geograpical coordinate of Scarborough are {}, {}.'.format(latitude_s, longitude_s))

The geograpical coordinate of Scarborough are 43.773077, -79.257774.


In [88]:
# create map of Scarborough using latitude and longitude values
map_scarborough = folium.Map(location=[latitude_s, longitude_s], zoom_start=10)

# add markers to map
for lat, lng, bor, nei in zip(Scarborough_data['Latitude'], Scarborough_data['Longitude'],Scarborough_data['Borough'], Scarborough_data['Neighborhood']):
    
    label = '{}, {}'.format(nei, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scarborough)  
    
map_scarborough

Exploring the first neighborhoood of Scarborough

In [89]:
#Name of the neighborhood
first_nei=Scarborough_data['Neighborhood'][0]
first_nei

'Rouge, Malvern'

In [90]:
#Location of the neighborhood
first_nei_lat=Scarborough_data.loc[0,'Latitude']
first_nei_lon=Scarborough_data.loc[0,'Longitude']
print('Latitude and longitude values of {} are {}, {}.'.format(first_nei, 
                                                               first_nei_lat, 
                                                               first_nei_lon))

Latitude and longitude values of Rouge, Malvern are 43.81165000000004, -79.19556138899998.


Now, let's get the top 100 venues that are in Malvern within a radius of 500 meters.

In [94]:
CLIENT_ID = 'HU5FWCR5TTHNPOTBIX5DN4AA5OGWV2RXJCUAEXEA5URYXYQF' # your Foursquare ID
CLIENT_SECRET = '5FSAEAC5ZZOR4VJTYUKEWIHOIYBZ1LHMF05CIOGOT23VIJRD' # your Foursquare Secret
VERSION = '20181216' # Foursquare API version


In [95]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, first_nei_lat, first_nei_lon, VERSION, radius, LIMIT)

In [96]:
results = requests.get(url).json()

In [97]:
second_nei_name=Scarborough_data.loc[15,'Neighborhood']
second_nei_lat=Scarborough_data.loc[15,'Latitude']
second_nei_lon=Scarborough_data.loc[15,'Longitude']
print('Latitude and longitude values of {} are {}, {}.'.format(second_nei_name, 
                                                               second_nei_lat, 
                                                               second_nei_lon))

Latitude and longitude values of L'Amoreaux West, Steeles West are 43.80069800200005, -79.32073999999994.


In [98]:
radius = 500 
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    second_nei_lat, 
    second_nei_lon, 
    radius, 
    LIMIT)

In [99]:
results = requests.get(url).json()

In [104]:
venues=results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues.columns

Index(['reasons.count', 'reasons.items', 'referralId', 'venue.categories',
       'venue.id', 'venue.location.address', 'venue.location.cc',
       'venue.location.city', 'venue.location.country',
       'venue.location.crossStreet', 'venue.location.distance',
       'venue.location.formattedAddress', 'venue.location.labeledLatLngs',
       'venue.location.lat', 'venue.location.lng',
       'venue.location.neighborhood', 'venue.location.postalCode',
       'venue.location.state', 'venue.name', 'venue.photos.count',
       'venue.photos.groups'],
      dtype='object')

In [105]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [106]:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues




Unnamed: 0,name,categories,lat,lng
0,Mr Congee Chinese Cuisine 龍粥記,Chinese Restaurant,43.798879,-79.318335
1,Phoenix Restaurant 金鳳餐廳,Chinese Restaurant,43.798198,-79.318432
2,Price Chopper,Grocery Store,43.799445,-79.318563
3,Subway,Sandwich Place,43.798983,-79.318838
4,Shoppers Drug Mart,Pharmacy,43.79967,-79.319315
5,KFC,Fast Food Restaurant,43.798938,-79.318854
6,Tim Hortons,Coffee Shop,43.798281,-79.318317
7,Yamamoto Japanese Cuisine 山本盛世,Japanese Restaurant,43.798589,-79.318558
8,McDonald's,Fast Food Restaurant,43.79888,-79.318724
9,A Buck or Two,Thrift / Vintage Store,43.798286,-79.318485


# Explore Neighborhoods in Scarborough

In [107]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        venue_results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in venue_results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [108]:
scarborough_venues = getNearbyVenues(names=Scarborough_data['Neighborhood'],
                                   latitudes=Scarborough_data['Latitude'],
                                   longitudes=Scarborough_data['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West, Steeles West
Upper Rouge


In [109]:
print(scarborough_venues.shape)
scarborough_venues.head()

(91, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Highland Creek, Rouge Hill, Port Union",43.785605,-79.158701,Scarborough Historical Society,43.788755,-79.162438,History Museum
1,"Highland Creek, Rouge Hill, Port Union",43.785605,-79.158701,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Guildwood, Morningside, West Hill",43.76569,-79.175299,The Strawberry Patch,43.764738,-79.173081,Tea Room
3,"Guildwood, Morningside, West Hill",43.76569,-79.175299,Homestead Roofing Repair,43.76514,-79.178663,Construction & Landscaping
4,"Guildwood, Morningside, West Hill",43.76569,-79.175299,Heron Park Community Centre,43.768867,-79.176958,Gym / Fitness Center


In [110]:
scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,12,12,12,12,12,12
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",2,2,2,2,2,2
"Birch Cliff, Cliffside West",6,6,6,6,6,6
Cedarbrae,2,2,2,2,2,2
"Clairlea, Golden Mile, Oakridge",10,10,10,10,10,10
"Clarks Corners, Sullivan, Tam O'Shanter",11,11,11,11,11,11
"Cliffcrest, Cliffside, Scarborough Village West",11,11,11,11,11,11
"Dorset Park, Scarborough Town Centre, Wexford Heights",4,4,4,4,4,4
"East Birchmount Park, Ionview, Kennedy Park",4,4,4,4,4,4
"Guildwood, Morningside, West Hill",4,4,4,4,4,4


In [111]:
print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

There are 57 uniques categories.


# Analyze Each Neighborhood

In [112]:
# one hot encoding
scarborough_onehot = pd.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarborough_onehot['Neighborhood'] = scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot.columns[:-1])
scarborough_onehot = scarborough_onehot[fixed_columns]
scarborough_onehot.head()

Unnamed: 0,Neighborhood,Auto Garage,Automotive Shop,Badminton Court,Bakery,Bar,Brewery,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Chinese Restaurant,Coffee Shop,College Stadium,Construction & Landscaping,Department Store,Discount Store,Fast Food Restaurant,Fried Chicken Joint,Furniture / Home Store,General Entertainment,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,History Museum,Hobby Shop,Hong Kong Restaurant,Indian Restaurant,Intersection,Japanese Restaurant,Liquor Store,Metro Station,Noodle House,Other Great Outdoors,Park,Pharmacy,Pizza Place,Playground,Pool,Restaurant,Sandwich Place,Shopping Mall,Skating Rink,Soccer Field,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Trail,Train Station,Vietnamese Restaurant,Wings Joint
0,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [113]:
scarborough_onehot.shape

(91, 58)

In [114]:
scarborough_grouped = scarborough_onehot.groupby('Neighborhood').mean().reset_index()
scarborough_grouped

Unnamed: 0,Neighborhood,Auto Garage,Automotive Shop,Badminton Court,Bakery,Bar,Brewery,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Chinese Restaurant,Coffee Shop,College Stadium,Construction & Landscaping,Department Store,Discount Store,Fast Food Restaurant,Fried Chicken Joint,Furniture / Home Store,General Entertainment,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,History Museum,Hobby Shop,Hong Kong Restaurant,Indian Restaurant,Intersection,Japanese Restaurant,Liquor Store,Metro Station,Noodle House,Other Great Outdoors,Park,Pharmacy,Pizza Place,Playground,Pool,Restaurant,Sandwich Place,Shopping Mall,Skating Rink,Soccer Field,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Trail,Train Station,Vietnamese Restaurant,Wings Joint
0,Agincourt,0.0,0.0,0.083333,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.083333,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0
1,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
4,"Clairlea, Golden Mile, Oakridge",0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.1,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Clarks Corners, Sullivan, Tam O'Shanter",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.181818,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0
6,"Cliffcrest, Cliffside, Scarborough Village West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.181818,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909
7,"Dorset Park, Scarborough Town Centre, Wexford ...",0.0,0.25,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"East Birchmount Park, Ionview, Kennedy Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Guildwood, Morningside, West Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0


In [115]:
scarborough_grouped.shape

(15, 58)

In [116]:
num_top_venues = 5
for hood in scarborough_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp =scarborough_grouped[scarborough_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
              venue  freq
0      Skating Rink  0.08
1  Sushi Restaurant  0.08
2              Park  0.08
3              Pool  0.08
4     Shopping Mall  0.08


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
               venue  freq
0           Pharmacy   1.0
1        Auto Garage   0.0
2         Hobby Shop   0.0
3  Indian Restaurant   0.0
4       Intersection   0.0


----Birch Cliff, Cliffside West----
                   venue  freq
0               Gym Pool  0.17
1                    Gym  0.17
2  General Entertainment  0.17
3                   Park  0.17
4           Skating Rink  0.17


----Cedarbrae----
               venue  freq
0              Trail   0.5
1         Playground   0.5
2        Auto Garage   0.0
3               Pool   0.0
4  Indian Restaurant   0.0


----Clairlea, Golden Mile, Oakridge----
          venue  freq
0      Bus Line   0.2
1        Bakery   0.2
2   Coffee Shop   0.2
3   Bus Station   0.1
4  Soccer Field   0.1


----Clarks Corn

In [117]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [118]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scarborough_grouped['Neighborhood']

for ind in np.arange(scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Skating Rink,Chinese Restaurant,Badminton Court,Bakery,Park,Vietnamese Restaurant,Sushi Restaurant,Supermarket,Bubble Tea Shop,Shopping Mall
1,"Agincourt North, L'Amoreaux East, Milliken, St...",Pharmacy,Wings Joint,College Stadium,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint
2,"Birch Cliff, Cliffside West",Gym Pool,College Stadium,Gym,General Entertainment,Skating Rink,Park,Grocery Store,Golf Course,Gift Shop,Furniture / Home Store
3,Cedarbrae,Playground,Trail,Wings Joint,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant
4,"Clairlea, Golden Mile, Oakridge",Coffee Shop,Bus Line,Bakery,Metro Station,Intersection,Bus Station,Soccer Field,Discount Store,Fast Food Restaurant,Fried Chicken Joint
5,"Clarks Corners, Sullivan, Tam O'Shanter",Pharmacy,Pizza Place,Fried Chicken Joint,Bus Stop,Shopping Mall,Chinese Restaurant,Thai Restaurant,Hobby Shop,Golf Course,General Entertainment
6,"Cliffcrest, Cliffside, Scarborough Village West",Fast Food Restaurant,Wings Joint,Sandwich Place,Furniture / Home Store,Liquor Store,Discount Store,Pharmacy,Pizza Place,Coffee Shop,Burger Joint
7,"Dorset Park, Scarborough Town Centre, Wexford ...",Automotive Shop,Bakery,Brewery,Gift Shop,Wings Joint,Construction & Landscaping,Gym,Grocery Store,Golf Course,General Entertainment
8,"East Birchmount Park, Ionview, Kennedy Park",Discount Store,Coffee Shop,Department Store,Gym Pool,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store
9,"Guildwood, Morningside, West Hill",Tea Room,Park,Construction & Landscaping,Gym / Fitness Center,Wings Joint,College Stadium,Grocery Store,Golf Course,Gift Shop,General Entertainment


In [119]:
# set number of clusters
# Using k-means to cluster the neighborhood into 4 clusters.
kclusters = 4

scarborough_grouped_clustering = scarborough_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scarborough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 3, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], dtype=int32)

In [121]:
scarborough_merged = Scarborough_data[0:15]

# add clustering labels
scarborough_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
scarborough_merged = scarborough_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

scarborough_merged.head() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.81165,-79.195561,1,,,,,,,,,,
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785605,-79.158701,3,History Museum,Bar,College Stadium,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76569,-79.175299,1,Tea Room,Park,Construction & Landscaping,Gym / Fitness Center,Wings Joint,College Stadium,Grocery Store,Golf Course,Gift Shop,General Entertainment
3,M1G,Scarborough,Woburn,43.768216,-79.21761,2,Coffee Shop,Park,Business Service,Construction & Landscaping,College Stadium,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment
4,M1H,Scarborough,Cedarbrae,43.769608,-79.23944,1,Playground,Trail,Wings Joint,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant


# Visualize the Cluster

In [122]:
# create map
map_clusters = folium.Map(location=[latitude_s, longitude_s], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print(rainbow)
# add markers to the map
markers_colors = []
for lat, lon, nei , cluster in zip(scarborough_merged['Latitude'], scarborough_merged['Longitude'], scarborough_merged['Neighborhood'], scarborough_merged['Cluster Labels']):
    label = folium.Popup(str(nei) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

['#8000ff', '#2adddd', '#d4dd80', '#ff0000']


# Examine the Cluster

## cluster 1

In [123]:

scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 0,scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Agincourt,0,Skating Rink,Chinese Restaurant,Badminton Court,Bakery,Park,Vietnamese Restaurant,Sushi Restaurant,Supermarket,Bubble Tea Shop,Shopping Mall


## cluster 2

In [124]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 1,scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Rouge, Malvern",1,,,,,,,,,,
2,"Guildwood, Morningside, West Hill",1,Tea Room,Park,Construction & Landscaping,Gym / Fitness Center,Wings Joint,College Stadium,Grocery Store,Golf Course,Gift Shop,General Entertainment
4,Cedarbrae,1,Playground,Trail,Wings Joint,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant
5,Scarborough Village,1,Train Station,Grocery Store,Indian Restaurant,Restaurant,Wings Joint,Coffee Shop,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store
6,"East Birchmount Park, Ionview, Kennedy Park",1,Discount Store,Coffee Shop,Department Store,Gym Pool,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store
7,"Clairlea, Golden Mile, Oakridge",1,Coffee Shop,Bus Line,Bakery,Metro Station,Intersection,Bus Station,Soccer Field,Discount Store,Fast Food Restaurant,Fried Chicken Joint
8,"Cliffcrest, Cliffside, Scarborough Village West",1,Fast Food Restaurant,Wings Joint,Sandwich Place,Furniture / Home Store,Liquor Store,Discount Store,Pharmacy,Pizza Place,Coffee Shop,Burger Joint
9,"Birch Cliff, Cliffside West",1,Gym Pool,College Stadium,Gym,General Entertainment,Skating Rink,Park,Grocery Store,Golf Course,Gift Shop,Furniture / Home Store
10,"Dorset Park, Scarborough Town Centre, Wexford ...",1,Automotive Shop,Bakery,Brewery,Gift Shop,Wings Joint,Construction & Landscaping,Gym,Grocery Store,Golf Course,General Entertainment
11,"Maryvale, Wexford",1,Auto Garage,Intersection,College Stadium,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint


## cluster 3

In [125]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 2,scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Woburn,2,Coffee Shop,Park,Business Service,Construction & Landscaping,College Stadium,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment


## cluster 4

In [126]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 3,scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Highland Creek, Rouge Hill, Port Union",3,History Museum,Bar,College Stadium,Gym,Grocery Store,Golf Course,Gift Shop,General Entertainment,Furniture / Home Store,Fried Chicken Joint
