<h1>Segmenting and Clustering Neighborhoods in Toronto Part 3</h1>

Assignment Instructions:
<p>For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.
<p>Start by creating a new Notebook for this assignment.
<p>Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:he dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
<p>Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
<p>More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
<p>
<p>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
<p>Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
<p>In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [1]:
!conda install -c conda-forge geopy --yes 
!conda install -c conda-forge folium --yes 
!conda install -c conda-forge pyquery --yes

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



<h2>1. Get Wiki page containing Toronto Boroughs/Neighborhoods

In [2]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
wikitables = soup.find_all('table') 
Toronto = pd.read_html(str(wikitables[0]), index_col=None, header=0)[0]
Toronto.head()

  return f(*args, **kwds)
  return f(*args, **kwds)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [3]:
# verify dimension shape
Toronto.shape

(289, 3)

<h2>2. Data Cleaning

In [4]:
# Empty entries to np.nan to drop them in the next step
Toronto['Borough'].replace('', np.nan, inplace=True)
# Drop np.nan to remove rows not containing meaningful data
Toronto.dropna(subset=['Borough'], inplace=True)
# Leave behind rows containing 'Not assigned' in 'Borough'
Toronto = Toronto[Toronto['Borough'] != 'Not assigned']

<h2>3. Data Processing "Not Assigned" Value

In [5]:
# Iterate over the dataframe and fix 'Not assigned' for column 'Neighborhood'
for i, _ in Toronto.iterrows():
    if Toronto.loc[i]['Neighbourhood'] == 'Not assigned': Toronto.loc[i]['Neighborhood'] = Toronto.loc[i]['Borough']

<h2>4. Check Dataframe Shape

In [6]:
# Check datafame shape
Toronto.shape

(212, 3)

<h2>5. Dataframe

In [7]:
Toronto.head(212)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


<h2>6.  Number of Rows in the Dataframe

In [8]:
# Print the number of rows in the dataframe
print('Number of rows in Toronto dataframe: {}'.format(Toronto.shape[0]))

Number of rows in Toronto dataframe: 212


<h2>7. Postal Codes to Borough and Neighborhood

In [9]:
!conda install -c conda-forge geocoder --yes

Solving environment: done

# All requested packages already installed.



In [10]:
!wget -q --no-check-certificate -O 'latitude.pickle' 'https://docs.google.com/uc?export=download&id=1PdEOkPErrpBtDgSlDwczIv_KLlpY-YcO'
!wget -q --no-check-certificate -O 'longitude.pickle' 'https://docs.google.com/uc?export=download&id=1XujA04dCARQnlxu-X2ItOVcYQz0MMQh9'

In [11]:
!ls -l *.pickle

-rw-rw-r-- 1 jupyterlab resources 1965 Nov 27 14:27 latitude.pickle
-rw-rw-r-- 1 jupyterlab resources 1965 Nov 27 14:27 longitude.pickle


In [12]:
import pickle

with open('latitude.pickle', 'rb') as flat:
    latitude = pickle.load(flat)
with open('longitude.pickle', 'rb') as flon:
    longitude = pickle.load(flon)

In [13]:
TPS = Toronto['Postcode'].unique()
len(TPS)

103

In [14]:
import geocoder
import time

for postcode in TPS:
    # When offline cache is available use it to avoid Geocoder Google API throttling
    if postcode in latitude.keys(): 
        continue
    while True:
        g = geocoder.google('{}, Toronto, Ontario'.format(postcode))
        lat_lng_coords = g.latlng
        if lat_lng_coords == None:
            print('Trottled response to {}'.format(postcode))
            time.sleep(5)
            continue
        break
    
    if lat_lng_coords != None:
        latitude[postcode] = lat_lng_coords[0]
        longitude[postcode] = lat_lng_coords[1]
print('Successfully populated geo locations')

Successfully populated geo locations


In [15]:
lat = []
lon = []
for i, _ in Toronto.iterrows():
    lat.append(latitude[Toronto.loc[i]['Postcode']])
    lon.append(longitude[Toronto.loc[i]['Postcode']])

<h2>8. Add Latitude and Longitude to the dataframe

In [16]:
Toronto = Toronto.assign(Latitude = lat, Longitude=lon)
Toronto.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
5,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
6,M6A,North York,Lawrence Heights,43.718518,-79.464763


<h2>9. Show the Toronto Dataframe

In [17]:
Toronto = Toronto.reset_index(drop=True)
Toronto.to_csv('Toronto.csv')
Toronto.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,43.718518,-79.464763
5,M6A,North York,Lawrence Manor,43.718518,-79.464763
6,M7A,Queen's Park,Not assigned,43.662301,-79.389494
7,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
8,M1B,Scarborough,Rouge,43.806686,-79.194353
9,M1B,Scarborough,Malvern,43.806686,-79.194353


<h2>End of Coordinates Assessment

<h2>9. Establish Clustering of Neighborhoods in Toronto

In [18]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium # map rendering library

print('Libraries imported.')

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


Libraries imported.


In [19]:
address = 'Toronto, Ontario'

geolocator = Nominatim()
location = geolocator.geocode(address)
T_lat = location.latitude
T_lon = location.longitude
print('The geograpical coordinate of Toronto, ON, Canada are {}, {}.'.format(T_lat, T_lon))



The geograpical coordinate of Toronto, ON, Canada are 43.653963, -79.387207.


<h2>10. Create a map of Toronto with neighbourhoods

In [21]:
T_map = folium.Map(location=[T_lat, lon], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Toronto['Latitude'], Toronto['Longitude'], Toronto['Borough'], Toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(T_map)  
    
T_map

<h2> 11. Prepare Foursquare Credentials

In [23]:
CLIENT_ID = '23DB3QXW1DBU3D4DZSWG02LAG0JHJ1V010GUOPGP1WO3OXTR'
CLIENT_SECRET = 'QWGHM2RUOI4NGXHBWM2FBAMQYAANHMRL5B0RBEX1YFDNHLMZ'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 23DB3QXW1DBU3D4DZSWG02LAG0JHJ1V010GUOPGP1WO3OXTR
CLIENT_SECRET:QWGHM2RUOI4NGXHBWM2FBAMQYAANHMRL5B0RBEX1YFDNHLMZ


<h2> 12. West Toronto Locations

In [24]:
HighPark = Toronto[Toronto['Borough'] == 'West Toronto']
HighPark.reset_index(drop=True)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M6H,West Toronto,Dovercourt Village,43.669005,-79.442259
1,M6H,West Toronto,Dufferin,43.669005,-79.442259
2,M6J,West Toronto,Little Portugal,43.647927,-79.41975
3,M6J,West Toronto,Trinity,43.647927,-79.41975
4,M6K,West Toronto,Brockton,43.636847,-79.428191
5,M6K,West Toronto,Exhibition Place,43.636847,-79.428191
6,M6K,West Toronto,Parkdale Village,43.636847,-79.428191
7,M6P,West Toronto,High Park,43.661608,-79.464763
8,M6P,West Toronto,The Junction South,43.661608,-79.464763
9,M6R,West Toronto,Parkdale,43.64896,-79.456325


In [26]:

HP_lon = HighPark['Longitude'].values[0] # neighborhood longitude value

HP_name = HighPark['Neighbourhood'].values[0] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(HP_name, 
                                                               HP_lat, 
                                                               HP_lon))

Latitude and longitude values of Dovercourt Village are 43.6690051, -79.4422593.


In [27]:
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    HP_lat, 
    HP_lon, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=23DB3QXW1DBU3D4DZSWG02LAG0JHJ1V010GUOPGP1WO3OXTR&client_secret=QWGHM2RUOI4NGXHBWM2FBAMQYAANHMRL5B0RBEX1YFDNHLMZ&v=20180605&ll=43.6690051,-79.4422593&radius=500&limit=100'

In [30]:
results = requests.get(url).json()
# results
if results != None:
    print("Request successfully processed")



Request successfully processed


In [31]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [32]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,The Greater Good Bar,Bar,43.669409,-79.439267
1,Parallel,Middle Eastern Restaurant,43.669516,-79.438728
2,FreshCo,Supermarket,43.667918,-79.440754
3,Happy Bakery & Pastries,Bakery,43.66705,-79.441791
4,Planet Fitness Toronto Galleria,Gym / Fitness Center,43.667588,-79.442574


In [33]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

17 venues were returned by Foursquare.


In [34]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [35]:
# type your answer here
HP_venues = getNearbyVenues(names=HighPark['Neighbourhood'],
                                   latitudes=HighPark['Latitude'],
                                   longitudes=HighPark['Longitude']
                                  )

Dovercourt Village
Dufferin
Little Portugal
Trinity
Brockton
Exhibition Place
Parkdale Village
High Park
The Junction South
Parkdale
Roncesvalles
Runnymede
Swansea


In [36]:
print(HP_venues.shape)
HP_venues.head()

(385, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Dovercourt Village,43.669005,-79.442259,The Greater Good Bar,43.669409,-79.439267,Bar
1,Dovercourt Village,43.669005,-79.442259,Parallel,43.669516,-79.438728,Middle Eastern Restaurant
2,Dovercourt Village,43.669005,-79.442259,FreshCo,43.667918,-79.440754,Supermarket
3,Dovercourt Village,43.669005,-79.442259,Happy Bakery & Pastries,43.66705,-79.441791,Bakery
4,Dovercourt Village,43.669005,-79.442259,Planet Fitness Toronto Galleria,43.667588,-79.442574,Gym / Fitness Center


In [37]:
HP_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Brockton,23,23,23,23,23,23
Dovercourt Village,17,17,17,17,17,17
Dufferin,17,17,17,17,17,17
Exhibition Place,23,23,23,23,23,23
High Park,24,24,24,24,24,24
Little Portugal,64,64,64,64,64,64
Parkdale,16,16,16,16,16,16
Parkdale Village,23,23,23,23,23,23
Roncesvalles,16,16,16,16,16,16
Runnymede,37,37,37,37,37,37


In [38]:
print('There are {} uniques categories.'.format(len(HP_venues['Venue Category'].unique())))

There are 86 uniques categories.


In [39]:
# one hot encoding
HP_onehot = pd.get_dummies(HP_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
HP_onehot['Neighborhood'] = HP_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [HP_onehot.columns[-1]] + list(HP_onehot.columns[:-1])
HP_onehot = HP_onehot[fixed_columns]

HP_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bakery,Bank,Bar,Bookstore,Boutique,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Cuban Restaurant,Cupcake Shop,Dessert Shop,Diner,Discount Store,Dog Run,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Liquor Store,Mac & Cheese Joint,Malay Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Music Venue,New American Restaurant,Park,Performing Arts Venue,Pet Store,Pharmacy,Piano Bar,Pizza Place,Post Office,Pub,Record Shop,Restaurant,Salon / Barbershop,Sandwich Place,Smoothie Shop,Southern / Soul Food Restaurant,Speakeasy,Sports Bar,Stadium,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Dovercourt Village,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Dovercourt Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Dovercourt Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
3,Dovercourt Village,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Dovercourt Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [40]:
HP_onehot.shape

(385, 87)

In [41]:
HP_grouped = HP_onehot.groupby('Neighborhood').mean().reset_index()
HP_grouped

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bakery,Bank,Bar,Bookstore,Boutique,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Cuban Restaurant,Cupcake Shop,Dessert Shop,Diner,Discount Store,Dog Run,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Ice Cream Shop,Indie Movie Theater,Italian Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Liquor Store,Mac & Cheese Joint,Malay Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Music Venue,New American Restaurant,Park,Performing Arts Venue,Pet Store,Pharmacy,Piano Bar,Pizza Place,Post Office,Pub,Record Shop,Restaurant,Salon / Barbershop,Sandwich Place,Smoothie Shop,Southern / Soul Food Restaurant,Speakeasy,Sports Bar,Stadium,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Brockton,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.086957,0.0,0.0,0.043478,0.086957,0.0,0.043478,0.043478,0.0,0.130435,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478
1,Dovercourt Village,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.058824,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Dufferin,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.058824,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Exhibition Place,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.086957,0.0,0.0,0.043478,0.086957,0.0,0.043478,0.043478,0.0,0.130435,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478
4,High Park,0.0,0.041667,0.0,0.0,0.041667,0.0,0.041667,0.0,0.083333,0.041667,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.041667,0.041667,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0
5,Little Portugal,0.015625,0.0,0.015625,0.015625,0.0,0.03125,0.03125,0.0,0.125,0.0,0.03125,0.0,0.015625,0.0,0.0,0.0625,0.0,0.0,0.0,0.03125,0.046875,0.0,0.015625,0.015625,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0,0.015625,0.015625,0.0,0.015625,0.015625,0.015625,0.0,0.0,0.015625,0.015625,0.03125,0.0,0.0,0.015625,0.0,0.015625,0.03125,0.015625,0.0,0.0,0.0,0.0,0.03125,0.0,0.015625,0.015625,0.046875,0.015625,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.03125,0.015625,0.015625
6,Parkdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.125,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Parkdale Village,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.086957,0.0,0.0,0.043478,0.086957,0.0,0.043478,0.043478,0.0,0.130435,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478
8,Roncesvalles,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.125,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Runnymede,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.027027,0.081081,0.0,0.0,0.0,0.0,0.108108,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.027027,0.0,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.027027,0.0,0.0,0.027027,0.054054,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.081081,0.027027,0.027027,0.0,0.027027,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.054054,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0


In [42]:
HP_grouped.shape

(13, 87)

In [43]:
num_top_venues = 5

for hood in HP_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = HP_grouped[HP_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Brockton----
                venue  freq
0         Coffee Shop  0.13
1                Café  0.09
2      Breakfast Spot  0.09
3             Stadium  0.04
4  Falafel Restaurant  0.04


----Dovercourt Village----
                       venue  freq
0                Supermarket  0.12
1                   Pharmacy  0.12
2                     Bakery  0.12
3  Middle Eastern Restaurant  0.06
4                       Park  0.06


----Dufferin----
                       venue  freq
0                Supermarket  0.12
1                   Pharmacy  0.12
2                     Bakery  0.12
3  Middle Eastern Restaurant  0.06
4                       Park  0.06


----Exhibition Place----
                venue  freq
0         Coffee Shop  0.13
1                Café  0.09
2      Breakfast Spot  0.09
3             Stadium  0.04
4  Falafel Restaurant  0.04


----High Park----
                  venue  freq
0    Mexican Restaurant  0.08
1                   Bar  0.08
2                  Café  0.08
3           

In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [45]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = HP_grouped['Neighborhood']

for ind in np.arange(HP_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(HP_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brockton,Coffee Shop,Breakfast Spot,Café,Yoga Studio,Pet Store,Bakery,Bar,Burrito Place,Caribbean Restaurant,Climbing Gym
1,Dovercourt Village,Pharmacy,Bakery,Supermarket,Bank,Bar,Liquor Store,Discount Store,Brewery,Music Venue,Café
2,Dufferin,Pharmacy,Bakery,Supermarket,Bank,Bar,Liquor Store,Discount Store,Brewery,Music Venue,Café
3,Exhibition Place,Coffee Shop,Breakfast Spot,Café,Yoga Studio,Pet Store,Bakery,Bar,Burrito Place,Caribbean Restaurant,Climbing Gym
4,High Park,Mexican Restaurant,Café,Bar,Bookstore,Music Venue,Diner,Park,Cajun / Creole Restaurant,Fast Food Restaurant,Italian Restaurant
5,Little Portugal,Bar,Café,Restaurant,Coffee Shop,Boutique,Men's Store,New American Restaurant,Cocktail Bar,Pizza Place,Bakery
6,Parkdale,Gift Shop,Breakfast Spot,Cuban Restaurant,Bar,Eastern European Restaurant,Burger Joint,Dessert Shop,Dog Run,Bookstore,Restaurant
7,Parkdale Village,Coffee Shop,Breakfast Spot,Café,Yoga Studio,Pet Store,Bakery,Bar,Burrito Place,Caribbean Restaurant,Climbing Gym
8,Roncesvalles,Gift Shop,Breakfast Spot,Cuban Restaurant,Bar,Eastern European Restaurant,Burger Joint,Dessert Shop,Dog Run,Bookstore,Restaurant
9,Runnymede,Coffee Shop,Pizza Place,Café,Sushi Restaurant,Italian Restaurant,Grocery Store,Falafel Restaurant,Gourmet Shop,Indie Movie Theater,Post Office


In [46]:
# set number of clusters
kclusters = 5

HP_grouped_clustering = HP_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(HP_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 3, 3, 1, 4, 0, 2, 1, 2, 0], dtype=int32)

In [47]:
HP_merged = HighPark

# add clustering labels
HP_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
HP_merged = HP_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

HP_merged.head() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,M6H,West Toronto,Dovercourt Village,43.669005,-79.442259,1,Pharmacy,Bakery,Supermarket,Bank,Bar,Liquor Store,Discount Store,Brewery,Music Venue,Café
53,M6H,West Toronto,Dufferin,43.669005,-79.442259,3,Pharmacy,Bakery,Supermarket,Bank,Bar,Liquor Store,Discount Store,Brewery,Music Venue,Café
64,M6J,West Toronto,Little Portugal,43.647927,-79.41975,3,Bar,Café,Restaurant,Coffee Shop,Boutique,Men's Store,New American Restaurant,Cocktail Bar,Pizza Place,Bakery
65,M6J,West Toronto,Trinity,43.647927,-79.41975,1,Bar,Café,Restaurant,Coffee Shop,Boutique,Men's Store,New American Restaurant,Cocktail Bar,Pizza Place,Bakery
76,M6K,West Toronto,Brockton,43.636847,-79.428191,4,Coffee Shop,Breakfast Spot,Café,Yoga Studio,Pet Store,Bakery,Bar,Burrito Place,Caribbean Restaurant,Climbing Gym


In [50]:
# create map
map_clusters = folium.Map(location=[lat, lon], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HP_merged['Latitude'], HP_merged['Longitude'], HP_merged['Neighbourhood'], HP_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [51]:
HP_merged.loc[HP_merged['Cluster Labels'] == 0, HP_merged.columns[[1] + list(range(5, HP_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
77,West Toronto,0,Coffee Shop,Breakfast Spot,Café,Yoga Studio,Pet Store,Bakery,Bar,Burrito Place,Caribbean Restaurant,Climbing Gym
134,West Toronto,0,Gift Shop,Breakfast Spot,Cuban Restaurant,Bar,Eastern European Restaurant,Burger Joint,Dessert Shop,Dog Run,Bookstore,Restaurant
135,West Toronto,0,Gift Shop,Breakfast Spot,Cuban Restaurant,Bar,Eastern European Restaurant,Burger Joint,Dessert Shop,Dog Run,Bookstore,Restaurant
146,West Toronto,0,Coffee Shop,Pizza Place,Café,Sushi Restaurant,Italian Restaurant,Grocery Store,Falafel Restaurant,Gourmet Shop,Indie Movie Theater,Post Office


In [52]:
HP_merged.loc[HP_merged['Cluster Labels'] == 1, HP_merged.columns[[1] + list(range(5, HP_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,West Toronto,1,Pharmacy,Bakery,Supermarket,Bank,Bar,Liquor Store,Discount Store,Brewery,Music Venue,Café
65,West Toronto,1,Bar,Café,Restaurant,Coffee Shop,Boutique,Men's Store,New American Restaurant,Cocktail Bar,Pizza Place,Bakery
124,West Toronto,1,Mexican Restaurant,Café,Bar,Bookstore,Music Venue,Diner,Park,Cajun / Creole Restaurant,Fast Food Restaurant,Italian Restaurant


In [53]:
HP_merged.loc[HP_merged['Cluster Labels'] == 3, HP_merged.columns[[1] + list(range(5, HP_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,West Toronto,3,Pharmacy,Bakery,Supermarket,Bank,Bar,Liquor Store,Discount Store,Brewery,Music Venue,Café
64,West Toronto,3,Bar,Café,Restaurant,Coffee Shop,Boutique,Men's Store,New American Restaurant,Cocktail Bar,Pizza Place,Bakery


In [54]:
HP_merged.loc[HP_merged['Cluster Labels'] == 4, HP_merged.columns[[1] + list(range(5, HP_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
76,West Toronto,4,Coffee Shop,Breakfast Spot,Café,Yoga Studio,Pet Store,Bakery,Bar,Burrito Place,Caribbean Restaurant,Climbing Gym
145,West Toronto,4,Coffee Shop,Pizza Place,Café,Sushi Restaurant,Italian Restaurant,Grocery Store,Falafel Restaurant,Gourmet Shop,Indie Movie Theater,Post Office


<h1>End of Location Exploration and Clustering! Thank you!