## Introduction

In this lab, I will to use a dataset from Toronto  to convert addresses into their equivalent latitude and longitude values. Also, I will use the Foursquare API to explore neighborhoods in Toronto. I will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the *k*-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in Toronto and their emerging clusters.

## 1. Download and Explore Dataset

In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 6 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web in Open Data from Toronto https://open.toronto.ca/

In [1]:
import pandas as pd
import geopandas as gpd
import ssl
import numpy as np
ssl._create_default_https_context=ssl._create_unverified_context

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

############### Borough Dadaset##############
url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/f82dbe76-928e-4cec-8147-a21882f575e2?format=geojson&projection=4326"
boroughTnt=gpd.read_file(url)

############### Neighbordataset Dadaset############## 
url="https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=geojson&projection=4326"
neighbTnt=gpd.read_file(url)
print(boroughTnt)
print(neighbTnt)


   _id  AREA_ID DATE_EFFECTIVE  AREA_ATTR_ID  PARENT_AREA_ID  AREA_SHORT_CODE  \
0  223    49884           None         49884           49886               14   
1  224   643704           None        643704           49886                8   
2  225    49887           None         49887           49886                6   
3  226   435733           None        435733           49886                1   
4  227  1094349           None       1094349           49886               19   
5  228   760645           None        760645           49886                4   

   AREA_LONG_CODE    AREA_NAME    AREA_DESC     X     Y  LONGITUDE   LATITUDE  \
0              14         YORK         YORK  None  None -79.477566  43.685081   
1               8   NORTH YORK   NORTH YORK  None  None -79.430521  43.751872   
2               6    EAST YORK    EAST YORK  None  None -79.337052  43.700623   
3               1  SCARBOROUGH  SCARBOROUGH  None  None -79.236990  43.778078   
4              19    ETOBIC

#### We change some colummns just to make it look better

In [2]:
borough_names = boroughTnt[['geometry', 'AREA_NAME']]
neighb_names = neighbTnt[['AREA_NAME','geometry','LONGITUDE','LATITUDE']]
borough_names=borough_names.rename(columns={'AREA_NAME':'borough'})
neighb_names=neighb_names.rename(columns={'AREA_NAME':'neighborhood'})

In order to join both datasets (neighborhoods and boroughs) and get one where each neighborhood match with its borough, we use the method *overlay* from GeoPandas Library

In [3]:
from shapely.ops import cascaded_union
borough_with_neighbo = gpd.overlay(neighb_names,borough_names,how='intersection',make_valid=True, keep_geom_type=True)
borough_with_neighbo.shape

(139, 5)

As you see, we lost a row because we had got 140. So, we have to check and noticed that the neighborhood *Forest Hill South (101)* share two boroughs. In order to this exercise, we are going to assign one of them: York

In [4]:
#Forest Hill South (101)
missingData=neighb_names[neighb_names['neighborhood']=='Forest Hill South (101)']
missingData

Unnamed: 0,neighborhood,geometry,LONGITUDE,LATITUDE
118,Forest Hill South (101),"POLYGON ((-79.42556 43.70099, -79.42314 43.701...",-79.414318,43.694526


In [5]:
newData=gpd.GeoDataFrame({'neighborhood':'Forest Hill South (101)','borough':'YORK','LATITUDE':missingData.LATITUDE,'LONGITUDE':missingData.LONGITUDE,'geometry':missingData.geometry})
borough_with_neighbo=borough_with_neighbo.append(newData)
borough_with_neighbo.drop('geometry',axis=1,inplace=True)
borough_with_neighbo

Unnamed: 0,neighborhood,LONGITUDE,LATITUDE,borough
0,Lambton Baby Point (114),-79.496045,43.657420,YORK
1,Mount Dennis (115),-79.499989,43.688144,YORK
2,Oakwood Village (107),-79.439785,43.688566,YORK
3,Rockcliffe-Smythe (111),-79.494420,43.674790,YORK
4,Runnymede-Bloor West Village (89),-79.485708,43.659269,YORK
...,...,...,...,...
135,Highland Creek (134),-79.177472,43.790775,SCARBOROUGH
136,Ionview (125),-79.272470,43.735364,SCARBOROUGH
137,Kennedy Park (124),-79.260382,43.725556,SCARBOROUGH
138,L'Amoreaux (117),-79.314084,43.795716,SCARBOROUGH


And make sure that the dataset has all 6 boroughs and 140 neighborhoods.

In [6]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(borough_with_neighbo['borough'].unique()),
        borough_with_neighbo.shape[0]
    )
)

The dataframe has 6 boroughs and 140 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Toronto,

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>tn_explorer</em>, as shown below.

In [7]:
address = 'Toronto'

geolocator = Nominatim(user_agent="tn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [8]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(borough_with_neighbo['LATITUDE'], borough_with_neighbo['LONGITUDE'], borough_with_neighbo['borough'], borough_with_neighbo['neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

For illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in Old Toronto. So let's slice the original dataframe and create a new dataframe of the Old Toronto data.

In [9]:
oldToronto_data = borough_with_neighbo[borough_with_neighbo['borough'] == 'TORONTO'].reset_index(drop=True)
oldToronto_data.head()

Unnamed: 0,neighborhood,LONGITUDE,LATITUDE,borough
0,Wychwood (94),-79.425515,43.676919,TORONTO
1,Yonge-Eglinton (100),-79.40359,43.704689,TORONTO
2,Yonge-St.Clair (97),-79.397871,43.687859,TORONTO
3,Lawrence Park North (105),-79.403978,43.73006,TORONTO
4,Lawrence Park South (103),-79.406039,43.717212,TORONTO


In [10]:
address = 'Toronto, Toronto'

geolocator = Nominatim(user_agent="tn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Old Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Old Toronto are 43.6534817, -79.3839347.


As we did with all of Toronto, let's visualizat Old Toronto the neighborhoods in it.

In [11]:
# create map of York using latitude and longitude values
map_oldToronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(oldToronto_data['LATITUDE'], oldToronto_data['LONGITUDE'], oldToronto_data['neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_oldToronto)  
    
map_oldToronto

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [12]:
from IPython.display import HTML
from IPython.display import display
# Taken from https://stackoverflow.com/questions/31517194/how-to-hide-one-specific-cell-input-or-output-in-ipython-notebook
tag = HTML('''<script>
code_show=true; 
function code_toggle() {
    if (code_show){
        $('div.cell.code_cell.rendered.selected div.input').hide();
    } else {
        $('div.cell.code_cell.rendered.selected div.input').show();
    }
    code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
To show/hide this cell's raw code input, click <a href="javascript:code_toggle()">here</a>.''')
display(tag)
############### Write code below ##################

CLIENT_ID = 'SU2JR0C4CYVWHTTDNCIA34TLHCCZFRBGB4IFOWV01NNMI2T5' # your Foursquare ID
CLIENT_SECRET = 'DW0HD4FQYRUFWPVDHKYEDYHLSWF5LIQ3AY05JGEHLAC4GIG1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [13]:
oldToronto_data.loc[0, 'neighborhood']

'Wychwood (94)'

Get the neighborhood's latitude and longitude values.

In [14]:
neighborhood_latitude = oldToronto_data.loc[0, 'LATITUDE'] # neighborhood latitude value
neighborhood_longitude = oldToronto_data.loc[0, 'LONGITUDE'] # neighborhood longitude value

neighborhood_name = oldToronto_data.loc[0, 'neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Wychwood (94) are 43.6769192679, -79.425514947.


#### Now, let's get the top 100 venues that are in Wychwood within a radius of 500 meters.


In [15]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=SU2JR0C4CYVWHTTDNCIA34TLHCCZFRBGB4IFOWV01NNMI2T5&client_secret=DW0HD4FQYRUFWPVDHKYEDYHLSWF5LIQ3AY05JGEHLAC4GIG1&v=20180605&ll=43.6769192679,-79.425514947&radius=500&limit=100'

In [16]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ec2c6f30f5968001c96c53c'},
 'response': {'headerLocation': 'Bracondale Hill',
  'headerFullLocation': 'Bracondale Hill, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 43.6814192724, 'lng': -79.41930460333964},
   'sw': {'lat': 43.672419263399995, 'lng': -79.43172529066035}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b86e89df964a52051a531e3',
       'name': "Wychwood Barns Farmers' Market",
       'location': {'address': '601 Christie Street',
        'crossStreet': 'St Clair Avenue West',
        'lat': 43.68001040153905,
        'lng': -79.42384857341463,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.68001040153905,
          'lng': -79.4

Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [17]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [18]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wychwood Barns Farmers' Market,Farmers Market,43.68001,-79.423849
1,Wychwood Barns,Event Space,43.680028,-79.42381
2,Hillcrest Park,Park,43.676012,-79.424787
3,Wychwood Barns Community Gallery,Art Gallery,43.679386,-79.424254


And how many venues were returned by Foursquare?

In [19]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Old Toronto


#### Let's create a function to repeat the same process to all the neighborhoods in Old Toronto


In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *oldToronto_venues*.

In [21]:
oldToronto_venues = getNearbyVenues(names=oldToronto_data['neighborhood'],
                                   latitudes=oldToronto_data['LATITUDE'],
                                   longitudes=oldToronto_data['LONGITUDE']
                                  )

Wychwood (94)
Yonge-Eglinton (100)
Yonge-St.Clair (97)
Lawrence Park North (105)
Lawrence Park South (103)
Little Portugal (84)
Moss Park (73)
Mount Pleasant East (99)
Mount Pleasant West (104)
Niagara (82)
North Riverdale (68)
North St.James Town (74)
Palmerston-Little Italy (80)
Playter Estates-Danforth (67)
Regent Park (72)
Roncesvalles (86)
Rosedale-Moore Park (98)
South Parkdale (85)
South Riverdale (70)
The Beaches (63)
Trinity-Bellwoods (81)
University (79)
Waterfront Communities-The Island (77)
Weston-Pellam Park (91)
Woodbine Corridor (64)
Annex (95)
Bay Street Corridor (76)
Blake-Jones (69)
Cabbagetown-South St.James Town (71)
Casa Loma (96)
Church-Yonge Corridor (75)
Corso Italia-Davenport (92)
Danforth (66)
Dovercourt-Wallace Emerson-Junction (93)
Dufferin Grove (83)
East End-Danforth (62)
Forest Hill North (102)
Greenwood-Coxwell (65)
High Park North (88)
High Park-Swansea (87)
Junction Area (90)
Kensington-Chinatown (78)


#### Let's check the size of the resulting dataframe

In [22]:
print(oldToronto_venues.shape)
oldToronto_venues.head()

(1281, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wychwood (94),43.676919,-79.425515,Wychwood Barns Farmers' Market,43.68001,-79.423849,Farmers Market
1,Wychwood (94),43.676919,-79.425515,Wychwood Barns,43.680028,-79.42381,Event Space
2,Wychwood (94),43.676919,-79.425515,Hillcrest Park,43.676012,-79.424787,Park
3,Wychwood (94),43.676919,-79.425515,Wychwood Barns Community Gallery,43.679386,-79.424254,Art Gallery
4,Yonge-Eglinton (100),43.704689,-79.40359,North Toronto Memorial Community Centre,43.706098,-79.404337,Gym


Let's check how many venues were returned for each neighborhood

In [23]:
oldToronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Annex (95),27,27,27,27,27,27
Bay Street Corridor (76),63,63,63,63,63,63
Blake-Jones (69),17,17,17,17,17,17
Cabbagetown-South St.James Town (71),45,45,45,45,45,45
Casa Loma (96),12,12,12,12,12,12
Church-Yonge Corridor (75),100,100,100,100,100,100
Corso Italia-Davenport (92),21,21,21,21,21,21
Danforth (66),28,28,28,28,28,28
Dovercourt-Wallace Emerson-Junction (93),12,12,12,12,12,12
Dufferin Grove (83),64,64,64,64,64,64


#### Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(oldToronto_venues['Venue Category'].unique())))

There are 233 uniques categories.


## 3. Analyze Each Neighborhood

In [25]:
# one hot encoding
oldToronto_onehot = pd.get_dummies(oldToronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
oldToronto_onehot['Neighborhood'] = oldToronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [oldToronto_onehot.columns[-1]] + list(oldToronto_onehot.columns[:-1])
oldToronto_onehot = oldToronto_onehot[fixed_columns]

oldToronto_onehot.head()

Unnamed: 0,Zoo,Accessories Store,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Trail,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
oldToronto_onehot.shape

(1281, 233)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
oldToronto_grouped = oldToronto_onehot.groupby('Neighborhood').mean().reset_index()
oldToronto_grouped

Unnamed: 0,Neighborhood,Zoo,Accessories Store,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Arcade,Art Gallery,Art Museum,...,Trail,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Annex (95),0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0
1,Bay Street Corridor (76),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,...,0.0,0.0,0.0,0.0,0.015873,0.0,0.015873,0.0,0.0,0.015873
2,Blake-Jones (69),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Cabbagetown-South St.James Town (71),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Casa Loma (96),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church-Yonge Corridor (75),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01
6,Corso Italia-Davenport (92),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Danforth (66),0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Dovercourt-Wallace Emerson-Junction (93),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Dufferin Grove (83),0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.03125,0.0


#### Let's confirm the new size

In [28]:
oldToronto_grouped.shape

(42, 233)

In [29]:
num_top_venues = 5

for hood in oldToronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = oldToronto_grouped[oldToronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Annex (95)----
            venue  freq
0  Sandwich Place  0.11
1            Café  0.11
2             Pub  0.07
3     Pizza Place  0.04
4     Social Club  0.04


----Bay Street Corridor (76)----
                venue  freq
0         Coffee Shop  0.19
1      Sandwich Place  0.05
2  Italian Restaurant  0.05
3          Restaurant  0.05
4     Thai Restaurant  0.03


----Blake-Jones (69)----
              venue  freq
0              Café  0.18
1       Coffee Shop  0.12
2  Asian Restaurant  0.06
3        Nail Salon  0.06
4         Bookstore  0.06


----Cabbagetown-South St.James Town (71)----
                venue  freq
0          Restaurant  0.07
1         Pizza Place  0.07
2         Coffee Shop  0.07
3              Bakery  0.04
4  Italian Restaurant  0.04


----Casa Loma (96)----
                        venue  freq
0                        Park  0.25
1              History Museum  0.17
2                      Bistro  0.08
3                      Castle  0.08
4  Modern European Restaurant  

#### Let's put that into a *pandas* dataframe
But first define a function to sort the venues in descending order.

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] =oldToronto_grouped['Neighborhood']

for ind in np.arange(oldToronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(oldToronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Annex (95),Sandwich Place,Café,Pub,Asian Restaurant,Cheese Shop,Donut Shop,Social Club,Indian Restaurant,Pizza Place,BBQ Joint
1,Bay Street Corridor (76),Coffee Shop,Sandwich Place,Restaurant,Italian Restaurant,Thai Restaurant,Burger Joint,Salad Place,Bar,Bubble Tea Shop,Hotel
2,Blake-Jones (69),Café,Coffee Shop,Toy / Game Store,Music School,Gastropub,Nail Salon,Bookstore,Asian Restaurant,Beer Bar,Burger Joint
3,Cabbagetown-South St.James Town (71),Coffee Shop,Restaurant,Pizza Place,Park,Café,Pub,Bakery,Italian Restaurant,Farm,Beer Store
4,Casa Loma (96),Park,History Museum,Bistro,Castle,Museum,Modern European Restaurant,Café,Steakhouse,Historic Site,Grocery Store


## 4. Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 3 clusters.

In [32]:
# set number of clusters
kclusters = 3

oldToronto_grouped_clustering = oldToronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(oldToronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 0, 2, 2, 2, 0, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [33]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Annex (95),Sandwich Place,Café,Pub,Asian Restaurant,Cheese Shop,Donut Shop,Social Club,Indian Restaurant,Pizza Place,BBQ Joint
1,Bay Street Corridor (76),Coffee Shop,Sandwich Place,Restaurant,Italian Restaurant,Thai Restaurant,Burger Joint,Salad Place,Bar,Bubble Tea Shop,Hotel
2,Blake-Jones (69),Café,Coffee Shop,Toy / Game Store,Music School,Gastropub,Nail Salon,Bookstore,Asian Restaurant,Beer Bar,Burger Joint
3,Cabbagetown-South St.James Town (71),Coffee Shop,Restaurant,Pizza Place,Park,Café,Pub,Bakery,Italian Restaurant,Farm,Beer Store
4,Casa Loma (96),Park,History Museum,Bistro,Castle,Museum,Modern European Restaurant,Café,Steakhouse,Historic Site,Grocery Store


In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

oldToronto_merged = oldToronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
oldToronto_merged = oldToronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='neighborhood')

#oldtoronto_merged.head() # check the last columns!
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Annex (95),Sandwich Place,Café,Pub,Asian Restaurant,Cheese Shop,Donut Shop,Social Club,Indian Restaurant,Pizza Place,BBQ Joint
1,2,Bay Street Corridor (76),Coffee Shop,Sandwich Place,Restaurant,Italian Restaurant,Thai Restaurant,Burger Joint,Salad Place,Bar,Bubble Tea Shop,Hotel
2,2,Blake-Jones (69),Café,Coffee Shop,Toy / Game Store,Music School,Gastropub,Nail Salon,Bookstore,Asian Restaurant,Beer Bar,Burger Joint
3,2,Cabbagetown-South St.James Town (71),Coffee Shop,Restaurant,Pizza Place,Park,Café,Pub,Bakery,Italian Restaurant,Farm,Beer Store
4,0,Casa Loma (96),Park,History Museum,Bistro,Castle,Museum,Modern European Restaurant,Café,Steakhouse,Historic Site,Grocery Store


Finally, let's visualize the resulting clusters

In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(oldToronto_merged['LATITUDE'], oldToronto_merged['LONGITUDE'], oldToronto_merged['neighborhood'], oldToronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

#### Cluster 1 (Sights)

In [36]:
oldToronto_merged.loc[oldToronto_merged['Cluster Labels'] == 0, oldToronto_merged.columns[[0] + list(range(4, oldToronto_merged.shape[1]))]]

Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Wychwood (94),0,Park,Farmers Market,Art Gallery,Event Space,Yoga Studio,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
10,North Riverdale (68),0,Park,Café,Dog Run,Pool,Eastern European Restaurant,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Farm
16,Rosedale-Moore Park (98),0,Tennis Court,Park,Playground,Candy Store,Yoga Studio,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
23,Weston-Pellam Park (91),0,Park,Vietnamese Restaurant,Bakery,Deli / Bodega,Burger Joint,Fish Market,Snack Place,Latin American Restaurant,Jewelry Store,Brazilian Restaurant
29,Casa Loma (96),0,Park,History Museum,Bistro,Castle,Museum,Modern European Restaurant,Café,Steakhouse,Historic Site,Grocery Store
33,Dovercourt-Wallace Emerson-Junction (93),0,Park,Bakery,Pizza Place,Pool,Café,Middle Eastern Restaurant,Bar,Bank,Pharmacy,Grocery Store
38,High Park North (88),0,Pharmacy,Tennis Court,Park,Café,Metro Station,Food Truck,Convenience Store,Gym / Fitness Center,Gym,Dive Bar
39,High Park-Swansea (87),0,Baseball Field,Amphitheater,Zoo,Other Great Outdoors,Dog Run,Café,Scenic Lookout,Skating Rink,Lake,Garden


#### Cluster 2 (Plans)

In [37]:

oldToronto_merged.loc[oldToronto_merged['Cluster Labels'] == 1, oldToronto_merged.columns[[0] + list(range(4, oldToronto_merged.shape[1]))]]


Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Waterfront Communities-The Island (77),1,Boat or Ferry,Yoga Studio,Egyptian Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Farm


#### Cluster 3 (To Eat)

In [38]:
oldToronto_merged.loc[oldToronto_merged['Cluster Labels'] == 2, oldToronto_merged.columns[[0] + list(range(4, oldToronto_merged.shape[1]))]]

Unnamed: 0,neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Yonge-Eglinton (100),2,Coffee Shop,Fast Food Restaurant,Restaurant,Gym,Burger Joint,Persian Restaurant,Salad Place,Liquor Store,Sandwich Place,Caribbean Restaurant
2,Yonge-St.Clair (97),2,Coffee Shop,Italian Restaurant,Grocery Store,Sushi Restaurant,Restaurant,Gym,Thai Restaurant,Sandwich Place,Bank,Bagel Shop
3,Lawrence Park North (105),2,Sushi Restaurant,Italian Restaurant,Bakery,Asian Restaurant,Coffee Shop,Pizza Place,Bank,Pub,Sandwich Place,Burger Joint
4,Lawrence Park South (103),2,Sporting Goods Shop,Spa,Coffee Shop,Chinese Restaurant,Dessert Shop,Rental Car Location,Flower Shop,Fast Food Restaurant,Farmers Market,Eastern European Restaurant
5,Little Portugal (84),2,Bar,Café,Restaurant,Coffee Shop,Bakery,Korean Restaurant,Cocktail Bar,Breakfast Spot,Athletics & Sports,Sports Bar
6,Moss Park (73),2,Sandwich Place,Coffee Shop,Thai Restaurant,Italian Restaurant,Yoga Studio,Food Truck,Breakfast Spot,Skating Rink,Event Space,Pub
7,Mount Pleasant East (99),2,Dessert Shop,Italian Restaurant,Gym,Sandwich Place,Coffee Shop,Café,Pizza Place,Thai Restaurant,Diner,Indian Restaurant
8,Mount Pleasant West (104),2,Coffee Shop,Italian Restaurant,Café,Sushi Restaurant,Pizza Place,Dessert Shop,Restaurant,Bar,Middle Eastern Restaurant,Fast Food Restaurant
9,Niagara (82),2,Café,Park,Restaurant,Theme Park,Bar,Thai Restaurant,Pub,Poke Place,Beer Store,Middle Eastern Restaurant
11,North St.James Town (74),2,Coffee Shop,Grocery Store,Sandwich Place,Pizza Place,Food & Drink Shop,Intersection,Japanese Restaurant,Bistro,Steakhouse,Library
