# Segmenting and Clustering Neighborhoods in Toronto using Foursqaure APIs

In this assignment, it is required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet.

For the Toronto neighborhood data, a Wikipedia page (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) exists that has all the information we need to explore and cluster the neighborhoods in Toronto. One will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, one can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

In [1]:
import requests # Need to get wikipedia contents
import lxml.html as lxml
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup  # import Beautifulsoup # Get BeautifulSoup package

## 1: To get the data wiki data into the system

Below is the url of the wiki page:

In [2]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'  # URL of the target wikipedia page

In [3]:
# Request the wiki page before using BeautifulSoup to scrape
website_URL=requests.get(url).text
#website_URL

In [4]:
#Create soup using BS
soup = BeautifulSoup(website_URL,"lxml")

The following cell is only for diagnostic purpose.

In [5]:
#
#print(soup.prettify())

In [6]:
Table = soup.find("table",{"class":"wikitable sortable"}) 

#Create a handle, page, to handle the contents of the website
page = requests.get(url)

#Store the contents of the website under doc
doc = lxml.fromstring(page.content)

In [7]:
doc

<Element html at 0x7fd40b652598>

Now we have read the webpage content into the system and stored it in doc.

## 2: Data wrangling & cleaning and transforming

In [8]:
withlink=Table.findAll('a')

### 2.1 Read only the table contents between <tr>..</tr> of HTML

In [9]:
tr_elements = doc.xpath('//tr')
#Check the length of the first 12 rows
[len(T) for T in tr_elements[:12]]
tr_elements = doc.xpath('//tr')

In [10]:
tr_elements[1:5]

[<Element tr at 0x7fd40b5f52c8>,
 <Element tr at 0x7fd40b5f5598>,
 <Element tr at 0x7fd40b5f55e8>,
 <Element tr at 0x7fd40b5f5638>]

In [11]:
#Create empty list
Dict=[]
col=[]
i=0

#For the first row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print ('%d:"%s"'%(i,name))
    col.append((name,[]))

1:"Postcode"
2:"Borough"
3:"Neighbourhood
"


In [12]:
#first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 3, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1    
[len(C) for (title,C) in col]
Dict={title:column for (title,column) in col}   # Dictionary with Postcodes, Borough and Neighbourhoods
df=pd.DataFrame(Dict)  # Creating the pandas Dataframe using the dictionary values

In [13]:
df.head()
df.columns

Index(['Postcode', 'Borough', 'Neighbourhood\n'], dtype='object')

### 2.2 Data cleaning
Including:
#### Remove "Not assigned" boroughs
#### Rename the columns' names
#### Drop the \N at the end of Neighbourhood
#### Process duplicated postal codes

In [14]:
df_Borough=df[df.Borough != 'Not assigned']   

In [15]:
df_Borough.columns = ['Postcode', 'Borough', 'Neighbourhood']

In [17]:
df_Borough_new=df_Borough[['Postcode', 'Borough']].copy()
df_Borough_new.loc[:,'Neighbourhood'] =pd.Series(df_Borough['Neighbourhood']).str.slice(stop=-1)# pd.Series(np.random.randn(sLength), index=df1.index)
#df_Borough_new['Neighbourhood']=s.str.slice(stop=-1)

In [18]:
df_Borough_new.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [19]:
df_Borough_new.loc[df_Borough_new.Neighbourhood == 'Not assigned', 'Neighbourhood'] = df_Borough_new.Borough

In [20]:
df_new=df_Borough_new.groupby('Postcode', sort=False)['Neighbourhood'].apply(','.join)  # group neighbourhoods by Postcode

In [21]:
df_new.head()

Postcode
M3A                          Parkwoods
M4A                   Victoria Village
M5A           Harbourfront,Regent Park
M6A    Lawrence Heights,Lawrence Manor
M7A                       Queen's Park
Name: Neighbourhood, dtype: object

In [22]:
df_new_FINAL=pd.DataFrame({'Postcode':df_new.index, 'df_new':df_new.values})
df_new_F=df_new_FINAL.rename(columns={ df_new_FINAL.columns[1]: "Neighbourhood" })
df_N=df_new_F['Neighbourhood'].tolist() # create a neighbourhood list

In [23]:
df_Borough_BONLY=df_Borough['Borough'] # create a Borough lit
df_Borough_PONLY=df_Borough['Postcode'] # create a postcode list

In [24]:
df_Borough_BandP=pd.concat([df_Borough_PONLY, df_Borough_BONLY], axis=1)
df_Borough_BandP_new = df_Borough_BandP.drop_duplicates(subset=['Postcode','Borough'])  # drop duplicates

In [25]:
df_P=df_Borough_BandP_new['Postcode'].tolist() # refined lists
df_B=df_Borough_BandP_new['Borough'].tolist() # refined lists

In [26]:
Toronto_DICT= {'Postcode':df_P,'Borough':df_B, 'Neighbourhood':df_N} # Create a dictionary with the 3 refined lists
Toronto_df=pd.DataFrame(Toronto_DICT) # create the dataframe
# "List of Postcodes, Boroughs, & Neighbourhoods in TORONTO FOR 103 POSTCODES"
Toronto_df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge,Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens,Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson,Garden District"


Now that we have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.
# 3: To  get the geocode for Toronto neighborhood
We are going to cheat a little bit here by using the csv file provided.

In [27]:
LandLfile = 'http://cocl.us/Geospatial_data'
Col_headers = ["Postcode","Latitude","Longitude"]  # 

df_Geo_Toronto = pd.read_csv(LandLfile,names=Col_headers,skiprows=1)  # getting the L& L data

In [28]:
df_Geo_Toronto.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [29]:
Toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


In [30]:
df_FINAL=pd.merge(Toronto_df, df_Geo_Toronto, on='Postcode', how='inner')  # merging the dataframes to get the L & L data annexed to the earlier DataFrame

In [31]:
df_FINAL.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


# 4: Foursquare and maps
Some visual explorations of the neighborhoods in Toronto.

In [32]:
# This is the 3rd section (Clustering & Neighbourthood analysis.....)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

In [33]:
#This step doesn't seem to be needed for IBM Lab
#!conda update conda
#!conda install -c conda-forge geopy --yes
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

In [34]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [35]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

df_Toronto_Special=df_FINAL[df_FINAL['Borough'].str.contains("Toronto")]   # select Boroughs with "Toronto"

In [36]:
df_Toronto_Special.shape

(38, 5)

In [37]:
df_Toronto_Special.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [80]:
address = 'Toronto, Ontario'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [None]:
# add markers to map
for lat, lng, borough, neighborhood in zip(df_Toronto_Special['Latitude'], df_Toronto_Special['Longitude'], df_Toronto_Special['Borough'], df_Toronto_Special['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [78]:
neighborhoods=df_Toronto_Special
#CT_data = neighborhoods[neighborhoods['Borough'] == 'Central Toronto'].reset_index(drop=True)
CT_data = neighborhoods.reset_index(drop=True)
CT_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [38]:
address = 'Central Toronto, Ontario'  # selecting central Toronto (CT) for deep dive

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Central Toronto are {}, {}.'.format(latitude, longitude))

# create map of Toronto using latitude and longitude values
map_CT = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(CT_data['Latitude'], CT_data['Longitude'], CT_data['Borough'], CT_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_CT)  
    
map_CT

The geograpical coordinate of Central Toronto are 43.6449033, -79.3818364.


In [79]:
from IPython.display import HTML
from IPython.display import display

# Taken from https://stackoverflow.com/questions/31517194/how-to-hide-one-specific-cell-input-or-output-in-ipython-notebook
tag = HTML('''<script>
code_show=true; 
function code_toggle() {
    if (code_show){
        $('div.cell.code_cell.rendered.selected div.input').hide();
    } else {
        $('div.cell.code_cell.rendered.selected div.input').show();
    }
    code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
To show/hide this cell's raw code input, click <a href="javascript:code_toggle()">here</a>.''')
display(tag)

CLIENT_ID = 'Your Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'Your Foursquare Secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

In [80]:
CT_data.shape

(38, 5)

In [81]:
CT_data.loc[0, 'Neighbourhood']

'Harbourfront,Regent Park'

In [82]:
neighborhood_latitude = CT_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = CT_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = CT_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

results = requests.get(url).json()
results

Latitude and longitude values of Harbourfront,Regent Park are 43.6542599, -79.3606359.


{'meta': {'code': 200, 'requestId': '5cc714b9db04f559d6e9333a'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 48,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label': 'display',
 

In [83]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Toronto Cooper Koo Family Cherry St YMCA Centre,Gym / Fitness Center,43.653191,-79.357947
3,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149
4,Body Blitz Spa East,Spa,43.654735,-79.359874
5,Impact Kitchen,Restaurant,43.656369,-79.35698
6,Dominion Pub and Kitchen,Pub,43.656919,-79.358967
7,Figs Breakfast & Lunch,Breakfast Spot,43.655675,-79.364503
8,Corktown Common,Park,43.655618,-79.356211
9,The Distillery Historic District,Historic Site,43.650244,-79.359323


# 5: Cluster analysis of Toronto neighborhood

## 5.1 Explore Neighborhoods in Toronto
### Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [84]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        url
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
#        results = requests.get(url).json()["response"]['venues']
        
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
#            v['name'], 
#            v['location']['lat'], 
#            v['location']['lng'],  
#            v['categories'][0]['name']) for v in results])
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *CT_venues*.

In [85]:
CT_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [86]:
df_Toronto_Special.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [87]:
CT_venues = getNearbyVenues(names=CT_data['Borough'],
                                   latitudes=CT_data['Latitude'],
                                   longitudes=CT_data['Longitude']
                                  )

Downtown Toronto
Downtown Toronto
Downtown Toronto
East Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
West Toronto
Downtown Toronto
West Toronto
East Toronto
Downtown Toronto
West Toronto
East Toronto
Downtown Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
West Toronto
Central Toronto
Central Toronto
West Toronto
Central Toronto
Downtown Toronto
West Toronto
Central Toronto
Downtown Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
East Toronto


In [88]:
CT_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Downtown Toronto,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Downtown Toronto,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Downtown Toronto,43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,Downtown Toronto,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,Downtown Toronto,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [113]:
# one hot encoding
CT_onehot = pd.get_dummies(CT_venues[['Venue Category']], prefix="", prefix_sep="")

In [114]:
CT_onehot.head()

Unnamed: 0,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [115]:
CT_onehot.shape

(1706, 235)

In [116]:
CT_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Downtown Toronto,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Downtown Toronto,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Downtown Toronto,43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,Downtown Toronto,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,Downtown Toronto,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [117]:
# add Borough column back to dataframe
#CT_onehot.loc[:,'Neighborhood'] = pd.Series(CT_venues['Neighborhood']) 
CT_onehot['Neighborhood'] = CT_venues['Neighborhood'] 

In [118]:
CT_onehot.columns.get_loc("Neighborhood")

160

In [102]:
CT_onehot.columns

In [103]:
CT_onehot.shape

(1706, 235)

In [95]:
CT_onehot.columns

Index(['Adult Boutique', 'Afghan Restaurant', 'Airport', 'Airport Food Court',
       'Airport Gate', 'Airport Lounge', 'Airport Service', 'Airport Terminal',
       'American Restaurant', 'Antique Shop',
       ...
       'Thrift / Vintage Store', 'Toy / Game Store', 'Trail', 'Train Station',
       'Vegetarian / Vegan Restaurant', 'Video Game Store',
       'Vietnamese Restaurant', 'Wine Bar', 'Wings Joint', 'Yoga Studio'],
      dtype='object', length=235)

In [108]:
# move neighborhood column to the first column
CT_onehot.columns[CT_onehot.columns.get_loc("Neighborhood")]

'Neighborhood'

In [None]:
list(CT_onehot.columns[:-CT_onehot.columns.get_loc("Neighborhood")])

In [None]:
fixed_columns = [CT_onehot.columns[-1]] + list(CT_onehot.columns[:-1])
fixed_columns

In [74]:
CT_onehot = CT_onehot[fixed_columns]

CT_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dance Studio,Dessert Shop,Diner,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Fried Chicken Joint,Garden,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,History Museum,Hotel,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Jewish Restaurant,Light Rail Station,Liquor Store,Mexican Restaurant,Park,Pharmacy,Photography Studio,Pizza Place,Playground,Pub,Rental Car Location,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Skating Rink,Spa,Sporting Goods Shop,Sports Bar,Supermarket,Sushi Restaurant,Swim School,Tennis Court,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Central Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Central Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Central Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
3,Central Toronto,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Central Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [75]:
CT_onehot.shape

(113, 63)

In [119]:
CT_grouped = CT_onehot.groupby('Neighborhood').mean().reset_index()
CT_grouped

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.0,0.00885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.00885,0.0,0.0,0.026549,0.0,0.00885,0.0,0.044248,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.026549,0.0,0.079646,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.00885,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,0.035398,0.0,0.017699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.00885,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.00885,0.00885,0.026549,0.00885,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,0.017699,0.00885,0.0,0.017699,0.00885,0.0,0.00885,0.00885,0.0,0.0,0.0,0.0,0.00885,0.0,0.00885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053097,0.0,0.0,0.017699,0.00885,0.061947,0.0,0.00885,0.0,0.0,0.0,0.0,0.0,0.026549,0.0,0.0,0.0,0.00885,0.017699,0.0,0.0,0.00885,0.070796,0.0,0.0,0.00885,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.0,0.0,0.00885,0.0,0.017699,0.00885,0.0,0.0,0.0,0.0,0.00885,0.035398,0.00885,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.00885,0.0,0.0,0.0,0.00885,0.00885,0.0,0.00885,0.0,0.00885,0.0,0.0,0.00885
1,Downtown Toronto,0.000775,0.000775,0.000775,0.000775,0.000775,0.001549,0.001549,0.001549,0.017041,0.001549,0.003873,0.0,0.007746,0.001549,0.001549,0.003873,0.000775,0.0,0.000775,0.000775,0.003098,0.027885,0.005422,0.020139,0.001549,0.003098,0.000775,0.013168,0.002324,0.002324,0.002324,0.0,0.000775,0.01007,0.000775,0.001549,0.012393,0.003873,0.006971,0.002324,0.011619,0.003873,0.0,0.000775,0.056545,0.0,0.000775,0.004648,0.004648,0.008521,0.000775,0.001549,0.0,0.01007,0.009295,0.090627,0.000775,0.000775,0.000775,0.000775,0.003873,0.001549,0.008521,0.002324,0.008521,0.0,0.004648,0.0,0.002324,0.002324,0.01007,0.004648,0.004648,0.001549,0.007746,0.001549,0.000775,0.000775,0.001549,0.003098,0.0,0.002324,0.001549,0.001549,0.000775,0.006197,0.007746,0.000775,0.0,0.002324,0.0,0.000775,0.0,0.000775,0.004648,0.002324,0.002324,0.006197,0.005422,0.0,0.003098,0.001549,0.0,0.0,0.015492,0.003873,0.001549,0.003098,0.000775,0.000775,0.003098,0.002324,0.003873,0.006197,0.010844,0.006197,0.000775,0.000775,0.000775,0.000775,0.001549,0.000775,0.000775,0.000775,0.001549,0.030209,0.003873,0.006971,0.005422,0.000775,0.002324,0.024012,0.019365,0.003873,0.000775,0.0,0.002324,0.001549,0.002324,0.002324,0.0,0.000775,0.003873,0.004648,0.0,0.0,0.000775,0.000775,0.003098,0.001549,0.006197,0.004648,0.001549,0.000775,0.001549,0.003098,0.002324,0.003873,0.0,0.002324,0.004648,0.003873,0.003873,0.003098,0.001549,0.000775,0.000775,0.000775,0.015492,0.002324,0.000775,0.001549,0.0,0.014717,0.000775,0.001549,0.003098,0.003098,0.0,0.000775,0.003098,0.011619,0.003873,0.002324,0.0,0.0,0.034082,0.000775,0.008521,0.001549,0.008521,0.002324,0.002324,0.016266,0.001549,0.005422,0.0,0.000775,0.001549,0.003098,0.001549,0.003098,0.0,0.003873,0.003098,0.003873,0.003098,0.0,0.0,0.014717,0.000775,0.000775,0.009295,0.0,0.001549,0.003098,0.000775,0.000775,0.0,0.009295,0.0,0.012393,0.008521,0.000775,0.000775,0.001549,0.000775,0.002324,0.012393,0.002324,0.004648,0.006197,0.000775,0.003098
2,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02459,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008197,0.0,0.0,0.0,0.02459,0.008197,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008197,0.0,0.016393,0.0,0.0,0.0,0.032787,0.008197,0.0,0.008197,0.016393,0.0,0.0,0.040984,0.0,0.0,0.008197,0.008197,0.008197,0.0,0.0,0.0,0.008197,0.0,0.065574,0.0,0.0,0.0,0.0,0.008197,0.008197,0.0,0.008197,0.008197,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.008197,0.0,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008197,0.016393,0.0,0.008197,0.008197,0.0,0.0,0.0,0.008197,0.0,0.0,0.0,0.0,0.0,0.008197,0.016393,0.0,0.008197,0.008197,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065574,0.008197,0.008197,0.008197,0.0,0.0,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040984,0.008197,0.0,0.0,0.04918,0.008197,0.0,0.0,0.0,0.008197,0.0,0.0,0.008197,0.008197,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008197,0.0,0.0,0.0,0.0,0.008197,0.0,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040984,0.0,0.008197,0.0,0.0,0.02459,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02459,0.0,0.0,0.008197,0.0,0.016393,0.0,0.0,0.0,0.02459,0.0,0.0,0.008197,0.0,0.0,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.008197,0.0,0.008197,0.008197,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393
3,West Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005556,0.0,0.005556,0.005556,0.0,0.005556,0.016667,0.0,0.0,0.0,0.0,0.0,0.027778,0.011111,0.077778,0.0,0.0,0.0,0.0,0.0,0.0,0.005556,0.0,0.0,0.016667,0.005556,0.0,0.022222,0.011111,0.0,0.0,0.005556,0.011111,0.0,0.0,0.055556,0.005556,0.0,0.005556,0.0,0.0,0.0,0.0,0.005556,0.0,0.011111,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005556,0.0,0.0,0.0,0.011111,0.005556,0.0,0.005556,0.0,0.011111,0.0,0.022222,0.005556,0.011111,0.0,0.0,0.0,0.005556,0.0,0.0,0.0,0.005556,0.0,0.011111,0.0,0.005556,0.0,0.005556,0.0,0.005556,0.0,0.0,0.0,0.0,0.011111,0.005556,0.0,0.011111,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.016667,0.0,0.005556,0.005556,0.011111,0.016667,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005556,0.0,0.005556,0.0,0.033333,0.005556,0.0,0.0,0.0,0.005556,0.005556,0.0,0.005556,0.0,0.0,0.005556,0.0,0.005556,0.005556,0.0,0.0,0.0,0.011111,0.011111,0.005556,0.005556,0.0,0.0,0.0,0.005556,0.0,0.0,0.016667,0.011111,0.0,0.0,0.005556,0.0,0.0,0.0,0.0,0.016667,0.005556,0.005556,0.016667,0.0,0.027778,0.0,0.0,0.0,0.0,0.005556,0.0,0.0,0.011111,0.0,0.005556,0.0,0.0,0.027778,0.0,0.0,0.005556,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005556,0.0,0.0,0.005556,0.0,0.005556,0.0,0.005556,0.005556,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.0,0.0,0.0,0.005556,0.005556,0.0,0.005556,0.005556,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.011111,0.005556,0.0,0.011111


In [120]:
CT_grouped.shape

(4, 235)

In [123]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [132]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = CT_grouped['Neighborhood']

for ind in np.arange(CT_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(CT_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Pizza Place,Park,Café,Sushi Restaurant,Dessert Shop,Pub,Clothing Store,Gym
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Seafood Restaurant
2,East Toronto,Coffee Shop,Greek Restaurant,Italian Restaurant,Ice Cream Shop,Café,Park,Brewery,Bakery,Pizza Place,Sandwich Place
3,West Toronto,Bar,Café,Coffee Shop,Italian Restaurant,Bakery,Restaurant,Pizza Place,Diner,Breakfast Spot,Park


In [133]:
# set number of clusters
kclusters = 4

CT_grouped_clustering = CT_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(CT_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 2, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [134]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [135]:
CT_merged = CT_data

In [137]:
CT_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [138]:
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Central Toronto,Coffee Shop,Sandwich Place,Pizza Place,Park,Café,Sushi Restaurant,Dessert Shop,Pub,Clothing Store,Gym
1,3,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Seafood Restaurant
2,2,East Toronto,Coffee Shop,Greek Restaurant,Italian Restaurant,Ice Cream Shop,Café,Park,Brewery,Bakery,Pizza Place,Sandwich Place
3,1,West Toronto,Bar,Café,Coffee Shop,Italian Restaurant,Bakery,Restaurant,Pizza Place,Diner,Breakfast Spot,Park


In [142]:
CT_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [148]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
CT_merged = CT_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

In [149]:
CT_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Seafood Restaurant
1,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Seafood Restaurant
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Seafood Restaurant
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Coffee Shop,Greek Restaurant,Italian Restaurant,Ice Cream Shop,Café,Park,Brewery,Bakery,Pizza Place,Sandwich Place
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Seafood Restaurant


Finally, let's visualize the resulting clusters

In [150]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(CT_merged['Latitude'], CT_merged['Longitude'], CT_merged['Borough'], CT_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters