# Segmenting and Clustering Post codes in Toronto

## Introduction

The objective of this project is explore, segment and cluster the neighborhoods in the city of Toronto.
The city data was scraped from a website (x) using BeautifulSoup, in order to obtain the city's post codes, boroughs and neighborhoods (ignoring rows with no borough assigned).
Once the city data was clean and structured...  **(CONTINUE INTRODUCTION WITH:neighborhood's coordinates)**

### Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
from bs4 import BeautifulSoup
import lxml

### Colecting data about Toronto's neighbourhoods using BeautifulSoup
#### Source of data: wikipedia

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [3]:
r = requests.get(url)
soup = BeautifulSoup(r.text,'html.parser')

In [4]:
results = soup.find('div', class_='mw-parser-output')

In [5]:
neighbourhoods_aux = results.table.tbody.find_all('tr')
neighbourhoods_aux[0]

<tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>

In [6]:
# extracting the first element, because it is not a neighbourhood
neighbourhoods = neighbourhoods_aux[1:]
neighbourhoods[0]

<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>

### Extracting <strong>_postcode, borough_</strong> and <strong>_neighbourhood_</strong>

In [7]:
#testing with random neighbourhood
first_neigh = neighbourhoods[12]
first_neigh.contents

['\n',
 <td>M1B</td>,
 '\n',
 <td><a href="/wiki/Scarborough,_Toronto" title="Scarborough, Toronto">Scarborough</a></td>,
 '\n',
 <td><a href="/wiki/Malvern,_Toronto" title="Malvern, Toronto">Malvern</a>
 </td>]

In [8]:
# postcode
first_neigh.find('td').text

'M1B'

In [9]:
#borough
first_neigh.contents[3].text

'Scarborough'

In [10]:
#neighbourhood
first_neigh.contents[5].text.strip('\n')

'Malvern'

### Creating dataframe

In [11]:
column_names = ['Postcode','Borough','Neighbourhood']

In [12]:
#using the arguments tested above to colect postcode, borough and neighbourhood from the complete dataset

records = []
for results in neighbourhoods:
    postcode = results.find('td').text
    borough = results.contents[3].text
    neighbourhood = results.contents[5].text.strip('\n')
    records.append((postcode,borough,neighbourhood))

In [13]:
len(records)

288

In [14]:
df_toronto = pd.DataFrame(data = records,columns = column_names)

In [15]:
df_toronto.tail()

Unnamed: 0,Postcode,Borough,Neighbourhood
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West
286,M8Z,Etobicoke,South of Bloor
287,M9Z,Not assigned,Not assigned


In [16]:
#rows with no borough assigned must be cleaned
df_toronto['Borough'].value_counts()['Not assigned']

77

### Cleaning dataframe

In [17]:
#removing rows with borough not assigned

df_toronto = df_toronto[df_toronto['Borough']!='Not assigned']
df_toronto['Borough'].value_counts()

Etobicoke           45
North York          38
Downtown Toronto    37
Scarborough         37
Central Toronto     17
West Toronto        13
York                 9
East Toronto         7
East York            6
Queen's Park         1
Mississauga          1
Name: Borough, dtype: int64

In [18]:
#reseting index
df_toronto.reset_index(drop =True, inplace=True)
df_toronto.tail()

Unnamed: 0,Postcode,Borough,Neighbourhood
206,M8Z,Etobicoke,Kingsway Park South West
207,M8Z,Etobicoke,Mimico NW
208,M8Z,Etobicoke,The Queensway West
209,M8Z,Etobicoke,Royal York South West
210,M8Z,Etobicoke,South of Bloor


In [19]:
#Removing rows with no neighbourhood assigned

for i in range (0,df_toronto.shape[0]):
    if df_toronto.iloc[i][2] == 'Not assigned':
        df_toronto.iloc[i][2] = df_toronto.iloc[i][1]

In [20]:
# there is no Postcode not assigned
df_toronto[df_toronto['Postcode']=='Not assigned']

Unnamed: 0,Postcode,Borough,Neighbourhood


### Combining multiple neighborhoods with the same post code

In [21]:
toronto = df_toronto.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(','.join).reset_index()

In [22]:
toronto.head(4)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn


In [23]:
print('The Toronto dataframe shape is: {} rows and {} columns'.format(toronto.shape[0], toronto.shape[1]) )

The Toronto dataframe shape is: 103 rows and 3 columns


### Completing dataframe with post code's latitude and longitude
#### Using GeoPy to retrieve the coordinates from the post codes.

In [24]:
geolocator = Nominatim(user_agent="toronto_explorer")

In [25]:
lat_long = []
lat = None
long = None

for i in range (0, toronto.shape[0]):
    post_code = toronto.iloc[i][0]
    bor = toronto.iloc[i][1]
    neig = toronto.iloc[i][2]
    while lat is None:
            coordinates = geolocator.geocode('Toronto, Ontario, {}'.format(post_code))
            lat = coordinates.latitude
    while long is None:
            long = coordinates.longitude
    lat_long.append((post_code, bor, neig, lat, long))
    latitude = None
    long = None

lat_long

[('M1B', 'Scarborough', 'Rouge,Malvern', 43.653963, -79.387207),
 ('M1C',
  'Scarborough',
  'Highland Creek,Rouge Hill,Port Union',
  43.653963,
  -79.387207),
 ('M1E',
  'Scarborough',
  'Guildwood,Morningside,West Hill',
  43.653963,
  -79.387207),
 ('M1G', 'Scarborough', 'Woburn', 43.653963, -79.387207),
 ('M1H', 'Scarborough', 'Cedarbrae', 43.653963, -79.387207),
 ('M1J', 'Scarborough', 'Scarborough Village', 43.653963, -79.387207),
 ('M1K',
  'Scarborough',
  'East Birchmount Park,Ionview,Kennedy Park',
  43.653963,
  -79.387207),
 ('M1L',
  'Scarborough',
  'Clairlea,Golden Mile,Oakridge',
  43.653963,
  -79.387207),
 ('M1M',
  'Scarborough',
  'Cliffcrest,Cliffside,Scarborough Village West',
  43.653963,
  -79.387207),
 ('M1N', 'Scarborough', 'Birch Cliff,Cliffside West', 43.653963, -79.387207),
 ('M1P',
  'Scarborough',
  'Dorset Park,Scarborough Town Centre,Wexford Heights',
  43.653963,
  -79.387207),
 ('M1R', 'Scarborough', 'Maryvale,Wexford', 43.653963, -79.387207),
 ('M1S

In [26]:
column = ['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude']
toronto_dataset = pd.DataFrame(data = lat_long, columns = column)

In [27]:
toronto_dataset.sort_values('Postcode', ascending = False).head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
102,M9W,Etobicoke,Northwest,43.653963,-79.387207
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.653963,-79.387207
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.653963,-79.387207
99,M9P,Etobicoke,Westmount,43.653963,-79.387207
98,M9N,York,Weston,43.653963,-79.387207


#### Using csv file with coordinates of each postal code

In [30]:
file = pd.read_csv('/Users/danielmiranda/Documents/Cursos/Coursera/IBM Data Science Professional Certificate/Curso 9 - Applied Data Science Capstone/4. Project Toronto/Geospatial_Coordinates.csv')


In [36]:
file.head(1)

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353


In [34]:
#adjusting the columns name
file.columns=(['Postcode','Latitude','Longitude'])

In [37]:
file.head(1)

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353


In [38]:
toronto_latlong = pd.merge(toronto,file, on='Postcode')

In [39]:
toronto_latlong.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Next, Foursquare API is uded to explore the neighborhoods and segment them.

In [42]:
#defining foursquare credentials and version
CLIENT_ID = '5POPJZCPATN5LA5SWI3GBHRZXEIOODTEQNIFOUWXO1WEUBFF'
CLIENT_SECRET = '4UU2LLVLV2MAAVFDORH40KSZKNMES323YYERWOEYDCSR5VNJ'
VERSION = '20180605'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5POPJZCPATN5LA5SWI3GBHRZXEIOODTEQNIFOUWXO1WEUBFF
CLIENT_SECRET:4UU2LLVLV2MAAVFDORH40KSZKNMES323YYERWOEYDCSR5VNJ


### Creating a function to explore nearby venues

In [50]:
def nearby_venues(neighborhoods, latitudes, longitudes, radius=500, limit=100):
    
    venues_list=[]
    
    for name, lat, long in zip(neighborhoods, latitudes, longitudes):
        # API request url
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            long,
            radius,
            limit)

        # GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # Returning information for each nearby venue
        venues_list.append([(
            name,
            lat,
            long,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name'])for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
        
    return(nearby_venues)

### Toronto's venues

In [51]:
toronto_venues = nearby_venues(toronto_latlong['Neighbourhood'],toronto_latlong['Latitude'], toronto_latlong['Longitude'])

In [53]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge,Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
3,"Guildwood,Morningside,West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa


#### Checking how many venues were returned for each post code

In [68]:
toronto_venues.groupby('Neighborhood').count()[['Venue']].sort_values('Venue', ascending=False)

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
"Adelaide,King,Richmond",100
"Chinatown,Grange Park,Kensington Market",100
St. James Town,100
"Ryerson,Garden District",100
"First Canadian Place,Underground city",100
"Design Exchange,Toronto Dominion Centre",100
"Commerce Court,Victoria Hotel",100
"Harbourfront East,Toronto Islands,Union Station",100
Stn A PO Boxes 25 The Esplanade,95
Church and Wellesley,88


#### Finding out how many unique categories can be curated from all the returned venues

In [76]:
print('There are {} unique types of venues in toronto'.format(len(toronto_venues['Venue Category'].unique())))

There are 274 unique types of venues in toronto


### Analyzing each post code

In [88]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']])

# adding column neighborhood
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

onehot_aux = [toronto_onehot.columns[-1]]+ list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[onehot_aux]
toronto_onehot.head()

Unnamed: 0,Neighborhood,Venue Category_Accessories Store,Venue Category_Adult Boutique,Venue Category_Afghan Restaurant,Venue Category_Airport,Venue Category_Airport Food Court,Venue Category_Airport Gate,Venue Category_Airport Lounge,Venue Category_Airport Service,Venue Category_Airport Terminal,Venue Category_American Restaurant,Venue Category_Antique Shop,Venue Category_Aquarium,Venue Category_Arcade,Venue Category_Argentinian Restaurant,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Arts & Crafts Store,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_Auto Garage,Venue Category_Auto Workshop,Venue Category_BBQ Joint,Venue Category_Baby Store,Venue Category_Bagel Shop,Venue Category_Bakery,Venue Category_Bank,Venue Category_Bar,Venue Category_Baseball Field,Venue Category_Baseball Stadium,Venue Category_Basketball Stadium,Venue Category_Beach,Venue Category_Bed & Breakfast,Venue Category_Beer Bar,Venue Category_Beer Store,Venue Category_Belgian Restaurant,Venue Category_Bike Shop,Venue Category_Bistro,Venue Category_Boat or Ferry,Venue Category_Bookstore,Venue Category_Boutique,Venue Category_Brazilian Restaurant,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Bridal Shop,Venue Category_Bubble Tea Shop,Venue Category_Burger Joint,Venue Category_Burrito Place,Venue Category_Bus Line,Venue Category_Bus Station,Venue Category_Business Service,Venue Category_Butcher,Venue Category_Cafeteria,Venue Category_Café,Venue Category_Cajun / Creole Restaurant,Venue Category_Camera Store,Venue Category_Candy Store,Venue Category_Caribbean Restaurant,Venue Category_Cheese Shop,Venue Category_Chinese Restaurant,Venue Category_Chocolate Shop,Venue Category_Church,Venue Category_Climbing Gym,Venue Category_Clothing Store,Venue Category_Cocktail Bar,Venue Category_Coffee Shop,Venue Category_College Arts Building,Venue Category_College Auditorium,Venue Category_College Cafeteria,Venue Category_College Gym,Venue Category_College Rec Center,Venue Category_College Stadium,Venue Category_Colombian Restaurant,Venue Category_Comfort Food Restaurant,Venue Category_Comic Shop,Venue Category_Concert Hall,Venue Category_Construction & Landscaping,Venue Category_Convenience Store,Venue Category_Cosmetics Shop,Venue Category_Coworking Space,Venue Category_Creperie,Venue Category_Cuban Restaurant,Venue Category_Cupcake Shop,Venue Category_Curling Ice,Venue Category_Dance Studio,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Drugstore,Venue Category_Dumpling Restaurant,Venue Category_Eastern European Restaurant,Venue Category_Electronics Store,Venue Category_Empanada Restaurant,Venue Category_Ethiopian Restaurant,Venue Category_Event Space,Venue Category_Falafel Restaurant,Venue Category_Farmers Market,Venue Category_Fast Food Restaurant,Venue Category_Field,Venue Category_Filipino Restaurant,Venue Category_Fish & Chips Shop,Venue Category_Fish Market,Venue Category_Flea Market,Venue Category_Flower Shop,Venue Category_Food,Venue Category_Food & Drink Shop,Venue Category_Food Court,Venue Category_Food Truck,Venue Category_Fountain,Venue Category_French Restaurant,Venue Category_Fried Chicken Joint,Venue Category_Frozen Yogurt Shop,Venue Category_Fruit & Vegetable Store,Venue Category_Furniture / Home Store,Venue Category_Gaming Cafe,Venue Category_Garden,Venue Category_Garden Center,Venue Category_Gastropub,Venue Category_Gay Bar,Venue Category_General Entertainment,Venue Category_General Travel,Venue Category_German Restaurant,Venue Category_Gift Shop,Venue Category_Gluten-free Restaurant,Venue Category_Golf Course,Venue Category_Gourmet Shop,Venue Category_Greek Restaurant,Venue Category_Grocery Store,Venue Category_Gym,Venue Category_Gym / Fitness Center,Venue Category_Hakka Restaurant,Venue Category_Harbor / Marina,Venue Category_Health & Beauty Service,Venue Category_Health Food Store,Venue Category_Historic Site,Venue Category_History Museum,Venue Category_Hobby Shop,Venue Category_Hockey Arena,Venue Category_Hookah Bar,Venue Category_Hospital,Venue Category_Hostel,Venue Category_Hotel,Venue Category_Hotel Bar,Venue Category_Housing Development,Venue Category_Ice Cream Shop,Venue Category_Indian Restaurant,Venue Category_Indie Movie Theater,Venue Category_Indonesian Restaurant,Venue Category_Insurance Office,Venue Category_Intersection,Venue Category_Irish Pub,Venue Category_Italian Restaurant,Venue Category_Japanese Restaurant,Venue Category_Jazz Club,Venue Category_Jewelry Store,Venue Category_Jewish Restaurant,Venue Category_Juice Bar,Venue Category_Korean Restaurant,Venue Category_Lake,Venue Category_Latin American Restaurant,Venue Category_Light Rail Station,Venue Category_Lingerie Store,Venue Category_Liquor Store,Venue Category_Lounge,Venue Category_Luggage Store,Venue Category_Mac & Cheese Joint,Venue Category_Malay Restaurant,Venue Category_Market,Venue Category_Martial Arts Dojo,Venue Category_Massage Studio,Venue Category_Medical Center,Venue Category_Mediterranean Restaurant,Venue Category_Men's Store,Venue Category_Metro Station,Venue Category_Mexican Restaurant,Venue Category_Middle Eastern Restaurant,Venue Category_Miscellaneous Shop,Venue Category_Mobile Phone Shop,Venue Category_Modern European Restaurant,Venue Category_Molecular Gastronomy Restaurant,Venue Category_Monument / Landmark,Venue Category_Motel,Venue Category_Movie Theater,Venue Category_Moving Target,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_Neighborhood,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Noodle House,Venue Category_Office,Venue Category_Opera House,Venue Category_Optical Shop,Venue Category_Organic Grocery,Venue Category_Other Great Outdoors,Venue Category_Park,Venue Category_Performing Arts Venue,Venue Category_Pet Store,Venue Category_Pharmacy,Venue Category_Pizza Place,Venue Category_Plane,Venue Category_Playground,Venue Category_Plaza,Venue Category_Poke Place,Venue Category_Pool,Venue Category_Portuguese Restaurant,Venue Category_Poutine Place,Venue Category_Pub,Venue Category_Ramen Restaurant,Venue Category_Record Shop,Venue Category_Recording Studio,Venue Category_Rental Car Location,Venue Category_Restaurant,Venue Category_River,Venue Category_Rock Climbing Spot,Venue Category_Sake Bar,Venue Category_Salad Place,Venue Category_Salon / Barbershop,Venue Category_Sandwich Place,Venue Category_Scenic Lookout,Venue Category_Sculpture Garden,Venue Category_Seafood Restaurant,Venue Category_Shoe Store,Venue Category_Shopping Mall,Venue Category_Skate Park,Venue Category_Skating Rink,Venue Category_Smoke Shop,Venue Category_Smoothie Shop,Venue Category_Snack Place,Venue Category_Soccer Field,Venue Category_Social Club,Venue Category_Soup Place,Venue Category_Southern / Soul Food Restaurant,Venue Category_Spa,Venue Category_Speakeasy,Venue Category_Sporting Goods Shop,Venue Category_Sports Bar,Venue Category_Stadium,Venue Category_Stationery Store,Venue Category_Steakhouse,Venue Category_Strip Club,Venue Category_Supermarket,Venue Category_Supplement Shop,Venue Category_Sushi Restaurant,Venue Category_Swim School,Venue Category_Taco Place,Venue Category_Tailor Shop,Venue Category_Taiwanese Restaurant,Venue Category_Tanning Salon,Venue Category_Tapas Restaurant,Venue Category_Tea Room,Venue Category_Tennis Court,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Theme Restaurant,Venue Category_Thrift / Vintage Store,Venue Category_Toy / Game Store,Venue Category_Trail,Venue Category_Train Station,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Video Store,Venue Category_Vietnamese Restaurant,Venue Category_Warehouse Store,Venue Category_Wine Bar,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio
0,"Rouge,Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek,Rouge Hill,Port Union",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Guildwood,Morningside,West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Guildwood,Morningside,West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Guildwood,Morningside,West Hill",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [91]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Venue Category_Accessories Store,Venue Category_Adult Boutique,Venue Category_Afghan Restaurant,Venue Category_Airport,Venue Category_Airport Food Court,Venue Category_Airport Gate,Venue Category_Airport Lounge,Venue Category_Airport Service,Venue Category_Airport Terminal,Venue Category_American Restaurant,Venue Category_Antique Shop,Venue Category_Aquarium,Venue Category_Arcade,Venue Category_Argentinian Restaurant,Venue Category_Art Gallery,Venue Category_Art Museum,Venue Category_Arts & Crafts Store,Venue Category_Asian Restaurant,Venue Category_Athletics & Sports,Venue Category_Auto Garage,Venue Category_Auto Workshop,Venue Category_BBQ Joint,Venue Category_Baby Store,Venue Category_Bagel Shop,Venue Category_Bakery,Venue Category_Bank,Venue Category_Bar,Venue Category_Baseball Field,Venue Category_Baseball Stadium,Venue Category_Basketball Stadium,Venue Category_Beach,Venue Category_Bed & Breakfast,Venue Category_Beer Bar,Venue Category_Beer Store,Venue Category_Belgian Restaurant,Venue Category_Bike Shop,Venue Category_Bistro,Venue Category_Boat or Ferry,Venue Category_Bookstore,Venue Category_Boutique,Venue Category_Brazilian Restaurant,Venue Category_Breakfast Spot,Venue Category_Brewery,Venue Category_Bridal Shop,Venue Category_Bubble Tea Shop,Venue Category_Burger Joint,Venue Category_Burrito Place,Venue Category_Bus Line,Venue Category_Bus Station,Venue Category_Business Service,Venue Category_Butcher,Venue Category_Cafeteria,Venue Category_Café,Venue Category_Cajun / Creole Restaurant,Venue Category_Camera Store,Venue Category_Candy Store,Venue Category_Caribbean Restaurant,Venue Category_Cheese Shop,Venue Category_Chinese Restaurant,Venue Category_Chocolate Shop,Venue Category_Church,Venue Category_Climbing Gym,Venue Category_Clothing Store,Venue Category_Cocktail Bar,Venue Category_Coffee Shop,Venue Category_College Arts Building,Venue Category_College Auditorium,Venue Category_College Cafeteria,Venue Category_College Gym,Venue Category_College Rec Center,Venue Category_College Stadium,Venue Category_Colombian Restaurant,Venue Category_Comfort Food Restaurant,Venue Category_Comic Shop,Venue Category_Concert Hall,Venue Category_Construction & Landscaping,Venue Category_Convenience Store,Venue Category_Cosmetics Shop,Venue Category_Coworking Space,Venue Category_Creperie,Venue Category_Cuban Restaurant,Venue Category_Cupcake Shop,Venue Category_Curling Ice,Venue Category_Dance Studio,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Drugstore,Venue Category_Dumpling Restaurant,Venue Category_Eastern European Restaurant,Venue Category_Electronics Store,Venue Category_Empanada Restaurant,Venue Category_Ethiopian Restaurant,Venue Category_Event Space,Venue Category_Falafel Restaurant,Venue Category_Farmers Market,Venue Category_Fast Food Restaurant,Venue Category_Field,Venue Category_Filipino Restaurant,Venue Category_Fish & Chips Shop,Venue Category_Fish Market,Venue Category_Flea Market,Venue Category_Flower Shop,Venue Category_Food,Venue Category_Food & Drink Shop,Venue Category_Food Court,Venue Category_Food Truck,Venue Category_Fountain,Venue Category_French Restaurant,Venue Category_Fried Chicken Joint,Venue Category_Frozen Yogurt Shop,Venue Category_Fruit & Vegetable Store,Venue Category_Furniture / Home Store,Venue Category_Gaming Cafe,Venue Category_Garden,Venue Category_Garden Center,Venue Category_Gastropub,Venue Category_Gay Bar,Venue Category_General Entertainment,Venue Category_General Travel,Venue Category_German Restaurant,Venue Category_Gift Shop,Venue Category_Gluten-free Restaurant,Venue Category_Golf Course,Venue Category_Gourmet Shop,Venue Category_Greek Restaurant,Venue Category_Grocery Store,Venue Category_Gym,Venue Category_Gym / Fitness Center,Venue Category_Hakka Restaurant,Venue Category_Harbor / Marina,Venue Category_Health & Beauty Service,Venue Category_Health Food Store,Venue Category_Historic Site,Venue Category_History Museum,Venue Category_Hobby Shop,Venue Category_Hockey Arena,Venue Category_Hookah Bar,Venue Category_Hospital,Venue Category_Hostel,Venue Category_Hotel,Venue Category_Hotel Bar,Venue Category_Housing Development,Venue Category_Ice Cream Shop,Venue Category_Indian Restaurant,Venue Category_Indie Movie Theater,Venue Category_Indonesian Restaurant,Venue Category_Insurance Office,Venue Category_Intersection,Venue Category_Irish Pub,Venue Category_Italian Restaurant,Venue Category_Japanese Restaurant,Venue Category_Jazz Club,Venue Category_Jewelry Store,Venue Category_Jewish Restaurant,Venue Category_Juice Bar,Venue Category_Korean Restaurant,Venue Category_Lake,Venue Category_Latin American Restaurant,Venue Category_Light Rail Station,Venue Category_Lingerie Store,Venue Category_Liquor Store,Venue Category_Lounge,Venue Category_Luggage Store,Venue Category_Mac & Cheese Joint,Venue Category_Malay Restaurant,Venue Category_Market,Venue Category_Martial Arts Dojo,Venue Category_Massage Studio,Venue Category_Medical Center,Venue Category_Mediterranean Restaurant,Venue Category_Men's Store,Venue Category_Metro Station,Venue Category_Mexican Restaurant,Venue Category_Middle Eastern Restaurant,Venue Category_Miscellaneous Shop,Venue Category_Mobile Phone Shop,Venue Category_Modern European Restaurant,Venue Category_Molecular Gastronomy Restaurant,Venue Category_Monument / Landmark,Venue Category_Motel,Venue Category_Movie Theater,Venue Category_Moving Target,Venue Category_Museum,Venue Category_Music Store,Venue Category_Music Venue,Venue Category_Neighborhood,Venue Category_New American Restaurant,Venue Category_Nightclub,Venue Category_Noodle House,Venue Category_Office,Venue Category_Opera House,Venue Category_Optical Shop,Venue Category_Organic Grocery,Venue Category_Other Great Outdoors,Venue Category_Park,Venue Category_Performing Arts Venue,Venue Category_Pet Store,Venue Category_Pharmacy,Venue Category_Pizza Place,Venue Category_Plane,Venue Category_Playground,Venue Category_Plaza,Venue Category_Poke Place,Venue Category_Pool,Venue Category_Portuguese Restaurant,Venue Category_Poutine Place,Venue Category_Pub,Venue Category_Ramen Restaurant,Venue Category_Record Shop,Venue Category_Recording Studio,Venue Category_Rental Car Location,Venue Category_Restaurant,Venue Category_River,Venue Category_Rock Climbing Spot,Venue Category_Sake Bar,Venue Category_Salad Place,Venue Category_Salon / Barbershop,Venue Category_Sandwich Place,Venue Category_Scenic Lookout,Venue Category_Sculpture Garden,Venue Category_Seafood Restaurant,Venue Category_Shoe Store,Venue Category_Shopping Mall,Venue Category_Skate Park,Venue Category_Skating Rink,Venue Category_Smoke Shop,Venue Category_Smoothie Shop,Venue Category_Snack Place,Venue Category_Soccer Field,Venue Category_Social Club,Venue Category_Soup Place,Venue Category_Southern / Soul Food Restaurant,Venue Category_Spa,Venue Category_Speakeasy,Venue Category_Sporting Goods Shop,Venue Category_Sports Bar,Venue Category_Stadium,Venue Category_Stationery Store,Venue Category_Steakhouse,Venue Category_Strip Club,Venue Category_Supermarket,Venue Category_Supplement Shop,Venue Category_Sushi Restaurant,Venue Category_Swim School,Venue Category_Taco Place,Venue Category_Tailor Shop,Venue Category_Taiwanese Restaurant,Venue Category_Tanning Salon,Venue Category_Tapas Restaurant,Venue Category_Tea Room,Venue Category_Tennis Court,Venue Category_Thai Restaurant,Venue Category_Theater,Venue Category_Theme Restaurant,Venue Category_Thrift / Vintage Store,Venue Category_Toy / Game Store,Venue Category_Trail,Venue Category_Train Station,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Video Store,Venue Category_Vietnamese Restaurant,Venue Category_Warehouse Store,Venue Category_Wine Bar,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.2,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Discovering the top 10 venues for each post code

In [96]:
num = 10

for neigh in toronto_grouped['Neighborhood']:
    print('____'+neigh+'____')
    temp = toronto_grouped[toronto_grouped['Neighborhood']==neigh].T.reset_index()
    temp.columns = ['Venue', 'Frequency']
    temp = temp.iloc[1:]
    temp['Frequency'] = temp['Frequency'].astype('float')
    temp = temp.round({'Frequency':2})
    print(temp.sort_values('Frequency', ascending=False).reset_index(drop=True).head(num))

____Adelaide,King,Richmond____
                                Venue  Frequency
0          Venue Category_Coffee Shop       0.06
1                 Venue Category_Café       0.05
2      Venue Category_Thai Restaurant       0.04
3  Venue Category_American Restaurant       0.04
4           Venue Category_Steakhouse       0.04
5                  Venue Category_Bar       0.03
6               Venue Category_Bakery       0.03
7                  Venue Category_Gym       0.03
8                Venue Category_Hotel       0.03
9         Venue Category_Burger Joint       0.03
____Agincourt____
                                            Venue  Frequency
0                   Venue Category_Sandwich Place       0.25
1                           Venue Category_Lounge       0.25
2                   Venue Category_Breakfast Spot       0.25
3                     Venue Category_Skating Rink       0.25
4               Venue Category_Miscellaneous Shop       0.00
5                            Venue Category_Mo

### Creating a dataframe
#### writing a function to sort the venues in descending order.

In [97]:
def return_most_common_venues(row, num):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Creating the new dataframe and display the top 10 venues for each neighborhood.

In [98]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Venue Category_Coffee Shop,Venue Category_Café,Venue Category_Steakhouse,Venue Category_Thai Restaurant,Venue Category_American Restaurant,Venue Category_Hotel,Venue Category_Gym,Venue Category_Bakery,Venue Category_Bar,Venue Category_Burger Joint
1,Agincourt,Venue Category_Breakfast Spot,Venue Category_Lounge,Venue Category_Sandwich Place,Venue Category_Skating Rink,Venue Category_Donut Shop,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Yoga Studio
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Venue Category_Park,Venue Category_Playground,Venue Category_Coffee Shop,Venue Category_Yoga Studio,Venue Category_Donut Shop,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Venue Category_Grocery Store,Venue Category_Pharmacy,Venue Category_Coffee Shop,Venue Category_Sandwich Place,Venue Category_Fast Food Restaurant,Venue Category_Fried Chicken Joint,Venue Category_Beer Store,Venue Category_Pizza Place,Venue Category_Dumpling Restaurant,Venue Category_Drugstore
4,"Alderwood,Long Branch",Venue Category_Pizza Place,Venue Category_Pool,Venue Category_Pub,Venue Category_Coffee Shop,Venue Category_Gym,Venue Category_Pharmacy,Venue Category_Athletics & Sports,Venue Category_Skating Rink,Venue Category_Sandwich Place,Venue Category_Department Store


### Clustering Neighborhoods
#### Runing k-means to cluster the post codes into 5 clusters

In [99]:
# seting the number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# runing k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# checking cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 1, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

#### creating a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [108]:
# Changing columns names in the toronto_latlong dataframe, for .join reasons
toronto_latlong.columns=(['Postcode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude'])


toronto_merged = toronto_latlong

# merging toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')



# checking the last columns!
toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,2.0,Venue Category_Fast Food Restaurant,Venue Category_Drugstore,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Dumpling Restaurant,Venue Category_College Gym
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,3.0,Venue Category_Bar,Venue Category_Dumpling Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Drugstore,Venue Category_Yoga Studio,Venue Category_Dessert Shop
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,2.0,Venue Category_Medical Center,Venue Category_Breakfast Spot,Venue Category_Rental Car Location,Venue Category_Mexican Restaurant,Venue Category_Intersection,Venue Category_Pizza Place,Venue Category_Spa,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Dumpling Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Venue Category_Coffee Shop,Venue Category_Korean Restaurant,Venue Category_Insurance Office,Venue Category_Drugstore,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2.0,Venue Category_Athletics & Sports,Venue Category_Fried Chicken Joint,Venue Category_Hakka Restaurant,Venue Category_Thai Restaurant,Venue Category_Bakery,Venue Category_Caribbean Restaurant,Venue Category_Bank,Venue Category_Doner Restaurant,Venue Category_Diner,Venue Category_Discount Store


### Visualizing clusters

In [113]:
# Discovering Toronto's Latitude and Longitude
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, ON, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, ON, Canada are 43.653963, -79.387207.


In [114]:
# creating map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[cluster-1],
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining clusters

#### Examining each cluster and determining the discriminating venue categories that distinguish each cluster.

In [115]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,0.0,Venue Category_Coffee Shop,Venue Category_Korean Restaurant,Venue Category_Insurance Office,Venue Category_Drugstore,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop
86,Mississauga,0.0,Venue Category_Hotel,Venue Category_Coffee Shop,Venue Category_American Restaurant,Venue Category_Gym / Fitness Center,Venue Category_Sandwich Place,Venue Category_Mediterranean Restaurant,Venue Category_Burrito Place,Venue Category_Fried Chicken Joint,Venue Category_Diner,Venue Category_Discount Store


In [116]:

toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,2.0,Venue Category_Fast Food Restaurant,Venue Category_Drugstore,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Dumpling Restaurant,Venue Category_College Gym
2,Scarborough,2.0,Venue Category_Medical Center,Venue Category_Breakfast Spot,Venue Category_Rental Car Location,Venue Category_Mexican Restaurant,Venue Category_Intersection,Venue Category_Pizza Place,Venue Category_Spa,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Dumpling Restaurant
4,Scarborough,2.0,Venue Category_Athletics & Sports,Venue Category_Fried Chicken Joint,Venue Category_Hakka Restaurant,Venue Category_Thai Restaurant,Venue Category_Bakery,Venue Category_Caribbean Restaurant,Venue Category_Bank,Venue Category_Doner Restaurant,Venue Category_Diner,Venue Category_Discount Store
5,Scarborough,2.0,Venue Category_Convenience Store,Venue Category_Playground,Venue Category_Yoga Studio,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop
6,Scarborough,2.0,Venue Category_Department Store,Venue Category_Bus Station,Venue Category_Coffee Shop,Venue Category_Discount Store,Venue Category_Drugstore,Venue Category_Diner,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Dumpling Restaurant
7,Scarborough,2.0,Venue Category_Bakery,Venue Category_Bus Line,Venue Category_Park,Venue Category_Intersection,Venue Category_Fast Food Restaurant,Venue Category_Metro Station,Venue Category_Soccer Field,Venue Category_Construction & Landscaping,Venue Category_Convenience Store,Venue Category_Ethiopian Restaurant
8,Scarborough,2.0,Venue Category_Motel,Venue Category_American Restaurant,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Yoga Studio
9,Scarborough,2.0,Venue Category_General Entertainment,Venue Category_College Stadium,Venue Category_Café,Venue Category_Skating Rink,Venue Category_Yoga Studio,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant
10,Scarborough,2.0,Venue Category_Indian Restaurant,Venue Category_Pet Store,Venue Category_Vietnamese Restaurant,Venue Category_Latin American Restaurant,Venue Category_Chinese Restaurant,Venue Category_Doner Restaurant,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store
11,Scarborough,2.0,Venue Category_Auto Garage,Venue Category_Middle Eastern Restaurant,Venue Category_Smoke Shop,Venue Category_Breakfast Spot,Venue Category_Shopping Mall,Venue Category_Sandwich Place,Venue Category_Bakery,Venue Category_Doner Restaurant,Venue Category_Discount Store,Venue Category_Dog Run


In [117]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,3.0,Venue Category_Bar,Venue Category_Dumpling Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Drugstore,Venue Category_Yoga Studio,Venue Category_Dessert Shop


In [118]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Etobicoke,4.0,Venue Category_Baseball Field,Venue Category_Yoga Studio,Venue Category_Dumpling Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Drugstore,Venue Category_Eastern European Restaurant
97,North York,4.0,Venue Category_Construction & Landscaping,Venue Category_Baseball Field,Venue Category_Yoga Studio,Venue Category_Dumpling Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Donut Shop,Venue Category_Drugstore
