# The Battle of the Neighborhoods 
## (Week 2)


## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **restaurant** in **Toronto**, Canada.

Here we will try finding if someone wants to open a new restaurant in the city which location is best suited for it keeping in mind the competitors and which income group of people will be attracted most to it based on the **population of the neighbourhood**.

Since there are lots of restaurants in Toronto we will try to detect **locations that are not already crowded with restaurants**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* All existing restaurants in the neighborhood (any type of restaurant)
* Age group of people with their income
* Distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**

In [1]:
import numpy as np
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.expand_frame_repr', False)


In [2]:
# define the dataframe columns
column_names = ['Postal_Code','Borough', 'Neighborhood'] 

bn = pd.DataFrame(columns=column_names)

## 1. Download and Explore Dataset

In [3]:
from urllib.request import urlopen
wiki = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

page = urlopen(wiki)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"XptG7wpAMNAAAUaw2@sAAAES","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":951325562,"wgRevisionId":951325562,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toronto","Ontario

In [4]:
Toronto=soup.find('table', class_='wikitable sortable')
Toronto

<table class="wikitable sortable">
<tbody><tr>
<th>Postal code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park / Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor / Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park / Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern / Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3B
</td>
<td>North York
</td>
<td>Don Mills
</td></tr>
<tr>
<td>M4B
</td>
<td>Ea

In [5]:
#Generate lists
Pos=[]
Bor=[]
Neig=[]

for row in Toronto.findAll("tr"):
    cells = row.findAll('td')
    if len(cells)==3: 
        Pos.append(cells[0].find(text=True))
        Bor.append(cells[1].find(text=True))
        Neig.append(cells[2].find(text=True))

        
#Add Data to our DataFrame
bn['Postal_Code']=Pos
bn['Borough']=Bor
bn['Neighborhood']=Neig

bn

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


### Data Cleaning

If Borough is Not Assigned drop row.

Reset Index

In [6]:
bn = bn.drop(bn[bn['Borough'].str.contains("Not assigned")==True].index, axis=0, inplace=False)

bn.index = pd.RangeIndex(len(bn.index))
bn

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,Malvern / Rouge
7,M3B,North York,Don Mills
8,M4B,East York,Parkview Hill / Woodbine Gardens
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [7]:
bn.shape

(103, 3)

In [8]:
column_n = ['NEBRVAL'] 
bn_NA = pd.DataFrame(columns=column_n)
NA=['Not assigned'] 
bn_NA['NEBRVAL'] = NA

bn1=bn

for row_index,row in bn.iterrows():
    if((bn.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned') or (bn.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned\n')):
       
        bn1.loc[row_index,['Neighborhood']] = bn1.loc[row_index,['Borough']].values.astype('str') 
        
#Reset Index
#nbr.index = pd.RangeIndex(len(nbr.index))

bn1

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,Malvern / Rouge
7,M3B,North York,Don Mills
8,M4B,East York,Parkview Hill / Woodbine Gardens
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [9]:
bn2=bn1.groupby('Postal_Code').agg({'Borough':'first',
                               'Neighborhood': ', '.join}).reset_index()

column_names = ['Postal_Code','Borough', 'Neighborhood'] 
bn3 = pd.DataFrame(columns=column_names)

bn3 = bn2.drop(bn2[bn2['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=False)

#Reset Index
bn3.index = pd.RangeIndex(len(bn3.index))
bn3

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M4E,East Toronto,The Beaches\n
1,M4K,East Toronto,The Danforth West / Riverdale\n
2,M4L,East Toronto,India Bazaar / The Beaches West\n
3,M4M,East Toronto,Studio District\n
4,M4N,Central Toronto,Lawrence Park\n
5,M4P,Central Toronto,Davisville North\n
6,M4R,Central Toronto,North Toronto West\n
7,M4S,Central Toronto,Davisville\n
8,M4T,Central Toronto,Moore Park / Summerhill East\n
9,M4V,Central Toronto,Summerhill West / Rathnelly / South Hill / For...


In [10]:
column = ['Postal_Code','Borough', 'Neighborhood'] 
bn_ung = pd.DataFrame(columns=column_names)

bn_ung = bn1.drop(bn1[bn1['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=False)


bn_ung.index = pd.RangeIndex(len(bn_ung.index))
bn_ung

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M5A,Downtown Toronto,Regent Park / Harbourfront
1,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
2,M5B,Downtown Toronto,"Garden District, Ryerson"
3,M5C,Downtown Toronto,St. James Town
4,M4E,East Toronto,The Beaches
5,M5E,Downtown Toronto,Berczy Park
6,M5G,Downtown Toronto,Central Bay Street
7,M6G,Downtown Toronto,Christie
8,M5H,Downtown Toronto,Richmond / Adelaide / King
9,M6H,West Toronto,Dufferin / Dovercourt Village


In [11]:
#!conda install -c conda-forge geopy --yes 
import time
from geopy.geocoders import Nominatim

In [12]:
from geopy.util import get_version
get_version()

'1.18.1'

In [13]:
geolocator = Nominatim(scheme='http', user_agent="ES1234")

for row_index, item in bn_ung.iterrows():
    
    list1 = bn_ung.loc[[row_index],['Neighborhood']].values.astype('str')
    loc = ' , Toronto, Ontario, Canada'
    list1.astype('str')
    list1 = np.append(list1, loc)
    latitude = None
    longitude = None
    location = None
    
    location = geolocator.geocode(list1 , limit = 15)
    time.sleep(5)
    if(location is not None):
        bn_ung.loc[bn_ung.index[row_index], 'Latitude'] = location.latitude
        bn_ung.loc[bn_ung.index[row_index], 'Longitude'] = location.longitude
          

In [14]:
print(bn_ung)

   Postal_Code            Borough                                       Neighborhood   Latitude  Longitude
0         M5A
  Downtown Toronto
                        Regent Park / Harbourfront
        NaN        NaN
1         M7A
  Downtown Toronto
      Queen's Park / Ontario Provincial Government
        NaN        NaN
2         M5B
  Downtown Toronto
                          Garden District, Ryerson
        NaN        NaN
3         M5C
  Downtown Toronto
                                    St. James Town
  43.669403 -79.372704
4         M4E
      East Toronto
                                       The Beaches
  43.671024 -79.296712
5         M5E
  Downtown Toronto
                                       Berczy Park
  43.647984 -79.375396
6         M5G
  Downtown Toronto
                                Central Bay Street
        NaN        NaN
7         M6G
  Downtown Toronto
                                          Christie
  43.664111 -79.418405
8         M5H
  Downtown Toronto
    

In [15]:
import json 

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Importing to use the Foursquare API lab
!conda install -c conda-forge folium=0.5.0 --yes  #Uncomment if not installed
import folium 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                       

In [16]:
print('We have {} boroughs and {} neighborhoods.'.format(
        len(bn_ung['Borough'].unique()),
        bn_ung.shape[0]
    )
)

bn_ung.dropna(inplace =True)
bn_ung.index = pd.RangeIndex(len(bn_ung.index))

address = 'Toronto, Ontario, Canada'

geolocator = Nominatim(user_agent="ES1234")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

We have 4 boroughs and 39 neighborhoods.
The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [17]:
bn_ung

Unnamed: 0,Postal_Code,Borough,Neighborhood,Latitude,Longitude
0,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704
1,M4E,East Toronto,The Beaches,43.671024,-79.296712
2,M5E,Downtown Toronto,Berczy Park,43.647984,-79.375396
3,M6G,Downtown Toronto,Christie,43.664111,-79.418405
4,M6H,West Toronto,Dufferin / Dovercourt Village,43.660203,-79.435651
5,M4M,East Toronto,Studio District,43.649585,-79.390683
6,M4N,Central Toronto,Lawrence Park,43.729199,-79.403252
7,M5N,Central Toronto,Roselawn,43.710541,-79.401138
8,M4P,Central Toronto,Davisville North,43.704312,-79.388517
9,M5P,Central Toronto,Forest Hill North & West,43.693559,-79.413902


In [18]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(bn_ung['Latitude'], bn_ung['Longitude'], bn_ung['Borough'], bn_ung['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Lets use FOURSQUARE API to explore the neighbourhood

In [19]:
CLIENT_ID = 'KXETUQGZ3IPW3WQGA1GYH0OFCTXANZYLHSPTRTEOJ0GD5G5V' # your Foursquare ID
CLIENT_SECRET = '4PT23SZGLEO2GIEHPENT3DQ3ZNDEN3QF0SBAK1IQI41W1KDP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Successfully Logged-In')

Successfully Logged-In


In [20]:
bn_ung.loc[0]
neighborhood_latitude = np.float(bn_ung.loc[0,['Latitude']].values)
neighborhood_longitude =  np.float(bn_ung.loc[0,['Longitude']].values)

###  Now, let's get the top 100 venues that are in Harbour Square Park within a radius of 500 meters.

###### First, let's create the GET request URL. Name the URL url.

In [21]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)


In [22]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ea21a7b7828ae001b817bea'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'St. James Town',
  'headerFullLocation': 'St. James Town, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 32,
  'suggestedBounds': {'ne': {'lat': 43.6739032045, 'lng': -79.36649453408027},
   'sw': {'lat': 43.664903195499996, 'lng': -79.37891366591974}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b17c9f5f964a5202fc823e3',
       'name': 'Mr. Jerk',
       'location': {'address': '209 Wellesley St. E',
        'crossStreet': 'btwn Bleecker & Ontario',
        'lat': 43.66732847256732,
        'lng': -79.37338943621165,
        'labeledLatL

In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Mr. Jerk,Caribbean Restaurant,43.667328,-79.373389
1,Cranberries,Diner,43.667843,-79.369407
2,Murgatroid,Restaurant,43.667381,-79.369311
3,F'Amelia,Italian Restaurant,43.667536,-79.368613
4,Butter Chicken Factory,Indian Restaurant,43.667072,-79.369184


And how many venues were returned by Foursquare?

In [25]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

32 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Toronto

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [27]:
toronto_venues = getNearbyVenues(names=bn_ung['Neighborhood'],
                                   latitudes=bn_ung['Latitude'],
                                   longitudes=bn_ung['Longitude']
                                  )

St. James Town

The Beaches

Berczy Park

Christie

Dufferin / Dovercourt Village

Studio District

Lawrence Park

Roselawn

Davisville North

Forest Hill North & West

Parkdale / Roncesvalles

Davisville

University of Toronto / Harbord

Runnymede / Swansea

Moore Park / Summerhill East

Rosedale

Church and Wellesley



In [28]:
print(toronto_venues.shape)
toronto_venues.head()

(791, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,St. James Town,43.669403,-79.372704,Mr. Jerk,43.667328,-79.373389,Caribbean Restaurant
1,St. James Town,43.669403,-79.372704,Cranberries,43.667843,-79.369407,Diner
2,St. James Town,43.669403,-79.372704,Murgatroid,43.667381,-79.369311,Restaurant
3,St. James Town,43.669403,-79.372704,F'Amelia,43.667536,-79.368613,Italian Restaurant
4,St. James Town,43.669403,-79.372704,Butter Chicken Factory,43.667072,-79.369184,Indian Restaurant


In [29]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
Christie,59,59,59,59,59,59
Church and Wellesley,71,71,71,71,71,71
Davisville,45,45,45,45,45,45
Davisville North,36,36,36,36,36,36
Dufferin / Dovercourt Village,43,43,43,43,43,43
Forest Hill North & West,4,4,4,4,4,4
Lawrence Park,53,53,53,53,53,53
Moore Park / Summerhill East,5,5,5,5,5,5
Parkdale / Roncesvalles,45,45,45,45,45,45


In [30]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 185 uniques categories.


## 3. Analyze Each Neighborhood

In [31]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Trail,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Health & Beauty Service,Hobby Shop,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Movie Theater,Moving Target,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Soup Place,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Tree,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,St. James Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,St. James Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,St. James Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,St. James Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,St. James Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [32]:
toronto_onehot.shape

(791, 186)

In [33]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Trail,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Health & Beauty Service,Hobby Shop,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Movie Theater,Moving Target,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Soup Place,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Tree,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Berczy Park,0.01,0.01,0.02,0.0,0.0,0.02,0.01,0.03,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.03,0.08,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01
1,Christie,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.033898,0.050847,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.033898,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.016949,0.220339,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.016949,0.016949,0.0,0.016949,0.016949,0.0,0.0,0.016949,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.016949,0.0
2,Church and Wellesley,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.014085,0.0,0.014085,0.042254,0.014085,0.0,0.014085,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.042254,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.014085,0.0,0.0,0.0,0.028169,0.0,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.014085,0.028169,0.028169,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.014085,0.014085,0.028169,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.070423,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.028169,0.028169,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.014085,0.0,0.0,0.0,0.014085,0.014085,0.0,0.042254,0.0,0.0,0.014085,0.0,0.014085,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.014085,0.014085,0.0,0.084507,0.0,0.0,0.0,0.0,0.014085,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169
3,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.022222,0.022222,0.022222,0.022222,0.0,0.0,0.022222,0.088889,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.044444,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.083333,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Dufferin / Dovercourt Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093023,0.0,0.116279,0.0,0.023256,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093023,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0
6,Forest Hill North & West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Lawrence Park,0.0,0.0,0.0,0.0,0.037736,0.018868,0.0,0.056604,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.056604,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.09434,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.075472,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Moore Park / Summerhill East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Parkdale / Roncesvalles,0.022222,0.022222,0.0,0.0,0.022222,0.0,0.0,0.044444,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.022222,0.0,0.022222,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.022222,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.022222,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
toronto_grouped.shape

(17, 186)

Lets Check top Venues

In [35]:
Top_venues = 10

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(Top_venues))
    print('\n')

----Berczy Park
----
                 venue  freq
0          Coffee Shop  0.08
1                 Café  0.06
2   Italian Restaurant  0.06
3           Restaurant  0.05
4   Seafood Restaurant  0.04
5                Hotel  0.04
6  Japanese Restaurant  0.04
7               Bakery  0.03
8             Beer Bar  0.03
9            Gastropub  0.03


----Christie
----
               venue  freq
0  Korean Restaurant  0.22
1        Coffee Shop  0.05
2          Gift Shop  0.03
3      Grocery Store  0.03
4     Sandwich Place  0.03
5                Pub  0.03
6               Café  0.03
7       Dessert Shop  0.03
8     Ice Cream Shop  0.03
9  Indian Restaurant  0.03


----Church and Wellesley
----
                      venue  freq
0          Sushi Restaurant  0.08
1       Japanese Restaurant  0.07
2              Burger Joint  0.04
3                Restaurant  0.04
4               Coffee Shop  0.04
5               Yoga Studio  0.03
6  Mediterranean Restaurant  0.03
7                     Hotel  0.03
8    

In [36]:
def return_most_common_venues(row, Top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:Top_venues]

In [37]:
Top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(Top_venues):
    try:
        columns.append('{}{} Popular Venues'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Popular Venues'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], Top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
0,Berczy Park,Coffee Shop,Café,Italian Restaurant,Restaurant,Hotel,Seafood Restaurant,Japanese Restaurant,Cocktail Bar,Gym,Gastropub
1,Christie,Korean Restaurant,Coffee Shop,Gift Shop,Dessert Shop,Cocktail Bar,Ice Cream Shop,Indian Restaurant,Sandwich Place,Grocery Store,Mexican Restaurant
2,Church and Wellesley,Sushi Restaurant,Japanese Restaurant,Burger Joint,Restaurant,Coffee Shop,Men's Store,Mediterranean Restaurant,Gastropub,Yoga Studio,Gay Bar
3,Davisville,Sushi Restaurant,Italian Restaurant,Coffee Shop,Gym,Trail,Pub,Convenience Store,Pharmacy,Ice Cream Shop,Indian Restaurant
4,Davisville North,Sandwich Place,Dessert Shop,Café,Pizza Place,Sushi Restaurant,Italian Restaurant,Gym,Coffee Shop,Thai Restaurant,Seafood Restaurant
5,Dufferin / Dovercourt Village,Bar,Café,Bakery,Sandwich Place,Mexican Restaurant,Cocktail Bar,Grocery Store,Beer Store,Restaurant,Coffee Shop
6,Forest Hill North & West,Playground,Mediterranean Restaurant,Bank,Park,Yoga Studio,Farmers Market,Food Truck,Food & Drink Shop,Flower Shop,Flea Market
7,Lawrence Park,Italian Restaurant,Sushi Restaurant,Bakery,Coffee Shop,Pub,Asian Restaurant,Bank,Fast Food Restaurant,Hobby Shop,Bus Line
8,Moore Park / Summerhill East,Park,Grocery Store,Candy Store,Playground,Yoga Studio,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Filipino Restaurant
9,Parkdale / Roncesvalles,Café,Pizza Place,Tibetan Restaurant,Diner,Pharmacy,Italian Restaurant,Bakery,Indian Restaurant,Restaurant,Coffee Shop


## 4. Cluster Neighborhoodsusing K-Mean

In [38]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood',1)
#print(toronto_grouped_clustering)
#print(toronto_grouped)
# run k-means clustering
kmeans = KMeans(init = "k-means++", n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
labels = kmeans.labels_[0:63] 
print(labels)

[0 3 0 0 0 0 2 0 1 0 4 0 0 0 0 0 0]


In [39]:
toronto_merged = bn_ung
print(toronto_merged.shape)
labels = np.append(labels,labels[0])
print(labels.shape)
# add clustering labels
toronto_merged['Cluster Labels'] = labels.tolist()

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

(17, 5)
(18,)


ValueError: Length of values does not match length of index

In [70]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

#### Cluster 1

In [71]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
0,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Pizza Place,Italian Restaurant,Steakhouse,History Museum,Sporting Goods Shop,Park
1,Downtown Toronto,0,Coffee Shop,Thai Restaurant,Pharmacy,Grocery Store,Food Truck,Moving Target,Beer Store,Electronics Store,Sushi Restaurant,Restaurant
2,Downtown Toronto,0,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Middle Eastern Restaurant,Diner,Tea Room,Bookstore,Plaza
4,Downtown Toronto,0,Coffee Shop,Pizza Place,Grocery Store,Café,Beer Store,Bakery,Bank,Restaurant,Library,Breakfast Spot
8,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Italian Restaurant,Grocery Store,Pizza Place,Cocktail Bar,Diner,Cosmetics Shop,Bakery
9,Downtown Toronto,0,Coffee Shop,Restaurant,Hotel,Café,Italian Restaurant,Gastropub,American Restaurant,Gym,Japanese Restaurant,Deli / Bodega
13,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Pizza Place,Italian Restaurant,Steakhouse,History Museum,Sporting Goods Shop,Park
14,Downtown Toronto,0,Playground,Farm,Scenic Lookout,American Restaurant,Beer Garden,Beach,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
16,West Toronto,0,Bar,Coffee Shop,Café,Restaurant,Cocktail Bar,Bakery,Thai Restaurant,Portuguese Restaurant,French Restaurant,Korean Restaurant
17,West Toronto,0,Clothing Store,Coffee Shop,Cosmetics Shop,Restaurant,Plaza,Tea Room,Fast Food Restaurant,Lingerie Store,Pizza Place,Theater


#### Cluster 2

In [72]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
3,Downtown Toronto,1,Clothing Store,Coffee Shop,Cosmetics Shop,Fast Food Restaurant,Restaurant,Hotel,Lingerie Store,Middle Eastern Restaurant,Tea Room,Theater
6,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Hotel,Bakery,Beer Bar,Japanese Restaurant,Gym,Gastropub,Seafood Restaurant
7,Downtown Toronto,1,Korean Restaurant,Coffee Shop,Indian Restaurant,Cocktail Bar,Café,Japanese Restaurant,Karaoke Bar,Dessert Shop,Sandwich Place,Grocery Store
15,Downtown Toronto,1,Coffee Shop,Café,Deli / Bodega,Restaurant,Hotel,Bakery,Italian Restaurant,Breakfast Spot,Steakhouse,Gym
25,East Toronto,1,Beach,Bar,Breakfast Spot,Japanese Restaurant,Tea Room,Coffee Shop,Pub,Thai Restaurant,Park,Salon / Barbershop
26,East Toronto,1,Indian Restaurant,Café,Halal Restaurant,Grocery Store,Snack Place,Pakistani Restaurant,Sandwich Place,Restaurant,Egyptian Restaurant,Discount Store
27,Downtown Toronto,1,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Steakhouse,Japanese Restaurant,Italian Restaurant,Bakery,Gastropub
31,Central Toronto,1,Playground,Bank,Park,Mediterranean Restaurant,Yoga Studio,Farm,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant
33,Central Toronto,1,Pizza Place,Coffee Shop,Park,Thai Restaurant,Bistro,Grocery Store,Ice Cream Shop,Indian Restaurant,Falafel Restaurant,Fried Chicken Joint
34,Central Toronto,1,Café,Boutique,Coffee Shop,Restaurant,Italian Restaurant,French Restaurant,Sushi Restaurant,Spa,Hotel,Women's Store


#### Cluster 3

In [73]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
49,Central Toronto,2,Coffee Shop,Sandwich Place,History Museum,Pizza Place,Café,Jewish Restaurant,Cosmetics Shop,Park,Pub,Cheese Shop


#### Cluster 4

In [74]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
5,East Toronto,3,Beach,Bar,Breakfast Spot,Japanese Restaurant,Tea Room,Coffee Shop,Pub,Thai Restaurant,Park,Salon / Barbershop
10,Downtown Toronto,3,Coffee Shop,Hotel,Restaurant,Diner,Gastropub,Cosmetics Shop,Cocktail Bar,Clothing Store,Italian Restaurant,Pizza Place
11,West Toronto,3,Café,Restaurant,Pizza Place,Bar,Park,Bus Line,Coffee Shop,Falafel Restaurant,Electronics Store,Ethiopian Restaurant
12,West Toronto,3,Bar,Bakery,Coffee Shop,Café,Cocktail Bar,Sandwich Place,Clothing Store,Beer Bar,Beer Store,Mexican Restaurant
30,Central Toronto,3,Sandwich Place,Dessert Shop,Coffee Shop,Sushi Restaurant,Pizza Place,Thai Restaurant,Café,Italian Restaurant,Fried Chicken Joint,Seafood Restaurant
35,West Toronto,3,Tibetan Restaurant,Bar,Café,Pharmacy,Diner,Restaurant,Indian Restaurant,Liquor Store,North Indian Restaurant,Donut Shop
36,West Toronto,3,Bookstore,Gourmet Shop,Breakfast Spot,Restaurant,Coffee Shop,Gift Shop,Café,Gastropub,Bank,Movie Theater
39,Downtown Toronto,3,Café,Bookstore,Park,Japanese Restaurant,Yoga Studio,College Gym,Museum,French Restaurant,Noodle House,Chinese Restaurant
42,Central Toronto,3,Playground,Gym,Tennis Court,Park,Yoga Studio,Event Space,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant
44,Downtown Toronto,3,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Bar,Vietnamese Restaurant,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Bakery,Dessert Shop


#### Cluster 5

In [75]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Popular Venues,2nd Popular Venues,3rd Popular Venues,4th Popular Venues,5th Popular Venues,6th Popular Venues,7th Popular Venues,8th Popular Venues,9th Popular Venues,10th Popular Venues
18,East Toronto,4,Grocery Store,Pharmacy,Coffee Shop,Bus Line,Doctor's Office,Optical Shop,Sandwich Place,Caribbean Restaurant,Café,Skating Rink
32,West Toronto,4,Baseball Field,Dog Run,Zoo,Garden,Other Great Outdoors,Park,Scenic Lookout,Skating Rink,Café,Pool
40,West Toronto,4,Pizza Place,Rental Car Location,Coffee Shop,Pharmacy,Grocery Store,Café,Fried Chicken Joint,Sandwich Place,Bar,Thai Restaurant
