<h1 style="color:MediumSeaGreen;"align=center><font size = 8>Segmenting and Clustering Neighborhoods in Toronto City</font></h1>

<p style="font-size:20px;">In this lab, we will be segmenting and clustering neighborhoods of Toronto city.Here you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in Toronto City. You will use the <b>explore</b> function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the _k_-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in New York City and their emerging clusters.</p>

<h2 style="color:MediumSeaGreen;">Table of Contents</h2>

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  [Extract Toronto Dataset from Wikipedia](#item1)</a>

2.  [Removing Duplicate and null values](#item2)

3.  [Joining extracted Toronto neighbourhoods dataset and neighborhood spatial dataset](#item3)

4.  [Explore Neighborhoods in Toronto City](#item4)

5.  [Analyze Each Neighborhood](#item5)

6.  [Cluster Neighborhoods](#item6) 

7.  [Examine Clusters](#item7)
    </font>
    </div>

<p style="font-size:20px;">Libraries such as pandas, numpy, folium which will be used throughout the proccess are imported below.
<ul style="font-size:20px;">1. <b>Pandas</b> and <b>numpy</b> are used for proccessing the dataframes.</ul>
<ul style="font-size:20px;">2. <b>requests</b> and <b>BeautifulSoup</b> are used for Extracting the data from the given wikipedia site.</ul>
<ul style="font-size:20px;">3. <b>json</b> is imported to proccess the json file into dataframe.</ul>
<ul style="font-size:20px;">4. <b>geopy</b> is imported to get the latitude and longitude of given addresses.</ul>
<ul style="font-size:20px;">5.  Finally, <b>folium</b> is imported to visualize maps and locations marked in the maps.</ul></p>

In [52]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

In [53]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes       # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>
<h1 style="color:MediumSeaGreen;">Extract and Explore Dataset from Wikipedia</h1>

<p style="font-size:20px;">Here, the given url for the toronto neighborhood dataset is passed to the get() funtion of requests lirary. Then the result of the get() function is passed through html parser in BeautifulSoup() and results with json file.</p>

In [54]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
req = requests.get(url)
soup = BeautifulSoup(req.text,'html.parser')

In [55]:
#soup.findAll('table',{'class':'wikitable sortable'})

<p style="font-size:20px;">Below the required details from the json file is stored in dictionary format which will then be easier to convert to as a dataframe.</p>

In [56]:
k=0
x=0
y=0
z=0
PostalCode=[]
Borough=[]
Neighbourhood=[]
for row in soup.findAll('table',{'class':'wikitable sortable'}):
    l = row.text.split('\n')
l = [i for i in l if len(i)>0]
for i in range(3,len(l),3):
    PostalCode.append(l[i])
    #print(l[i])
    x+=1
for i in range(4,len(l),3):
    Borough.append(l[i])    
    #print(l[i])  
    y+=1
for i in range(5,len(l),3):
    Neighbourhood.append(l[i])
    #print(l[i])       
    z+=1
    
dictt={}
dictt['PostalCode']=PostalCode
dictt['Borough']=Borough
dictt['Neighbourhood']=Neighbourhood


<p style="font-size:20px;">As we can see our extracted dataframe has three columns Borough, Neighborhood and its Postalcode.</p> 

In [57]:
df = pd.DataFrame(dictt)
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


<a id='item2'></a>
<h1 style="color:MediumSeaGreen;">Removing Duplicate and Null values</h1>

<p style="font-size:20px;">Here, we check for duplicate and null values in the created dataset and remove them.</p>

In [58]:
len(df['PostalCode'])

180

<p style="font-size:20px;">Checking whether there is repeated PostalCode value</p>

In [59]:
(df['PostalCode'].value_counts()>1).value_counts()

False    180
Name: PostalCode, dtype: int64

<p style="font-size:20px;">Checking whether there is a row where Borough value is present and Neighborhood value is not present</p>

In [60]:
for index,row in df.iterrows():
    if row['Borough']!='Not assigned' and row['Neighbourhood']=='Not assigned':
        print("Hey Borough {} is not having specific Neighbourhoods!".format(row['Borough'])) 
        row['Neighbourhood']=row['Borough']
    else:
        print('Every Boroughs having their own neighbourhoods!')
        break   
       

Every Boroughs having their own neighbourhoods!


<p style="font-size:20px;">removing all not assigned or nan values in the datset</p>

In [61]:
df = df.replace('Not assigned',np.nan)
df.dropna(inplace=True)

In [62]:
df.index=np.arange(len(df))
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [63]:
df.shape

(103, 3)

<p style="font-size:20px;">Here, instead of using geopy to get the latitude and longitude of the address, we used the readily available .csv file whose link is given in the assignment page.</p>

In [64]:
Spatial_info_df = pd.read_csv("C://Users//pen2j//Geospatial_Coordinates.csv")
Spatial_info_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<a id='item3'></a>
<h1 style="color:MediumSeaGreen;">Joining extracted Toronto neighbourhoods dataset and neighborhood spatial dataset</h1>
<p style="font-size:20px;">In below few cells we combine our own extracted dataframe and given dataframe which will give final columns like Borough, Neighborhood, Postalcode, Latitude, Longitude.</p> 

In [65]:
c=1
#[print('{} the postalcode {} of temp_df is in postalcode of df'.format(c,i)) for c,i in zip(np.array(range(len(temp_df))),temp_df['Postal Code'].values) if i in df['PostalCode'].values]
[print('{} the postalcode {} of Spatial_info_df is not in postalcode of df'.format(c,i)) for c,i in zip(np.array(range(len(Spatial_info_df))),Spatial_info_df['Postal Code'].values) if i not in df['PostalCode'].values]
extra=len([print('{} the postalcode {} of Spatial_info_df is not in postalcode of df'.format(c,i)) for c,i in zip(np.array(range(len(Spatial_info_df))),Spatial_info_df['Postal Code'].values) if i not in df['PostalCode'].values])
if extra==0:
    print('no.of rows or postalcodes in Neighbourhood dataset is {}\nno.of rows or postalcodes in Spatial dataset is {}'.format(len(df),len(Spatial_info_df)))
    print('\n\nPostalCode column in both the datsets have the same values too.')

no.of rows or postalcodes in Neighbourhood dataset is 103
no.of rows or postalcodes in Spatial dataset is 103


PostalCode column in both the datsets have the same values too.


In [66]:
#t=df.copy()
df['Latitude']=[str(i) for i in range(len(df))]
df['Longitude']=[str(i) for i in range(len(df))]
for index,row in df.iterrows():
    row['Latitude']=Spatial_info_df[Spatial_info_df['Postal Code']==row['PostalCode']]['Latitude'].values[0]
    row['Longitude']=Spatial_info_df[Spatial_info_df['Postal Code']==row['PostalCode']]['Longitude'].values[0]
    #row['PostalCode']='test'
    #print(row['PostalCode'],row['Latitude'])
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7533,-79.3297
1,M4A,North York,Victoria Village,43.7259,-79.3156
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7185,-79.4648
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895


<p style="font-size:20px;">Since the dataset's column Borough has borough values which are both inside and outside of city Toronto. So we consider only boroughs which are inside Toronto or we can say only the boroughs who has toronto in their name</p>  

In [67]:
Toronto_Neighbourhoods = df[df['Borough'].str.contains('Toronto')] 
Toronto_Neighbourhoods.index=np.arange(len(Toronto_Neighbourhoods))

<p style="font-size:20px;">Below is the code for getting Latitude and Longitude of exact Toronto and for mapping Toronto Neighborhoods in Toronto Map</p>

In [68]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="my_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


In [69]:
# create map of Toronto using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, pc, neighborhood in zip(Toronto_Neighbourhoods['Latitude'], Toronto_Neighbourhoods['Longitude'], Toronto_Neighbourhoods['Borough'], Toronto_Neighbourhoods['PostalCode'],Toronto_Neighbourhoods['Neighbourhood']):
    label = '{}, {}, {}'.format(pc, neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [70]:
Toronto_Neighbourhoods.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3789
3,M5C,Downtown Toronto,St. James Town,43.6515,-79.3754
4,M4E,East Toronto,The Beaches,43.6764,-79.293


<p style="font-size:20px;">Now we are going to take only one Borough and analyze the similarity between its neighborhoods. Here we are taking Downtown Toronto. In the below code Exact latitude and longitude of Downtown Toronto is got using geocoder() function. And the Dataframe is filterd to be having values only related to Downtown Toronto.</p>

In [71]:
address = 'Downtown Toronto, Ontario'
geolocator = Nominatim(user_agent='my_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("the geographical coordinates of {} is {},{}".format(address,latitude,longitude))

the geographical coordinates of Downtown Toronto, Ontario is 43.6563221,-79.3809161


In [72]:
Downtown_df = Toronto_Neighbourhoods[Toronto_Neighbourhoods['Borough']=='Downtown Toronto']
Downtown_df.index=np.arange(len(Downtown_df))
Downtown_df.head(7)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3789
3,M5C,Downtown Toronto,St. James Town,43.6515,-79.3754
4,M5E,Downtown Toronto,Berczy Park,43.6448,-79.3733
5,M5G,Downtown Toronto,Central Bay Street,43.658,-79.3874
6,M6G,Downtown Toronto,Christie,43.6695,-79.4226


<p style="font-size:20px;">below, Downtown Toronto and its neighborhoods are mapped</p>

In [73]:
Downtown_toronto_map = folium.Map([latitude,longitude],zoom_start=11)
for lat,lan,code,borough,neigh in zip(Downtown_df['Latitude'],Downtown_df['Longitude'],Downtown_df['PostalCode'],Downtown_df['Borough'],Downtown_df['Neighbourhood']):
    label = '{},{},{}'.format(code,neigh,borough)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat,lan],
        radius=5,
        color='blue',
        popup=label,
        fill_color='cadetblue',
        fill=True,
        fill_opacity=0.5,
        parse_html=False).add_to(Downtown_toronto_map)
Downtown_toronto_map    

<a id='item4'></a>
<h1 style="color:MediumSeaGreen;">Explore Neighborhoods in Toronto City</h1>

## Exploring Venues in any one of the selected Neighborhood of Downtown toronto

<p style="font-size:20px;">Inorder to find similarity between neighborhoods, we should have to explore the types of venues in those neighborhoods. thus we are using FourSquare API to explore venues in each neighborhood of Downtown Toronto Borough</p> 

<p style="font-size:20px;">At first, we are writing code only to explore venues in M5A neighborhood of Downtown toronto borough</p>

In [74]:
# @hidden_cell

CLIENT_ID = 'ZNK2TXQZYPHR5REFDIHW1VRBWGXQWE2KCQ2QDEOMODVPOXZT' # your Foursquare ID
CLIENT_SECRET = 'UMORNLFTCVNYRVAKSRYOEKCESGVRZSDXJMG2HRRMPRMK5RGR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
RADIUS = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZNK2TXQZYPHR5REFDIHW1VRBWGXQWE2KCQ2QDEOMODVPOXZT
CLIENT_SECRET:UMORNLFTCVNYRVAKSRYOEKCESGVRZSDXJMG2HRRMPRMK5RGR


In [75]:
#using first neighbourhood-M5A of borough-Downtown Toronto
Neighbourhood = Downtown_df.loc[0,'PostalCode']
latitude = Downtown_df.loc[0,'Latitude']
longitude = Downtown_df.loc[0,'Longitude']
print(Neighbourhood,latitude,longitude)

M5A 43.6542599 -79.3606359


<p style="font-size:20px;">below is the url for exploring venues of neighborhood M5A of borough Downtown toronto.</p>

In [76]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,latitude,longitude,VERSION,RADIUS,LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=ZNK2TXQZYPHR5REFDIHW1VRBWGXQWE2KCQ2QDEOMODVPOXZT&client_secret=UMORNLFTCVNYRVAKSRYOEKCESGVRZSDXJMG2HRRMPRMK5RGR&ll=43.6542599,-79.3606359&v=20180605&radius=500&limit=100


<p style="font-size:20px;">getting the venues details from the url in json format.</p>

In [77]:
result = requests.get(url).json()

<p style="font-size:20px;">Here, the resultant json file is converted to dataframe using <b>json_normalize()</b> funtion and unwanted columns are removed from the dataframe.</p>

In [78]:
M5A_venues_df = json_normalize(result['response']['groups'][0]['items'])[['venue.name','venue.location.lat','venue.location.lng','venue.categories']]
M5A_venues_df['Category']=[str(i) for i in range(len(M5A_venues_df))]
for index,row in M5A_venues_df.iterrows():
    M5A_venues_df['Category'][index] =row['venue.categories'][0]['name']   
M5A_venues_df.drop(['venue.categories'],inplace=True,axis=1)
M5A_venues_df.columns=['Venue','Latitude','Longitude','Category']
M5A_venues_df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Venue,Latitude,Longitude,Category
0,Roselle Desserts,43.653447,-79.362017,Bakery
1,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Impact Kitchen,43.656369,-79.35698,Restaurant
4,Body Blitz Spa East,43.654735,-79.359874,Spa


## Exploring venues in all the neighborhoods of Downtown toronto

<p style="font-size:20px;">below funtion, <b>find_venue_details()</b> is used to explore venues in all the neighborhoods of Downtown toronto borough</p>

In [79]:
def find_venue_details(postalcodes,latitudes,longitudes,neighbourhoods):
    venue_details = pd.DataFrame(columns=['Venue','Latitude','Longitude','Category','Neighbourhood_PostalCode','Neighbourhood_latitude','Neighbourhood_longitude'])
    for lat,lng,pc,neigh in zip(latitudes,longitudes,postalcodes,neighbourhoods):
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,lat,lng,VERSION,RADIUS,LIMIT)
        result = requests.get(url).json()
        
        temp_venues_df = json_normalize(result['response']['groups'][0]['items'])[['venue.name','venue.location.lat','venue.location.lng','venue.categories']]
        temp_venues_df['Category']=[str(i) for i in range(len(temp_venues_df))]
        for index,row in temp_venues_df.iterrows():
            temp_venues_df['Category'][index] =row['venue.categories'][0]['name']   
        temp_venues_df.drop(['venue.categories'],inplace=True,axis=1)
        temp_venues_df.columns=['Venue','Latitude','Longitude','Category']
        temp_venues_df['Neighbourhood_PostalCode']=[pc for i in range(len(temp_venues_df))]
        temp_venues_df['Neighbourhood_latitude']= [lat for i in range(len(temp_venues_df))]
        temp_venues_df['Neighbourhood_longitude']= [lng for i in range(len(temp_venues_df))]
        temp_venues_df['Neighbourhood']= [neigh for i in range(len(temp_venues_df))]
        #print(temp_venues_df)
        #print(type(venue_details))
        venue_details = pd.concat([venue_details,temp_venues_df],sort=True)  
    return venue_details   
        


<p style="font-size:20px;">making call to <b>find_venue_details()</b> funtion</p>

In [80]:
final_df = find_venue_details(Downtown_df['PostalCode'],Downtown_df['Latitude'],Downtown_df['Longitude'],Downtown_df['Neighbourhood'])


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.


<p style="font-size:20px;">changing the column order to make it understand easier</p>

In [81]:
final_df = final_df.loc[:,['Neighbourhood_PostalCode','Neighbourhood','Neighbourhood_latitude','Neighbourhood_longitude','Venue','Category','Latitude','Longitude']]

<p style="font-size:20px;">checking no.of venues returned for the whole Downtown toronto borough</p>

In [82]:
print(final_df.shape)

(1248, 8)


<p style="font-size:20px;">checking no.of venues returned for each neighborhood in Downtown toronto borough</p>

In [83]:
final_df.groupby('Neighbourhood_PostalCode').count()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood_latitude,Neighbourhood_longitude,Venue,Category,Latitude,Longitude
Neighbourhood_PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
M4W,4,4,4,4,4,4,4
M4X,48,48,48,48,48,48,48
M4Y,75,75,75,75,75,75,75
M5A,44,44,44,44,44,44,44
M5B,100,100,100,100,100,100,100
M5C,85,85,85,85,85,85,85
M5E,55,55,55,55,55,55,55
M5G,68,68,68,68,68,68,68
M5H,100,100,100,100,100,100,100
M5J,100,100,100,100,100,100,100


<p style="font-size:20px;">checking total no.of venue categories</p>

In [84]:
print('There are {} uniques categories.'.format(len(final_df['Category'].unique())))

There are 212 uniques categories.


<a id='item5'></a>
<h1 style="color:MediumSeaGreen;">Analyzing the Neighborhoods</h1>

In [85]:
Downtown_onehot = pd.get_dummies(final_df[['Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Downtown_onehot['Neighbourhood_PostalCode'] = final_df['Neighbourhood_PostalCode'] 
Downtown_onehot['Neighbourhood'] = final_df['Neighbourhood']
# move neighborhood column to the first column
fixed_columns = [Downtown_onehot.columns[-2]]+[Downtown_onehot.columns[-1]] + list(Downtown_onehot.columns[:-2])
Downtown_onehot = Downtown_onehot[fixed_columns]

Downtown_onehot.head()

Unnamed: 0,Neighbourhood_PostalCode,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [86]:
Downtown_onehot.shape

(1248, 214)

<p style="font-size:20px;">Here, onehot encoded dataframe is grouped by its postalcode and mean is taken to find the frequency of each venues in all the neighborhoods</p> 

In [87]:
Downtown_grouped = Downtown_onehot.groupby(['Neighbourhood_PostalCode','Neighbourhood']).mean().reset_index()
Downtown_grouped.head()

Unnamed: 0,Neighbourhood_PostalCode,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M4W,Rosedale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4X,"St. James Town, Cabbagetown",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0625,0.0,0.0,0.020833,0.0,0.041667,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.041667,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.020833,0.020833,0.0625,0.0,0.020833,0.020833,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M4Y,Church and Wellesley,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.026667,0.0,0.013333,0.0,0.0,0.026667,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.093333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.053333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.026667,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.053333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.026667,0.026667,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.013333,0.0,0.0,0.04,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.053333,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.026667
3,M5A,"Regent Park, Harbourfront",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068182,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068182,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068182,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727
4,M5B,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.09,0.0,0.09,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0


In [88]:
Downtown_grouped.shape

(19, 214)

<p style="font-size:20px;">printing top 5 higher frequecy venues according to each neighborhood</p>

In [89]:
num_top_venues = 5

for hood in Downtown_grouped['Neighbourhood_PostalCode']:
    print("----"+hood+"----")
    temp = Downtown_grouped[Downtown_grouped['Neighbourhood_PostalCode'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[2:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4W----
            venue  freq
0            Park  0.50
1      Playground  0.25
2           Trail  0.25
3   Movie Theater  0.00
4  Massage Studio  0.00


----M4X----
         venue  freq
0  Coffee Shop  0.08
1   Restaurant  0.06
2  Pizza Place  0.06
3         Café  0.06
4         Park  0.04


----M4Y----
                 venue  freq
0          Coffee Shop  0.09
1  Japanese Restaurant  0.05
2     Sushi Restaurant  0.05
3              Gay Bar  0.05
4           Restaurant  0.04


----M5A----
         venue  freq
0  Coffee Shop  0.18
1       Bakery  0.07
2          Pub  0.07
3         Park  0.07
4      Theater  0.05


----M5B----
                 venue  freq
0          Coffee Shop  0.09
1       Clothing Store  0.09
2                 Café  0.04
3  Japanese Restaurant  0.03
4       Cosmetics Shop  0.03


----M5C----
          venue  freq
0   Coffee Shop  0.07
1          Café  0.06
2    Restaurant  0.05
3  Cocktail Bar  0.05
4      Beer Bar  0.04


----M5E----
            venue  freq
0   

<p style="font-size:20px;">sorting the venues in decending order</p>

In [90]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[2:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<p style="font-size:20px;">creating new dataframe and display the top 10 venues for each neighborhood.</p>

In [91]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode','Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = Downtown_grouped['Neighbourhood_PostalCode']
neighborhoods_venues_sorted['Neighbourhood'] = Downtown_grouped['Neighbourhood']

for ind in np.arange(Downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(Downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,PostalCode,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Rosedale,Park,Trail,Playground,Cupcake Shop,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner
1,M4X,"St. James Town, Cabbagetown",Coffee Shop,Restaurant,Café,Pizza Place,Chinese Restaurant,Park,Pub,Italian Restaurant,Bakery,Market
2,M4Y,Church and Wellesley,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Yoga Studio,Men's Store,Mediterranean Restaurant,Hotel,Pub
3,M5A,"Regent Park, Harbourfront",Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Café,Theater,Yoga Studio,Event Space,Performing Arts Venue
4,M5B,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Fast Food Restaurant,Hotel,Pizza Place,Bookstore


<a id='item6'></a>
<h1 style="color:MediumSeaGreen;">Clustering Neighborhoods</h1>

<p style="font-size:20px;">Run _k_-means to cluster the neighborhood into 5 clusters.</p>

In [92]:
# set number of clusters
kclusters = 5

Downtown_grouped_clustering = Downtown_grouped.drop(['Neighbourhood_PostalCode','Neighbourhood'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 1, 1, 4, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 3, 1, 1, 0, 4])

In [93]:
print(len(Downtown_df),len(neighborhoods_venues_sorted))

19 19


<p style="font-size:20px;">create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.</p>

In [94]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Downtown_merged = Downtown_df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
Downtown_merged = Downtown_merged.join(neighborhoods_venues_sorted.drop(['Neighbourhood'],1).set_index('PostalCode'), on='PostalCode')

Downtown_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606,4,Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Café,Theater,Yoga Studio,Event Space,Performing Arts Venue
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895,4,Coffee Shop,Yoga Studio,Creperie,Diner,Sandwich Place,Music Venue,Portuguese Restaurant,Beer Bar,Italian Restaurant,Smoothie Shop
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3789,1,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Fast Food Restaurant,Hotel,Pizza Place,Bookstore
3,M5C,Downtown Toronto,St. James Town,43.6515,-79.3754,1,Coffee Shop,Café,Cocktail Bar,Restaurant,Gastropub,American Restaurant,Beer Bar,Seafood Restaurant,Gym,Farmers Market
4,M5E,Downtown Toronto,Berczy Park,43.6448,-79.3733,1,Coffee Shop,Cheese Shop,Farmers Market,Cocktail Bar,Restaurant,Seafood Restaurant,Bakery,Beer Bar,Gourmet Shop,Vegetarian / Vegan Restaurant


In [95]:
Downtown_merged.groupby('Cluster Labels').count()

Unnamed: 0_level_0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
1,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13
2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
4,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3


<p style="font-size:20px;">visualizing the resulting clusters</p>

In [96]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, neigh in zip(Downtown_merged['Latitude'], Downtown_merged['Longitude'], Downtown_merged['PostalCode'], Downtown_merged['Cluster Labels'],Downtown_merged['Neighbourhood']):
    label = folium.Popup(str(poi)+ ' ' + str(neigh) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item7'></a>
<h1 style="color:MediumSeaGreen;">Examine Clusters</h1>

<p style="font-size:20px;">Here, each cluster is printed seperately to analyze what are the common venues among them.</p>

In [97]:
Downtown_merged.loc[Downtown_merged['Cluster Labels'] == 0, Downtown_merged.columns[[1] + list(range(5, Downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Downtown Toronto,0,Grocery Store,Café,Park,Candy Store,Restaurant,Italian Restaurant,Baby Store,Athletics & Sports,Nightclub,Coffee Shop


In [98]:
Downtown_merged.loc[Downtown_merged['Cluster Labels'] == 1, Downtown_merged.columns[[1] + list(range(5, Downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,1,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Fast Food Restaurant,Hotel,Pizza Place,Bookstore
3,Downtown Toronto,1,Coffee Shop,Café,Cocktail Bar,Restaurant,Gastropub,American Restaurant,Beer Bar,Seafood Restaurant,Gym,Farmers Market
4,Downtown Toronto,1,Coffee Shop,Cheese Shop,Farmers Market,Cocktail Bar,Restaurant,Seafood Restaurant,Bakery,Beer Bar,Gourmet Shop,Vegetarian / Vegan Restaurant
7,Downtown Toronto,1,Coffee Shop,Café,Gym,Hotel,Restaurant,Clothing Store,Bar,Thai Restaurant,Breakfast Spot,Concert Hall
8,Downtown Toronto,1,Coffee Shop,Aquarium,Hotel,Café,Fried Chicken Joint,Scenic Lookout,Brewery,Restaurant,Bar,Park
9,Downtown Toronto,1,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Salad Place,Seafood Restaurant,American Restaurant,Gastropub,Asian Restaurant
10,Downtown Toronto,1,Coffee Shop,Restaurant,Hotel,Café,American Restaurant,Gym,Seafood Restaurant,Japanese Restaurant,Deli / Bodega,Thai Restaurant
11,Downtown Toronto,1,Café,Bookstore,Bar,Japanese Restaurant,Sandwich Place,Bakery,Yoga Studio,Italian Restaurant,Beer Bar,Beer Store
12,Downtown Toronto,1,Vegetarian / Vegan Restaurant,Café,Bar,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Gaming Cafe,Dumpling Restaurant,Burger Joint,Pizza Place
15,Downtown Toronto,1,Coffee Shop,Italian Restaurant,Pub,Japanese Restaurant,Restaurant,Café,Beer Bar,Seafood Restaurant,Hotel,Park


In [99]:
Downtown_merged.loc[Downtown_merged['Cluster Labels'] == 2, Downtown_merged.columns[[1] + list(range(5, Downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,2,Park,Trail,Playground,Cupcake Shop,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner


In [100]:
Downtown_merged.loc[Downtown_merged['Cluster Labels'] == 3, Downtown_merged.columns[[1] + list(range(5, Downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,3,Airport Lounge,Airport Service,Coffee Shop,Harbor / Marina,Plane,Rental Car Location,Boutique,Sculpture Garden,Bar,Boat or Ferry


In [101]:
Downtown_merged.loc[Downtown_merged['Cluster Labels'] == 4, Downtown_merged.columns[[1] + list(range(5, Downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,4,Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Café,Theater,Yoga Studio,Event Space,Performing Arts Venue
1,Downtown Toronto,4,Coffee Shop,Yoga Studio,Creperie,Diner,Sandwich Place,Music Venue,Portuguese Restaurant,Beer Bar,Italian Restaurant,Smoothie Shop
5,Downtown Toronto,4,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Salad Place,Thai Restaurant,Department Store,Japanese Restaurant,Burger Joint,Bubble Tea Shop
