# Capstone Project - The Battle of the Neighborhoods

### Introduction: Business Problem

This project we will find an appropriate food truck area for those who interested in opening business in Toronto but has a limit of cost. As there are many different venues but we have a limit of cost, so that we have to detect suitable locations to open a food truck in that area, because we don't want to open a large business which competes with other business. There are many venues like many tourists.

We will use machine learning to detect more neighborhoods in the city. After we obtained data from machine learning, we analyze the cluster of each area that has a lot of density of venues.


### Data description
Neighborhood and Borough Information of Toronto from Wikipedia website and latitude-longitude information from http://cocl.us/Geospatial_data
as the centroid of each area in the city of Toronto to explore the neighborhoods.

Utilizing the Foursquare API to explore the neighborhoods by being used locations of the city in Toronto.

The number of venues included type locations, latitude-longitude points of each area will be obtained using Foursquare API, they all will be fitted in the k-means algorithm to cluster venues.

# Segmenting and Clustering Neighborhoods in Toronto

In [1]:
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from bs4 import BeautifulSoup
import requests

In [2]:
# Assign the link of the website
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [3]:
# Reading the source code 
soup = BeautifulSoup(source, 'lxml')
#print(soup.prettify())

In [4]:
# find the table class 
wiki_table = soup.find('table', {'class':'wikitable sortable'})
#wiki_table

In [5]:
# Extract Header cols
filter_words = wiki_table.find_all('th')

# Create cols
header_cols = [i.text.replace('\n', '') for i in filter_words]
header_cols

['Postcode', 'Borough', 'Neighbourhood']

In [6]:
# Extract value cols
filter_words = wiki_table.find_all('td')

# Create values
value_cols = [filter_words[i].text.replace('\n','') for i in range(len(filter_words))]
#value_cols

In [7]:
# Create DataFrame for cleaning
Canada_df = pd.DataFrame(columns=header_cols, data=np.reshape(value_cols,(-1,3)))

print(Canada_df.shape)
Canada_df.head()

(288, 3)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [8]:
# Ignore cells is 'Not assigned'
filter_notassig = Canada_df.loc[(Canada_df['Borough'] == 'Not assigned')].index

# Drop rows is 'Not assigned'
Canada_df = Canada_df.drop(filter_notassig)
Canada_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [9]:
# fill values
Canada_df.iloc[6, 2] = Canada_df.loc[Canada_df['Borough']=="Queen's Park"].values[0][1]

# Reset Index
Canada_df.reset_index(drop='index', inplace=True)

print(Canada_df.shape)
Canada_df

(211, 3)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [10]:
# same values in the Postcode and Borough columns then values in the Neighbourhood columns to one row unique based on ostcode and Borough columns 
Grouped_df = Canada_df.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(list)
Grouped_df.shape

(103,)

In [11]:
# Extract values to Columns
Postcode_cols = [Grouped_df.index[i][0] for i in range(Grouped_df.shape[0])]
Borough_cols = [Grouped_df.index[i][1] for i in range(Grouped_df.shape[0])]
Neighbourhood_cols = [Grouped_df.values[i] for i in range(Grouped_df.shape[0])]

In [12]:
# Create DataFrame
Neighbourhood_df = pd.DataFrame(columns=header_cols, data={'Postcode':Postcode_cols,
                                                          'Borough':Borough_cols,
                                                          'Neighbourhood':Neighbourhood_cols})
Neighbourhood_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
3,M1G,Scarborough,[Woburn]
4,M1H,Scarborough,[Cedarbrae]


In [13]:
Neighbourhood_df['Neighbourhood']=Neighbourhood_df['Neighbourhood'].astype(str).str.replace("[",'')
Neighbourhood_df['Neighbourhood']=Neighbourhood_df['Neighbourhood'].astype(str).str.replace("]",'')
Neighbourhood_df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"'Rouge', 'Malvern'"
1,M1C,Scarborough,"'Highland Creek', 'Rouge Hill', 'Port Union'"
2,M1E,Scarborough,"'Guildwood', 'Morningside', 'West Hill'"
3,M1G,Scarborough,'Woburn'
4,M1H,Scarborough,'Cedarbrae'
5,M1J,Scarborough,'Scarborough Village'
6,M1K,Scarborough,"'East Birchmount Park', 'Ionview', 'Kennedy Park'"
7,M1L,Scarborough,"'Clairlea', 'Golden Mile', 'Oakridge'"
8,M1M,Scarborough,"'Cliffcrest', 'Cliffside', 'Scarborough Villag..."
9,M1N,Scarborough,"'Birch Cliff', 'Cliffside West'"


In [14]:
# the number of rows of your dataframe
Neighbourhood_df.shape

(103, 3)

In [15]:
# Import the csv file from http://cocl.us/Geospatial_data
Geo_coordinates = pd.read_csv('Geospatial_Coordinates.csv')
Geo_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
# merge tables
Neighbourhood_df = pd.merge(Neighbourhood_df, Geo_coordinates, how='left', left_on=['Postcode'], right_on=['Postal Code'])
Neighbourhood_df.drop('Postal Code', axis=1, inplace=True)

In [17]:
Neighbourhood_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"'Rouge', 'Malvern'",43.806686,-79.194353
1,M1C,Scarborough,"'Highland Creek', 'Rouge Hill', 'Port Union'",43.784535,-79.160497
2,M1E,Scarborough,"'Guildwood', 'Morningside', 'West Hill'",43.763573,-79.188711
3,M1G,Scarborough,'Woburn',43.770992,-79.216917
4,M1H,Scarborough,'Cedarbrae',43.773136,-79.239476


In [18]:
print('The dataframe has {} Borough and {} Neighbourhood.'.format(
    len(Neighbourhood_df['Borough'].unique()),
    len(Neighbourhood_df['Neighbourhood'])))

The dataframe has 11 Borough and 103 Neighbourhood.


In [19]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [20]:
address = 'Toronto ,ON'
geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [21]:
import folium

In [22]:
# Create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10, )

# add markers to map
for lat, lng, borough, neighborhood in zip(Neighbourhood_df['Latitude'], Neighbourhood_df['Longitude'], Neighbourhood_df['Borough'], Neighbourhood_df['Neighbourhood']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng],
                        radius=5,
                        popup=label,
                        color='blue',
                        fill=True,
                        fill_color='#3186cc',
                        fill_opacity=0.7,
                        parse_html=False).add_to(map_toronto) 
map_toronto

In [23]:
Neighbourhood_df.shape

(103, 5)

In [24]:
Neighbourhood_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"'Rouge', 'Malvern'",43.806686,-79.194353
1,M1C,Scarborough,"'Highland Creek', 'Rouge Hill', 'Port Union'",43.784535,-79.160497
2,M1E,Scarborough,"'Guildwood', 'Morningside', 'West Hill'",43.763573,-79.188711
3,M1G,Scarborough,'Woburn',43.770992,-79.216917
4,M1H,Scarborough,'Cedarbrae',43.773136,-79.239476


In [25]:
# Consider The Area size of Borough
Neighbourhood_df.groupby('Borough').size()

Borough
Central Toronto      9
Downtown Toronto    18
East Toronto         5
East York            5
Etobicoke           12
Mississauga          1
North York          24
Queen's Park         1
Scarborough         17
West Toronto         6
York                 5
dtype: int64

In [26]:
northyork_dt = Neighbourhood_df[Neighbourhood_df['Borough']=='North York'].reset_index(drop=True)
northyork_dt.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M2H,North York,'Hillcrest Village',43.803762,-79.363452
1,M2J,North York,"'Fairview', 'Henry Farm', 'Oriole'",43.778517,-79.346556
2,M2K,North York,'Bayview Village',43.786947,-79.385975
3,M2L,North York,"'Silver Hills', 'York Mills'",43.75749,-79.374714
4,M2M,North York,"'Newtonbrook', 'Willowdale'",43.789053,-79.408493


Let's get the geographical coordinates of North York

In [27]:
address = 'North York, Toronto'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North York are 43.7708175, -79.4132998.


 let's visualizat North York the neighborhoods

In [28]:
map_northyork = folium.Map(location=[latitude, longitude], zoom_start=9)

for lat, lng, label in zip(Neighbourhood_df['Latitude'], Neighbourhood_df['Longitude'], Neighbourhood_df['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_northyork)
map_northyork

#### Utilizing the Foursquare API to explore the neighborhoods

In [29]:
CLIENT_ID = 'TQX15HKTG5NJEUW4YVTEII55LKOQOPPXT0TK1X5Z3YNV4BA2' # your Foursquare ID
CLIENT_SECRET = 'UIYBJCDGT0ZQ3CELYLLFLHNSE42PF1UFRM00MSNYDSZCUHIB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TQX15HKTG5NJEUW4YVTEII55LKOQOPPXT0TK1X5Z3YNV4BA2
CLIENT_SECRET:UIYBJCDGT0ZQ3CELYLLFLHNSE42PF1UFRM00MSNYDSZCUHIB


In [30]:
northyork_dt.loc[5,'Neighbourhood']

"'Willowdale South'"

In [31]:
neighborhood_latitude = northyork_dt.loc[5, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = northyork_dt.loc[5, 'Longitude'] # neighborhood longitude value

neighborhood_name = northyork_dt.loc[5, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of 'Willowdale South' are 43.7701199, -79.40849279999999.


###  let's get the top 100 venues that are in Willowdale South within a radius of 500 meters.

In [33]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=TQX15HKTG5NJEUW4YVTEII55LKOQOPPXT0TK1X5Z3YNV4BA2&client_secret=UIYBJCDGT0ZQ3CELYLLFLHNSE42PF1UFRM00MSNYDSZCUHIB&v=20180605&ll=43.7701199,-79.40849279999999&radius=500&limit=100'

In [34]:
results = requests.get(url).json()
#results

In [35]:
# function from the Foursquare lab that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [36]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# defind columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Konjiki Ramen,Ramen Restaurant,43.766998,-79.412222
1,The Keg,Steakhouse,43.766579,-79.412131
2,Loblaws,Grocery Store,43.768648,-79.412597
3,Cineplex Cinemas Empress Walk,Movie Theater,43.768625,-79.412613
4,Aroma Espresso Bar,Café,43.769449,-79.413081


In [37]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

36 venues were returned by Foursquare.


In [38]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL 
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [39]:
NorthYork_venues = getNearbyVenues(names=northyork_dt['Neighbourhood'],
                                  latitudes=northyork_dt['Latitude'],
                                  longitudes=northyork_dt['Longitude'])
                                
NorthYork_venues.head()

'Hillcrest Village'
'Fairview', 'Henry Farm', 'Oriole'
'Bayview Village'
'Silver Hills', 'York Mills'
'Newtonbrook', 'Willowdale'
'Willowdale South'
'York Mills West'
'Willowdale West'
'Parkwoods'
'Don Mills North'
'Flemingdon Park', 'Don Mills South'
'Bathurst Manor', 'Downsview North', 'Wilson Heights'
'Northwood Park', 'York University'
'CFB Toronto', 'Downsview East'
'Downsview West'
'Downsview Central'
'Downsview Northwest'
'Victoria Village'
'Bedford Park', 'Lawrence Manor East'
'Lawrence Heights', 'Lawrence Manor'
'Glencairn'
'Downsview', 'North Park', 'Upwood Park'
'Humber Summit'
'Emery', 'Humberlea'


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,'Hillcrest Village',43.803762,-79.363452,Eagle's Nest Golf Club,43.805455,-79.364186,Golf Course
1,'Hillcrest Village',43.803762,-79.363452,AY Jackson Pool,43.804515,-79.366138,Pool
2,'Hillcrest Village',43.803762,-79.363452,Villa Madina,43.801685,-79.363938,Mediterranean Restaurant
3,'Hillcrest Village',43.803762,-79.363452,Duncan Creek Park,43.805539,-79.360695,Dog Run
4,"'Fairview', 'Henry Farm', 'Oriole'",43.778517,-79.346556,The LEGO Store,43.778207,-79.343483,Toy / Game Store


In [40]:
print(NorthYork_venues.shape)

(251, 7)


In [41]:
NorthYork_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"'Bathurst Manor', 'Downsview North', 'Wilson Heights'",18,18,18,18,18,18
'Bayview Village',4,4,4,4,4,4
"'Bedford Park', 'Lawrence Manor East'",25,25,25,25,25,25
"'CFB Toronto', 'Downsview East'",3,3,3,3,3,3
'Don Mills North',5,5,5,5,5,5
'Downsview Central',4,4,4,4,4,4
'Downsview Northwest',5,5,5,5,5,5
'Downsview West',4,4,4,4,4,4
"'Downsview', 'North Park', 'Upwood Park'",5,5,5,5,5,5
"'Emery', 'Humberlea'",1,1,1,1,1,1


In [42]:
print('There are {} uniques categories.'.format(len(NorthYork_venues['Venue Category'].unique())))

There are 112 uniques categories.


### Analyze Each Neighborhood

In [43]:
# one hot encoding
# defind Dummies by 'Venue Category' column
northyork_onehot = pd.get_dummies(NorthYork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
northyork_onehot['Neighborhood'] = NorthYork_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [northyork_onehot.columns[-1]] + list(northyork_onehot.columns[:-1])
northyork_onehot = northyork_onehot[fixed_columns]

northyork_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Station,Butcher,Cafeteria,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Electronics Store,Empanada Restaurant,Event Space,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Fraternity House,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hockey Arena,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Korean Restaurant,Liquor Store,Lounge,Luggage Store,Massage Studio,Mediterranean Restaurant,Metro Station,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Moving Target,Other Repair Shop,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shoe Store,Shopping Mall,Smoke Shop,Smoothie Shop,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,'Hillcrest Village',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,'Hillcrest Village',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,'Hillcrest Village',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,'Hillcrest Village',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"'Fairview', 'Henry Farm', 'Oriole'",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [44]:
northyork_onehot.shape

(251, 113)

#### let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [45]:
northyork_grouped = northyork_onehot.groupby('Neighborhood').mean().reset_index()
northyork_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Station,Butcher,Cafeteria,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Electronics Store,Empanada Restaurant,Event Space,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Fraternity House,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hockey Arena,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Korean Restaurant,Liquor Store,Lounge,Luggage Store,Massage Studio,Mediterranean Restaurant,Metro Station,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Moving Target,Other Repair Shop,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shoe Store,Shopping Mall,Smoke Shop,Smoothie Shop,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,"'Bathurst Manor', 'Downsview North', 'Wilson H...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0
1,'Bayview Village',0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"'Bedford Park', 'Lawrence Manor East'",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.08,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.08,0.04,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"'CFB Toronto', 'Downsview East'",0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,'Don Mills North',0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [46]:
northyork_grouped.shape

(23, 113)

#### Let's print each neighborhood along with the top 10 most common venues

In [51]:
#num_top_venues = 10

#for hood in northyork_grouped['Neighborhood']:
#    print("----"+hood+"----")
#    temp = northyork_grouped[northyork_grouped['Neighborhood'] == hood].T.reset_index()
#    temp.columns = ['venue','freq']
#    temp = temp.iloc[1:]
#    temp['freq'] = temp['freq'].astype(float)
#    temp = temp.round({'freq': 2})
#    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
#    print('\n')*/


let's write a function to sort the venues in descending order.

In [52]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

let's create the new dataframe and display the top 10 venues for each neighborhood.

In [53]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = northyork_grouped['Neighborhood']

for ind in np.arange(northyork_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(northyork_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"'Bathurst Manor', 'Downsview North', 'Wilson H...",Coffee Shop,Fried Chicken Joint,Supermarket,Frozen Yogurt Shop,Pharmacy,Pizza Place,Deli / Bodega,Restaurant,Bridal Shop,Sandwich Place
1,'Bayview Village',Chinese Restaurant,Café,Bank,Japanese Restaurant,Women's Store,Electronics Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store
2,"'Bedford Park', 'Lawrence Manor East'",Coffee Shop,Fast Food Restaurant,Italian Restaurant,Grocery Store,Indian Restaurant,Cupcake Shop,Liquor Store,Café,Pharmacy,Pizza Place
3,"'CFB Toronto', 'Downsview East'",Park,Airport,Other Repair Shop,Electronics Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store
4,'Don Mills North',Gym / Fitness Center,Caribbean Restaurant,Café,Baseball Field,Japanese Restaurant,Women's Store,Electronics Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega


## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [54]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [55]:
# Set number of clusters
k_clusters = 5

northyork_grouped_clustering = northyork_grouped.drop('Neighborhood', 1)

# run k-mean clustering
kmeans = KMeans(n_clusters=k_clusters, random_state=0).fit(northyork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 3, 3, 0, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 0, 2, 1, 3, 3,
       0])

In [56]:
# add clustering labels col
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_.astype('int'))

In [57]:
# merge northyork_grouped with northyork_dt to add latitude/longitude for each neighborhood
northyork_merged = northyork_dt.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), left_on='Neighbourhood', right_on='Neighborhood')

print(northyork_merged.shape)
northyork_merged.head()

(23, 16)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M2H,North York,'Hillcrest Village',43.803762,-79.363452,3,Golf Course,Dog Run,Pool,Mediterranean Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store
1,M2J,North York,"'Fairview', 'Henry Farm', 'Oriole'",43.778517,-79.346556,3,Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Toy / Game Store,Metro Station,Tea Room,Bakery,Kids Store,Japanese Restaurant
2,M2K,North York,'Bayview Village',43.786947,-79.385975,3,Chinese Restaurant,Café,Bank,Japanese Restaurant,Women's Store,Electronics Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store
3,M2L,North York,"'Silver Hills', 'York Mills'",43.75749,-79.374714,2,Cafeteria,Women's Store,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
4,M2N,North York,'Willowdale South',43.77012,-79.408493,3,Restaurant,Coffee Shop,Ramen Restaurant,Café,Japanese Restaurant,Sandwich Place,Sushi Restaurant,Pizza Place,Bubble Tea Shop,Ice Cream Shop


Finally, let's visualize the resulting clusters

In [58]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [59]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i + x + (i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(northyork_merged['Latitude'], northyork_merged['Longitude'], northyork_merged['Neighbourhood'], northyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

## Examine Clusters

#### Cluster 0

In [65]:
northyork_merged.loc[(northyork_merged['Cluster Labels'] == 0), 
                     northyork_merged.columns[[2] + list(range(6, northyork_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,'York Mills West',Park,Convenience Store,Bank,Bar,Electronics Store,Construction & Landscaping,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store
7,'Parkwoods',Park,Food & Drink Shop,Fast Food Restaurant,Women's Store,Dog Run,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega
12,"'CFB Toronto', 'Downsview East'",Park,Airport,Other Repair Shop,Electronics Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store


#### Cluster 1

In [61]:
northyork_merged.loc[(northyork_merged['Cluster Labels'] == 1), 
                     northyork_merged.columns[[2] + list(range(6, northyork_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,'Victoria Village',Hockey Arena,Portuguese Restaurant,Intersection,Coffee Shop,Women's Store,Dog Run,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop


#### Cluster 2

In [62]:
northyork_merged.loc[(northyork_merged['Cluster Labels'] == 2), 
                     northyork_merged.columns[[2] + list(range(6, northyork_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,"'Silver Hills', 'York Mills'",Cafeteria,Women's Store,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant


#### Cluster 3

In [63]:
northyork_merged.loc[(northyork_merged['Cluster Labels'] == 3), 
                     northyork_merged.columns[[2] + list(range(6, northyork_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,'Hillcrest Village',Golf Course,Dog Run,Pool,Mediterranean Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store
1,"'Fairview', 'Henry Farm', 'Oriole'",Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Toy / Game Store,Metro Station,Tea Room,Bakery,Kids Store,Japanese Restaurant
2,'Bayview Village',Chinese Restaurant,Café,Bank,Japanese Restaurant,Women's Store,Electronics Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store
4,'Willowdale South',Restaurant,Coffee Shop,Ramen Restaurant,Café,Japanese Restaurant,Sandwich Place,Sushi Restaurant,Pizza Place,Bubble Tea Shop,Ice Cream Shop
6,'Willowdale West',Grocery Store,Pizza Place,Discount Store,Coffee Shop,Pharmacy,Asian Restaurant,Athletics & Sports,Convenience Store,Cosmetics Shop,Cupcake Shop
8,'Don Mills North',Gym / Fitness Center,Caribbean Restaurant,Café,Baseball Field,Japanese Restaurant,Women's Store,Electronics Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega
9,"'Flemingdon Park', 'Don Mills South'",Gym,Asian Restaurant,Coffee Shop,Beer Store,Grocery Store,Smoke Shop,Fast Food Restaurant,Italian Restaurant,Japanese Restaurant,Discount Store
10,"'Bathurst Manor', 'Downsview North', 'Wilson H...",Coffee Shop,Fried Chicken Joint,Supermarket,Frozen Yogurt Shop,Pharmacy,Pizza Place,Deli / Bodega,Restaurant,Bridal Shop,Sandwich Place
11,"'Northwood Park', 'York University'",Coffee Shop,Caribbean Restaurant,Miscellaneous Shop,Metro Station,Bar,Massage Studio,Falafel Restaurant,Discount Store,Cosmetics Shop,Cupcake Shop
13,'Downsview West',Grocery Store,Moving Target,Bank,Shopping Mall,Electronics Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store


#### Cluster 4

In [64]:
northyork_merged.loc[(northyork_merged['Cluster Labels'] == 4), 
                     northyork_merged.columns[[2] + list(range(6, northyork_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,"'Emery', 'Humberlea'",Baseball Field,Women's Store,Empanada Restaurant,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant


# Conclusion

Those who interested in opening food truck, there is one point to consider cluster 3. Because cluster 3 has a lot of venues, that cluster 3 will be an interesting point for the tourist. So those who want to start their own business may consider this point.