# Capstone Project - The Battle of the Neighborhoods (Week 2)
### THU LE

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Methodology](#methodology)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

Toronto is the city of food lovers, offering a huge variety of cuisines with high standards. My client is interested in promoting Vietnamese cuisine in Toronto and is looking for a good neighborhood to start their business. They have not decided which borough in Toronto to take a deeper look for the neighborhood. They would like to be given a big picture of how dynamic these boroughs are compared to each other, based on some basic criteria of venue categories. Then after they pick the borough, they would like to look at how neighborhoods in the borough are doing in restaurant categories.

## Methodology <a name="Methodology"></a>

Stage 1: List top 5 boroughs in Toronto with the consideration of their venue dynamic\
    - number\
    - variety\
    - popularity in restaurant/bar/entertainment)\
Stage 2: After the desired borough is picked by the customer, present the venue analysis on each neighborhood 


## Data <a name="data"></a>


- Toronto Neighborhoods with Geospatial Coordinates
- Foursquare to retrieve trending venues nearby 

#### Importing librabries

In [None]:
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
import json
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
import requests

## Analysis <a name="analysis"></a>


### Stage 1: List top 5 boroughs in Toronto with their venue dynamic

##### I. Prepare dataframe of boroughs and neigborhoods in Toronto with geospatial coordinates

In [267]:
# Create soup object storing parsed data from web
url ='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = requests.get(url).text
soup = BeautifulSoup(data,'html.parser')

# Data preparation and cleaning
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

# Add latitude and longitude to corresponding postal code
geo_coords = pd.read_csv(r'C:\Users\user\Downloads\Geospatial_Coordinates.csv')
geo_coords
df=df.merge(geo_coords, left_on ='PostalCode', right_on ='Postal Code').drop('Postal Code', axis =1)
df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [None]:
print('The dataframe has {} boroughs and {} neighborhoods'.format(len(df['Borough'].unique()),df.shape[0]))

##### II. Prepare dataframe of boroughs and their venues dynamic

In [None]:
# Foursquare credentials
CLIENT_ID = 'E3ECED54X54UQ2KZPLNAKXJGRUYU0SAFICERMC1EG0O4OWIV' 
CLIENT_SECRET = 'QMVFLY2DNZQO5HTOEURTVPDTUHY2TQMLX1J3KEEBGK3QYRAL'
VERSION = '20190505' 
LIMIT = 100 

In [None]:
# Create get venues nearby function
def getNearbyVenues( neigborhood, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for neigborhood, lat, lng in zip( neigborhood, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            neigborhood, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
#Run the function to have dataframe of venues in radius of 500 for each boroughs in downtown
total_venues = getNearbyVenues(df['Neighborhood'], df['Latitude'], df['Longitude'])
total_venues.head()

In [None]:
print(total_venues.shape)
total_venues.head()

In [None]:
# Add borough back for further analysis on borough
df_to_merge = df[['Borough','Neighborhood']]
total_venues_borough = total_venues.merge(df_to_merge, on = 'Neighborhood')
total_venues_borough.head()
total_venues_borough.groupby('Borough').count()

In [197]:
# one hot encoding
borough_onehot = pd.get_dummies(total_venues_borough[['Venue Category']], prefix="", prefix_sep="")

# add borough column back to dataframe
borough_onehot['Borough'] = total_venues_borough['Borough'] 

# move neighborhood column to the first column
fixed_columns = [borough_onehot.columns[-1]] + list(borough_onehot.columns[:-1])
borough_onehot = borough_onehot[fixed_columns]

borough_onehot.head()

Unnamed: 0,Borough,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [198]:
borough_grouped = borough_onehot.groupby('Borough').mean().reset_index()
borough_grouped.head()

Unnamed: 0,Borough,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009524,...,0.0,0.0,0.0,0.0,0.009524,0.0,0.0,0.0,0.0,0.009524
1,Downtown Toronto,0.0,0.000909,0.000909,0.000909,0.000909,0.001818,0.002727,0.001818,0.011818,...,0.0,0.010909,0.001818,0.0,0.003636,0.0,0.007273,0.0,0.0,0.004545
2,Downtown Toronto Stn A,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
3,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029126,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019417
4,East Toronto Business,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824


In [207]:
# Add number of venues to dataframe
borough_grouped_venues_sum = borough_onehot.groupby('Borough').sum().reset_index()
borough_grouped_venues_sum['Total venues'] =borough_grouped_venues_count.sum(axis=1)
borough_grouped_venues_sum = borough_grouped_venues_sum[['Borough','Total venues']]
borough_grouped_venues_sum.head()
borough_grouped= borough_grouped.merge(borough_grouped_venues_sum, on ='Borough')


In [209]:
# Sort boroughs from the one with highest number of venues
borough_grouped= borough_grouped.sort_values('Total venues', ascending = False )
borough_grouped.head()

Unnamed: 0,Borough,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Total venues
1,Downtown Toronto,0.0,0.000909,0.000909,0.000909,0.000909,0.001818,0.002727,0.001818,0.011818,...,0.010909,0.001818,0.0,0.003636,0.0,0.007273,0.0,0.0,0.004545,2200
10,North York,0.00431,0.0,0.00431,0.0,0.0,0.0,0.0,0.0,0.008621,...,0.0,0.00431,0.0,0.008621,0.0,0.0,0.0,0.008621,0.0,464
13,West Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.019355,0.0,0.0,0.012903,0.0,0.006452,0.0,0.0,0.012903,310
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009524,...,0.0,0.0,0.0,0.009524,0.0,0.0,0.0,0.0,0.009524,210
3,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029126,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019417,206


In [211]:
num_top_venues = 10

for bo in borough_grouped['Borough']:
    print("----"+bo+"----")
    
    temp = borough_grouped[borough_grouped['Borough'] == bo].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----Downtown Toronto----
                 venue     freq
0         Total venues  2200.00
1          Coffee Shop     0.10
2                 Café     0.06
3                Hotel     0.03
4  Japanese Restaurant     0.03
5           Restaurant     0.03
6                 Park     0.02
7       Clothing Store     0.02
8   Seafood Restaurant     0.02
9               Bakery     0.02


----North York----
                  venue    freq
0          Total venues  464.00
1           Coffee Shop    0.08
2        Clothing Store    0.05
3            Restaurant    0.04
4           Pizza Place    0.04
5  Fast Food Restaurant    0.03
6                  Café    0.03
7                  Bank    0.03
8                  Park    0.03
9         Grocery Store    0.03


----West Toronto----
                venue    freq
0        Total venues  310.00
1                Café    0.07
2                 Bar    0.06
3         Coffee Shop    0.05
4  Italian Restaurant    0.04
5          Restaurant    0.04
6      Breakfast 

#### Stage 1 Conclusion
Client pick Downtown Toronto and West Toronto for deeper neighboorhood exploration as they has highest trafic of venues.
Downtown Toronto has many Japanese restaurant, showing people's interest in Asian style cuisine
Although West Toronto is not as crowded as in Downtown Toronto, top 5 most common venues are all in food/drink category. 
And, the cost of business could be lower if the client decide to go small with this restaurant later.

### Stage 2: Present venues dynamics for neighborhoods in Downtown Toronto and West Toronto

In [232]:
downtown_venues = total_venues_borough[total_venues_borough.Borough.isin( ['Downtown Toronto'])]
west_venues = total_venues_borough[total_venues_borough.Borough.isin( ['West Toronto'])]


#### Downtown Toronto

In [233]:
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")
downtown_onehot['Neighborhood'] = downtown_venues['Neighborhood'] 
downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighborhood,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.016667,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.133333,0.2,0.133333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.016129
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,...,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.026316
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
6,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.015873,0.0


In [260]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Restaurant,Pharmacy,Cheese Shop,Seafood Restaurant,Farmers Market,Creperie
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Coffee Shop,Harbor / Marina,Rental Car Location,Sculpture Garden,Boat or Ferry,Airport,Airport Gate
2,Central Bay Street,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Burger Joint,Bubble Tea Shop,Salad Place,Japanese Restaurant,Spa,Sushi Restaurant
3,Christie,Grocery Store,Café,Park,Athletics & Sports,Italian Restaurant,Baby Store,Candy Store,Restaurant,Nightclub,Coffee Shop
4,Church and Wellesley,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Restaurant,Gay Bar,Fast Food Restaurant,Hotel,Mediterranean Restaurant,Men's Store,Pub
5,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym,Italian Restaurant,Cocktail Bar,Deli / Bodega,Seafood Restaurant,American Restaurant
6,"First Canadian Place, Underground city",Coffee Shop,Café,Hotel,Gym,Japanese Restaurant,Restaurant,Asian Restaurant,Salad Place,Deli / Bodega,Steakhouse
7,"Garden District, Ryerson",Clothing Store,Coffee Shop,Cosmetics Shop,Bubble Tea Shop,Japanese Restaurant,Middle Eastern Restaurant,Café,Italian Restaurant,Fast Food Restaurant,Pizza Place
8,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Aquarium,Café,Hotel,Brewery,Scenic Lookout,Sporting Goods Shop,Pizza Place,Restaurant,Italian Restaurant
9,"Kensington Market, Chinatown, Grange Park",Café,Coffee Shop,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bar,Gaming Cafe,Mexican Restaurant,Dessert Shop,Park,Farmers Market


##### Run k-means to cluster the neighborhood into 5 clusters.

In [265]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

downtown_grouped_clustering = downtown_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 0, 2, 0, 0, 0, 0, 0, 0])

In [266]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = downtown

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


##### Visualize clusters

In [242]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

import folium
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['Neighborhood Latitude'], downtown_merged['Neighborhood Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### West Toronto

In [254]:
west_onehot = pd.get_dummies(west_venues[['Venue Category']], prefix="", prefix_sep="")
west_onehot['Neighborhood'] = west_venues['Neighborhood'] 
west_grouped = west_onehot.groupby('Neighborhood').mean().reset_index()
west_grouped

Unnamed: 0,Neighborhood,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bakery,Bank,Bar,Beer Store,Bookstore,...,Speakeasy,Stadium,Supermarket,Sushi Restaurant,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.086957,0.0,0.043478,0.0,0.0,...,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Dufferin, Dovercourt Village",0.0,0.0,0.0,0.0,0.133333,0.066667,0.066667,0.0,0.0,...,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"High Park, The Junction South",0.04,0.0,0.04,0.0,0.04,0.0,0.04,0.0,0.04,...,0.04,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0
3,"Little Portugal, Trinity",0.0,0.023256,0.0,0.046512,0.0,0.0,0.093023,0.023256,0.0,...,0.0,0.0,0.0,0.0,0.0,0.023256,0.046512,0.046512,0.023256,0.023256
4,"Parkdale, Roncesvalles",0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Runnymede, Swansea",0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.028571,...,0.0,0.0,0.0,0.057143,0.0,0.0,0.028571,0.0,0.0,0.028571


In [259]:
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = west_grouped['Neighborhood']

for ind in np.arange(west_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(west_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Brockton, Parkdale Village, Exhibition Place",Café,Bakery,Breakfast Spot,Coffee Shop,Gym,Intersection,Grocery Store,Furniture / Home Store,Nightclub,Performing Arts Venue
1,"Dufferin, Dovercourt Village",Pharmacy,Bakery,Liquor Store,Park,Music Venue,Middle Eastern Restaurant,Furniture / Home Store,Café,Brewery,Supermarket
2,"High Park, The Junction South",Mexican Restaurant,Thai Restaurant,Café,Antique Shop,Speakeasy,Italian Restaurant,Grocery Store,Music Venue,Gastropub,Furniture / Home Store
3,"Little Portugal, Trinity",Bar,Restaurant,Café,Men's Store,Asian Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Portuguese Restaurant,Pizza Place
4,"Parkdale, Roncesvalles",Gift Shop,Breakfast Spot,Dog Run,Cuban Restaurant,Coffee Shop,Movie Theater,Eastern European Restaurant,Bookstore,Bar,Italian Restaurant
5,"Runnymede, Swansea",Café,Sushi Restaurant,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Gym,Health Food Store,Falafel Restaurant,Fish & Chips Shop


In [256]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 3

west_grouped_clustering = west_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(west_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 2, 2, 0, 2])

In [257]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

west_merged = west_venues

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
west_merged = west_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
west_merged

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
685,"Dufferin, Dovercourt Village",43.669005,-79.442259,The Greater Good Bar,43.669409,-79.439267,Bar,West Toronto,1,Pharmacy,Bakery,Liquor Store,Park,Music Venue,Middle Eastern Restaurant,Furniture / Home Store,Café,Brewery,Supermarket
686,"Dufferin, Dovercourt Village",43.669005,-79.442259,Parallel,43.669516,-79.438728,Middle Eastern Restaurant,West Toronto,1,Pharmacy,Bakery,Liquor Store,Park,Music Venue,Middle Eastern Restaurant,Furniture / Home Store,Café,Brewery,Supermarket
687,"Dufferin, Dovercourt Village",43.669005,-79.442259,Happy Bakery & Pastries,43.667050,-79.441791,Bakery,West Toronto,1,Pharmacy,Bakery,Liquor Store,Park,Music Venue,Middle Eastern Restaurant,Furniture / Home Store,Café,Brewery,Supermarket
688,"Dufferin, Dovercourt Village",43.669005,-79.442259,Blood Brothers Brewing,43.669944,-79.436533,Brewery,West Toronto,1,Pharmacy,Bakery,Liquor Store,Park,Music Venue,Middle Eastern Restaurant,Furniture / Home Store,Café,Brewery,Supermarket
689,"Dufferin, Dovercourt Village",43.669005,-79.442259,FreshCo,43.667918,-79.440754,Grocery Store,West Toronto,1,Pharmacy,Bakery,Liquor Store,Park,Music Venue,Middle Eastern Restaurant,Furniture / Home Store,Café,Brewery,Supermarket
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1592,"Runnymede, Swansea",43.651571,-79.484450,Cards 'N' Such,43.650497,-79.480778,Post Office,West Toronto,2,Café,Sushi Restaurant,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Gym,Health Food Store,Falafel Restaurant,Fish & Chips Shop
1593,"Runnymede, Swansea",43.651571,-79.484450,West End Mamas,43.648703,-79.484919,Health Food Store,West Toronto,2,Café,Sushi Restaurant,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Gym,Health Food Store,Falafel Restaurant,Fish & Chips Shop
1594,"Runnymede, Swansea",43.651571,-79.484450,Kingsway Meat Products & Deli,43.650299,-79.480827,Butcher,West Toronto,2,Café,Sushi Restaurant,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Gym,Health Food Store,Falafel Restaurant,Fish & Chips Shop
1595,"Runnymede, Swansea",43.651571,-79.484450,(The New) Moksha Yoga Bloor West,43.648658,-79.485242,Yoga Studio,West Toronto,2,Café,Sushi Restaurant,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Gym,Health Food Store,Falafel Restaurant,Fish & Chips Shop


In [258]:
address = 'West Toronto, Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

import folium
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(west_merged['Neighborhood Latitude'], west_merged['Neighborhood Longitude'], west_merged['Neighborhood'], west_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Stage 2 Conclusion


For Downtown Toronto, the client could pick neigborhoods in cluster 0 as they have the similar character, which is popular with dining/drink venues.\
For West Toronto, the cluster for restaurant venue is cluster 2.\
In top 10 most common venues table, there are neighborhoods that has restaurants listed at 2 in 3 most common venues, which are good location to start a restaurant business.\
Also, the table the information where Vietnamese restaurants are common. The client can pick these neighborhood to leverage existing customer traffice or choose another location that has had Vietnamese competior.