# Battle of Neighborhood

## Introduction
In this project, we plan to build new gym in the city of Toronto. The goal of the project, we find the best venue to build a new gym in Toronto.

Since most of people want to take care of their health, they want to do more physical exercice. Some people workout at home and other go to the gym to practice. Then we need to know where are the optimal neighborhood in Toronto to build the new gym.

## Business Problem
In this project, we find the best neighborhood to build our new gym in Toronto. We can imagine that the chain Gold’s gym want to open a gym in Toronto. Then we need to consider the following features for the neighborhood decision.

* On what basis can we decide for the gym’s location
* We need also to find a place where people are most likely take a subscription to this gym.
* If there is already a gym in the neighborhood, is it a good idea to build our gym in this place ?


## Data
In this project, we analyse the city of Toronto.
Dataset 1: In order to segment the neighborhoods and explore them, we need a dataset containing the post codes of Toronto. We need also the coordinates (latitude and longitude) of each post codes that are contained in a csv-files given by Coursera.

This dataset exists for free on the website: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M


Dataset 2: We use the Foursquare location to get venues in each neighborhood. Most of people like to go to the gym before, after work or during the lunch break. These venues can be workplace or supermarket. Since these venues are crowded, people are most likely to go to the gym if it is near to their workplace or on the road of the workplace. Also, if the gym are near to a supermarket, the people can go to the gym when visiting the supermarket.

Link to the dataset: https://developer.foursquare.com/docs/data

## Import libraries
Before our analysis, we need to import libraries for data analysis and especially ``folium`` to display Toronto maps.

In [1]:
# import libraries
import numpy as np
import pandas as pd
import folium
import requests
from bs4 import BeautifulSoup

## Import and preprocessing data

As mentionned in the introduction, we import the data from the web.

In [2]:
# import data
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(url)
soup = BeautifulSoup(page.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]
print('data dimension:', df.shape)
df.head()

data dimension: (287, 3)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


We observe that there some Postcodes do not have assigned borough and Neighbourhood, the missing value is designed by 'Not assigned'. The first step is to remove rows where the borough and the neighbourhood are not assigned for the given postcode and we reindex the data. 

In [17]:
# remove not assigned row
df_prep = df[df.Borough != 'Not assigned']
df_prep.reset_index(inplace=True)
print('data dimension:', df_prep.shape)
df_prep.head()

data dimension: (210, 4)


Unnamed: 0,index,Postcode,Borough,Neighbourhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,Harbourfront
3,5,M6A,North York,Lawrence Heights
4,6,M6A,North York,Lawrence Manor


In [18]:
# drop the column index
df_clean = df_prep.drop(['index'], axis=1)
print('data dimension:', df_clean.shape)
df_clean.head()

data dimension: (210, 3)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


We observe that we have redundant columns. For example, we have two rows for the Postcode M6A which is assigned to the Borough since one borough can have at least two neighbourhoods. Then the next step is to combine each row with same borough and postcode as one row. Then the feature Neighbourhood contains list with many neighbourhoods for one Borough and one Postcode.

In [21]:
# combine neighbourhood
for p in np.unique(df_clean.Postcode):
    df_clean.loc[df_clean['Postcode'] == p,'Neighbourhood'] = df_clean.loc[df_clean['Postcode'] == p,'Neighbourhood'].str.cat(sep=',')
    
# drop duplicate    
df_clean.drop_duplicates('Postcode', inplace=True)
print('Data dimension:', df_clean.shape)
df_clean.head()

Data dimension: (103, 3)


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor,Lawrence Heigh..."
5,M7A,Downtown Toronto,Queen's Park


With this transformation, one row is assigned to one Postcode with a list of neighborhood.

The next step is to add geographical features to our dataset. Then we import the csv-files ``Geospatial_Coordiantes.csv`` to obtain the coordinates of each borough.

In [22]:
# import new csv 
coord = pd.read_csv('Geospatial_Coordinates.csv')
coord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


We observe that the coordinate dataset contains also Postal Code as features. Then we need to merge each dataset on the Postal Code.

In [23]:
# merge dataset
df_clean.rename({'Postcode': 'Postal Code'}, inplace=True, axis=1)

result = pd.merge(df_clean, coord, on='Postal Code')
result.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor,Lawrence Heigh...",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


The dataframe ``result`` is the first step of the data processing.

In [None]:
# save data
result.to_csv('result.csv')

In [2]:
# load data
result = pd.read_csv('result.csv')

### Foursquare
Since we have build our dataset containing Toronto's neighborhood with geographical features. The next step is to add the features venues such as the gym or restaurants from the Foursquare API.

From this API, we extract all venues and add to the features. We need then to enter the client id and client secret to get access to the API.

In [3]:
# Foursquare credential
client_id = 'KC0ERLLNKX5LMBP2YGIE3JYDJWUHTLXCVIXQ10GCD5M0UUPU'
client_secret = 'H2SRB4BVVC1T332FFL3CRGK3JL520JR0R0JWGEVWMZQBCVMC'
version = '20180605'
print('Client ID:', client_id)
print('Client Secret:', client_secret)
print('Version:', version)
# get latitude and longitude of one Neighbourhood
neigh_lat = result.loc[0, 'Latitude']
neigh_long = result.loc[0, 'Longitude']
neigh_name = result.loc[0, 'Neighbourhood']
print('Neighbourhood: {}, Latitude: {}, Longitude: {}'.format(neigh_name, neigh_lat, neigh_long))

Client ID: KC0ERLLNKX5LMBP2YGIE3JYDJWUHTLXCVIXQ10GCD5M0UUPU
Client Secret: H2SRB4BVVC1T332FFL3CRGK3JL520JR0R0JWGEVWMZQBCVMC
Version: 20180605
Neighbourhood: Parkwoods, Latitude: 43.7532586, Longitude: -79.3296565


In [4]:
limit = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    client_id, 
    client_secret, 
    version, 
    neigh_lat, 
    neigh_long, 
    radius, 
    limit)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=KC0ERLLNKX5LMBP2YGIE3JYDJWUHTLXCVIXQ10GCD5M0UUPU&client_secret=H2SRB4BVVC1T332FFL3CRGK3JL520JR0R0JWGEVWMZQBCVMC&v=20180605&ll=43.7532586,-79.3296565&radius=500&limit=100'

In [7]:
# get the result from API
url_req = requests.get(url).json()

# extract category of the venues
def extract_category(row):
    try:
        cat_list = row['categories']
    except:
        cat_list = row['venue.categories']
        
    if len(cat_list) == 0:
        return None
    else:
        return cat_list[0]['name']
    
venues = url_req['response']['groups'][0]['items']

# convert json to pandas dataframe
from pandas.io.json import json_normalize
venues_df = json_normalize(venues)

# choose specific columns
columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venues_df = venues_df.loc[:, columns]

# get the categories
venues_df['venue.categories'] = venues_df.apply(extract_category, axis=1)
venues_df

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,TTC stop - 44 Valley Woods,Bus Stop,43.755402,-79.333741


We need to apply this procedure for all rows of the dataset.

In [4]:
# apply method for all rows
latitude = result['Latitude']
longitude = result['Longitude']
neighbourhood = result['Neighbourhood']

limit = 100
radius = 2000

venues_list = []

data_loader = zip(neighbourhood, latitude, longitude)

for neigh, lat, long in data_loader:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    client_id, 
    client_secret, 
    version, 
    lat, 
    long, 
    radius, 
    limit)
    
    req = requests.get(url).json()
    
    venues = req['response']['groups'][0]['items']
    
    venues_list.append([(neigh, lat, long, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],
                       v['venue']['categories'][0]['name']) for v in venues])
    
    venues_df = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues_df.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 
                  'Venue Category']
    
venues_df.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Donalda Golf & Country Club,43.752816,-79.342741,Golf Course
2,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
3,Parkwoods,43.753259,-79.329656,Graydon Hall Manor,43.763923,-79.342961,Event Space
4,Parkwoods,43.753259,-79.329656,Galleria Supermarket,43.75352,-79.349518,Supermarket


In [5]:
# convert categorical values
venue_one_hot = pd.get_dummies(venues_df[['Venue Category']], prefix="", prefix_sep="")

# add to dataframe
venue_one_hot['Neighborhood'] = venues_df['Neighborhood']

fix_col = [venue_one_hot.columns[-1]] + list(venue_one_hot.columns[:-1])
venue_one_hot = venue_one_hot[fix_col]

# compute mean of frequency by neighborhood 
venue_group = venue_one_hot.groupby('Neighborhood').mean().reset_index()

# save the dataset without neighborhood
#venue_group.drop(['Neighborhood'], axis = 1, inplace=True)

venue_group.head()

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,...,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0


In [6]:
# extract the index of the most frequented venues
def most_common_venue(row, top):
    row_cat = row.iloc[1:]
    row_cat_sort = row_cat.sort_values(ascending=False)
    return row_cat_sort.index.values[0:top]


# create dataframe with the 10 most frequented venues
top_venue = 10

columns = ['Neighborhood']

for i in range(top_venue):
    columns.append('{} most common venue'.format(i+1))
    
venues_sort = pd.DataFrame(columns=columns)

venues_sort['Neighborhood'] = venue_group['Neighborhood']

for i in range(venue_group.shape[0]):
    venues_sort.iloc[i,1:] = most_common_venue(venue_group.iloc[i,:], top_venue)

venues_sort.head()

Unnamed: 0,Neighborhood,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue
0,"Adelaide,King,Richmond",Coffee Shop,Restaurant,Hotel,Theater,Beer Bar,Pizza Place,Cosmetics Shop,Japanese Restaurant,Café,Sandwich Place
1,Agincourt,Chinese Restaurant,Coffee Shop,Pharmacy,Restaurant,Sandwich Place,Indian Restaurant,Sushi Restaurant,Bank,Bakery,Discount Store
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Chinese Restaurant,Coffee Shop,Park,Gas Station,Pizza Place,Dessert Shop,Bakery,Vietnamese Restaurant,Bubble Tea Shop,Korean Restaurant
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Coffee Shop,Pizza Place,Indian Restaurant,Fast Food Restaurant,Grocery Store,Convenience Store,Park,Skating Rink,Fried Chicken Joint,Beer Store
4,"Alderwood,Long Branch",Coffee Shop,Fast Food Restaurant,Department Store,Pizza Place,Clothing Store,Burger Joint,Pharmacy,Restaurant,Electronics Store,Café


In [7]:
# save dataframe
venues_sort.to_csv('venues_sort.csv')
# save to csv-file
venue_group.to_csv('venue_group.csv')

In [None]:
venues_sort = pd.read_csv('venues_sort.csv')
venue_group = pd.read_csv('venue_group.csv')

## Methodologie
To segment the neighborhood of Toronto, we use the k-means clustering method and we integrated the cluster label to the dataset. Each clusters contain a venue that has highest amount and with high frequency. The number of clusters is determined by ourselves. Then we display the cluster on the map of Toronto and analysis the main venue in each cluster.

In [8]:
# load cluster model
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs

In order to fit our k-means clustering method, we need to create feature containing information on the venues. In our case, the features data are the mean of each venues by neighborhood.

In [10]:
# drop the column neighborhood
X = venue_group.drop(['Neighborhood'], axis = 1)
X.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,...,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0


In [44]:
# fit kmeans method with 5 clusters
n_clust = 3
kmean = KMeans(n_clusters=n_clust, random_state=42).fit(X)

In [45]:
# add cluster label to the main dataset
venues_sort['Cluster Labels'] = kmean.labels_
venues_sort.head()

Unnamed: 0,Neighborhood,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
0,"Adelaide,King,Richmond",Coffee Shop,Restaurant,Hotel,Theater,Beer Bar,Pizza Place,Cosmetics Shop,Japanese Restaurant,Café,Sandwich Place,0
1,Agincourt,Chinese Restaurant,Coffee Shop,Pharmacy,Restaurant,Sandwich Place,Indian Restaurant,Sushi Restaurant,Bank,Bakery,Discount Store,2
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Chinese Restaurant,Coffee Shop,Park,Gas Station,Pizza Place,Dessert Shop,Bakery,Vietnamese Restaurant,Bubble Tea Shop,Korean Restaurant,2
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Coffee Shop,Pizza Place,Indian Restaurant,Fast Food Restaurant,Grocery Store,Convenience Store,Park,Skating Rink,Fried Chicken Joint,Beer Store,2
4,"Alderwood,Long Branch",Coffee Shop,Fast Food Restaurant,Department Store,Pizza Place,Clothing Store,Burger Joint,Pharmacy,Restaurant,Electronics Store,Café,2


In order to display the map with the cluster, we need to merge the dataframe ``venues_sort`` with the dataframe ``result``.

In [None]:
result.rename({'Neighbourhood': 'Neighborhood'}, axis=1, inplace=True)

In [46]:
# merge venues_sort with result
merge_data = pd.merge(result, venues_sort, on='Neighborhood')
merge_data.head()

Unnamed: 0.1,Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
0,0,M3A,North York,Parkwoods,43.753259,-79.329656,Coffee Shop,Japanese Restaurant,Gas Station,Pizza Place,Supermarket,Sandwich Place,Chinese Restaurant,Fried Chicken Joint,Pharmacy,Park,2
1,1,M4A,North York,Victoria Village,43.725882,-79.315572,Coffee Shop,Fast Food Restaurant,Gym,Sandwich Place,Gym / Fitness Center,Japanese Restaurant,Grocery Store,Middle Eastern Restaurant,Clothing Store,Park,2
2,2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,Coffee Shop,Café,Park,Theater,Restaurant,Japanese Restaurant,Bakery,Gastropub,Italian Restaurant,Farmers Market,0
3,3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763,Clothing Store,Coffee Shop,Furniture / Home Store,Restaurant,Fast Food Restaurant,Dessert Shop,Grocery Store,Greek Restaurant,Pet Store,Jewelry Store,2
4,4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,Coffee Shop,Café,Japanese Restaurant,Park,Mexican Restaurant,Breakfast Spot,Gastropub,Restaurant,Pizza Place,Theater,0


In [47]:
merge_data.drop(['Unnamed: 0'], axis=1, inplace=True)
merge_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
0,M3A,North York,Parkwoods,43.753259,-79.329656,Coffee Shop,Japanese Restaurant,Gas Station,Pizza Place,Supermarket,Sandwich Place,Chinese Restaurant,Fried Chicken Joint,Pharmacy,Park,2
1,M4A,North York,Victoria Village,43.725882,-79.315572,Coffee Shop,Fast Food Restaurant,Gym,Sandwich Place,Gym / Fitness Center,Japanese Restaurant,Grocery Store,Middle Eastern Restaurant,Clothing Store,Park,2
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,Coffee Shop,Café,Park,Theater,Restaurant,Japanese Restaurant,Bakery,Gastropub,Italian Restaurant,Farmers Market,0
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763,Clothing Store,Coffee Shop,Furniture / Home Store,Restaurant,Fast Food Restaurant,Dessert Shop,Grocery Store,Greek Restaurant,Pet Store,Jewelry Store,2
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,Coffee Shop,Café,Japanese Restaurant,Park,Mexican Restaurant,Breakfast Spot,Gastropub,Restaurant,Pizza Place,Theater,0


In [17]:
# import the libraries for visualization
import matplotlib.cm as cm
import matplotlib.colors as colors

In [48]:
# display the Toronto map with the clusters.
latitude = 43.651070
longitude = -79.347015

x = np.arange(n_clust)
ys = [i + x + (i*x)**2 for i in range(n_clust)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


toronto_map = folium.Map(location=[latitude, longitude], zoom_start=12)

toronto_loader = zip(merge_data['Latitude'], merge_data['Longitude'], merge_data['Neighborhood'], merge_data['Cluster Labels'])

for lat, long, neigh, lab in toronto_loader:
    label = folium.Popup(str(neigh) + 'Cluster' + str(lab), parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=rainbow[lab-1],
        fill=True,
        fill_color=rainbow[lab-1],
        fill_opacity=0.7).add_to(toronto_map)
    
toronto_map

On this map, we only observe 3 relevant cluster on this map. Let's explore each venue for each cluster.

### Cluster 0 - most common venues: Coffee Shop, Park, Restaurant

In [49]:
merge_data.loc[merge_data['Cluster Labels'] == 0, merge_data.columns[[2] + list(range(5, merge_data.shape[1]))]]

Unnamed: 0,Neighborhood,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
2,Harbourfront,Coffee Shop,Café,Park,Theater,Restaurant,Japanese Restaurant,Bakery,Gastropub,Italian Restaurant,Farmers Market,0
4,Queen's Park,Coffee Shop,Café,Japanese Restaurant,Park,Mexican Restaurant,Breakfast Spot,Gastropub,Restaurant,Pizza Place,Theater,0
9,"Ryerson,Garden District",Coffee Shop,Café,Restaurant,Gastropub,Japanese Restaurant,Park,Italian Restaurant,Pizza Place,Breakfast Spot,Sushi Restaurant,0
14,Woodbine Heights,Park,Coffee Shop,Pizza Place,Thai Restaurant,Café,Gastropub,Ethiopian Restaurant,Pharmacy,Ice Cream Shop,Beer Store,0
15,St. James Town,Coffee Shop,Café,Restaurant,Hotel,Italian Restaurant,Farmers Market,Japanese Restaurant,Gastropub,Beer Bar,Cosmetics Shop,0
16,Humewood-Cedarvale,Italian Restaurant,Coffee Shop,Bank,Café,Bakery,Park,Caribbean Restaurant,Indian Restaurant,Ice Cream Shop,Pizza Place,0
19,The Beaches,Coffee Shop,Pub,Breakfast Spot,Beach,Bakery,Thai Restaurant,BBQ Joint,Bar,Park,Japanese Restaurant,0
20,Berczy Park,Coffee Shop,Hotel,Café,Japanese Restaurant,Restaurant,Farmers Market,Beer Bar,Gastropub,Italian Restaurant,Park,0
24,Central Bay Street,Coffee Shop,Café,Restaurant,Pizza Place,Theater,Park,Mexican Restaurant,Hotel,Sushi Restaurant,Sandwich Place,0
25,Christie,Café,Coffee Shop,Bar,Grocery Store,Korean Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Beer Bar,Indian Restaurant,Pizza Place,0


### Cluster 1 - most common venues: Zoo, sculpture garden, restaurant

In [50]:
merge_data.loc[merge_data['Cluster Labels'] == 1, merge_data.columns[[2] + list(range(5, merge_data.shape[1]))]]

Unnamed: 0,Neighborhood,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
6,"Rouge,Malvern",Zoo Exhibit,Fast Food Restaurant,Restaurant,Gas Station,Other Great Outdoors,Pizza Place,Zoo,Mediterranean Restaurant,Pub,Caribbean Restaurant,1
95,Upper Rouge,Sculpture Garden,Golf Course,Grocery Store,Trail,Playground,Farm,Empanada Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,1


### Cluster 2 - most common venues: Coffee Shop, Restaurant, Gym, Bank, Park, Store

In [51]:
merge_data.loc[merge_data['Cluster Labels'] == 2, merge_data.columns[[2] + list(range(5, merge_data.shape[1]))]]

Unnamed: 0,Neighborhood,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
0,Parkwoods,Coffee Shop,Japanese Restaurant,Gas Station,Pizza Place,Supermarket,Sandwich Place,Chinese Restaurant,Fried Chicken Joint,Pharmacy,Park,2
1,Victoria Village,Coffee Shop,Fast Food Restaurant,Gym,Sandwich Place,Gym / Fitness Center,Japanese Restaurant,Grocery Store,Middle Eastern Restaurant,Clothing Store,Park,2
3,"Lawrence Heights,Lawrence Manor",Clothing Store,Coffee Shop,Furniture / Home Store,Restaurant,Fast Food Restaurant,Dessert Shop,Grocery Store,Greek Restaurant,Pet Store,Jewelry Store,2
5,Islington Avenue,Pharmacy,Coffee Shop,Bank,Park,Shopping Mall,Golf Course,Liquor Store,Supermarket,Lighting Store,Bakery,2
7,Don Mills North,Coffee Shop,Japanese Restaurant,Restaurant,Pizza Place,Italian Restaurant,Supermarket,Bank,Park,Burger Joint,Asian Restaurant,2
8,"Woodbine Gardens,Parkview Hill",Pizza Place,Sandwich Place,Park,Coffee Shop,Skating Rink,Pharmacy,Fast Food Restaurant,Grocery Store,Ice Cream Shop,Beer Store,2
10,Glencairn,Clothing Store,Coffee Shop,Furniture / Home Store,Restaurant,Sushi Restaurant,Fast Food Restaurant,Fried Chicken Joint,Grocery Store,Greek Restaurant,Bank,2
11,"Cloverdale,Islington,Martin Grove,Princess Gar...",Coffee Shop,Convenience Store,Fish & Chips Shop,Sandwich Place,Restaurant,Pizza Place,Pharmacy,Bank,Sushi Restaurant,Gym,2
12,"Highland Creek,Rouge Hill,Port Union",Coffee Shop,Breakfast Spot,Pizza Place,Pharmacy,Pet Store,Sandwich Place,Hotel,Mobile Phone Shop,Bar,Fried Chicken Joint,2
13,"Flemingdon Park,Don Mills South",Coffee Shop,Restaurant,Park,Japanese Restaurant,Pizza Place,Sandwich Place,Middle Eastern Restaurant,Gym,Asian Restaurant,Liquor Store,2


## Results
We observe the following with the 3 clusters:
* We observe that the gyms are not the most frequented venues in the three clusters.
* The gym are only the third in the neighborhood of Victoria Village.
* Most of the gym in each clusters are near of restaurant, store.
* Gyms are located in cluster 0 and cluster 2.
* Cluster 1 is more dedicated to restaurant or park.
* Restaurant and bars are the dominant in all clusters.
* The clusters containing gyms contains also supermarket, store, working place such as bank.
* The cluster 1 represents a part of a National Park.

## Discussion
According to this analysis, Gold's Gym should better open its gym in the clusters 0 or 2, where there are restaurants, shops or society such as bank and even hotel. The high frequency of the gym can be due to these reasons, either clients of these venues go to the gym or even the employee go to the gym since it is near to their working place. It is not a good idea to build a new gym in the cluster, since people would prefer to visit the zoo or park or run in the park than going to the gym.

## Conclusion
To conclude, we would have better results if we have more data such as land pricing or public transportation access. Then based on our analysis and assumptions, it would be better to build the gym in cluster 0 or cluster 2 that are in the city center and near to shops or work office. 