## Coursera Capstone Project
This project aims to end off the IBM Data Science Coursera Course

## Table of contents
* [Problem Statement](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Cluster Results](#clusterresults)
* [Discussion](#discussions)
* [Conclusion](#conclusions)

## Problem statement<a name="introduction"></a>

When an entrepreneur is looking to open a business, it would be useful to have an analysis of where a good location for this business would be. I would aim to segment cities using the Foursquare location data obtained from the API to determine what the best location is for the business, dependant on what type of business it is and where the gaps in the market are.

## Data<a name="data"></a>

The Foursquare API provides us with a multitude of location data and this would be used to segement cities into various clusters. These clusters would represent areas where certain services are in abundance and where certain servives are perhaps needed as there arent many of these in the area. This would then allow an entrepreneaur to decide what location to open his/her business in.

## Methodology<a name="methodology"></a>

I used the k-Means clustering methodology to take a city (in this instance Toronto) and segment the city into clusters. For each of these clusters I then analysed the POIs within them to assign labels to each of them.

## Results<a name="results"></a>
The code below is documented sufficiently to give my results and illustrate the process I followed.

In [136]:
import numpy as np
import pandas as pd
import geocoder
import folium
import requests
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [137]:
print("Hello Capstone Project Course!")

Hello Capstone Project Course!


### Define all functions necessary for the notebook

In [138]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Read data from table in Wikipedia html

In [139]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

Filter out rows with 'Not assigned' as Borough values

In [140]:
df = df[df['Borough'] != 'Not assigned']

Explode the cells where multiple Neighborhoods are split by /

In [141]:
df['Neighborhood'] = df['Neighborhood'].str.split('/')
df = df.explode('Neighborhood')


Check if any of the Boroughs have Not assigned values for Neighborhoods - To replace them with the Borough as indicated

In [142]:
df[df['Neighborhood']=='Not assigned']

Unnamed: 0,Postal code,Borough,Neighborhood


Shape of the df

In [143]:
df.shape

(208, 3)

Read in Lat Long CSV and join to df

In [144]:
path = r"/Users/BenNC/Desktop/Geospatial_Coordinates.csv"
loc_df = pd.read_csv(path)
loc_df.columns = ['Postal code','Latitude','Longitude'] 
df = pd.merge(df,loc_df,how='left',on='Postal code')

In [145]:
df

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park,43.654260,-79.360636
3,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
4,M6A,North York,Lawrence Manor,43.718518,-79.464763
...,...,...,...,...,...
203,M8Z,Etobicoke,Mimico NW,43.628841,-79.520999
204,M8Z,Etobicoke,The Queensway West,43.628841,-79.520999
205,M8Z,Etobicoke,South of Bloor,43.628841,-79.520999
206,M8Z,Etobicoke,Kingsway Park South West,43.628841,-79.520999


Use Folium to visualise the data of Toronto  
Toronto Lat and Long got from Google

In [146]:
latitude = 43.6532
longitude = -79.3832

# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

In [147]:
# Call the function above
map_toronto

In [148]:
# Define the client credentils from foursquare

In [149]:
CLIENT_ID = 'XMXZND32YAWUM1LOEMNAZ5E2FJU30GYKRM432KPTNYPCYWLA'
CLIENT_SECRET = 'MPJW5H2OMCN1NPB53LJMT2N1Q3SN5QLCL4CSRDGQDFGDGEBI' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [150]:
LIMIT = 50 
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=XMXZND32YAWUM1LOEMNAZ5E2FJU30GYKRM432KPTNYPCYWLA&client_secret=MPJW5H2OMCN1NPB53LJMT2N1Q3SN5QLCL4CSRDGQDFGDGEBI&v=20180605&ll=43.6532,-79.3832&radius=500&limit=50'

In [151]:
# Get the json from the foursquare API
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e943e740be7b4001b689340'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 88,
  'suggestedBounds': {'ne': {'lat': 43.6577000045, 'lng': -79.37699210971401},
   'sw': {'lat': 43.648699995499996, 'lng': -79.389407890286}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'l

Filter the dataframe into nearby venues dataframe

In [152]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Indigo,Bookstore,43.653515,-79.380696
3,LUSH,Cosmetics Shop,43.653557,-79.3804
4,Eggspectation Bell Trinity Square,Breakfast Spot,43.653144,-79.38198


Define function that can do this for all of the results in the full response

In [153]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [154]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park 
 Harbourfront
Lawrence Manor 
 Lawrence Heights
Queen's Park 
 Ontario Provincial Government
Islington Avenue
Malvern 
 Rouge
Don Mills
Parkview Hill 
 Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park 
 Princess Gardens 
 Martin Grove 
 Islington 
 Cloverdale
Rouge Hill 
 Port Union 
 Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate 
 Bloordale Gardens 
 Old Burnhamthorpe 
 Markland Wood
Guildwood 
 Morningside 
 West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor 
 Wilson Heights 
 Downsview North
Thorncliffe Park
Richmond 
 Adelaide 
 King
Dufferin 
 Dovercourt Village
Scarborough Village
Fairview 
 Henry Farm 
 Oriole
Northwood Park 
 York University
East Toronto
Harbourfront East 
 Union Station 
 Toronto Islands
Little Portugal 
 Trinity
Kennedy Park 
 Ionview 
 East Birchmount Park
Bayview Village
Do

KeyError: 'groups'

In [155]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 243 uniques categories.


In [156]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,50,50,50,50,50,50
Agincourt North,2,2,2,2,2,2
Albion Gardens,9,9,9,9,9,9
Bathurst Quay,16,16,16,16,16,16
Beaumond Heights,9,9,9,9,9,9
...,...,...,...,...,...,...
Willowdale,39,39,39,39,39,39
Willowdale,1,1,1,1,1,1
Woburn,3,3,3,3,3,3
Woodbine Heights,12,12,12,12,12,12


In [157]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

In [158]:
# move neighborhood column to the first column
list_ = list(toronto_onehot.columns[:])
list_.remove('Neighborhood')
fixed_columns = [toronto_onehot.columns[171]] + list_
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group by neighborhood and use the mean distance as the aggregation

In [159]:
toronto_group = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_group

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Adelaide,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.04,0.0,...,0.0,0.02,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
1,Agincourt North,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.00,0.0,...,0.0,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
2,Albion Gardens,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.00,0.0,...,0.0,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
3,Bathurst Quay,0.0,0.0625,0.0625,0.0625,0.125,0.1875,0.125,0.00,0.0,...,0.0,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
4,Beaumond Heights,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.00,0.0,...,0.0,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,Willowdale,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.00,0.0,...,0.0,0.00,0.0,0.000000,0.025641,0.0,0.0,0.0,0.0,0.0
194,Willowdale,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.00,0.0,...,0.0,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
195,Woburn,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.00,0.0,...,0.0,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
196,Woodbine Heights,0.0,0.0000,0.0000,0.0000,0.000,0.0000,0.000,0.00,0.0,...,0.0,0.00,0.0,0.083333,0.000000,0.0,0.0,0.0,0.0,0.0


In [160]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_group['Neighborhood']

for ind in np.arange(toronto_group.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_group.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,American Restaurant,Steakhouse,Pizza Place,Asian Restaurant,Bar,Restaurant,Brazilian Restaurant,Breakfast Spot
1,Agincourt North,Playground,Park,Yoga Studio,Curling Ice,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
2,Albion Gardens,Pizza Place,Grocery Store,Fried Chicken Joint,Sandwich Place,Beer Store,Fast Food Restaurant,Pharmacy,Garden Center,Garden,Dog Run
3,Bathurst Quay,Airport Service,Airport Lounge,Airport Terminal,Bar,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Boutique
4,Beaumond Heights,Pizza Place,Grocery Store,Fried Chicken Joint,Sandwich Place,Beer Store,Fast Food Restaurant,Pharmacy,Garden Center,Garden,Dog Run


# Clustering Analysis

In [161]:
no_clusters = 4

toronto_clustered = toronto_group.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=no_clusters, random_state=0).fit(toronto_clustered)

kmeans.labels_[0:10]

array([0, 2, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [164]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

#Drop rows where nans exist
toronto_merged.drop([toronto_merged.index[8] , toronto_merged.index[80],toronto_merged.index[81],toronto_merged.index[185]],inplace=True)
toronto_merged.head() # check the last columns!

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,2.0,Park,Food & Drink Shop,Yoga Studio,Dance Studio,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Hockey Arena,Coffee Shop,Portuguese Restaurant,Intersection,Yoga Studio,Diner,Department Store,Dessert Shop,Dim Sum Restaurant,Distribution Center
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,0.0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Mexican Restaurant,Theater,Yoga Studio,Hotel
3,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0.0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Mexican Restaurant,Theater,Yoga Studio,Hotel
4,M6A,North York,Lawrence Manor,43.718518,-79.464763,0.0,Clothing Store,Accessories Store,Arts & Crafts Store,Furniture / Home Store,Event Space,Coffee Shop,Boutique,Gift Shop,Vietnamese Restaurant,Airport Terminal


In [165]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(no_clusters)
ys = [i + x + (i*x)**2 for i in range(no_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster results<a name="clusterresults"></a>

### Cluster 1 - Boroughs with good access to Convenience stores, Pubs and fast food restaurants

In [166]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Hockey Arena,Coffee Shop,Portuguese Restaurant,Intersection,Yoga Studio,Diner,Department Store,Dessert Shop,Dim Sum Restaurant,Distribution Center
2,Downtown Toronto,0.0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Mexican Restaurant,Theater,Yoga Studio,Hotel
3,Downtown Toronto,0.0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Mexican Restaurant,Theater,Yoga Studio,Hotel
4,North York,0.0,Clothing Store,Accessories Store,Arts & Crafts Store,Furniture / Home Store,Event Space,Coffee Shop,Boutique,Gift Shop,Vietnamese Restaurant,Airport Terminal
5,North York,0.0,Clothing Store,Accessories Store,Arts & Crafts Store,Furniture / Home Store,Event Space,Coffee Shop,Boutique,Gift Shop,Vietnamese Restaurant,Airport Terminal
...,...,...,...,...,...,...,...,...,...,...,...,...
203,Etobicoke,0.0,Convenience Store,Wings Joint,Tanning Salon,Supplement Shop,Discount Store,Grocery Store,Gym,Sandwich Place,Bakery,Fast Food Restaurant
204,Etobicoke,0.0,Convenience Store,Wings Joint,Tanning Salon,Supplement Shop,Discount Store,Grocery Store,Gym,Sandwich Place,Bakery,Fast Food Restaurant
205,Etobicoke,0.0,Convenience Store,Wings Joint,Tanning Salon,Supplement Shop,Discount Store,Grocery Store,Gym,Sandwich Place,Bakery,Fast Food Restaurant
206,Etobicoke,0.0,Convenience Store,Wings Joint,Tanning Salon,Supplement Shop,Discount Store,Grocery Store,Gym,Sandwich Place,Bakery,Fast Food Restaurant


### Cluster 2 - Good for Bars, Yoga studios and various restaurants

In [167]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Scarborough,1.0,Bar,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
22,Scarborough,1.0,Bar,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
23,Scarborough,1.0,Bar,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


### Cluster 3 - Good for parks, playgrounds and general outdoor/exercise related activities

In [168]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,2.0,Park,Food & Drink Shop,Yoga Studio,Dance Studio,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
37,York,2.0,Park,Women's Store,Pool,Curling Ice,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
59,East York,2.0,Park,Convenience Store,Yoga Studio,Dance Studio,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
117,North York,2.0,Park,Convenience Store,Bank,Yoga Studio,Deli / Bodega,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
151,Scarborough,2.0,Playground,Park,Yoga Studio,Curling Ice,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
152,Scarborough,2.0,Playground,Park,Yoga Studio,Curling Ice,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
153,Scarborough,2.0,Playground,Park,Yoga Studio,Curling Ice,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
154,Scarborough,2.0,Playground,Park,Yoga Studio,Curling Ice,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
180,Downtown Toronto,2.0,Park,Trail,Playground,Yoga Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store
195,Etobicoke,2.0,Park,Baseball Field,Yoga Studio,Dance Studio,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


### Cluster 4 - Good for golf and Gyms

In [169]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Etobicoke,3.0,Golf Course,College Gym,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
17,Etobicoke,3.0,Golf Course,College Gym,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
18,Etobicoke,3.0,Golf Course,College Gym,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
19,Etobicoke,3.0,Golf Course,College Gym,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
20,Etobicoke,3.0,Golf Course,College Gym,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store


## Discussions<a name="discussions"></a>

As you can see from the above results, the city gives obvious segments of where certain services are more popular than others. It would therefore be recommended that the entrepreneur use these results according to the risk profile of their prospective business and make a decision based from these insights in conjunction with the risks he/she is willing to take.

## Conclusions<a name="conclusions"></a>

As we can see from the above clusters, there are certain areas within toronto that are good for ceratin businesses. This would be useful to an entrepreneur as it would allow them to decide whether they want to go for a safe bet and choose an area where there are similar businesses that are doing well, or whether to make a more risky choice and break into an area where their service doesnt yet exist.