### Project: Segmenting and Clustering Neighborhoods in Toronto (part three)

---

In this notebook, I will explore the data from Toronto to get some ideas about their venues with the FourSquare API and machine learning techniques.

Load necessary libraries

In [1]:
import pandas as pd
import numpy as np
import requests as rq
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import folium as fl
from geopy.geocoders import Nominatim as geo
from sklearn.cluster import KMeans
pd.set_option('display.max_colwidth', -1)

I'll import the data from last work.

In [2]:
df_ca = pd.read_csv('df_ca_ll.csv', sep = ';')
df_ca.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.8113,-79.193
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.7878,-79.1564
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866
3,M1G,Scarborough,Woburn,43.7712,-79.2144
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389
5,M1J,Scarborough,Scarborough Village,43.7464,-79.2323
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.7298,-79.2639
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.7122,-79.2843
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.7247,-79.2312
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.6952,-79.2646


I'll just use data from Toronto.

In [3]:
df_ca['Toronto'] = [True if 'Toronto' in B else False for B in df_ca['Borough']]

In [4]:
df_tor = df_ca[df_ca['Toronto'] == True].reset_index(drop=True)

In [5]:
df_tor.drop('Toronto', axis = 1, inplace = True)

In [6]:
df_tor

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.6784,-79.2941
1,M4K,East Toronto,"The Danforth West, Riverdale",43.6803,-79.3538
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.6693,-79.3155
3,M4M,East Toronto,Studio District,43.6561,-79.3406
4,M4N,Central Toronto,Lawrence Park,43.7301,-79.3935
5,M4P,Central Toronto,Davisville North,43.7135,-79.3887
6,M4R,Central Toronto,North Toronto West,43.7143,-79.4065
7,M4S,Central Toronto,Davisville,43.702,-79.3853
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.6899,-79.3853
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",43.6861,-79.4025


Sounds good. Let's see the map.

In [7]:
address = 'Toronto, CA'

geolocator = geo(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [8]:
map_toronto = fl.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, postcode, neighbourhood in zip(df_tor['Latitude'], df_tor['Longitude'], df_tor['Postcode'], df_tor['Neighbourhood']):
    label = '{}, {}'.format(postcode, neighbourhood)
    label = fl.Popup(label, parse_html=True)
    fl.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Hands on FourSquare API.

In [9]:
ID = '4EMUJ5LGYYCHNLPZ4RNDWICPENZI5Q5CDIDW5BIYVBNKNLCZ'
PASS = 'E1DSAFD4W2R4F1BELSC2KIH0OBG0L1UBE4LDHIWMFUGN2XPZ'
VERSION = '20191201'

Fine. So let's do the venues analysis. Get the top 100 venues near by each postal code until 500 meters of radius. Most part of code was used here is an adaptation from Coursera course, because for me, it's good enough.

In [10]:
# Function to get venues from FourSquare API

def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            ID, 
            PASS, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        results = rq.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Postcode Latitude', 
                  'Postcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
# Getting values

toronto_venues = getNearbyVenues(names=df_tor['Postcode'], latitudes=df_tor['Latitude'], longitudes=df_tor['Longitude'])

M4E
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6G
M6H
M6J
M6K
M6P
M6R
M6S
M7Y
M9A


Checking the new dataset

In [12]:
print(toronto_venues.shape)
toronto_venues.head()

(1605, 7)


Unnamed: 0,Postcode,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M4E,43.6784,-79.2941,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,43.6784,-79.2941,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,43.6784,-79.2941,Beaches Bake Shop,43.680363,-79.289692,Bakery
3,M4E,43.6784,-79.2941,The Beech Tree,43.680493,-79.288846,Gastropub
4,M4E,43.6784,-79.2941,Grover Pub and Grub,43.679181,-79.297215,Pub


In [13]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 216 uniques categories.


Now, data will be binarized to do some calculations

In [14]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Postcode'] = toronto_venues['Postcode'] 

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Postcode,Accessories Store,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,M4E,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Checking data and get de means

In [15]:
toronto_onehot.shape

(1605, 217)

In [16]:
toronto_grouped = toronto_onehot.groupby('Postcode').mean().reset_index()
toronto_grouped

Unnamed: 0,Postcode,Accessories Store,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M4P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M4S,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M4V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now, it's time to see the top five venues of all postal codes

In [17]:
num_top_venues = 5

for hood in toronto_grouped['Postcode']:
    print("---- "+hood+" ----")
    temp = toronto_grouped[toronto_grouped['Postcode'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- M4E ----
               venue  freq
0  Pub                0.29
1  Gastropub          0.14
2  Health Food Store  0.14
3  Bakery             0.14
4  Neighborhood       0.14


---- M4K ----
                venue  freq
0  Greek Restaurant    0.24
1  Restaurant          0.08
2  Ice Cream Shop      0.05
3  Italian Restaurant  0.05
4  Coffee Shop         0.05


---- M4L ----
              venue  freq
0  Sandwich Place    0.10
1  Pizza Place       0.05
2  Sushi Restaurant  0.05
3  Park              0.05
4  Pub               0.05


---- M4M ----
                   venue  freq
0  Coffee Shop            0.12
1  Coworking Space        0.12
2  Garden Center          0.12
3  Performing Arts Venue  0.12
4  Park                   0.12


---- M4N ----
                venue  freq
0  Photography Studio  0.5 
1  Park                0.5 
2  Neighborhood        0.0 
3  Men's Store         0.0 
4  Mexican Restaurant  0.0 


---- M4P ----
               venue  freq
0  Food & Drink Shop  0.14
1  Convenien

In [18]:
# Function to get categories

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
# Creating the new dataframe

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Postcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

postcode_venues_sorted = pd.DataFrame(columns=columns)
postcode_venues_sorted['Postcode'] = toronto_grouped['Postcode']

for ind in np.arange(toronto_grouped.shape[0]):
    postcode_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postcode_venues_sorted

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,Pub,Health Food Store,Neighborhood,Bakery,Trail,Gastropub,Event Space,Ethiopian Restaurant,Electronics Store,Dog Run
1,M4K,Greek Restaurant,Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Yoga Studio,Café,Pub,Bubble Tea Shop,Brewery
2,M4L,Sandwich Place,Italian Restaurant,Ice Cream Shop,Burrito Place,Burger Joint,Fast Food Restaurant,Brewery,Fish & Chips Shop,Food & Drink Shop,Steakhouse
3,M4M,Coworking Space,Gym,Garden Center,Baseball Field,Coffee Shop,Diner,Park,Performing Arts Venue,Cupcake Shop,Doner Restaurant
4,M4N,Photography Studio,Park,Yoga Studio,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
5,M4P,Convenience Store,Park,Dog Run,Gym,Breakfast Spot,Food & Drink Shop,Clothing Store,Yoga Studio,Ethiopian Restaurant,Eastern European Restaurant
6,M4R,Garden,Gym Pool,Park,Playground,Yoga Studio,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
7,M4S,Sandwich Place,Café,Italian Restaurant,Dessert Shop,Gym,Thai Restaurant,Farmers Market,Fast Food Restaurant,Salon / Barbershop,Restaurant
8,M4T,Park,Gym,Grocery Store,Playground,Thai Restaurant,Yoga Studio,Eastern European Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
9,M4V,Light Rail Station,Coffee Shop,Liquor Store,Supermarket,Donut Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


I think five groups it's a good way to cluster the data

In [20]:
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop('Postcode', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
kmeans.labels_

array([0, 0, 0, 0, 2, 0, 1, 0, 1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       4, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0])

It seems, the postal code M5N don't have information, so I droped it. Maybe because de period or some algorithm error.

In [21]:
postcode_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = df_tor
toronto_merged = toronto_merged.join(postcode_venues_sorted.set_index('Postcode'), on='Postcode')
toronto_merged.dropna(inplace = True)
toronto_merged

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.6784,-79.2941,0.0,Pub,Health Food Store,Neighborhood,Bakery,Trail,Gastropub,Event Space,Ethiopian Restaurant,Electronics Store,Dog Run
1,M4K,East Toronto,"The Danforth West, Riverdale",43.6803,-79.3538,0.0,Greek Restaurant,Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Yoga Studio,Café,Pub,Bubble Tea Shop,Brewery
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.6693,-79.3155,0.0,Sandwich Place,Italian Restaurant,Ice Cream Shop,Burrito Place,Burger Joint,Fast Food Restaurant,Brewery,Fish & Chips Shop,Food & Drink Shop,Steakhouse
3,M4M,East Toronto,Studio District,43.6561,-79.3406,0.0,Coworking Space,Gym,Garden Center,Baseball Field,Coffee Shop,Diner,Park,Performing Arts Venue,Cupcake Shop,Doner Restaurant
4,M4N,Central Toronto,Lawrence Park,43.7301,-79.3935,2.0,Photography Studio,Park,Yoga Studio,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
5,M4P,Central Toronto,Davisville North,43.7135,-79.3887,0.0,Convenience Store,Park,Dog Run,Gym,Breakfast Spot,Food & Drink Shop,Clothing Store,Yoga Studio,Ethiopian Restaurant,Eastern European Restaurant
6,M4R,Central Toronto,North Toronto West,43.7143,-79.4065,1.0,Garden,Gym Pool,Park,Playground,Yoga Studio,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
7,M4S,Central Toronto,Davisville,43.702,-79.3853,0.0,Sandwich Place,Café,Italian Restaurant,Dessert Shop,Gym,Thai Restaurant,Farmers Market,Fast Food Restaurant,Salon / Barbershop,Restaurant
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.6899,-79.3853,1.0,Park,Gym,Grocery Store,Playground,Thai Restaurant,Yoga Studio,Eastern European Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",43.6861,-79.4025,3.0,Light Rail Station,Coffee Shop,Liquor Store,Supermarket,Donut Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


Let's plot the new mal with the cluster segmemtation

In [22]:
map_clusters = fl.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Postcode'], toronto_merged['Cluster Labels']):
    label = fl.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    fl.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

So, I check each cluster to understand the segmentation

In [23]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0.0,Pub,Health Food Store,Neighborhood,Bakery,Trail,Gastropub,Event Space,Ethiopian Restaurant,Electronics Store,Dog Run
1,East Toronto,0.0,Greek Restaurant,Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Yoga Studio,Café,Pub,Bubble Tea Shop,Brewery
2,East Toronto,0.0,Sandwich Place,Italian Restaurant,Ice Cream Shop,Burrito Place,Burger Joint,Fast Food Restaurant,Brewery,Fish & Chips Shop,Food & Drink Shop,Steakhouse
3,East Toronto,0.0,Coworking Space,Gym,Garden Center,Baseball Field,Coffee Shop,Diner,Park,Performing Arts Venue,Cupcake Shop,Doner Restaurant
5,Central Toronto,0.0,Convenience Store,Park,Dog Run,Gym,Breakfast Spot,Food & Drink Shop,Clothing Store,Yoga Studio,Ethiopian Restaurant,Eastern European Restaurant
7,Central Toronto,0.0,Sandwich Place,Café,Italian Restaurant,Dessert Shop,Gym,Thai Restaurant,Farmers Market,Fast Food Restaurant,Salon / Barbershop,Restaurant
11,Downtown Toronto,0.0,Coffee Shop,Pizza Place,Bakery,Restaurant,Café,Italian Restaurant,Gift Shop,Indian Restaurant,Intersection,Breakfast Spot
12,Downtown Toronto,0.0,Coffee Shop,Gay Bar,Japanese Restaurant,Restaurant,Yoga Studio,Theater,Gastropub,Hotel,Diner,Sushi Restaurant
13,Downtown Toronto,0.0,Coffee Shop,Restaurant,Breakfast Spot,Gym / Fitness Center,Beer Store,Italian Restaurant,Bakery,Thai Restaurant,Theater,Food Truck
14,Downtown Toronto,0.0,Coffee Shop,Clothing Store,Cosmetics Shop,Fast Food Restaurant,Café,Middle Eastern Restaurant,Ramen Restaurant,Tea Room,Plaza,Pizza Place


In [24]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Central Toronto,1.0,Garden,Gym Pool,Park,Playground,Yoga Studio,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
8,Central Toronto,1.0,Park,Gym,Grocery Store,Playground,Thai Restaurant,Yoga Studio,Eastern European Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
10,Downtown Toronto,1.0,Park,Playground,Grocery Store,Candy Store,Yoga Studio,Doner Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
30,Downtown Toronto,1.0,Grocery Store,Café,Park,Candy Store,Baby Store,Playground,Coffee Shop,Deli / Bodega,Dance Studio,Farmers Market


In [25]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,2.0,Photography Studio,Park,Yoga Studio,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
34,West Toronto,2.0,Park,Yoga Studio,Dog Run,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


In [26]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Central Toronto,3.0,Light Rail Station,Coffee Shop,Liquor Store,Supermarket,Donut Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


In [27]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,4.0,Park,Lawyer,Trail,Yoga Studio,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [28]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0]['1st Most Common Venue'].value_counts()

Coffee Shop           14
Café                  4 
Sandwich Place        3 
Pub                   1 
Park                  1 
Pharmacy              1 
Music Venue           1 
Greek Restaurant      1 
Chinese Restaurant    1 
Bar                   1 
Convenience Store     1 
Coworking Space       1 
Name: 1st Most Common Venue, dtype: int64

In [29]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0]['2nd Most Common Venue'].value_counts()

Café                   7
Restaurant             4
Coffee Shop            3
Italian Restaurant     2
Pizza Place            2
Bakery                 2
Park                   2
Clothing Store         1
Yoga Studio            1
Gay Bar                1
Food & Drink Shop      1
American Restaurant    1
Gym                    1
Health Food Store      1
Harbor / Marina        1
Name: 2nd Most Common Venue, dtype: int64

### Resume

I believe cluster 0 locations are numerous more attractive in terms of food and hotel choices, an area suitable for tourism. Already the cluster of 1-4 could be merged into a single group, because apparently they are places where they have more parks and field resources, as well as some options of gyms and pet stores. Thus, they should be used by the local population and rural areas, being a good candidate for family outings and exercises.I believe cluster 0 locations are numerous more attractive in terms of food and hotel choices, an area suitable for tourism. Already the cluster of 1-4 could be merged into a single group, because apparently they are places where they have more parks and field resources, as well as some options of gyms and pet stores. Thus, they should be used by the local population and rural areas, being a good candidate for family outings and exercises.

I apologize for my English, since I am Brazilian and beginner in data science.

Diego N. Vilela