# Explore Neighbourhoods in Toronto

### Task 1: Scrape postalcode data from Wikipedia page, wrangle the data, clean it, and then read it into a pandas dataframe

*Packages: Beautifulsoup, Pandas, lxml, sklearn-kmeans, matplotlib, folium*

In [1]:
#install Beautifulsoup 
import requests
import pandas as pd
import folium 
import matplotlib.cm as cm
import matplotlib.colors as colors
#!conda install --yes --prefix {sys.prefix} beautifulsoup4
# Matplotlib and associated plotting modules
# install parser:lxml
import sys
#!conda install --yes --prefix {sys.prefix} lxml
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

**Connect to the Wikipedia page by specifing the URL of the Wikipedia page to be scraped. Then parse the page using the *BeautifulSoup* constructor.**  
*Note: Wiki page linked in the course notes has been updated; so, the older version of the page that corresponds to the table in course notes was retrieved using archived wiki page.*

In [6]:
url = requests.get('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=945633050').text
from bs4 import BeautifulSoup
soup=BeautifulSoup(url, 'lxml')
# Output too long hence not displayed 

**Review the HTML code above and look for the data we need. The data is in a table. Since the wepage has only two tables, it is quicker to parse all table opbjects and select the first table.**

In [7]:
table = soup.find_all('table')[0]
# Output too long hence not displayed 

**Convert the HTML table to a dataframe using pandas library**

In [8]:
df=pd.read_html(str(table))[0]
type(df[0:5])

pandas.core.frame.DataFrame

In [9]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


**Remove Boroughs that are 'Not assigned'.**

In [10]:
df = df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


In [11]:
df.reset_index().drop('index',axis=1)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


**Multiple neighbourhoods can be assigned to a postal code. Neighbourhoods under the same postal code should be grouped seperated by a comma. So the dataframe is grouped by *Postcode* and *Borough*, and *Neighbourhood* is aggreagted by concatenating with a comma.**

In [12]:
df_grouped = df.groupby(['Postcode','Borough'], as_index=False).agg({'Neighbourhood':',' .join})
df_grouped.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [13]:
# update the column name to match table in assignment page
df_grouped.columns = ['Postalcode', 'Borough','Neighbourhood']
df_grouped.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [14]:
df_grouped.shape

(103, 3)

**Download latitutde and longitude data from the course page**

In [17]:
df_latlong = pd.read_csv('Geospatial_Coordinates.csv')
df_latlong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


**Merge latitude and longitude data with postalcode data**  
*Unlike New York neighborhood analysis, the latitude and longitude provided are for postal codes with one or more neighbourhoods. Therefore, the following analysis is about exploring and clustering areas based on postal codes.*

In [19]:
df_grouped['Latitude'] = df_latlong['Latitude'][df_latlong['Postal Code'] == df_grouped['Postalcode']]
df_grouped['Longitude'] = df_latlong['Longitude'][df_latlong['Postal Code'] == df_grouped['Postalcode']]
df_grouped.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Task 2: Create a map of Toronto City with area locations based on postal codes

In [20]:
# create map of Toronto using latitude and longitude values
from geopy.geocoders import Nominatim
df_neigh = df_grouped
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [21]:
# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(df_neigh['Latitude'], df_neigh['Longitude'], df_neigh['Borough'], df_neigh['Neighbourhood'],df_neigh['Postalcode']):
    label = '{}, {}, {}'.format(postcode,neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Analyse a subset of areas by dropping Boroughs with no 'Toronto' in their names**

In [28]:
df_toronto = df_neigh[df_neigh['Borough'].str.contains('Toronto')].reset_index(drop=True)
df_toronto.shape

(39, 5)

### Task 3: Explore areas using Foursquare API
**Access Foursquare API**

In [24]:
CLIENT_ID = '1VRUZN2ID4JNFQF3E0CAZK4IJBS5XOG5N54XRMIPPKW0J42K' # your Foursquare ID
client_secret = open('CLIENT SECRET.txt','r')
CLIENT_SECRET = client_secret.read() # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)


Your credentails:
CLIENT_ID: 1VRUZN2ID4JNFQF3E0CAZK4IJBS5XOG5N54XRMIPPKW0J42K


**Function to get nearby venues**

In [25]:
def NearbyVenues(postalcodes, boroughs, neighbourhoods, latitudes, longitudes, radius=500):
    limit=100
    venues_list=[]
    for postalcode, borough, neighbourhood, lat, lng in zip(postalcodes,  boroughs, neighbourhoods ,latitudes, longitudes):
        print('{} - {} - {}'.format(postalcode,borough,neighbourhood))
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            postalcode,
            borough,
            neighbourhood,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postalcode', 'Borough','Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

**Get nearby venues for all postal codes in the data frame.**

In [29]:
toronto_venues = NearbyVenues(postalcodes=df_toronto['Postalcode'], boroughs=df_toronto['Borough'], 
                                   neighbourhoods=df_toronto['Neighbourhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )

M4E - East Toronto - The Beaches
M4K - East Toronto - The Danforth West,Riverdale
M4L - East Toronto - The Beaches West,India Bazaar
M4M - East Toronto - Studio District
M4N - Central Toronto - Lawrence Park
M4P - Central Toronto - Davisville North
M4R - Central Toronto - North Toronto West
M4S - Central Toronto - Davisville
M4T - Central Toronto - Moore Park,Summerhill East
M4V - Central Toronto - Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
M4W - Downtown Toronto - Rosedale
M4X - Downtown Toronto - Cabbagetown,St. James Town
M4Y - Downtown Toronto - Church and Wellesley
M5A - Downtown Toronto - Harbourfront
M5B - Downtown Toronto - Ryerson,Garden District
M5C - Downtown Toronto - St. James Town
M5E - Downtown Toronto - Berczy Park
M5G - Downtown Toronto - Central Bay Street
M5H - Downtown Toronto - Adelaide,King,Richmond
M5J - Downtown Toronto - Harbourfront East,Toronto Islands,Union Station
M5K - Downtown Toronto - Design Exchange,Toronto Dominion Centre
M5L - Down

In [27]:
print(toronto_venues.shape)
toronto_venues.head()

(1726, 9)


Unnamed: 0,Postalcode,Borough,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,East Toronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,East Toronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


**Count the number of venues for each postal code**

In [33]:
toronto_venues.groupby('Postalcode').count()

Unnamed: 0_level_0,Borough,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postalcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
M4E,4,4,4,4,4,4,4,4
M4K,41,41,41,41,41,41,41,41
M4L,20,20,20,20,20,20,20,20
M4M,41,41,41,41,41,41,41,41
M4N,3,3,3,3,3,3,3,3
M4P,7,7,7,7,7,7,7,7
M4R,20,20,20,20,20,20,20,20
M4S,38,38,38,38,38,38,38,38
M4T,4,4,4,4,4,4,4,4
M4V,15,15,15,15,15,15,15,15


In [32]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 241 uniques categories.


In [34]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add postalcode column back to dataframe
toronto_onehot['Postalcode'] = toronto_venues['Postalcode']
toronto_onehot['Borough'] = toronto_venues['Borough']
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood']
# move postalcode column to the first column
fixed_columns = [toronto_onehot.columns[-3]]+ [toronto_onehot.columns[-2]] + [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-3])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4K,East Toronto,"The Danforth West,Riverdale",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


**Group rows by postalcode and by taking the mean of the frequency of occurrence of each category**

In [35]:
print(toronto_onehot.shape)
toronto_grouped = toronto_onehot.groupby(['Postalcode','Borough','Neighbourhood']).mean().reset_index()
toronto_grouped

(1726, 244)


Unnamed: 0,Postalcode,Borough,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,"The Danforth West,Riverdale",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439
2,M4L,East Toronto,"The Beaches West,India Bazaar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.02439
4,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M4P,Central Toronto,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4R,Central Toronto,North Toronto West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
7,M4S,Central Toronto,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M4T,Central Toronto,"Moore Park,Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0


**Print each postalcode along with the top 10 most common venues**

In [36]:
top_venues = 10

for PC in toronto_grouped['Postalcode']:
    print("----"+PC+"----")
    temp = toronto_grouped.loc[toronto_grouped['Postalcode'] == PC,'Afghan Restaurant':'Yoga Studio'].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(top_venues))
    print('\n')

----M4E----
                             venue  freq
0                              Pub  0.25
1                Health Food Store  0.25
2                     Neighborhood  0.25
3                            Trail  0.25
4                          Airport  0.00
5                      Music Venue  0.00
6                           Museum  0.00
7                    Movie Theater  0.00
8              Monument / Landmark  0.00
9  Molecular Gastronomy Restaurant  0.00


----M4K----
                     venue  freq
0         Greek Restaurant  0.20
1              Coffee Shop  0.10
2       Italian Restaurant  0.07
3   Furniture / Home Store  0.05
4           Ice Cream Shop  0.05
5                Bookstore  0.05
6                     Café  0.02
7  Fruit & Vegetable Store  0.02
8       Frozen Yogurt Shop  0.02
9          Bubble Tea Shop  0.02


----M4L----
                  venue  freq
0                  Park  0.10
1           Pizza Place  0.05
2    Italian Restaurant  0.05
3               Brewery  0

**To convert the above list into a data frame**
1. Sort venues in descending order of frquency 
2. Append the top 15 venues to each postal code

In [37]:
def return_most_common_venues(row, top_venues):
    row_categories = row.iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:top_venues]

In [38]:
top_venues = 15
import numpy as np
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postalcode','Borough','Neighbourhood']
for ind in np.arange(top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postalcode_venues_sorted = pd.DataFrame(columns=columns)
postalcode_venues_sorted[['Postalcode','Borough','Neighbourhood']] = toronto_grouped[['Postalcode','Borough','Neighbourhood']]

for ind in np.arange(toronto_grouped.shape[0]):
    postalcode_venues_sorted.iloc[ind, 3:] = return_most_common_venues(toronto_grouped.iloc[ind, :], top_venues)

postalcode_venues_sorted

Unnamed: 0,Postalcode,Borough,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,M4E,East Toronto,The Beaches,Neighborhood,Trail,Health Food Store,Pub,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Yoga Studio,Department Store,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
1,M4K,East Toronto,"The Danforth West,Riverdale",Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bookstore,Yoga Studio,Caribbean Restaurant,Pub,Spa,Juice Bar,Dessert Shop,Café,Restaurant,Brewery
2,M4L,East Toronto,"The Beaches West,India Bazaar",Park,Coffee Shop,Pub,Sandwich Place,Board Shop,Burrito Place,Fast Food Restaurant,Restaurant,Italian Restaurant,Steakhouse,Fish & Chips Shop,Sushi Restaurant,Ice Cream Shop,Brewery,Liquor Store
3,M4M,East Toronto,Studio District,Café,Coffee Shop,American Restaurant,Bakery,Brewery,Italian Restaurant,Yoga Studio,Fish Market,Pet Store,Park,Neighborhood,Middle Eastern Restaurant,Latin American Restaurant,Ice Cream Shop,Gay Bar
4,M4N,Central Toronto,Lawrence Park,Park,Swim School,Bus Line,Yoga Studio,Diner,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
5,M4P,Central Toronto,Davisville North,Gym,Food & Drink Shop,Department Store,Park,Breakfast Spot,Sandwich Place,Hotel,Dog Run,Doner Restaurant,Dumpling Restaurant,Donut Shop,Discount Store,Eastern European Restaurant,Electronics Store,Empanada Restaurant
6,M4R,Central Toronto,North Toronto West,Clothing Store,Coffee Shop,Yoga Studio,Sporting Goods Shop,Café,Chinese Restaurant,Dessert Shop,Diner,Fast Food Restaurant,Mexican Restaurant,Park,Pet Store,Rental Car Location,Restaurant,Salon / Barbershop
7,M4S,Central Toronto,Davisville,Pizza Place,Sandwich Place,Dessert Shop,Gym,Sushi Restaurant,Coffee Shop,Café,Italian Restaurant,Asian Restaurant,Diner,Japanese Restaurant,Restaurant,Indian Restaurant,Flower Shop,Dance Studio
8,M4T,Central Toronto,"Moore Park,Summerhill East",Restaurant,Park,Tennis Court,Playground,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Deli / Bodega,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",Pub,Coffee Shop,Fried Chicken Joint,Liquor Store,Sports Bar,Restaurant,Supermarket,Sushi Restaurant,Bank,Light Rail Station,Pizza Place,American Restaurant,Vietnamese Restaurant,Coworking Space,Discount Store


## 4. Cluster postal code areas based on venues

In [39]:
# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop(['Postalcode','Borough','Neighbourhood'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:10])
# add clustering labels
postalcode_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_toronto.drop(['Borough','Neighbourhood'],axis=1)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(postalcode_venues_sorted.set_index('Postalcode'), on=['Postalcode'])

toronto_merged.head() 

[4 1 1 1 0 1 1 1 5 1]


Unnamed: 0,Postalcode,Latitude,Longitude,Cluster Labels,Borough,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,M4E,43.676357,-79.293031,4,East Toronto,The Beaches,Neighborhood,Trail,Health Food Store,Pub,...,Diner,Discount Store,Distribution Center,Dog Run,Yoga Studio,Department Store,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
1,M4K,43.679557,-79.352188,1,East Toronto,"The Danforth West,Riverdale",Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,...,Bookstore,Yoga Studio,Caribbean Restaurant,Pub,Spa,Juice Bar,Dessert Shop,Café,Restaurant,Brewery
2,M4L,43.668999,-79.315572,1,East Toronto,"The Beaches West,India Bazaar",Park,Coffee Shop,Pub,Sandwich Place,...,Burrito Place,Fast Food Restaurant,Restaurant,Italian Restaurant,Steakhouse,Fish & Chips Shop,Sushi Restaurant,Ice Cream Shop,Brewery,Liquor Store
3,M4M,43.659526,-79.340923,1,East Toronto,Studio District,Café,Coffee Shop,American Restaurant,Bakery,...,Italian Restaurant,Yoga Studio,Fish Market,Pet Store,Park,Neighborhood,Middle Eastern Restaurant,Latin American Restaurant,Ice Cream Shop,Gay Bar
4,M4N,43.72802,-79.38879,0,Central Toronto,Lawrence Park,Park,Swim School,Bus Line,Yoga Studio,...,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


**Visualize clusters**

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, PC, cluster,borough in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Postalcode'], toronto_merged['Cluster Labels'], toronto_merged['Borough']):
    label = folium.Popup(str(PC) + str(borough) +' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Number of areas in each cluster**

In [42]:
print(toronto_merged.groupby('Cluster Labels').count()['Postalcode'])


Cluster Labels
0     1
1    33
2     1
3     1
4     1
5     1
6     1
Name: Postalcode, dtype: int64


**Explore each cluster**

Cluster 0 seems to be area/s where restaurants/eateries are not the most common venues. Cluster 0 is farthest from Toronto city center.

In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Postalcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
4,M4N,Park,Swim School,Bus Line,Yoga Studio,Diner,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


Cluster 1 has the highest number of areas. Since the data set consisted of Boroughs around Toronto city, these areas are a lot similar to each other in terms of the type of venues most visited - restaurants, cafes, coffe shops. 

In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0] +[4]+ list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Postalcode,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
1,M4K,East Toronto,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bookstore,Yoga Studio,Caribbean Restaurant,Pub,Spa,Juice Bar,Dessert Shop,Café,Restaurant,Brewery
2,M4L,East Toronto,Park,Coffee Shop,Pub,Sandwich Place,Board Shop,Burrito Place,Fast Food Restaurant,Restaurant,Italian Restaurant,Steakhouse,Fish & Chips Shop,Sushi Restaurant,Ice Cream Shop,Brewery,Liquor Store
3,M4M,East Toronto,Café,Coffee Shop,American Restaurant,Bakery,Brewery,Italian Restaurant,Yoga Studio,Fish Market,Pet Store,Park,Neighborhood,Middle Eastern Restaurant,Latin American Restaurant,Ice Cream Shop,Gay Bar
5,M4P,Central Toronto,Gym,Food & Drink Shop,Department Store,Park,Breakfast Spot,Sandwich Place,Hotel,Dog Run,Doner Restaurant,Dumpling Restaurant,Donut Shop,Discount Store,Eastern European Restaurant,Electronics Store,Empanada Restaurant
6,M4R,Central Toronto,Clothing Store,Coffee Shop,Yoga Studio,Sporting Goods Shop,Café,Chinese Restaurant,Dessert Shop,Diner,Fast Food Restaurant,Mexican Restaurant,Park,Pet Store,Rental Car Location,Restaurant,Salon / Barbershop
7,M4S,Central Toronto,Pizza Place,Sandwich Place,Dessert Shop,Gym,Sushi Restaurant,Coffee Shop,Café,Italian Restaurant,Asian Restaurant,Diner,Japanese Restaurant,Restaurant,Indian Restaurant,Flower Shop,Dance Studio
9,M4V,Central Toronto,Pub,Coffee Shop,Fried Chicken Joint,Liquor Store,Sports Bar,Restaurant,Supermarket,Sushi Restaurant,Bank,Light Rail Station,Pizza Place,American Restaurant,Vietnamese Restaurant,Coworking Space,Discount Store
11,M4X,Downtown Toronto,Coffee Shop,Restaurant,Italian Restaurant,Pizza Place,Pharmacy,Pet Store,Park,Bakery,Pub,Café,Japanese Restaurant,Caribbean Restaurant,Playground,Farmers Market,Jewelry Store
12,M4Y,Downtown Toronto,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Gastropub,Hotel,Pub,Bubble Tea Shop,Men's Store,Mediterranean Restaurant,Café,Polish Restaurant,General Entertainment,Gym
13,M5A,Downtown Toronto,Coffee Shop,Pub,Park,Bakery,Mexican Restaurant,Theater,Café,Beer Store,Dessert Shop,Chocolate Shop,Ice Cream Shop,Yoga Studio,Distribution Center,Spa,Shoe Store


Cluster 2 has venues that are typical of suburban areas. The map shows the cluster is not close to the city center. 

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0] + [4]+ list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Postalcode,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
22,M5N,Central Toronto,Garden,Yoga Studio,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner


Cluster 3 is at the north end of downtown Toronto. The area is close to parks and creeks.  


In [46]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0] +[4]+ list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Postalcode,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
10,M4W,Downtown Toronto,Park,Trail,Playground,Yoga Studio,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Dumpling Restaurant,Donut Shop,Department Store,Eastern European Restaurant,Electronics Store,Empanada Restaurant


Cluster 4 is far east of the city center. The map suggests this is a residential area.

In [47]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0] +[4]+ list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Postalcode,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,M4E,East Toronto,Neighborhood,Trail,Health Food Store,Pub,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Yoga Studio,Department Store,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store


Cluster 5 is north of cluster 2. Cluster 5 is not on the public transportation route and seems to be residential. 

In [48]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[0] +[4]+ list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Postalcode,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
8,M4T,Central Toronto,Restaurant,Park,Tennis Court,Playground,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Deli / Bodega,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop


Cluster 6 is in north central Toronto. The area has parks, trails, schools and housing. 

In [49]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[0] +[4]+ list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Postalcode,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
23,M5P,Central Toronto,Jewelry Store,Bus Line,Park,Sushi Restaurant,Trail,Donut Shop,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio,Diner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant
