IBM Data Science Capstone, Class 9, Week 3,  "Cananda Postal Codes Neighborhoods Data Frames"  Steven Harrison, March 17, 2019

Install our BeautifulSoup HTML Parser

In [9]:
!conda install -c anaconda BeautifulSoup4

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following packages will be UPDATED:

    beautifulsoup4: 4.6.0-py35h442a8c9_1 --> 4.6.3-py35_0 anaconda

beautifulsoup4 100% |################################| Time: 0:00:00  36.13 MB/s


Import our Python libraries.

In [1]:
"""  Import our Data and Data Analysis Dependencies """
import numpy as np                    # import our numpy arrays
import pandas as pd                   # import our pandas dataframes
import urllib.request                 # import our web access tool
from bs4 import BeautifulSoup         # import our HTML parsing tool
from sklearn.cluster import KMeans    # import our Unsupervised Learning Clustering Algorithm
import matplotlib.pyplot as plt        # import our Python plotting library
import folium                         # import our map tool

Extract our HTML data from the Wikipedia Site.

In [2]:
"""  Import our Data,  Webscrape Wikipedia for our Canada Postal Codes  """
toronto_codes = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'    # our wikipedia target
postal_page = urllib.request.urlopen(toronto_codes)                                  
soup = BeautifulSoup(postal_page, 'html.parser')                                     # read all of the HTML into soup    

Isolate our data table embedded in our HTML.

In [3]:
code_table = soup.find('table',{'class':'wikitable sortable'})       # isolate our HTML table structure

Extract our data from the table tags and clean our data according to the system requirements

In [4]:
""" Extract our data from our table tr and td tags """
table_rows = code_table.findAll('tr')     # all our table elements are now in a list (mydata), extract from each...

delete = 0
row = list()          # initial processing
row2 = list()         # 2nd processing
row3 = list()         # 3rd processing 

for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]      # extracted data from HTML Tag
    if delete == 0:                 # handle the first value and remove []
        delete = 1
        continue
    row2.append(row)

""" Clean our data """
for items in row2:
    if items[1] == 'Not assigned':    # handle 'not assigned' in 2nd value
        continue
    temp_val = items[2].strip()       # handle \n removal on 3rd value each item
    del items[2]
    items.append(temp_val)
    if items[2] == 'Not assigned':    # handle 'not assigned' in 3rd value, 2nd value becomes 3rd value
        del items[2]
        replacement = items[1]
        items.append(replacement)
    row3.append(items)


Tranform our data into a pandas datafram, add column titles, and output to Jupyter Lab

In [5]:
""" read our cleaned Postal Codes csv file into our pandas dataframe """
df = pd.read_csv('Postal Codes2.csv') 
df.columns=['PostalCode', 'Borough', 'Neighborhood']
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M4A,North York,Victoria Village
1,M5A,Downtown Toronto,"Harbourfront, Regent Park"
2,M6A,North York,"Lawrence Heights, Lawrence Manor"
3,M7A,Queen's Park,Queen's Park
4,M9A,Etobicoke,Islington Avenue


Out to console the shape of our dataframe.

In [6]:
""" Output the shape of our pandas dataframe """
df.shape

(102, 3)

In [7]:
""" read our cleaned Postal Codes csv file with Lat/Long integration into our pandas dataframe """
# This dataframe includes the Lat and Long of the Postal Codes and Boroughs, for use with FourSquare processing 
df = pd.read_csv('Postal Codes3.csv') 
df.columns=['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
1,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
2,M1G,Scarborough,Woburn,43.770992,-79.216917
3,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
4,M1J,Scarborough,Scarborough Village,43.744734,-79.239476


In [8]:
canada_map = folium.Map(location=[43.700, -79.41],zoom_start = 10)

In [9]:
boroughs = ['Downtown Toronto', 'West Toronto','East Toronto']  # isolate and only cluster these boroughs that contain TORONTO...

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(canada_map)

canada_map            # render our map

Isolate our dataframe to contain on those boroughs that contain Toronto.

In [10]:
toronto_df  = df[df['Borough'] == 'West Toronto']
toronto2_df = df[df['Borough'] == 'East Toronto']
toronto3_df = df[df['Borough'] == 'Downtown Toronto']

toronto4_df = toronto_df.append(toronto2_df)
toronto5_df = toronto4_df.append(toronto3_df)
toronto6_df = toronto5_df.reset_index(drop=True)
toronto6_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M6H,West Toronto,"Dovercourt Village, Dufferin",43.669005,-79.442259
1,M6J,West Toronto,"Little Portugal, Trinity,",43.647927,-79.41975
2,M6K,West Toronto,"Brockton, Exhibition Place, Parkdale Village",43.636847,-79.428191
3,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763
4,M6R,West Toronto,"Parkdale, Roncesvalles",43.64896,-79.456325


Prepare our second map with the isolated Boroughs that have the word Toronto in them.

In [11]:
canada_map2 = folium.Map(location=[43.700, -79.41],zoom_start = 12)        # create our map object

In [12]:
""" Add our markers to our map for our isolated Toronto boroughs """
boroughs = ['Downtown Toronto', 'West Toronto','East Toronto']  # isolate and only cluster these boroughs that contain TORONTO...

for lat, lng, borough, neighborhood in zip(toronto6_df['Latitude'], toronto6_df['Longitude'], toronto6_df['Borough'], toronto6_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',         # make our isolated markers red
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(canada_map2)

canada_map2            # render our map

In [13]:
""" Integrate our FourSquare Data Source - Integrate our data into our Toronto Data Source """
CLIENT_ID     = '1XYLXHRU0FD2ZW0VTXOUMYBZFMGNMN0AQYHCNQQEC1IILVH4'
CLIENT_SECRET = '5AYMOSXZHUBH3ZXIDUN3RUF1TOG3TPV5FK3WLRMQBWZ4HGY1'
VERSION = '20190101'  # Foursquare API Version Date Format

# toronto6_df is our isolated dataframe for our Foursquare opertion and clustering model

toronto6_df.loc[0, 'Neighborhood']
toronto_latitude  = toronto6_df.loc[0, 'Latitude']    # Get a latitude from our pandas dataframe
toronto_longitude = toronto6_df.loc[0, 'Longitude']   # Get a longitude
toronto_name = toronto6_df.loc[0, 'Neighborhood']

print('Latitude and longitude values of {} are {}, {}.'.format(toronto_name,
                                                               toronto_latitude,
                                                               toronto_longitude
                                                               ))


Latitude and longitude values of Dovercourt Village, Dufferin are 43.66900510000001, -79.4422593.


In [14]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    toronto_latitude,
    toronto_longitude,
    radius,
    LIMIT)

url


'https://api.foursquare.com/v2/venues/explore?&client_id=1XYLXHRU0FD2ZW0VTXOUMYBZFMGNMN0AQYHCNQQEC1IILVH4&client_secret=5AYMOSXZHUBH3ZXIDUN3RUF1TOG3TPV5FK3WLRMQBWZ4HGY1&v=20190101&ll=43.66900510000001,-79.4422593&radius=500&limit=100'

In [15]:
""" retrieve our data set from Foursquare Places API into json file format """
import requests
results = requests.get(url).json()


In [16]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
from pandas.io.json import json_normalize

In [18]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
filtered_columns = ['venue.name','venue.categories','venue.location.lat','venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,The Greater Good Bar,Bar,43.669409,-79.439267
1,Parallel,Middle Eastern Restaurant,43.669516,-79.438728
2,Happy Bakery & Pastries,Bakery,43.66705,-79.441791
3,FreshCo,Supermarket,43.667918,-79.440754
4,Planet Fitness Toronto Galleria,Gym / Fitness Center,43.667588,-79.442574


In [19]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))


21 venues were returned by Foursquare.


In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                            'Neighborhood Latitude',
                            'Neighborhood Longitude',
                            'Venue',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category']
    return(nearby_venues)


In [21]:
toronto_venues = getNearbyVenues(names=toronto6_df['Neighborhood'],
                                latitudes = toronto6_df['Latitude'],
                                longitudes= toronto6_df['Longitude']
                                )

Dovercourt Village, Dufferin
Little Portugal, Trinity, 
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea
The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Business Reply Mail Processing Centre 969 Eastern
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie


Integrate our Toronto Dataframe with our Foursquare Data

In [22]:
print(toronto_venues.shape)
toronto_venues.head()


(1572, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Dovercourt Village, Dufferin",43.669005,-79.442259,The Greater Good Bar,43.669409,-79.439267,Bar
1,"Dovercourt Village, Dufferin",43.669005,-79.442259,Parallel,43.669516,-79.438728,Middle Eastern Restaurant
2,"Dovercourt Village, Dufferin",43.669005,-79.442259,Happy Bakery & Pastries,43.66705,-79.441791,Bakery
3,"Dovercourt Village, Dufferin",43.669005,-79.442259,FreshCo,43.667918,-79.440754,Supermarket
4,"Dovercourt Village, Dufferin",43.669005,-79.442259,Planet Fitness Toronto Galleria,43.667588,-79.442574,Gym / Fitness Center


In [23]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,58,58,58,58,58,58
"Brockton, Exhibition Place, Parkdale Village",23,23,23,23,23,23
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",44,44,44,44,44,44
Central Bay Street,79,79,79,79,79,79
"Chinatown, Grange Park, Kensington Market",97,97,97,97,97,97
Christie,16,16,16,16,16,16
Church and Wellesley,82,82,82,82,82,82


In [24]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 230 unique categories.


In [25]:
""" Begin our Machine Learning Pipeline begins - Data Preprocessing """
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()


Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
toronto_onehot.shape

(1572, 230)

In [27]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.0,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.010309,0.0,0.0,0.051546,0.0,0.051546,0.010309,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.012195,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.012195,0.012195,0.0,0.012195,0.012195,0.0


In [28]:
toronto_grouped.shape

(29, 230)

In [29]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----Adelaide, King, Richmond----
             venue  freq
0      Coffee Shop  0.06
1       Steakhouse  0.04
2              Bar  0.04
3             Café  0.04
4  Thai Restaurant  0.04


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.05
2  Italian Restaurant  0.03
3          Steakhouse  0.03
4      Farmers Market  0.03


----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0  Breakfast Spot  0.09
1            Café  0.09
2     Coffee Shop  0.09
3      Restaurant  0.04
4   Burrito Place  0.04


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0         Yoga Studio  0.06
1       Auto Workshop  0.06
2  Light Rail Station  0.06
3       Garden Center  0.06
4              Garden  0.06


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0    Airport Lounge  0.14
1   Airport Service  0.14
2  Ai

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]


In [31]:
num_top_venues = 10
indicators = ['st','nd','rd']
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,Bar,Steakhouse,Burger Joint,Gym,Sushi Restaurant,Hotel,Bakery
1,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Bakery,Steakhouse,Seafood Restaurant,Italian Restaurant,Farmers Market,Restaurant,Café
2,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Café,Coffee Shop,Yoga Studio,Bar,Burrito Place,Restaurant,Caribbean Restaurant,Climbing Gym,Performing Arts Venue
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Auto Workshop,Pizza Place,Moving Target,Restaurant,Butcher,Burrito Place,Brewery,Skate Park,Smoke Shop
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Boat or Ferry,Sculpture Garden,Plane,Boutique,Airport Gate,Airport


Cluster our most common venues according to Neighborhoods (Toronto Boroughs)

In [32]:
""" Cluster Neighborhoods """
""" run our unsupervised learning, Kmeans Clustering Algorithm """
kclusters = 5    # set our number of clusters ( 0 - 4 )
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
kmeans.labels_[0:10]
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = toronto6_df
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


Plot our clusters on a toronto map. People can use our map to find the most popular venues that exist based on their location in Toronto.

In [33]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[43.700, -79.41],zoom_start=12)
x = np.arange(kclusters)
ys = [i + x + (i * x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters


There are limited venues in the east parts of Toronto due to its geographical location from the downtown district.

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,East Toronto,0,Health Food Store,Coffee Shop,Pub,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Toronto,1,Bakery,Discount Store,Supermarket,Pharmacy,Fast Food Restaurant,Pool,Brewery,Café,Bar,Bank
5,West Toronto,1,Pizza Place,Café,Sushi Restaurant,Bookstore,Coffee Shop,Italian Restaurant,Food,Indie Movie Theater,Fish & Chips Shop,Falafel Restaurant
12,Downtown Toronto,1,Coffee Shop,Restaurant,Pizza Place,Bakery,Italian Restaurant,Café,Pub,Chinese Restaurant,Pharmacy,Park


A majority of our venues are clusted in the West and Central parts of Toronto, including the Etobicoke borough.

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,West Toronto,2,Bar,Coffee Shop,Asian Restaurant,Cocktail Bar,Bakery,Boutique,Restaurant,Pizza Place,Café,Men's Store
2,West Toronto,2,Breakfast Spot,Café,Coffee Shop,Yoga Studio,Bar,Burrito Place,Restaurant,Caribbean Restaurant,Climbing Gym,Performing Arts Venue
3,West Toronto,2,Bar,Mexican Restaurant,Café,Cajun / Creole Restaurant,Bakery,Italian Restaurant,Diner,Speakeasy,Flea Market,Fried Chicken Joint
4,West Toronto,2,Breakfast Spot,Gift Shop,Restaurant,Dessert Shop,Eastern European Restaurant,Bar,Burger Joint,Dog Run,Italian Restaurant,Movie Theater
7,East Toronto,2,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Bookstore,Cosmetics Shop,Brewery,Bubble Tea Shop,Restaurant,Café
8,East Toronto,2,Sandwich Place,Pub,Sushi Restaurant,Food & Drink Shop,Ice Cream Shop,Fish & Chips Shop,Fast Food Restaurant,Movie Theater,Burrito Place,Steakhouse
9,East Toronto,2,Café,Coffee Shop,Bakery,Italian Restaurant,Gastropub,American Restaurant,Yoga Studio,Sandwich Place,Fish Market,Juice Bar
10,East Toronto,2,Yoga Studio,Auto Workshop,Pizza Place,Moving Target,Restaurant,Butcher,Burrito Place,Brewery,Skate Park,Smoke Shop
13,Downtown Toronto,2,Japanese Restaurant,Coffee Shop,Gay Bar,Burger Joint,Restaurant,Mediterranean Restaurant,Bubble Tea Shop,Gym,Café,Pub
14,Downtown Toronto,2,Coffee Shop,Café,Park,Bakery,Theater,Breakfast Spot,Mexican Restaurant,Pub,Bank,Italian Restaurant


Our island borough has very limited venues because of its geography and surrounded by the Ocean.  All there is is the airport and its facilities.

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Downtown Toronto,3,Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Boat or Ferry,Sculpture Garden,Plane,Boutique,Airport Gate,Airport


In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Downtown Toronto,4,Park,Playground,Trail,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
