# Coursera IBM Capstone Assignment

This assignment aims to demonstrate the ability to scrap data off the web and turn it into a useful format for data science application

# Segmentation & Clustering

This section of the assignment focuses on web scrapping from the wikipedia website to obtain the necessary address information for the city of Toronto in Canada. 

### Assignment Question 1

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Start by creating a new Notebook for this assignment.
2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in      order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

3. To create the above dataframe:

* The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
* Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
* More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
* If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
* Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
* In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.      
4. Submit a link to your Notebook on your Github repository. (10 marks)

## Web Scrapping off Wikipedia

In [1]:
import requests
from bs4 import BeautifulSoup
import bs4

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install folium
import folium # map rendering library

print('Libraries imported.')

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 6.4MB/s eta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1
Libraries imported.


In [2]:
toronto_codes = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [3]:
soup = BeautifulSoup(toronto_codes.text, 'lxml')

soup

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"XptG7wpAMNAAAUaw2@sAAAES","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":951325562,"wgRevisionId":951325562,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"

### Useful Functions to help work with the data

In [4]:
def column_maker(data=None, position=None):
    """Function to return a list from within a nested list"""
    list_name = []
    for code in data:
        post_code = code[position]
        list_name.append(post_code)
    return list_name

def clean_data(data=None, characters = None):
    """Function to clean out unnecessary characters in a column"""
    list_name = []
    for item in data:
        cleaned = item.rstrip(characters)
        list_name.append(cleaned)
    return list_name

In [5]:
#Retrieving the data from a table within the html document
rows = soup.find(class_="wikitable sortable").find_all('tr')[1:]

#Isolating the values of interest through a list Comprehension
Data = []
for row in rows:
    cell = [i.text for i in row.find_all('td')]
    Data.append(cell)

In [9]:
#Cleaning cells to create a structured dataset
PostalCode   = column_maker(data=Data, position=0)
PostalCode   = clean_data(data=PostalCode, characters='\n')

Borough      = column_maker(data=Data, position=1)
Borough      = clean_data(data=Borough, characters='\n')

Neighborhood = column_maker(data=Data, position=2)
Neighborhood = clean_data(data=Neighborhood, characters='\n')

#Checking structure of the data
print(len(PostalCode), len(Borough), len(Neighborhood))

180 180 180


In [10]:
#Creating DataFrame
Toronto_Data = pd.DataFrame({'PostalCode': PostalCode, 'Borough': Borough, 'Neighborhood': Neighborhood})

Toronto_Data

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


In [11]:
# Removing the Slashed in the Neighborhood column

Neighbor_Clean = []

for i in Toronto_Data.index:
    neighbor = Toronto_Data.iloc[i, 2]
    replace  = neighbor.replace("/", ",")
    Neighbor_Clean.append(replace)

In [12]:
# Seeing if the cleaning procedure worked
Neighbor_Clean[:10]

['',
 '',
 'Parkwoods',
 'Victoria Village',
 'Regent Park , Harbourfront',
 'Lawrence Manor , Lawrence Heights',
 "Queen's Park , Ontario Provincial Government",
 '',
 'Islington Avenue',
 'Malvern , Rouge']

In [13]:
Toronto_Data['Neighborhood'] = Neighbor_Clean

In [14]:
Toronto_Data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park , Harbourfront"


In [15]:
len(Toronto_Data.index) == len(Toronto_Data.PostalCode.unique())

True

The above line proves that the number of unique postal codes matches the total number of records in the dataset. 

In [16]:
#Dropping rows with 'Not assigned' in the Borough Column
Toronto_Data.drop(Toronto_Data[Toronto_Data.Borough == 'Not assigned'].index, inplace=True)

print('Number of remaining rows are', len(Toronto_Data.index))
Toronto_Data.set_index('PostalCode', inplace=True)
Toronto_Data.head()

Number of remaining rows are 103


Unnamed: 0_level_0,Borough,Neighborhood
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park , Harbourfront"
M6A,North York,"Lawrence Manor , Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government"


In [17]:
# Check how many 'Not assigned' entries are in the Neighborhood column
len(Toronto_Data.Neighborhood[Toronto_Data.Neighborhood == 'Not assigned'])

0

In [19]:
Toronto_Data.shape

(103, 2)

### Assignment Question 2

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:

import geocoder # import geocoder

#initialize your variable to None
lat_lng_coords = None

#loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]


Important Note: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (2 marks)

## Importing Geospatial Data 

The approach used for obtaining the geo spatial data is by importing the file provided and its contents as a data frame. This data frame is merged to the data frame creating from web scrapping the wikipedia site. 

In [20]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


Merging data to form a complete data set to use for further analysis

In [21]:
Toronto = Toronto_Data.merge(df_data_0, how='inner', left_index=True, right_index=True)

print(len(Toronto.index))
Toronto.head()

103


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
M3A,North York,Parkwoods,43.753259,-79.329656
M4A,North York,Victoria Village,43.725882,-79.315572
M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494


In [22]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


In [26]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(Toronto['Latitude'], Toronto['Longitude'], Toronto['Borough'], Toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=20,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Assignment Question 3

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

1. to add enough Markdown cells to explain what you decided to do and to report any observations you make.
2. to generate maps to visualize your neighborhoods and how they cluster together.  

Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

In [27]:
# The code was removed by Watson Studio for sharing.

We will use the Foursquare API to explore the boroughs in the data set. We will have a look at the number of venues in each borough as way to cluster the boroughs accordingly. This technique is very simple but demonstrates the available capability contained with the Foursquare API and the power of geo spatial data. 

#### Implementation of the FourSquare API

In [37]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
              
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [29]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

We now use the function to make a call to the Foursquare API to retrive the venue data. 

In [38]:
Toronto_venues = getNearbyVenues(names=Toronto['Borough'],
                                   latitudes=Toronto['Latitude'],
                                   longitudes=Toronto['Longitude']
                                 )

Toronto_venues.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,North York,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,North York,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,North York,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
3,North York,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,North York,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


Lets see how many venues there are per Borough in Toronto

In [39]:
Toronto_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,111,111,111,111,111,111
Downtown Toronto,1227,1227,1227,1227,1227,1227
East Toronto,123,123,123,123,123,123
East York,77,77,77,77,77,77
Etobicoke,76,76,76,76,76,76
Mississauga,12,12,12,12,12,12
North York,244,244,244,244,244,244
Scarborough,91,91,91,91,91,91
West Toronto,151,151,151,151,151,151
York,16,16,16,16,16,16


In [40]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 268 uniques categories.


We will use the One Hot Encoding technique for preprocessing before we use the K-Mean Algorithm

In [41]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Borough'] = Toronto_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Borough,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,College Stadium,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Soccer Field,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,North York,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,North York,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,North York,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,North York,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,North York,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


We group the data frame by the 'Borough' and calculate the frequency of a particular venue category

In [43]:
Toronto_grouped = Toronto_onehot.groupby('Borough').mean().reset_index()

Toronto_grouped

Unnamed: 0,Borough,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,College Stadium,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Soccer Field,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018018,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.0,0.018018,0.0,0.009009,0.009009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.009009,0.0,0.0,0.0,0.009009,0.0,0.018018,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.009009,0.0,0.0,0.0,0.036036,0.0,0.072072,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.009009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.036036,0.009009,0.018018,0.0,0.0,0.0,0.0,0.009009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.009009,0.0,0.0,0.0,0.0,0.0,0.009009,0.0,0.009009,0.0,0.0,0.0,0.0,0.0,0.009009,0.0,0.0,0.0,0.0,0.009009,0.0,0.009009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.009009,0.0,0.018018,0.009009,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.0,0.0,0.0,0.0,0.0,0.009009,0.0,0.0,0.0,0.018018,0.0,0.0,0.0,0.0,0.018018,0.0,0.0,0.009009,0.0,0.0,0.0,0.0,0.009009,0.0,0.018018,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.009009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.018018,0.036036,0.0,0.009009,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.009009,0.036036,0.0,0.0,0.0,0.0,0.009009,0.063063,0.0,0.0,0.009009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009009,0.0,0.009009,0.009009,0.0,0.0,0.0,0.0,0.009009,0.0,0.036036,0.009009,0.0,0.0,0.0,0.0,0.0,0.009009,0.0,0.0,0.0,0.009009,0.009009,0.0,0.0,0.009009,0.0,0.0,0.009009,0.0,0.0,0.0,0.0,0.0,0.009009
1,Downtown Toronto,0.0,0.000815,0.000815,0.000815,0.00163,0.002445,0.00163,0.0163,0.00163,0.004075,0.010595,0.00163,0.00326,0.007335,0.000815,0.0,0.0,0.002445,0.000815,0.00163,0.01793,0.00489,0.012225,0.0,0.00163,0.0,0.00326,0.000815,0.000815,0.01304,0.002445,0.00163,0.0,0.00326,0.000815,0.00978,0.000815,0.00163,0.01141,0.002445,0.0,0.005705,0.002445,0.00652,0.005705,0.0,0.0,0.0,0.000815,0.05542,0.0,0.000815,0.002445,0.00489,0.004075,0.000815,0.000815,0.0,0.01467,0.00978,0.09943,0.000815,0.000815,0.000815,0.000815,0.0,0.00163,0.004075,0.00163,0.007335,0.0,0.0,0.000815,0.007335,0.0,0.0,0.00652,0.0,0.000815,0.0,0.00163,0.01141,0.004075,0.00489,0.0,0.008965,0.002445,0.002445,0.000815,0.000815,0.000815,0.0,0.000815,0.00163,0.00163,0.0,0.000815,0.002445,0.000815,0.00652,0.00652,0.0,0.000815,0.0,0.00326,0.0,0.000815,0.0,0.000815,0.00326,0.0,0.002445,0.00326,0.00652,0.00489,0.0,0.0,0.002445,0.00163,0.0,0.0,0.00163,0.012225,0.00163,0.00163,0.00326,0.000815,0.00326,0.00326,0.0,0.00326,0.002445,0.00815,0.015485,0.005705,0.0,0.000815,0.0,0.000815,0.000815,0.000815,0.00163,0.00163,0.0,0.0,0.000815,0.000815,0.03097,0.000815,0.00163,0.00815,0.00326,0.000815,0.0,0.0,0.000815,0.02771,0.026895,0.004075,0.000815,0.00489,0.000815,0.00163,0.00163,0.0,0.00326,0.002445,0.0,0.00489,0.0,0.000815,0.000815,0.0,0.0,0.00326,0.00163,0.000815,0.00815,0.00489,0.000815,0.0,0.002445,0.000815,0.00163,0.00163,0.0,0.00163,0.004075,0.00326,0.00163,0.00489,0.002445,0.000815,0.00163,0.00163,0.00163,0.000815,0.000815,0.017115,0.00163,0.000815,0.002445,0.010595,0.000815,0.00163,0.00489,0.002445,0.0,0.00163,0.00163,0.010595,0.00326,0.000815,0.0,0.000815,0.03586,0.0,0.00163,0.000815,0.00978,0.00489,0.00978,0.002445,0.00163,0.017115,0.00163,0.004075,0.0,0.0,0.000815,0.002445,0.0,0.00163,0.004075,0.002445,0.007335,0.000815,0.0,0.0,0.010595,0.000815,0.000815,0.0,0.013855,0.0,0.000815,0.004075,0.000815,0.000815,0.00978,0.01141,0.010595,0.000815,0.0,0.0,0.000815,0.002445,0.0,0.00978,0.00163,0.0,0.00326,0.0,0.00489,0.000815,0.0,0.000815,0.005705
2,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.0,0.0,0.0,0.02439,0.00813,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.04065,0.0,0.00813,0.0,0.0,0.01626,0.0,0.0,0.0,0.0,0.04065,0.0,0.0,0.00813,0.00813,0.0,0.0,0.0,0.0,0.00813,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.00813,0.0,0.0,0.00813,0.0,0.00813,0.0,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.0,0.01626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.02439,0.0,0.0,0.00813,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.00813,0.01626,0.0,0.00813,0.00813,0.0,0.01626,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065041,0.00813,0.00813,0.00813,0.0,0.0,0.0,0.0,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03252,0.00813,0.0,0.0,0.0,0.0,0.04065,0.00813,0.0,0.0,0.00813,0.0,0.0,0.00813,0.01626,0.0,0.01626,0.0,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.0,0.0,0.01626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.01626,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.00813,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.01626,0.0,0.0,0.00813,0.0,0.0,0.0,0.00813,0.0,0.00813,0.0,0.0,0.01626,0.0,0.0,0.0,0.0,0.00813,0.00813,0.0,0.0,0.0,0.00813,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.0,0.0,0.0,0.0,0.01626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00813,0.0,0.0,0.0,0.02439
3,East York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.051948,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.0,0.0,0.038961,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051948,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.012987,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.025974,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051948,0.0,0.025974,0.038961,0.038961,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.0,0.038961,0.012987,0.0,0.0,0.0,0.0,0.025974,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987
4,Etobicoke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.026316,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.039474,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.039474,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.065789,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0
5,Mississauga,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North York,0.004098,0.004098,0.0,0.0,0.0,0.0,0.0,0.008197,0.0,0.0,0.0,0.0,0.004098,0.012295,0.008197,0.0,0.0,0.0,0.0,0.0,0.012295,0.032787,0.008197,0.016393,0.0,0.004098,0.0,0.0,0.0,0.0,0.008197,0.0,0.004098,0.0,0.0,0.0,0.008197,0.0,0.0,0.0,0.004098,0.004098,0.0,0.004098,0.004098,0.0,0.008197,0.008197,0.008197,0.020492,0.0,0.0,0.008197,0.0,0.008197,0.004098,0.0,0.0,0.045082,0.0,0.069672,0.0,0.0,0.0,0.0,0.0,0.0,0.004098,0.0,0.004098,0.008197,0.008197,0.0,0.008197,0.0,0.0,0.0,0.0,0.004098,0.0,0.0,0.008197,0.004098,0.004098,0.004098,0.004098,0.012295,0.0,0.004098,0.0,0.0,0.0,0.0,0.0,0.008197,0.004098,0.0,0.004098,0.0,0.0,0.020492,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004098,0.008197,0.004098,0.004098,0.0,0.004098,0.004098,0.004098,0.0,0.008197,0.0,0.0,0.0,0.004098,0.0,0.0,0.0,0.0,0.0,0.004098,0.0,0.004098,0.0,0.004098,0.020492,0.012295,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004098,0.004098,0.0,0.0,0.008197,0.0,0.0,0.008197,0.004098,0.0,0.004098,0.004098,0.0,0.012295,0.028689,0.0,0.0,0.012295,0.0,0.0,0.0,0.0,0.0,0.012295,0.0,0.004098,0.004098,0.0,0.0,0.004098,0.0,0.004098,0.0,0.0,0.0,0.008197,0.012295,0.0,0.0,0.0,0.0,0.0,0.0,0.008197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028689,0.0,0.004098,0.016393,0.032787,0.0,0.004098,0.004098,0.0,0.004098,0.004098,0.0,0.008197,0.012295,0.0,0.0,0.0,0.036885,0.0,0.0,0.0,0.0,0.004098,0.02459,0.0,0.0,0.0,0.004098,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.004098,0.0,0.012295,0.0,0.0,0.0,0.004098,0.0,0.008197,0.004098,0.016393,0.0,0.0,0.0,0.0,0.0,0.004098,0.008197,0.004098,0.0,0.0,0.004098,0.0,0.0,0.0,0.0,0.004098,0.004098,0.008197,0.0,0.0,0.0,0.0,0.016393,0.0
7,Scarborough,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.043956,0.043956,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.010989,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.043956,0.0,0.0,0.0,0.010989,0.0,0.054945,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.021978,0.0,0.0,0.021978,0.0,0.010989,0.0,0.0,0.010989,0.0,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.010989,0.032967,0.032967,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.021978,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0
8,West Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.006623,0.013245,0.0,0.0,0.0,0.0,0.0,0.0,0.02649,0.013245,0.059603,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.0,0.0,0.0,0.019868,0.013245,0.0,0.019868,0.013245,0.0,0.0,0.0,0.0,0.013245,0.0,0.0,0.0,0.0,0.072848,0.006623,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.006623,0.046358,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.0,0.0,0.0,0.0,0.013245,0.006623,0.0,0.0,0.0,0.0,0.013245,0.0,0.013245,0.0,0.0,0.006623,0.0,0.0,0.0,0.0,0.006623,0.0,0.0,0.0,0.0,0.006623,0.0,0.006623,0.0,0.0,0.006623,0.0,0.006623,0.0,0.006623,0.0,0.0,0.0,0.0,0.0,0.013245,0.006623,0.0,0.0,0.013245,0.0,0.0,0.0,0.0,0.013245,0.0,0.0,0.0,0.0,0.019868,0.0,0.0,0.006623,0.006623,0.019868,0.006623,0.006623,0.0,0.0,0.0,0.0,0.006623,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.006623,0.0,0.006623,0.0,0.039735,0.006623,0.0,0.0,0.013245,0.006623,0.0,0.006623,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013245,0.0,0.013245,0.006623,0.006623,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.013245,0.0,0.006623,0.006623,0.0,0.0,0.0,0.0,0.0,0.0,0.02649,0.006623,0.006623,0.013245,0.02649,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013245,0.0,0.006623,0.0,0.0,0.039735,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.0,0.0,0.006623,0.0,0.0,0.0,0.006623,0.0,0.013245,0.0,0.0,0.0,0.0,0.0,0.006623,0.013245,0.006623,0.0,0.0,0.0,0.0,0.0,0.0,0.019868,0.0,0.0,0.013245,0.0,0.006623,0.006623,0.0,0.0,0.013245
9,York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0


Let's look at the top 5 venues in Toronto

In [45]:
num_top_venues = 5

for borough in Toronto_grouped['Borough']:
    print("----"+borough+"----")
    temp = Toronto_grouped[Toronto_grouped['Borough'] == borough].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
            venue  freq
0     Coffee Shop  0.07
1  Sandwich Place  0.06
2            Park  0.05
3            Café  0.05
4    Dessert Shop  0.04


----Downtown Toronto----
                venue  freq
0         Coffee Shop  0.10
1                Café  0.06
2          Restaurant  0.04
3               Hotel  0.03
4  Italian Restaurant  0.03


----East Toronto----
                venue  freq
0    Greek Restaurant  0.07
1         Coffee Shop  0.05
2  Italian Restaurant  0.04
3             Brewery  0.04
4                Café  0.04


----East York----
                 venue  freq
0                 Bank  0.05
1          Coffee Shop  0.05
2                 Park  0.05
3  Sporting Goods Shop  0.04
4         Burger Joint  0.04


----Etobicoke----
            venue  freq
0     Pizza Place  0.11
1  Sandwich Place  0.07
2     Coffee Shop  0.05
3            Café  0.04
4   Grocery Store  0.04


----Mississauga----
                      venue  freq
0                     Hotel  0.1

It seems like coffee shops are the order of the day in Toronto!

#### K-Means Clustering

We use the K-Means algorithm for clustering the 'Borough's' within the Toronto city. 5 clusters are chosen for this exerise. 

In [46]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 4, 0, 3, 2, 4, 2, 1], dtype=int32)

In [47]:
# add clustering labels
Toronto_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

In [52]:
Toronto_Labelled = Toronto_grouped[['Cluster Labels', 'Borough']]

Toronto_Labelled

Unnamed: 0,Cluster Labels,Borough
0,2,Central Toronto
1,2,Downtown Toronto
2,2,East Toronto
3,4,East York
4,0,Etobicoke
5,3,Mississauga
6,2,North York
7,4,Scarborough
8,2,West Toronto
9,1,York


In [53]:
# We take the mean of the longitude and latitude for the center points of the Boroughs

Toronto_lat_lng = Toronto.groupby("Borough")['Latitude', 'Longitude'].mean()

Toronto_lat_lng

Unnamed: 0_level_0,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Central Toronto,43.70198,-79.398954
Downtown Toronto,43.654597,-79.383972
East Toronto,43.669436,-79.324654
East York,43.700303,-79.335851
Etobicoke,43.660043,-79.542074
Mississauga,43.636966,-79.615819
North York,43.750727,-79.429338
Scarborough,43.766229,-79.249085
West Toronto,43.652653,-79.44929
York,43.690797,-79.472633


In [55]:
# We create a final data set by merging the clustered data frame to the lat and long coordinates

Toronto_final = Toronto_Labelled.merge(Toronto_lat_lng, left_on='Borough', right_on='Borough')

Toronto_final

Unnamed: 0,Cluster Labels,Borough,Latitude,Longitude
0,2,Central Toronto,43.70198,-79.398954
1,2,Downtown Toronto,43.654597,-79.383972
2,2,East Toronto,43.669436,-79.324654
3,4,East York,43.700303,-79.335851
4,0,Etobicoke,43.660043,-79.542074
5,3,Mississauga,43.636966,-79.615819
6,2,North York,43.750727,-79.429338
7,4,Scarborough,43.766229,-79.249085
8,2,West Toronto,43.652653,-79.44929
9,1,York,43.690797,-79.472633


In [59]:
# set color scheme for the clusters
colors = ['green', 'blue', 'red', 'orange', 'grey']

In [60]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

In [64]:
# add markers to the map
for lat, lon, poi, cluster in zip(Toronto_final['Latitude'], Toronto_final['Longitude'],
                                  Toronto_final['Borough'], Toronto_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=50,
        popup=label,
        color=colors[cluster],
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

We have now clustered and visualised the data we scrapped of wikipedia using the venues we obtained from the Foursquare API. We could use other approaches as well to solve for various business problems that we might want to solve. 