Capstone Project

**I. Import Libraries**

In [1]:
import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import numpy as np

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


**II. Scrape data from Wikipedia into a Dataframe**

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_London,_Ontario").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame from the list
Lon_df = pd.DataFrame({"Neighborhood": neighborhoodList})

Lon_df.head()

Unnamed: 0,Neighborhood
0,"Byron, Ontario"
1,"Huron Heights, London, Ontario"
2,"Lambeth, London, Ontario"
3,Oakridge Acres
4,"Tempo, Ontario"


In [7]:
# print the number of rows of the dataframe
Lon_df.shape

(9, 1)

**III. Getting the geographical coordinates**

In [8]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, Ontario'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [9]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in Lon_df["Neighborhood"].tolist() ]

In [10]:
coords

[[42.95107000000007, -81.33063999999996],
 [43.01885000000004, -81.20267999999999],
 [42.90483000000006, -81.28664999999995],
 [42.97420000000005, -81.30299999999994],
 [42.85448000000008, -81.27288999999996],
 [43.037360000000035, -81.28525999999994],
 [42.94171000000006, -81.20851999999996],
 [42.949220000000025, -81.28945999999996],
 [42.96853952434288, -81.25074747926911]]

In [11]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [12]:
# merge the coordinates into the original dataframe
Lon_df['Latitude'] = df_coords['Latitude']
Lon_df['Longitude'] = df_coords['Longitude']

In [13]:
# check the neighborhoods and the coordinates
print(Lon_df.shape)
Lon_df

(9, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Byron, Ontario",42.95107,-81.33064
1,"Huron Heights, London, Ontario",43.01885,-81.20268
2,"Lambeth, London, Ontario",42.90483,-81.28665
3,Oakridge Acres,42.9742,-81.303
4,"Tempo, Ontario",42.85448,-81.27289
5,"Uplands, Ontario",43.03736,-81.28526
6,"Westminster, Middlesex County, Ontario",42.94171,-81.20852
7,"Westmount, London, Ontario",42.94922,-81.28946
8,Wortley Village,42.96854,-81.250747


In [14]:
# save the DataFrame as CSV file
Lon_df.to_csv("Lon_df.csv", index=False)

**IV. Create a map of London with neighborhoods superimposed on top**

In [15]:
# get the coordinates of Kuala Lumpur
address = 'London, Ontario'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London, Ontario {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London, Ontario 42.9836747, -81.2496068.


In [16]:
# create map of London using latitude and longitude values
map_Lon = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(Lon_df['Latitude'], Lon_df['Longitude'], Lon_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_Lon)  
    
map_Lon

In [17]:
# save the map as HTML file
map_Lon.save('map_Lon.html')

**V. Use the Foursquare API to explore the neighborhoods of London**

In [18]:
# define Foursquare Credentials and Version
CLIENT_ID = 'YAKFMH5XSB41TDVDWVCSHHLMB12TUNWZ3ZFCZXKL53UQ2GH2' # your Foursquare ID
CLIENT_SECRET = 'JSYEL2ZFSFTR0MADI400X0YTQ04GR5ZOFJZXAEIHLHGODSHA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YAKFMH5XSB41TDVDWVCSHHLMB12TUNWZ3ZFCZXKL53UQ2GH2
CLIENT_SECRET:JSYEL2ZFSFTR0MADI400X0YTQ04GR5ZOFJZXAEIHLHGODSHA


**Get the top 100 venues that are within a radius of 2000 meters.**

In [19]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(Lon_df['Latitude'], Lon_df['Longitude'], Lon_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [20]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(442, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Byron, Ontario",42.95107,-81.33064,Byron Pizza,42.958386,-81.331232,Pizza Place
1,"Byron, Ontario",42.95107,-81.33064,Springbank Park,42.958367,-81.321894,Park
2,"Byron, Ontario",42.95107,-81.33064,Storybook Gardens,42.957501,-81.316424,Theme Park
3,"Byron, Ontario",42.95107,-81.33064,Starbucks,42.960321,-81.334134,Coffee Shop
4,"Byron, Ontario",42.95107,-81.33064,LCBO,42.958275,-81.331203,Liquor Store


**We can check how many venues were returned for each neighorhood**

In [21]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Byron, Ontario",18,18,18,18,18,18
"Huron Heights, London, Ontario",43,43,43,43,43,43
"Lambeth, London, Ontario",11,11,11,11,11,11
Oakridge Acres,65,65,65,65,65,65
"Tempo, Ontario",5,5,5,5,5,5
"Uplands, Ontario",50,50,50,50,50,50
"Westminster, Middlesex County, Ontario",84,84,84,84,84,84
"Westmount, London, Ontario",66,66,66,66,66,66
Wortley Village,100,100,100,100,100,100


**Let's find out how many unique categories can be curated from all the returned venues**

In [22]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 123 uniques categories.


In [23]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Pizza Place', 'Park', 'Theme Park', 'Coffee Shop', 'Liquor Store',
       'Ski Lodge', 'Supermarket', 'Pub', 'Pharmacy', 'Bank',
       'Frozen Yogurt Shop', 'Ski Area', 'Baseball Field',
       'Shopping Mall', 'Italian Restaurant', 'Vietnamese Restaurant',
       'Skating Rink', 'Fish & Chips Shop', 'Asian Restaurant',
       'Thrift / Vintage Store', 'Fast Food Restaurant', 'Pet Store',
       'Ice Cream Shop', 'Restaurant', 'Juice Bar', 'Department Store',
       'Gas Station', 'Discount Store', 'Gym', 'Grocery Store',
       'Sushi Restaurant', 'Big Box Store', 'Gym / Fitness Center', 'Bar',
       'Sandwich Place', 'Convenience Store', 'Breakfast Spot',
       'Beer Store', 'Dance Studio', 'Spa', 'Golf Course', 'Outlet Store',
       'Gourmet Shop', 'Dog Run', 'Warehouse Store', 'Deli / Bodega',
       'BBQ Joint', 'Bowling Alley', 'Bubble Tea Shop', 'Burrito Place'],
      dtype=object)

In [24]:
# check if the results contain "Bank"
"Bank" in venues_df['VenueCategory'].unique()

True

**VI. Analyze Each Neighborhood**

In [25]:
# one hot encoding
Lon_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Lon_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Lon_onehot.columns[-1]] + list(Lon_onehot.columns[:-1])
Lon_onehot = Lon_onehot[fixed_columns]

print(Lon_onehot.shape)
Lon_onehot.head()

(442, 124)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bank,Bar,Baseball Field,Baseball Stadium,Beer Store,Big Box Store,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Business Service,Café,Camera Store,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food Court,Food Service,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hobby Shop,Hockey Arena,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Laser Tag,Lingerie Store,Liquor Store,Market,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Motorcycle Shop,Movie Theater,Multiplex,Museum,Nightclub,Outdoor Supply Store,Outlet Store,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Portuguese Restaurant,Pub,Record Shop,Recreation Center,Rental Service,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Ski Lodge,Soccer Field,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Yoga Studio
0,"Byron, Ontario",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Byron, Ontario",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Byron, Ontario",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
3,"Byron, Ontario",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Byron, Ontario",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Now let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [26]:
Lon_grouped = Lon_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(Lon_grouped.shape)
Lon_grouped

(9, 124)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bank,Bar,Baseball Field,Baseball Stadium,Beer Store,Big Box Store,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Business Service,Café,Camera Store,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food Court,Food Service,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hobby Shop,Hockey Arena,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Laser Tag,Lingerie Store,Liquor Store,Market,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Motorcycle Shop,Movie Theater,Multiplex,Museum,Nightclub,Outdoor Supply Store,Outlet Store,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Portuguese Restaurant,Pub,Record Shop,Recreation Center,Rental Service,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Ski Lodge,Soccer Field,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Yoga Studio
0,"Byron, Ontario",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.055556,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Huron Heights, London, Ontario",0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.162791,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.093023,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.093023,0.0,0.0,0.0,0.0,0.0,0.0,0.069767,0.0,0.0,0.023256,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0
2,"Lambeth, London, Ontario",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Oakridge Acres,0.0,0.0,0.0,0.015385,0.0,0.015385,0.015385,0.0,0.015385,0.015385,0.030769,0.0,0.015385,0.0,0.0,0.015385,0.030769,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.046154,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.046154,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.046154,0.015385,0.015385,0.015385,0.061538,0.030769,0.030769,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.015385,0.0,0.015385,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030769,0.015385,0.030769,0.015385,0.0,0.015385,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.030769,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.015385,0.030769,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0
4,"Tempo, Ontario",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Uplands, Ontario",0.02,0.02,0.0,0.04,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.04,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Westminster, Middlesex County, Ontario",0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.011905,0.02381,0.107143,0.0,0.011905,0.0,0.0,0.02381,0.011905,0.0,0.011905,0.0,0.02381,0.0,0.047619,0.0,0.011905,0.011905,0.0,0.011905,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.035714,0.011905,0.0,0.0,0.0,0.0,0.047619,0.02381,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.02381,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.011905,0.047619,0.0,0.0,0.011905,0.0,0.0,0.0,0.059524,0.0,0.011905,0.035714,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.011905,0.0,0.011905,0.011905,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0
7,"Westmount, London, Ontario",0.0,0.0,0.0,0.030303,0.015152,0.0,0.0,0.015152,0.030303,0.0,0.0,0.0,0.015152,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.030303,0.075758,0.0,0.0,0.015152,0.0,0.015152,0.0,0.0,0.015152,0.0,0.015152,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.075758,0.015152,0.030303,0.0,0.015152,0.0,0.030303,0.030303,0.0,0.015152,0.0,0.0,0.0,0.015152,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015152,0.015152,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.0,0.015152,0.030303,0.030303,0.045455,0.045455,0.0,0.0,0.0,0.015152,0.0,0.0,0.060606,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.0,0.015152,0.015152,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.015152,0.015152
8,Wortley Village,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.08,0.0,0.0,0.0,0.02,0.0,0.01,0.02,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.01,0.01,0.03,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.02,0.0,0.02,0.03,0.0,0.03,0.01,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01


**Create new dataframe for Bank data only**

In [27]:
Lon_bank = Lon_grouped[["Neighborhoods","Bank"]]

In [28]:
Lon_bank.head()

Unnamed: 0,Neighborhoods,Bank
0,"Byron, Ontario",0.111111
1,"Huron Heights, London, Ontario",0.023256
2,"Lambeth, London, Ontario",0.0
3,Oakridge Acres,0.015385
4,"Tempo, Ontario",0.0


**VII. Cluster Neighborhoods**

**Run k-means to cluster the neighborhoods in London into 9 clusters.**

In [29]:
# set number of clusters
kclusters = 3

Lon_clustering = Lon_bank.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Lon_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 1, 1, 1, 0, 1, 0, 0], dtype=int32)

In [30]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
Lon_merged = Lon_bank.copy()

# add clustering labels
Lon_merged["Cluster Labels"] = kmeans.labels_

In [31]:
Lon_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
Lon_merged.head()

Unnamed: 0,Neighborhood,Bank,Cluster Labels
0,"Byron, Ontario",0.111111,2
1,"Huron Heights, London, Ontario",0.023256,0
2,"Lambeth, London, Ontario",0.0,1
3,Oakridge Acres,0.015385,1
4,"Tempo, Ontario",0.0,1


In [38]:
# merge Lon_merged with Lon_df to add latitude/longitude for each neighborhood
Lon_merged = Lon_merged.join(Lon_df.set_index("Neighborhood"), on="Neighborhood")

print(Lon_merged.shape)
Lon_merged.head()

(9, 5)


Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
1,"Huron Heights, London, Ontario",0.023256,0,43.01885,-81.20268
5,"Uplands, Ontario",0.04,0,43.03736,-81.28526
7,"Westmount, London, Ontario",0.030303,0,42.94922,-81.28946
8,Wortley Village,0.03,0,42.96854,-81.250747
2,"Lambeth, London, Ontario",0.0,1,42.90483,-81.28665


In [39]:
# sort the results by Cluster Labels
print(Lon_merged.shape)
Lon_merged.sort_values(["Cluster Labels"], inplace=True)
Lon_merged

(9, 5)


Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
1,"Huron Heights, London, Ontario",0.023256,0,43.01885,-81.20268
5,"Uplands, Ontario",0.04,0,43.03736,-81.28526
7,"Westmount, London, Ontario",0.030303,0,42.94922,-81.28946
8,Wortley Village,0.03,0,42.96854,-81.250747
2,"Lambeth, London, Ontario",0.0,1,42.90483,-81.28665
3,Oakridge Acres,0.015385,1,42.9742,-81.303
4,"Tempo, Ontario",0.0,1,42.85448,-81.27289
6,"Westminster, Middlesex County, Ontario",0.011905,1,42.94171,-81.20852
0,"Byron, Ontario",0.111111,2,42.95107,-81.33064


**Now we can visualize the resulting clusters**

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Lon_merged['Latitude'], Lon_merged['Longitude'], Lon_merged['Neighborhood'], Lon_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [41]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

**VIII. Examine Clusters**

**Cluster 0**

In [42]:
Lon_merged.loc[Lon_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
1,"Huron Heights, London, Ontario",0.023256,0,43.01885,-81.20268
5,"Uplands, Ontario",0.04,0,43.03736,-81.28526
7,"Westmount, London, Ontario",0.030303,0,42.94922,-81.28946
8,Wortley Village,0.03,0,42.96854,-81.250747


**Cluster 1**

In [43]:
Lon_merged.loc[Lon_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
2,"Lambeth, London, Ontario",0.0,1,42.90483,-81.28665
3,Oakridge Acres,0.015385,1,42.9742,-81.303
4,"Tempo, Ontario",0.0,1,42.85448,-81.27289
6,"Westminster, Middlesex County, Ontario",0.011905,1,42.94171,-81.20852


**Cluster 2**

In [44]:
Lon_merged.loc[Lon_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
0,"Byron, Ontario",0.111111,2,42.95107,-81.33064


**Observations**

Most of the banks are concentrated in the central and north neighbourhoods of London, with the highest number in cluster 0 (red) and cluster 2 (lime green). On the other hand, cluster 1 (purple) has a lower number to totally no banks in its neighborhoods. This represents an opportunity for a new bank to open a branch as our model indicates that the neighbourhoods in this cluster are under-served in terms of banking and there would be minimal competition from existing banks. Of course cluster 1 includes two neighbourhoods on the southern perimeter of London, which may include less dense housing/lower population than inner city neighbourhoods. That being said, adding a location in this cluster may still be a great opportunity for a bank because it could count on some potential business from rural communities outside of the city on their way into London along the 401 and 402 highways, which are two of Ontario's largest highways.