## Neighborhoods in Vancouver : Data Preparing

### Import required Libraries

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Fetch Data
Fetch data of Postal code of Canada with code M

In [2]:
URL = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V'
response = requests.get(URL)
soup = BeautifulSoup(response.text,'html.parser')
table = soup.find('table').tbody

In [3]:
table

<tbody><tr>
<td valign="top" width="11.1%"><b>V1A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Kimberley,_British_Columbia" title="Kimberley, British Columbia">Kimberley</a></span>
</td>
<td valign="top" width="11.1%"><b>V2A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Penticton" title="Penticton">Penticton</a></span>
</td>
<td valign="top" width="11.1%"><b>V3A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Langley,_British_Columbia_(district_municipality)" title="Langley, British Columbia (district municipality)">Langley Township</a><br/>(Langley City)</span>
</td>
<td valign="top" width="11.1%"><b>V4A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Surrey,_British_Columbia" title="Surrey, British Columbia">Surrey</a><br/>Southwest</span>
</td>
<td valign="top" width="11.1%"><b>V5A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Burnaby" ti

Get rows in the table

In [4]:
rows = table.find_all('tr')
rows

[<tr>
 <td valign="top" width="11.1%"><b>V1A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Kimberley,_British_Columbia" title="Kimberley, British Columbia">Kimberley</a></span>
 </td>
 <td valign="top" width="11.1%"><b>V2A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Penticton" title="Penticton">Penticton</a></span>
 </td>
 <td valign="top" width="11.1%"><b>V3A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Langley,_British_Columbia_(district_municipality)" title="Langley, British Columbia (district municipality)">Langley Township</a><br/>(Langley City)</span>
 </td>
 <td valign="top" width="11.1%"><b>V4A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Surrey,_British_Columbia" title="Surrey, British Columbia">Surrey</a><br/>Southwest</span>
 </td>
 <td valign="top" width="11.1%"><b>V5A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Burnaby"

Get List of Columns in the table

In [5]:
len(rows)

20

## Clean and Prep Data

Clean and fetch relevant and required data like:
1. Ignore cells with a borough that is Not assigned
2. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
3. If more than one neighborhood can exist in one postal code area then combined them into one row with the neighborhoods separated with a comma.

In [7]:
data = [] #(code, name, area)
for i in range(0,len(rows)):
    tds = rows[i].find_all('td')
    for td in tds:
        a = td.b.text
        temp = td.span.text.split('(')
        b = temp[0]
        if b.lower() == 'vancouver':
            c = temp[1].replace(' / ', ',').replace(')', '')   
            split_c = c.split(',')
            for temp_c in split_c:
                data.append((a, b, temp_c))
                print('(', a, ', ', b, ', ', temp_c, ')')

( V6A ,  Vancouver ,  Strathcona )
( V6A ,  Vancouver ,  Chinatown )
( V6A ,  Vancouver ,  Downtown Eastside )
( V6B ,  Vancouver ,  NE Downtown )
( V6B ,  Vancouver ,  Gastown )
( V6B ,  Vancouver ,  Harbour Centre )
( V6B ,  Vancouver ,  International Village )
( V6B ,  Vancouver ,  Victory Square )
( V6B ,  Vancouver ,  Yaletown )
( V6C ,  Vancouver ,  Waterfront )
( V6C ,  Vancouver ,  Coal Harbour )
( V6C ,  Vancouver ,  Canada Place )
( V6E ,  Vancouver ,  SE West End )
( V6E ,  Vancouver ,  Davie Village )
( V6G ,  Vancouver ,  NW West End )
( V6G ,  Vancouver ,  Stanley Park )
( V6H ,  Vancouver ,  West Fairview )
( V6H ,  Vancouver ,  Granville Island )
( V6H ,  Vancouver ,  NE Shaughnessy )
( V6J ,  Vancouver ,  NW Shaughnessy )
( V6J ,  Vancouver ,  East Kitsilano )
( V6J ,  Vancouver ,  Quilchena )
( V5K ,  Vancouver ,  North Hastings-Sunrise )
( V6K ,  Vancouver ,  Central Kitsilano )
( V6K ,  Vancouver ,  Greektown )
( V5L ,  Vancouver ,  North Grandview-Woodland )
( V6L 

In [8]:
len(data)

70

In [9]:
data[:3]

[('V6A', 'Vancouver', 'Strathcona'),
 ('V6A', 'Vancouver', 'Chinatown'),
 ('V6A', 'Vancouver', 'Downtown Eastside')]

Create DataFrame of the list created.

In [10]:
columns = ['Code', 'Name', 'Area']

In [11]:
postal_df = pd.DataFrame(data, columns=columns)
postal_df.head()

Unnamed: 0,Code,Name,Area
0,V6A,Vancouver,Strathcona
1,V6A,Vancouver,Chinatown
2,V6A,Vancouver,Downtown Eastside
3,V6B,Vancouver,NE Downtown
4,V6B,Vancouver,Gastown


Print the shape of DataFrame

In [12]:
postal_df.shape

(70, 3)

## Add coordinates for postal code

Download .csv file consisting of Geospatial Data for Postal Code

In [13]:
import numpy as np

In [14]:
from geopy.geocoders import Nominatim

In [15]:
lat_list = []
lon_list = []
for area in postal_df.Area.values:
    address = '{}, Vancouver,British Columbia, Canada'.format(area)

    geolocator = Nominatim(user_agent="BC_explorer", timeout=10)
    location = geolocator.geocode(address)
    if location != None:
        latitude = location.latitude
        longitude = location.longitude
        lat_list.append(latitude)
        lon_list.append(longitude)
        print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))
    else:
        print('No location data on {}', address)
        lat_list.append(np.nan)
        lon_list.append(np.nan)

The geograpical coordinate of Strathcona, Vancouver,British Columbia, Canada are 49.279554, -123.0899788.
The geograpical coordinate of Chinatown, Vancouver,British Columbia, Canada are 49.2799809, -123.10408941422125.
The geograpical coordinate of Downtown Eastside, Vancouver,British Columbia, Canada are 49.2823992, -123.0994578.
The geograpical coordinate of NE Downtown, Vancouver,British Columbia, Canada are 49.283393, -123.1174563.
The geograpical coordinate of Gastown, Vancouver,British Columbia, Canada are 49.2836567, -123.1062358.
The geograpical coordinate of Harbour Centre, Vancouver,British Columbia, Canada are 49.28476745, -123.11206428918614.
The geograpical coordinate of International Village, Vancouver,British Columbia, Canada are 49.28021995, -123.10669595178601.
The geograpical coordinate of Victory Square, Vancouver,British Columbia, Canada are 49.2823247, -123.11012964839475.
The geograpical coordinate of Yaletown, Vancouver,British Columbia, Canada are 49.2763217, -1

In [16]:
postal_df['Latitude'] = lat_list
postal_df['Longitude'] = lon_list
postal_df

Unnamed: 0,Code,Name,Area,Latitude,Longitude
0,V6A,Vancouver,Strathcona,49.279554,-123.089979
1,V6A,Vancouver,Chinatown,49.279981,-123.104089
2,V6A,Vancouver,Downtown Eastside,49.282399,-123.099458
3,V6B,Vancouver,NE Downtown,49.283393,-123.117456
4,V6B,Vancouver,Gastown,49.283657,-123.106236
...,...,...,...,...,...
65,V5Y,Vancouver,West Riley Park-Little Mountain,,
66,V7Y,Vancouver,Pacific Centre,49.253241,-123.235331
67,V5Z,Vancouver,East Fairview,49.264113,-123.126835
68,V5Z,Vancouver,South Cambie,49.246685,-123.120915


In [17]:
postal_df.dropna(inplace=True)
postal_df

Unnamed: 0,Code,Name,Area,Latitude,Longitude
0,V6A,Vancouver,Strathcona,49.279554,-123.089979
1,V6A,Vancouver,Chinatown,49.279981,-123.104089
2,V6A,Vancouver,Downtown Eastside,49.282399,-123.099458
3,V6B,Vancouver,NE Downtown,49.283393,-123.117456
4,V6B,Vancouver,Gastown,49.283657,-123.106236
...,...,...,...,...,...
64,V5Y,Vancouver,West Mount Pleasant,49.263330,-123.096588
66,V7Y,Vancouver,Pacific Centre,49.253241,-123.235331
67,V5Z,Vancouver,East Fairview,49.264113,-123.126835
68,V5Z,Vancouver,South Cambie,49.246685,-123.120915


## Neighborhoods in Vancouver : Clustering and Segmentation

### Import Libraries

In [18]:
import numpy as np # library to handle data in a vectorized manner

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# !pip install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


###  Getting Vancouver's Coordinates using geocode

In [19]:
address = 'Vancouver, British Columbia'

geolocator = Nominatim(user_agent="va_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vancouver are 49.2608724, -123.1139529.


###  Ploting all Postal Area in Vancouver

In [21]:
# create map of Vancouver using latitude and longitude values
map_vancouver = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, area in zip(postal_df['Latitude'], postal_df['Longitude'], postal_df['Area']):
    label = '{}'.format(area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vancouver)  
    
map_vancouver

In [22]:
CLIENT_ID = 'GKHFHCRG2K0F4B0OP4FHLLGKYA4PT2LVWMBL21GISWBJZFW0' # your Foursquare ID
CLIENT_SECRET = 'BLBVBLLKZQQ0L0B10T0MDX1AE1AN0XY1NQAMOF2MM2YIWWFO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GKHFHCRG2K0F4B0OP4FHLLGKYA4PT2LVWMBL21GISWBJZFW0
CLIENT_SECRET:BLBVBLLKZQQ0L0B10T0MDX1AE1AN0XY1NQAMOF2MM2YIWWFO


### Funtion to get nearby venues in radius of 500 meters if coordinates are given.

In [23]:
def getNearbyVenues(area, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for area, lat, lng in zip(area, latitudes, longitudes):
        print(area, 'Vancou')
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            500)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            area, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                             'Area Latitude', 
                             'Area Longitude', 
                             'Venue', 
                             'Venue Latitude', 
                             'Venue Longitude', 
                             'Venue Category']
    
    return(nearby_venues)

### Using the above function creating dataframe for our borough in borough_df datasets and exploring the dataframe generated.

In [24]:
area_venues = getNearbyVenues(area=postal_df['Area'], 
                                 latitudes=postal_df['Latitude'], 
                                 longitudes=postal_df['Longitude']
                                  )

Strathcona Vancou
Chinatown Vancou
Downtown Eastside Vancou
NE Downtown Vancou
Gastown Vancou
Harbour Centre Vancou
International Village Vancou
Victory Square Vancou
Yaletown Vancou
Waterfront Vancou
Coal Harbour Vancou
Canada Place Vancou
SE West End Vancou
Davie Village Vancou
NW West End Vancou
Stanley Park Vancou
West Fairview Vancou
Granville Island Vancou
NE Shaughnessy Vancou
NW Shaughnessy Vancou
East Kitsilano Vancou
Quilchena Vancou
North Hastings-Sunrise Vancou
Central Kitsilano Vancou
North Grandview-Woodland Vancou
NW Arbutus Ridge Vancou
NE Dunbar-Southlands Vancou
South Hastings-Sunrise Vancou
North Renfrew-Collingwood Vancou
South Shaughnessy Vancou
NW Oakridge Vancou
NE Kerrisdale Vancou
SE Arbutus Ridge Vancou
South Grandview-Woodland Vancou
NE Kensington-Cedar Cottage Vancou
West Kerrisdale Vancou
South Dunbar-Southlands Vancou
Musqueam Vancou
SE Kensington-Cedar Cottage Vancou
Victoria-Fraserview Vancou
SE Kerrisdale Vancou
SW Oakridge Vancou
West Marpole Vancou
So

In [25]:
area_venues.shape

(2266, 7)

In [26]:
area_venues.head()

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Strathcona,49.279554,-123.089979,Union Market,49.277371,-123.086989,Deli / Bodega
1,Strathcona,49.279554,-123.089979,The Juice Truck,49.281281,-123.09212,Food Truck
2,Strathcona,49.279554,-123.089979,Finch’s Market,49.278565,-123.093473,Sandwich Place
3,Strathcona,49.279554,-123.089979,Wilder Snail,49.279346,-123.087338,Coffee Shop
4,Strathcona,49.279554,-123.089979,Strathcona Beer Company,49.281294,-123.085111,Brewery


In [27]:
area_venues.groupby('Area').count()

Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bentall Centre,61,61,61,61,61,61
Canada Place,60,60,60,60,60,60
Central Kitsilano,44,44,44,44,44,44
Chinatown,100,100,100,100,100,100
Coal Harbour,95,95,95,95,95,95
...,...,...,...,...,...,...
West Kitsilano,44,44,44,44,44,44
West Marpole,33,33,33,33,33,33
West Mount Pleasant,69,69,69,69,69,69
West Point Grey,46,46,46,46,46,46


In [29]:
print('Number of unique venues in data : ', area_venues['Venue Category'].nunique())

Number of unique venues in data :  211


### Transforming dataset using oneshot encoding for K-Means clustering.

In [30]:
vancouver_onehot = pd.get_dummies(area_venues['Venue Category'], prefix='', prefix_sep='')
vancouver_onehot['Area'] = area_venues['Area']
columns_list = [vancouver_onehot.columns[-1]] + list(vancouver_onehot.columns[:-1])
vancouver_onehot = vancouver_onehot[columns_list]
vancouver_onehot.head()

Unnamed: 0,Area,Accessories Store,Airport,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Strathcona,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Strathcona,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Strathcona,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Strathcona,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Strathcona,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
vancouver_onehot.shape

(2266, 212)

In [32]:
vancouver_grouped = vancouver_onehot.groupby('Area').mean().reset_index()
vancouver_grouped

Unnamed: 0,Area,Accessories Store,Airport,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Bentall Centre,0.000000,0.000000,0.000000,0.032787,0.016393,0.016393,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.016393
1,Canada Place,0.016667,0.016667,0.016667,0.016667,0.000000,0.000000,0.000000,0.000000,0.016667,...,0.000000,0.000000,0.016667,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.016667
2,Central Kitsilano,0.000000,0.000000,0.000000,0.045455,0.000000,0.000000,0.022727,0.000000,0.000000,...,0.022727,0.000000,0.000000,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.022727
3,Chinatown,0.000000,0.000000,0.000000,0.010000,0.010000,0.000000,0.020000,0.010000,0.000000,...,0.010000,0.000000,0.010000,0.0,0.0,0.01,0.010000,0.000000,0.010000,0.000000
4,Coal Harbour,0.000000,0.000000,0.010526,0.000000,0.000000,0.000000,0.010526,0.000000,0.000000,...,0.000000,0.000000,0.010526,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,West Kitsilano,0.000000,0.000000,0.000000,0.045455,0.000000,0.000000,0.022727,0.000000,0.000000,...,0.022727,0.000000,0.000000,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.022727
61,West Marpole,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.030303,0.090909,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000
62,West Mount Pleasant,0.000000,0.000000,0.000000,0.000000,0.000000,0.028986,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.028986,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000
63,West Point Grey,0.000000,0.000000,0.000000,0.000000,0.000000,0.021739,0.021739,0.000000,0.000000,...,0.043478,0.021739,0.000000,0.0,0.0,0.00,0.000000,0.000000,0.021739,0.021739


In [33]:
vancouver_grouped.shape

(65, 212)

### Select top ten categories of venues from each postal code.

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
vancouver_venues_sorted = pd.DataFrame(columns=columns)
vancouver_venues_sorted['Area'] = vancouver_grouped['Area']

for ind in np.arange(vancouver_grouped.shape[0]):
    vancouver_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vancouver_grouped.iloc[ind, :], num_top_venues)

vancouver_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bentall Centre,Hotel,Dessert Shop,Café,Food Truck,American Restaurant,Gym,Clothing Store,Coffee Shop,Plaza,Bar
1,Canada Place,Boat or Ferry,Coffee Shop,Hotel Bar,Hotel,Spa,Japanese Restaurant,Cruise,Café,Breakfast Spot,Plaza
2,Central Kitsilano,Bakery,French Restaurant,Ice Cream Shop,Tea Room,Food Truck,Thai Restaurant,Lounge,Sushi Restaurant,Japanese Restaurant,American Restaurant
3,Chinatown,Café,Coffee Shop,Sandwich Place,Pizza Place,Chinese Restaurant,Mexican Restaurant,Sushi Restaurant,Bakery,Restaurant,Clothing Store
4,Coal Harbour,Japanese Restaurant,Ramen Restaurant,Coffee Shop,Dessert Shop,Café,Grocery Store,Sushi Restaurant,Breakfast Spot,Italian Restaurant,Park


In [37]:
vancouver_grouped_clustering = vancouver_grouped.drop('Area', 1)

In [38]:
vancouver_grouped_clustering.head()

Unnamed: 0,Accessories Store,Airport,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,...,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.032787,0.016393,0.016393,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393
1,0.016667,0.016667,0.016667,0.016667,0.0,0.0,0.0,0.0,0.016667,0.0,...,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.016667
2,0.0,0.0,0.0,0.045455,0.0,0.0,0.022727,0.0,0.0,0.0,...,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727
3,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.01,0.0,0.01,...,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0
4,0.0,0.0,0.010526,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,...,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### K-Means Clustering

In [39]:
# set number of clusters
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vancouver_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 4, 0, 4], dtype=int32)

### Add labes generate to the dataframe and join this dataframe to geospatial dataframe of borough.

In [40]:
# add clustering labels
vancouver_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

vancouver_merged = postal_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
vancouver_merged = vancouver_merged.join(vancouver_venues_sorted.set_index('Area'), on='Area')

vancouver_merged.head() # check the last columns!

Unnamed: 0,Code,Name,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V6A,Vancouver,Strathcona,49.279554,-123.089979,4,Coffee Shop,Sandwich Place,Park,Deli / Bodega,Athletics & Sports,Brewery,Cheese Shop,Restaurant,Japanese Restaurant,Food Truck
1,V6A,Vancouver,Chinatown,49.279981,-123.104089,0,Café,Coffee Shop,Sandwich Place,Pizza Place,Chinese Restaurant,Mexican Restaurant,Sushi Restaurant,Bakery,Restaurant,Clothing Store
2,V6A,Vancouver,Downtown Eastside,49.282399,-123.099458,0,Café,Beer Bar,Chinese Restaurant,Coffee Shop,Asian Restaurant,German Restaurant,Pizza Place,Restaurant,French Restaurant,Noodle House
3,V6B,Vancouver,NE Downtown,49.283393,-123.117456,0,Hotel,Food Truck,Coffee Shop,Clothing Store,Restaurant,Seafood Restaurant,Concert Hall,Bookstore,Dessert Shop,Steakhouse
4,V6B,Vancouver,Gastown,49.283657,-123.106236,0,Coffee Shop,Café,Beer Bar,Sandwich Place,Pub,Furniture / Home Store,Vegetarian / Vegan Restaurant,Pizza Place,Lounge,Mexican Restaurant


### Ploting the Clusters

In [41]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vancouver_merged['Latitude'], vancouver_merged['Longitude'], vancouver_merged['Area'], vancouver_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Cluster 1

In [47]:
cluster1 = vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 0, vancouver_merged.columns[[1] + list(range(5, vancouver_merged.shape[1]))]]
cluster1

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Vancouver,0,Café,Coffee Shop,Sandwich Place,Pizza Place,Chinese Restaurant,Mexican Restaurant,Sushi Restaurant,Bakery,Restaurant,Clothing Store
2,Vancouver,0,Café,Beer Bar,Chinese Restaurant,Coffee Shop,Asian Restaurant,German Restaurant,Pizza Place,Restaurant,French Restaurant,Noodle House
3,Vancouver,0,Hotel,Food Truck,Coffee Shop,Clothing Store,Restaurant,Seafood Restaurant,Concert Hall,Bookstore,Dessert Shop,Steakhouse
4,Vancouver,0,Coffee Shop,Café,Beer Bar,Sandwich Place,Pub,Furniture / Home Store,Vegetarian / Vegan Restaurant,Pizza Place,Lounge,Mexican Restaurant
5,Vancouver,0,Coffee Shop,Café,Boat or Ferry,Hotel,Restaurant,Taco Place,Furniture / Home Store,Bar,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
6,Vancouver,0,Coffee Shop,Café,Sandwich Place,Pub,Clothing Store,Lounge,Plaza,Chinese Restaurant,Furniture / Home Store,Restaurant
7,Vancouver,0,Coffee Shop,Sandwich Place,Pub,Café,Plaza,Furniture / Home Store,Lounge,Vegetarian / Vegan Restaurant,Clothing Store,Pizza Place
8,Vancouver,0,Hotel,Coffee Shop,Italian Restaurant,Pizza Place,Sushi Restaurant,Restaurant,Concert Hall,Park,Mexican Restaurant,Spa
9,Vancouver,0,Coffee Shop,Boat or Ferry,Hotel,Café,Hotel Bar,Restaurant,Furniture / Home Store,Spa,Salon / Barbershop,Tea Room
10,Vancouver,0,Japanese Restaurant,Ramen Restaurant,Coffee Shop,Dessert Shop,Café,Grocery Store,Sushi Restaurant,Breakfast Spot,Italian Restaurant,Park


In [69]:
def get_venue_counts(cluster):
    dict_cluster = {}
    
    for cat in area_venues['Venue Category'].unique():
        dict_cluster[cat] = 0
        
    for col in cluster.columns[2:]:
        for key in cluster[col].values:
            if key in dict_cluster.keys():
                dict_cluster[key] = dict_cluster[key] + 1  
                
    df_cluster = pd.DataFrame.from_dict(dict_cluster, orient='index', columns=['Counts'])
    df_cluster.sort_values(by='Counts', ascending=False, inplace=True)
    return df_cluster

In [70]:
df_cluster1 = get_venue_counts(cluster1)
df_cluster1

Unnamed: 0,Counts
Coffee Shop,20
Café,13
Sushi Restaurant,13
Bakery,12
Japanese Restaurant,10
...,...
Building,0
Australian Restaurant,0
Dance Studio,0
Cafeteria,0


Cluster 2

In [71]:
cluster2 = vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 1, vancouver_merged.columns[[1] + list(range(5, vancouver_merged.shape[1]))]]
cluster2

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Vancouver,1,French Restaurant,Park,Ethiopian Restaurant,Food Truck,Food Court,Food & Drink Shop,Fish Market,Financial or Legal Service,Filipino Restaurant,Fast Food Restaurant
19,Vancouver,1,French Restaurant,Park,Ethiopian Restaurant,Food Truck,Food Court,Food & Drink Shop,Fish Market,Financial or Legal Service,Filipino Restaurant,Fast Food Restaurant
30,Vancouver,1,French Restaurant,Park,Ethiopian Restaurant,Food Truck,Food Court,Food & Drink Shop,Fish Market,Financial or Legal Service,Filipino Restaurant,Fast Food Restaurant


In [72]:
df_cluster2 = get_venue_counts(cluster2)
df_cluster2

Unnamed: 0,Counts
Financial or Legal Service,3
Filipino Restaurant,3
Fast Food Restaurant,3
Fish Market,3
Food & Drink Shop,3
...,...
Mobile Phone Shop,0
Toy / Game Store,0
Leather Goods Store,0
Middle Eastern Restaurant,0


Cluster 3

In [44]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 2, vancouver_merged.columns[[1] + list(range(5, vancouver_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Vancouver,2,Ice Cream Shop,Italian Restaurant,Sushi Restaurant,Coffee Shop,Sporting Goods Shop,Department Store,Falafel Restaurant,Food Truck,Dance Studio,Food Court
37,Vancouver,2,Ice Cream Shop,Italian Restaurant,Sushi Restaurant,Coffee Shop,Sporting Goods Shop,Department Store,Falafel Restaurant,Food Truck,Dance Studio,Food Court
49,Vancouver,2,Ice Cream Shop,Italian Restaurant,Sushi Restaurant,Coffee Shop,Sporting Goods Shop,Department Store,Falafel Restaurant,Food Truck,Dance Studio,Food Court


Cluster 4

In [45]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 3, vancouver_merged.columns[[1] + list(range(5, vancouver_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Vancouver,3,Park,Coffee Shop,Sandwich Place,Electronics Store,Food Court,Food & Drink Shop,Fish Market,Financial or Legal Service,Filipino Restaurant,Fast Food Restaurant
53,Vancouver,3,Coffee Shop,Fast Food Restaurant,Park,Yoga Studio,Ethiopian Restaurant,Food Truck,Food Court,Food & Drink Shop,Fish Market,Financial or Legal Service


Cluster 5

In [73]:
cluster5 = vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 4, vancouver_merged.columns[[1] + list(range(5, vancouver_merged.shape[1]))]]
cluster5

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Vancouver,4,Coffee Shop,Sandwich Place,Park,Deli / Bodega,Athletics & Sports,Brewery,Cheese Shop,Restaurant,Japanese Restaurant,Food Truck
16,Vancouver,4,Coffee Shop,Park,Asian Restaurant,Pharmacy,Indian Restaurant,Japanese Restaurant,Nail Salon,Restaurant,Sushi Restaurant,Sandwich Place
22,Vancouver,4,Vietnamese Restaurant,Inn,Bakery,Food Truck,Bus Line,Sushi Restaurant,Gas Station,Event Space,Liquor Store,Gun Shop
28,Vancouver,4,Vietnamese Restaurant,Inn,Bakery,Food Truck,Bus Line,Sushi Restaurant,Gas Station,Event Space,Liquor Store,Gun Shop
29,Vancouver,4,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Fried Chicken Joint,Bank,Pharmacy,Dessert Shop,Discount Store,Cantonese Restaurant,Café
31,Vancouver,4,Convenience Store,Pizza Place,Sushi Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Park,Sandwich Place,Electronics Store,Food & Drink Shop,Fish Market
32,Vancouver,4,Coffee Shop,Chinese Restaurant,Sushi Restaurant,Sandwich Place,Tea Room,Pharmacy,Convenience Store,Noodle House,Portuguese Restaurant,Thai Restaurant
34,Vancouver,4,Vietnamese Restaurant,Breakfast Spot,Brazilian Restaurant,Chinese Restaurant,Bank,Dim Sum Restaurant,Diner,Fast Food Restaurant,Fried Chicken Joint,Department Store
35,Vancouver,4,Coffee Shop,Chinese Restaurant,Vietnamese Restaurant,Bus Stop,Convenience Store,Greek Restaurant,Ice Cream Shop,Dessert Shop,Restaurant,Sandwich Place
36,Vancouver,4,Coffee Shop,Chinese Restaurant,Sushi Restaurant,Sandwich Place,Tea Room,Pharmacy,Convenience Store,Noodle House,Portuguese Restaurant,Thai Restaurant


In [74]:
df_cluster5 = get_venue_counts(cluster5)

In [75]:
df_cluster5

Unnamed: 0,Counts
Vietnamese Restaurant,15
Sandwich Place,15
Sushi Restaurant,14
Chinese Restaurant,12
Convenience Store,12
...,...
Bookstore,0
Food Court,0
Hot Dog Joint,0
Salon / Barbershop,0
