## Coursera Data Science Capstone Final Project

 ### Introduction/Business Problem: 
 
 1.1 Background
Metro Vancouver is a Canadian political subdivision and corporate entity representing the metropolitan area of Greater Vancouver. It consists of 21 municipalities. In the 2016 Census of Population conducted by Statistics Canada, the Metro Vancouver Regional District recorded a population of 2,463,431, making it the regional district in British Columbia with the greatest population and population density.


1.2 Problem
The aim of this project is to investigate the features of different cities in Metro Vancouver Area. Also, the project clusters the area into 5 clusters and explore the rank of popular venues in each cluster.


1.3 Interest
Since different cities in Metro Vancouver area has its unique characteristics, the project would be interesting to people who are potential travellers or residents. Since they are able to make a decision on which ares to explore based on their own preferences.


### Data

The postal codes for all British Columbian cities are scraped from wiki page (title: List of postal codes of Canada: V, website: https:// en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V.) 

The data was filtered based on the scope of cities included under Metro Vancouver Area umbrella. The city list can be found at Metro Vancouver official website. 

The latitude and longitude of each city is retrieved using Python pgeocode
library. 

The venues explored within each city is implemented using foursquare API.

#### Analyzing the features of each city in Metro Vancouver Area


In [391]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')


Folium installed
Libraries imported.


In [392]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V'
results = requests.get(url).text
soup=BeautifulSoup(results,'lxml')
table_contents=[]
table=soup.find('table')
table

<table class="wikitable sortable">
<tbody><tr>
<td valign="top" width="11.1%"><b>V1A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Kimberley,_British_Columbia" title="Kimberley, British Columbia">Kimberley</a></span>
</td>
<td valign="top" width="11.1%"><b>V2A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Penticton" title="Penticton">Penticton</a></span>
</td>
<td valign="top" width="11.1%"><b>V3A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Langley,_British_Columbia_(district_municipality)" title="Langley, British Columbia (district municipality)">Langley Township</a><br/>(Langley City)</span>
</td>
<td valign="top" width="11.1%"><b>V4A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Surrey,_British_Columbia" title="Surrey, British Columbia">Surrey</a><br/>Southwest</span>
</td>
<td valign="top" width="11.1%"><b>V5A</b><br/><span style="font-size: smaller; line-height

In [393]:
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.b.text[:3]
        cell['City'] = (row.span.text).split('(')[0]
        table_contents.append(cell)

        

In [394]:
df=pd.DataFrame(table_contents)
print (df.shape)

(175, 2)


In [395]:
df.City.values

array(['Kimberley', 'Penticton', 'Langley Township', 'SurreySouthwest',
       'Burnaby', 'Vancouver', 'RichmondSouth', 'Powell River',
       'Victoria', 'VernonEast', 'KamloopsNorthwest',
       'Port CoquitlamCentral', 'White Rock', 'Burnaby', 'Vancouver',
       'Richmond', 'Squamish', 'Victoria', 'Cranbrook',
       'KamloopsCentral and Southeast', 'Port CoquitlamSouth',
       'DeltaNortheast', 'Burnaby', 'Vancouver', 'RichmondNorthwest',
       'Kitimat', 'Victoria', 'Salmon Arm', 'KamloopsSouth and West',
       'CoquitlamNorth', 'DeltaEast', 'Burnaby', 'Vancouver',
       'RichmondSouthwest', 'Whistler', 'Victoria', 'Dawson Creek',
       'Williams Lake', 'AbbotsfordEast', 'DeltaEast Central', 'Burnaby',
       'Vancouver', 'North Vancouver ', 'Terrace', 'Ladysmith',
       'VernonWest', 'KamloopsNorth', 'Port Moody', 'Burnaby',
       'Vancouver', 'North Vancouver ', 'Campbell RiverOutskirts',
       'Fort St. John', 'Quesnel', 'CoquitlamNorth', 'Burnaby',
       'Vancouver',

In [396]:
Vancouver_city=['Vancouver','Burnaby','Richmond','Langley Township','Port CoquitlamCentral', 'White Rock','Port CoquitlamSouth','DeltaSoutheast',
       'DeltaNortheast', 'Burnaby', 'RichmondNorthwest','CoquitlamNorth','DeltaEast','RichmondSouthwest','DeltaEast Central','North Vancouver ','SurreyNorth','SurreyInner Northwest','West VancouverSoutheast','SurreyUpper East','West VancouverSoutheast','SurreyOuter Northwest','RichmondNortheast', 'West VancouverSouth','Maple RidgeEast','SurreyUpper West', 'Langley TownshipEast','RichmondSoutheast', 'West VancouverWest','Maple RidgeWest','SurreyLower West','RichmondNorth','Langley TownshipNorthwest','Pitt Meadows','Langley TownshipSouthwest',
       'SurreyLower East','New WestminsterNortheast','New WestminsterSouthwest','CoquitlamNorth'
 ]
df_vancouver=df[df['City'].isin(Vancouver_city)]
print (df_vancouver.shape)
print (df_vancouver.head(5))

(82, 2)
   PostalCode                   City
2         V3A       Langley Township
4         V5A                Burnaby
5         V6A              Vancouver
11        V3B  Port CoquitlamCentral
12        V4B             White Rock


## Find the coresponding latitude and longtitude for cities/areas in metro Vancouver

In [397]:
!pip install pgeocode
import pgeocode



In [398]:
Postal_list=df_vancouver['PostalCode'].tolist()

Postal_list


['V3A',
 'V5A',
 'V6A',
 'V3B',
 'V4B',
 'V5B',
 'V6B',
 'V7B',
 'V3C',
 'V4C',
 'V5C',
 'V6C',
 'V7C',
 'V3E',
 'V4E',
 'V5E',
 'V6E',
 'V7E',
 'V4G',
 'V5G',
 'V6G',
 'V7G',
 'V5H',
 'V6H',
 'V7H',
 'V3J',
 'V5J',
 'V6J',
 'V7J',
 'V5K',
 'V6K',
 'V7K',
 'V3L',
 'V4L',
 'V5L',
 'V6L',
 'V7L',
 'V3M',
 'V5M',
 'V6M',
 'V7M',
 'V3N',
 'V5N',
 'V6N',
 'V7N',
 'V5P',
 'V6P',
 'V7P',
 'V3R',
 'V5R',
 'V6R',
 'V7R',
 'V3S',
 'V5S',
 'V6S',
 'V3T',
 'V5T',
 'V6T',
 'V7T',
 'V3V',
 'V5V',
 'V6V',
 'V7V',
 'V2W',
 'V3W',
 'V4W',
 'V5W',
 'V6W',
 'V7W',
 'V2X',
 'V3X',
 'V5X',
 'V6X',
 'V7X',
 'V2Y',
 'V3Y',
 'V5Y',
 'V7Y',
 'V2Z',
 'V3Z',
 'V5Z',
 'V6Z']

In [399]:
nomi = pgeocode.Nominatim('Ca')

Vancouver_coor=nomi.query_postal_code(Postal_list)
Vancouver_coor.head()

Unnamed: 0,postal_code,country_code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,V3A,CA,Langley City,British Columbia,BC,,,,,49.0997,-122.6526,4.0
1,V5A,CA,Burnaby (Government Road / Lake City / SFU / B...,British Columbia,BC,Burnaby,,,,49.264,-122.9369,6.0
2,V6A,CA,Vancouver (Strathcona / Chinatown / Downtown E...,British Columbia,BC,Vancouver,5965814.0,,,49.2779,-123.0908,1.0
3,V3B,CA,Port Coquitlam Central,British Columbia,BC,Port Coquitlam,,,,49.274,-122.7649,1.0
4,V4B,CA,White Rock,British Columbia,BC,,,,,49.0259,-122.8058,4.0


In [400]:
df_vancouver.head(20)


Unnamed: 0,PostalCode,City
2,V3A,Langley Township
4,V5A,Burnaby
5,V6A,Vancouver
11,V3B,Port CoquitlamCentral
12,V4B,White Rock
13,V5B,Burnaby
14,V6B,Vancouver
15,V7B,Richmond
20,V3C,Port CoquitlamSouth
21,V4C,DeltaNortheast


In [401]:
df_vancouver.rename(columns={'PostalCode':'postal_code'},inplace=True)
df_vancouver.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,postal_code,City
2,V3A,Langley Township
4,V5A,Burnaby
5,V6A,Vancouver
11,V3B,Port CoquitlamCentral
12,V4B,White Rock


In [402]:
#df_vancouver.rename(columns={'PostalCode':'PostalCode'},inplace=True)
df_vancouver_coor = pd.merge(df_vancouver,Vancouver_coor,on='postal_code')
df_vancouver_coor.head()

Unnamed: 0,postal_code,City,country_code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,V3A,Langley Township,CA,Langley City,British Columbia,BC,,,,,49.0997,-122.6526,4.0
1,V5A,Burnaby,CA,Burnaby (Government Road / Lake City / SFU / B...,British Columbia,BC,Burnaby,,,,49.264,-122.9369,6.0
2,V6A,Vancouver,CA,Vancouver (Strathcona / Chinatown / Downtown E...,British Columbia,BC,Vancouver,5965814.0,,,49.2779,-123.0908,1.0
3,V3B,Port CoquitlamCentral,CA,Port Coquitlam Central,British Columbia,BC,Port Coquitlam,,,,49.274,-122.7649,1.0
4,V4B,White Rock,CA,White Rock,British Columbia,BC,,,,,49.0259,-122.8058,4.0


In [403]:
df_vancouver_coor=df_vancouver_coor[['postal_code','City','latitude','longitude']]
print (df_vancouver_coor.shape)
df_vancouver_coor.head()

(82, 4)


Unnamed: 0,postal_code,City,latitude,longitude
0,V3A,Langley Township,49.0997,-122.6526
1,V5A,Burnaby,49.264,-122.9369
2,V6A,Vancouver,49.2779,-123.0908
3,V3B,Port CoquitlamCentral,49.274,-122.7649
4,V4B,White Rock,49.0259,-122.8058


In [404]:
address = 'Vancouver, VA'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vancouver are 49.2608724, -123.1139529.


In [405]:

map_vancouver = folium.Map(location=[latitude, longitude], zoom_start=10)


for lat, lng, postalcode, city in zip(df_vancouver_coor['latitude'], df_vancouver_coor['longitude'], df_vancouver_coor['postal_code'], df_vancouver_coor['City']):
    label = '{}'.format(city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vancouver)  
    
map_vancouver

In [406]:
CLIENT_ID = 'ER5LEYWZJQNVFXTKKORK20K4OOZ5HZIYD0SWTZRZP2UYTEL5' # your Foursquare ID
CLIENT_SECRET = 'N5RGRMVFSE32GZGYVEAG03KZJW5OQKFG00FJDNDN15VDFFQC' # your Foursquare Secret
ACCESS_TOKEN = 'Q5IQT4DK4V11CZQMZPDG3WTSJPZP3FOHCMXDPOUEVQQBB5HX' # your FourSquare Access Token
VERSION = '20180604'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ER5LEYWZJQNVFXTKKORK20K4OOZ5HZIYD0SWTZRZP2UYTEL5
CLIENT_SECRET:N5RGRMVFSE32GZGYVEAG03KZJW5OQKFG00FJDNDN15VDFFQC


In [407]:
df_vancouver_coor.head(30)

Unnamed: 0,postal_code,City,latitude,longitude
0,V3A,Langley Township,49.0997,-122.6526
1,V5A,Burnaby,49.264,-122.9369
2,V6A,Vancouver,49.2779,-123.0908
3,V3B,Port CoquitlamCentral,49.274,-122.7649
4,V4B,White Rock,49.0259,-122.8058
5,V5B,Burnaby,49.2769,-122.9761
6,V6B,Vancouver,49.2788,-123.1139
7,V7B,Richmond,49.1988,-123.1799
8,V3C,Port CoquitlamSouth,49.2436,-122.7865
9,V4C,DeltaNortheast,49.1551,-122.9124


In [408]:
limit=100
def getNearbyVenues(codes,names,latitudes,longitudes,radius=500):
    venues_list=[]
    for code,name,lat,lng in zip(codes,names,latitudes,longitudes):
        url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,limit)
        results=requests.get(url).json()['response']['groups'][0]['items']
        venues_list.append([(code,name,lat,lng,v['venue']['name'],v['venue']['location']['lat'],
                          v['venue']['location']['lng'],
                          v['venue']['categories'][0]['name'])for v in results])
    nearby_venues=pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns=['PostalCode','City','City Latitude',
                          'City Longitude','Venue name','Venue Latitude','Venue Longitude','Venue Category']
    return(nearby_venues)

In [409]:
Vancouver_Venues=getNearbyVenues(codes=df_vancouver_coor['postal_code'],
                               names=df_vancouver_coor['City'],
                               latitudes=df_vancouver_coor['latitude'],
                               longitudes=df_vancouver_coor['longitude'],radius=500)
Vancouver_Venues.head(20)



Unnamed: 0,PostalCode,City,City Latitude,City Longitude,Venue name,Venue Latitude,Venue Longitude,Venue Category
0,V3A,Langley Township,49.0997,-122.6526,Ban Chok Dee,49.103026,-122.652841,Thai Restaurant
1,V3A,Langley Township,49.0997,-122.6526,Venetis Restaurant,49.10338,-122.65373,Steakhouse
2,V3A,Langley Township,49.0997,-122.6526,Maru Sushi Japanese Restaurant,49.103715,-122.650223,Sushi Restaurant
3,V3A,Langley Township,49.0997,-122.6526,McBurney Coffee and Tea House,49.104015,-122.653886,Café
4,V3A,Langley Township,49.0997,-122.6526,Michael's No Frills,49.102296,-122.65817,Grocery Store
5,V3A,Langley Township,49.0997,-122.6526,Douglas Park,49.102418,-122.652759,Playground
6,V3A,Langley Township,49.0997,-122.6526,Katy's Restaurant,49.103213,-122.653373,American Restaurant
7,V5A,Burnaby,49.264,-122.9369,Burnaby Mountain Golf Course,49.264878,-122.942871,Golf Course
8,V5A,Burnaby,49.264,-122.9369,CaseMogul Phone Repairs,49.260747,-122.939115,Mobile Phone Shop
9,V5A,Burnaby,49.264,-122.9369,Burnaby Mountain Driving Range,49.263959,-122.942353,Golf Driving Range


In [410]:
Vancouver_Venues.shape

(1090, 8)

In [434]:

Vancouver_Venues.groupby('City').count().sort_values(['Venue name'], ascending=False)

Unnamed: 0_level_0,PostalCode,City Latitude,City Longitude,Venue name,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Vancouver,755,755,755,755,755,755,755
Burnaby,68,68,68,68,68,68,68
SurreyInner Northwest,66,66,66,66,66,66,66
North Vancouver,42,42,42,42,42,42,42
Richmond,33,33,33,33,33,33,33
White Rock,24,24,24,24,24,24,24
DeltaNortheast,14,14,14,14,14,14,14
Pitt Meadows,11,11,11,11,11,11,11
SurreyNorth,11,11,11,11,11,11,11
Langley Township,7,7,7,7,7,7,7


In [412]:
print('There are {} uniques categories.'.format(len(Vancouver_Venues['Venue Category'].unique())))

There are 225 uniques categories.


In [413]:
# one hot encoding
vancouver_onehot = pd.get_dummies(Vancouver_Venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vancouver_onehot['City'] = Vancouver_Venues['City'] 

# move neighborhood column to the first column
fixed_columns = [vancouver_onehot.columns[-1]] + list(vancouver_onehot.columns[:-1])
vancouver_onehot = vancouver_onehot[fixed_columns]

vancouver_onehot.head()

Unnamed: 0,City,Accessories Store,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,Art Gallery,...,Travel Lounge,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio
0,Langley Township,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Langley Township,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Langley Township,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Langley Township,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Langley Township,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [414]:
vancouver_grouped =vancouver_onehot.groupby('City').mean().reset_index()
vancouver_grouped

Unnamed: 0,City,Accessories Store,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,Art Gallery,...,Travel Lounge,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Yoga Studio
0,Burnaby,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,...,0.0,0.0,0.014706,0.0,0.014706,0.0,0.0,0.0,0.0,0.0
1,CoquitlamNorth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,DeltaEast,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,DeltaEast Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,DeltaNortheast,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0
5,Langley Township,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Langley TownshipNorthwest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Langley TownshipSouthwest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Maple RidgeEast,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Maple RidgeWest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [415]:

num_top_venues = 5

for hood in vancouver_grouped['City']:
    print("----"+hood+"----")
    temp = vancouver_grouped[vancouver_grouped['City'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Burnaby----
         venue  freq
0  Coffee Shop  0.09
1         Park  0.09
2     Pharmacy  0.04
3  Golf Course  0.04
4     Bus Stop  0.03


----CoquitlamNorth----
                  venue  freq
0              Mountain   0.5
1                  Park   0.5
2     Accessories Store   0.0
3  Other Great Outdoors   0.0
4         Movie Theater   0.0


----DeltaEast----
                  venue  freq
0                 Trail   1.0
1     Accessories Store   0.0
2  Other Great Outdoors   0.0
3              Mountain   0.0
4         Movie Theater   0.0


----DeltaEast Central----
                    venue  freq
0  Furniture / Home Store   1.0
1       Accessories Store   0.0
2                   Motel   0.0
3                Mountain   0.0
4           Movie Theater   0.0


----DeltaNortheast----
            venue  freq
0     Coffee Shop  0.21
1     Pizza Place  0.14
2             Pub  0.07
3  Farmers Market  0.07
4   Grocery Store  0.07


----Langley Township----
                 venue  freq
0       

In [416]:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [417]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['City'] = vancouver_grouped['City']

for ind in np.arange(vancouver_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vancouver_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Burnaby,Coffee Shop,Park,Pharmacy,Golf Course,Bus Stop,Sushi Restaurant,Thai Restaurant,Bookstore,Garden Center,Chinese Restaurant
1,CoquitlamNorth,Mountain,Park,Yoga Studio,Event Space,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant
2,DeltaEast,Trail,Yoga Studio,Event Service,Food & Drink Shop,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant
3,DeltaEast Central,Furniture / Home Store,Yoga Studio,Food Court,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
4,DeltaNortheast,Coffee Shop,Pizza Place,Pub,Sandwich Place,Sushi Restaurant,Skating Rink,Grocery Store,Convenience Store,Farmers Market,Vietnamese Restaurant


In [418]:
# set number of clusters
kclusters = 5

vancouver_grouped_clustering = vancouver_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vancouver_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 4, 2, 1, 0, 0, 0, 0, 0, 0], dtype=int32)

In [419]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#df_vancouver_coor = df_vancouver_coor
#vancouver_merged = vancouver_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
df_vancouver_coor = df_vancouver_coor.join(neighborhoods_venues_sorted.set_index('City'), on='City')



In [420]:
df_vancouver_coor.shape

(82, 15)

In [421]:
df_vancouver_coor.head()

Unnamed: 0,postal_code,City,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V3A,Langley Township,49.0997,-122.6526,0.0,Thai Restaurant,Sushi Restaurant,Playground,Café,Steakhouse,American Restaurant,Grocery Store,Fair,Falafel Restaurant,Farm
1,V5A,Burnaby,49.264,-122.9369,0.0,Coffee Shop,Park,Pharmacy,Golf Course,Bus Stop,Sushi Restaurant,Thai Restaurant,Bookstore,Garden Center,Chinese Restaurant
2,V6A,Vancouver,49.2779,-123.0908,0.0,Coffee Shop,Hotel,Café,Restaurant,Chinese Restaurant,Pizza Place,Japanese Restaurant,Sushi Restaurant,Bakery,Park
3,V3B,Port CoquitlamCentral,49.274,-122.7649,0.0,Convenience Store,Construction & Landscaping,Park,Yoga Studio,Event Space,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field
4,V4B,White Rock,49.0259,-122.8058,0.0,Japanese Restaurant,Seafood Restaurant,Pizza Place,Café,Greek Restaurant,Gas Station,Beach,Gastropub,Bistro,Thai Restaurant


In [422]:

df_vancouver_coor.loc[df_vancouver_coor['Cluster Labels'] == 0,df_vancouver_coor.columns[[1] + list(range(4, df_vancouver_coor.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Langley Township,0.0,Thai Restaurant,Sushi Restaurant,Playground,Café,Steakhouse,American Restaurant,Grocery Store,Fair,Falafel Restaurant,Farm
1,Burnaby,0.0,Coffee Shop,Park,Pharmacy,Golf Course,Bus Stop,Sushi Restaurant,Thai Restaurant,Bookstore,Garden Center,Chinese Restaurant
2,Vancouver,0.0,Coffee Shop,Hotel,Café,Restaurant,Chinese Restaurant,Pizza Place,Japanese Restaurant,Sushi Restaurant,Bakery,Park
3,Port CoquitlamCentral,0.0,Convenience Store,Construction & Landscaping,Park,Yoga Studio,Event Space,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field
4,White Rock,0.0,Japanese Restaurant,Seafood Restaurant,Pizza Place,Café,Greek Restaurant,Gas Station,Beach,Gastropub,Bistro,Thai Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
77,Vancouver,0.0,Coffee Shop,Hotel,Café,Restaurant,Chinese Restaurant,Pizza Place,Japanese Restaurant,Sushi Restaurant,Bakery,Park
78,Langley TownshipSouthwest,0.0,Moving Target,Yoga Studio,Deli / Bodega,Food & Drink Shop,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant
79,SurreyLower East,0.0,Business Service,Breakfast Spot,Tour Provider,Event Space,Food & Drink Shop,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field
80,Vancouver,0.0,Coffee Shop,Hotel,Café,Restaurant,Chinese Restaurant,Pizza Place,Japanese Restaurant,Sushi Restaurant,Bakery,Park


In [423]:
df_vancouver_coor.loc[df_vancouver_coor['Cluster Labels'] == 1,df_vancouver_coor.columns[[1] + list(range(4, df_vancouver_coor.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,DeltaEast Central,1.0,Furniture / Home Store,Yoga Studio,Food Court,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market


In [424]:
df_vancouver_coor.loc[df_vancouver_coor['Cluster Labels'] == 2,df_vancouver_coor.columns[[1] + list(range(4, df_vancouver_coor.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,RichmondNorthwest,2.0,Trail,Shopping Plaza,Yoga Studio,Event Service,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant
14,DeltaEast,2.0,Trail,Yoga Studio,Event Service,Food & Drink Shop,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant
68,West VancouverWest,2.0,Tapas Restaurant,Trail,Yoga Studio,Event Service,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant


In [425]:
df_vancouver_coor.loc[df_vancouver_coor['Cluster Labels'] == 3,df_vancouver_coor.columns[[1] + list(range(4, df_vancouver_coor.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,West VancouverSouth,3.0,Art Gallery,Yoga Studio,Event Space,Food & Drink Shop,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant


In [426]:
df_vancouver_coor.loc[df_vancouver_coor['Cluster Labels'] == 4,df_vancouver_coor.columns[[1] + list(range(4, df_vancouver_coor.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,CoquitlamNorth,4.0,Mountain,Park,Yoga Studio,Event Space,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant
25,CoquitlamNorth,4.0,Mountain,Park,Yoga Studio,Event Space,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant
72,RichmondNorth,4.0,Playground,Park,Asian Restaurant,Yoga Studio,Event Service,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Field
