## The Battle of the Neighborhoods - Week 2

## Table of contents
* [Introduction](#introduction)
* [Business Problem](#BusinessProblem)
* [Data](#data)

### Introduction <a name="introduction"></a>

The city of Calgary is one of the largest municipalities Canada located in the province of Alberta. The city had a population of 1,285,711 in 2019, making it Alberta's largest city and Canada's third-largest municipality.

Calgary's economy includes activity in the energy, financial services, film and television, transportation and logistics, technology, manufacturing, aerospace, health and wellness, retail, and tourism sectors.

The Calgary Metropolitan Area (CMA) is home to Canada's second-highest number of corporate head offices among the country's 800 largest corporations.

With a thriving population and rich economy like this, there is no doubt a restaurant might be a good business venture in the city of Calgary.

However, an investor needs to be confident that they are making the right considerations before setting up a restaurant business in the city Calgary.

### Business Problem <a name="BusinessProblem"></a>

With the purpose in mind, finding the right location to open a thriving restaurant is one of the crucial factors to guarantee success in this business venture.

In this Capstone project, I will be focussing on different types of restaurants which are opened or closed in a particular location and then decide if it is good place to open a new restaurant based on the popular cuisine around that place. By using location analytics and machine learning algorithms such as clustering, this project aims to provide solutions to answer these business questions.

### Source of Data <a name="data"></a>

For this analysis, I will be using the "List of neighbourhoods in Calgary" data scraped from Wikipedia
 (https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Calgary). 
    
From the scraped data, there are total 257 neighbourhoods are in Calgary. The data will be trimmed down to two features ("Name" and "Sector") to remove irrelevant data for this analysis.

In [1]:
pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/e7/a8/40115c84414c017e1a293f331709eb7534303d3ccd11ef805ac09b1481e7/lxml-4.4.1-cp37-cp37m-manylinux1_x86_64.whl (5.7MB)
[K     |████████████████████████████████| 5.8MB 3.3MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.4.1
Note: you may need to restart the kernel to use updated packages.


In [2]:
#import libraries
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import lxml.html as lh
import urllib.request

In [3]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Calgary')
print('Data inserted into dataframe')

Data inserted into dataframe


In [4]:
calgary_df = df[0]
#Exploring the dataset
calgary_df.head()

Unnamed: 0,Name[9],Quadrant,Sector[10],Ward[11],Type[10],2012 PopulationRank,Population(2012)[9],Population(2011)[9],% change,Dwellings(2012)[9],Area(km2)[10],Populationdensity
0,Abbeydale,NE/SE,Northeast,10,Residential,82,5917.0,5700.0,3.8,2023.0,1.7,3480.6
1,Acadia,SE,South,9,Residential,27,10705.0,10615.0,0.8,5053.0,3.9,2744.9
2,Albert Park/Radisson Heights,SE,East,10,Residential,75,6234.0,6217.0,0.3,2709.0,2.5,2493.6
3,Altadore,SW,Centre,11,Residential,39,9116.0,8907.0,2.3,4486.0,2.9,3143.4
4,Alyth/Bonnybrook,SE,Centre,9,Industrial,208,16.0,17.0,−5.9,14.0,3.8,4.2


The data will be enriched by writing a function to append "Calgary" to each neighborhood to enhance the chances of looking up the coordinates of each neighborhood.
Further enrichment of the data to include the coordinates (latitude and longitude) of each neighbourhood using the geopy library.

In [5]:
calgary_df.drop(columns=["Quadrant","Ward[11]", "Type[10]","2012 PopulationRank","Population(2012)[9]","Population(2011)[9]","% change","Dwellings(2012)[9]","Area(km2)[10]","Populationdensity"], inplace=True)
calgary_df.columns = ['Neighborhood', 'Location']

In [6]:
calgary_df['Location'] = calgary_df['Location'].apply(lambda x: "{}{}".format(x, ', Calgary'))
calgary_df.head()

Unnamed: 0,Neighborhood,Location
0,Abbeydale,"Northeast, Calgary"
1,Acadia,"South, Calgary"
2,Albert Park/Radisson Heights,"East, Calgary"
3,Altadore,"Centre, Calgary"
4,Alyth/Bonnybrook,"Centre, Calgary"


In [7]:
calgary_df.shape

(258, 2)

In [8]:
!pip install geoPy

Collecting geoPy
[?25l  Downloading https://files.pythonhosted.org/packages/80/93/d384479da0ead712bdaf697a8399c13a9a89bd856ada5a27d462fb45e47b/geopy-1.20.0-py2.py3-none-any.whl (100kB)
[K     |████████████████████████████████| 102kB 3.2MB/s ta 0:00:011
[?25hCollecting geographiclib<2,>=1.49 (from geoPy)
  Downloading https://files.pythonhosted.org/packages/5b/ac/4f348828091490d77899bc74e92238e2b55c59392f21948f296e94e50e2b/geographiclib-1.49.tar.gz
Building wheels for collected packages: geographiclib
  Building wheel for geographiclib (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jovyan/.cache/pip/wheels/99/45/d1/14954797e2a976083182c2e7da9b4e924509e59b6e5c661061
Successfully built geographiclib
Installing collected packages: geographiclib, geoPy
Successfully installed geoPy-1.20.0 geographiclib-1.49


## Convert addresses into Latitude and Longitude

In [9]:
from geopy.geocoders import Nominatim
lat=[]
lng=[]
def getLatLng(row):
    geolocator = Nominatim(user_agent='foursquare')
    print(row[0]+', '+row[1])
    location = geolocator.geocode(row[0]+', '+row[1])
    if location != None:
        lat.append(location.latitude)
        lng.append(location.longitude)
    else:
        lat.append(None)
        lng.append(None)

In [10]:
calgary_df.apply(getLatLng, axis=1)

Abbeydale, Northeast, Calgary
Acadia, South, Calgary
Albert Park/Radisson Heights, East, Calgary
Altadore, Centre, Calgary
Alyth/Bonnybrook, Centre, Calgary
Applewood Park, East, Calgary
Arbour Lake, Northwest, Calgary
Aspen Woods, West, Calgary
Auburn Bay, Southeast, Calgary
Aurora Business Park, North, Calgary
Banff Trail, Centre, Calgary
Bankview, Centre, Calgary
Bayview, South, Calgary
Beddington Heights, North, Calgary
Bel-Aire, Centre, Calgary
Beltline, Centre, Calgary
Bonavista Downs, South, Calgary
Bowness, Northwest, Calgary
Braeside, South, Calgary
Brentwood, Northwest, Calgary
Bridgeland/Riverside, Centre, Calgary
Bridlewood, South, Calgary
Britannia, Centre, Calgary
Burns Industrial, Centre, Calgary
Calgary International Airport, Northeast, Calgary
Cambrian Heights, Centre, Calgary
Canada Olympic Park, West, Calgary
Canyon Meadows, South, Calgary
Capitol Hill, Centre, Calgary
Castleridge, Northeast, Calgary
Cedarbrae, South, Calgary
CFB Currie, West, Calgary
CFB Lincoln Par

0      None
1      None
2      None
3      None
4      None
5      None
6      None
7      None
8      None
9      None
10     None
11     None
12     None
13     None
14     None
15     None
16     None
17     None
18     None
19     None
20     None
21     None
22     None
23     None
24     None
25     None
26     None
27     None
28     None
29     None
       ... 
228    None
229    None
230    None
231    None
232    None
233    None
234    None
235    None
236    None
237    None
238    None
239    None
240    None
241    None
242    None
243    None
244    None
245    None
246    None
247    None
248    None
249    None
250    None
251    None
252    None
253    None
254    None
255    None
256    None
257    None
Length: 258, dtype: object

In [11]:
calgary_df['Latitude']=lat
calgary_df['Longitude']=lng

In [12]:
print(calgary_df['Latitude'].describe())
print(calgary_df['Longitude'].describe())
calgary_df.dropna(axis=0, inplace=True)

count    229.000000
mean      51.040531
std        0.070966
min       50.856893
25%       50.997947
50%       51.047031
75%       51.088182
max       51.178975
Name: Latitude, dtype: float64
count    229.000000
mean    -114.069943
std        0.077760
min     -114.265072
25%     -114.115487
50%     -114.073960
75%     -114.010572
max     -113.925905
Name: Longitude, dtype: float64


In [13]:
calgary_df.shape

(229, 4)

In [14]:
calgary_df.reset_index(drop=True)

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Abbeydale,"Northeast, Calgary",51.058836,-113.929413
1,Acadia,"South, Calgary",50.968655,-114.055587
2,Albert Park/Radisson Heights,"East, Calgary",51.044845,-113.990195
3,Altadore,"Centre, Calgary",51.015104,-114.100756
4,Alyth/Bonnybrook,"Centre, Calgary",51.016669,-114.024294
5,Applewood Park,"East, Calgary",51.044658,-113.928931
6,Arbour Lake,"Northwest, Calgary",51.136786,-114.202355
7,Aspen Woods,"West, Calgary",51.043119,-114.210185
8,Auburn Bay,"Southeast, Calgary",50.890605,-113.959565
9,Aurora Business Park,"North, Calgary",51.140549,-114.062707


In [15]:
!pip install folium
import folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 4.1MB/s eta 0:00:011
Installing collected packages: folium
Successfully installed folium-0.10.0


In [16]:
address = 'Calgary, Alberta'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

51.02532675 -114.049868485806


In [46]:
clat, clog = 51.0253,-114.0498
calgary_map = folium.Map(location=[clat, clog], zoom_start=5)
calgary_map

## Visualize Calgary's neighborhood using folium

In [18]:
CLIENT_ID = 'XNINTZ000YTOTSL4NXGS4AUEFKRBR0BBRHO2NSGW3SR4GLOO' # your Foursquare ID
CLIENT_SECRET = '3HFK3SY5FNWWRLVYG25NYVH4V2W4D4QIXW5UCCQP4XEMHSWI' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XNINTZ000YTOTSL4NXGS4AUEFKRBR0BBRHO2NSGW3SR4GLOO
CLIENT_SECRET:3HFK3SY5FNWWRLVYG25NYVH4V2W4D4QIXW5UCCQP4XEMHSWI


In [47]:
for label, clat, clog in zip(calgary_df['Neighborhood'], calgary_df['Latitude'], calgary_df['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        location=[clat, clog],
        radius=10,
        popup=label,
        colur='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6,
        parse_html=False
    ).add_to(calgary_map)
    
calgary_map

In [20]:
def getNearbyRestaurants(names, latitudes, longitudes, radius=3000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&section=food&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
calgary_business = getNearbyRestaurants(names=calgary_df['Neighborhood'],
                                   latitudes=calgary_df['Latitude'],
                                   longitudes=calgary_df['Longitude']
                                  )

Abbeydale
Acadia
Albert Park/Radisson Heights
Altadore
Alyth/Bonnybrook
Applewood Park
Arbour Lake
Aspen Woods
Auburn Bay
Aurora Business Park
Banff Trail
Bankview
Bayview
Beddington Heights
Bel-Aire
Beltline
Bonavista Downs
Bowness
Braeside
Brentwood
Bridgeland/Riverside
Bridlewood
Britannia
Burns Industrial
Calgary International Airport
Cambrian Heights
Canada Olympic Park
Canyon Meadows
Capitol Hill
Castleridge
Cedarbrae
Chaparral
Charleswood
Chinatown
Chinook Park
Christie Park
Citadel
Cliff Bungalow
Coach Hill
Collingwood
Copperfield
Coral Springs
Cougar Ridge
Country Hills
Country Hills Village
Coventry Hills
Cranston
Crescent Heights
Crestmont
Dalhousie
Deer Ridge
Deer Run
Diamond Cove
Discovery Ridge
Dover
Downtown Commercial Core
Downtown East Village
Downtown West End
Eagle Ridge
East Fairview Industrial
East Shepard Industrial
Eastfield
Eau Claire
Edgemont
Elbow Park
Elboya
Erin Woods
Erlton
Evanston
Evergreen
Fairview
Fairview Industrial
Falconridge
Foothills
Forest Heights

In [22]:
print(calgary_business.shape)
len(calgary_business['Venue Latitude'].unique())

(13153, 7)


1447

In [23]:
calgary_restaurants_unique = calgary_business.drop_duplicates(subset=['Venue Latitude', 'Venue Longitude'], keep='first')

In [24]:
calgary_restaurants_unique = calgary_restaurants_unique[calgary_restaurants_unique['Venue Category'].str.contains('Restaurant')]

In [25]:
#pd.set_option('display.max_rows', None)
calgary_restaurants_unique.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Abbeydale,51.058836,-113.929413,A&W Canada,51.068291,-113.933571,Fast Food Restaurant
3,Abbeydale,51.058836,-113.929413,Song Huong Vietnamese Restaurant,51.038606,-113.942208,Vietnamese Restaurant
6,Abbeydale,51.058836,-113.929413,McDonald's,51.075787,-113.958094,Fast Food Restaurant
9,Abbeydale,51.058836,-113.929413,KFC,51.064316,-113.957155,Fast Food Restaurant
10,Abbeydale,51.058836,-113.929413,Barrio Fiesta,51.052695,-113.935544,Filipino Restaurant


## Visualize all the venues in Calgary

In [26]:
clat, clng = 51.0253,-114.0498
calg_rest_map = folium.Map([clat, clng], zoom_start=10)
for label, lat, lng in zip(calgary_restaurants_unique['Venue'], calgary_restaurants_unique['Venue Latitude'], 
                           calgary_restaurants_unique['Venue Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        location=[lat, lng],
        radius=5,
        popup=label,
        colur='red',
        fill=True,
        fill_color='green',
        fill_opacity=0.6,
        parse_html=False
    ).add_to(calg_rest_map)
from IPython.display import display
display(calg_rest_map)

In [27]:
calgary_restaurants_unique.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbeydale,15,15,15,15,15,15
Acadia,65,65,65,65,65,65
Albert Park/Radisson Heights,55,55,55,55,55,55
Altadore,59,59,59,59,59,59
Alyth/Bonnybrook,20,20,20,20,20,20
Arbour Lake,32,32,32,32,32,32
Aspen Woods,18,18,18,18,18,18
Auburn Bay,10,10,10,10,10,10
Aurora Business Park,31,31,31,31,31,31
Banff Trail,20,20,20,20,20,20


In [28]:
calgary_onehot = pd.get_dummies(calgary_restaurants_unique['Venue Category'])
calgary_onehot.insert(loc=0, column='Neighborhood', value=calgary_restaurants_unique['Neighborhood'])

In [41]:
calgary_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Chinese Restaurant,Dim Sum Restaurant,...,Shabu-Shabu Restaurant,Southern / Soul Food Restaurant,Sushi Restaurant,Swiss Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
1,Abbeydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Abbeydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
6,Abbeydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Abbeydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10,Abbeydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
calgary_grouped = calgary_onehot.groupby('Neighborhood').mean().reset_index()

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = calgary_grouped['Neighborhood']

for ind in np.arange(calgary_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(calgary_grouped.iloc[ind, :], num_top_venues)

## Top 10 Venues in each neighborhood

In [33]:
neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbeydale,Fast Food Restaurant,Vietnamese Restaurant,Italian Restaurant,Asian Restaurant,Chinese Restaurant,Restaurant,Filipino Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant
1,Acadia,Fast Food Restaurant,Vietnamese Restaurant,Sushi Restaurant,American Restaurant,Restaurant,Italian Restaurant,Mexican Restaurant,Asian Restaurant,Chinese Restaurant,Greek Restaurant
2,Albert Park/Radisson Heights,Vietnamese Restaurant,Fast Food Restaurant,Restaurant,Asian Restaurant,Indian Restaurant,Italian Restaurant,American Restaurant,Chinese Restaurant,Falafel Restaurant,Korean Restaurant
3,Altadore,Vietnamese Restaurant,Restaurant,Mexican Restaurant,Fast Food Restaurant,French Restaurant,Greek Restaurant,American Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant
4,Alyth/Bonnybrook,Restaurant,Fast Food Restaurant,Vietnamese Restaurant,American Restaurant,Chinese Restaurant,Eastern European Restaurant,Mediterranean Restaurant,French Restaurant,Japanese Restaurant,Empanada Restaurant
5,Arbour Lake,Fast Food Restaurant,Vietnamese Restaurant,Chinese Restaurant,Japanese Restaurant,Greek Restaurant,Mexican Restaurant,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant
6,Aspen Woods,Restaurant,Vietnamese Restaurant,American Restaurant,Asian Restaurant,Sushi Restaurant,Japanese Restaurant,Indian Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Fast Food Restaurant
7,Auburn Bay,Sushi Restaurant,Asian Restaurant,Portuguese Restaurant,Restaurant,Seafood Restaurant,Japanese Restaurant,American Restaurant,Brazilian Restaurant,Fast Food Restaurant,Indonesian Restaurant
8,Aurora Business Park,Fast Food Restaurant,Vietnamese Restaurant,Chinese Restaurant,Italian Restaurant,American Restaurant,Restaurant,Japanese Restaurant,Indian Restaurant,Hong Kong Restaurant,Mediterranean Restaurant
9,Banff Trail,Fast Food Restaurant,American Restaurant,Japanese Restaurant,Vietnamese Restaurant,Restaurant,Asian Restaurant,Chinese Restaurant,Greek Restaurant,Indian Restaurant,Mediterranean Restaurant


## Clustering Neighborhoods

Cluster Neighborhoods and Examine Clusters First, let's determine the optimal value of K for our dataset using the Silhouette Coefficient Method



In [34]:
calgary_grouped_clustering = calgary_grouped.drop('Neighborhood', 1)

In [35]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

In [36]:
for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(calgary_grouped_clustering)
    label = kmeans.labels_
    sil_coeff = silhouette_score(calgary_grouped_clustering, label, metric='euclidean')
    print("For n_clusters={}, The Silhouette Coefficient is {}".format(n_cluster, sil_coeff))

For n_clusters=2, The Silhouette Coefficient is 0.21443545696428873
For n_clusters=3, The Silhouette Coefficient is 0.23993015863774705
For n_clusters=4, The Silhouette Coefficient is 0.21409888656087017
For n_clusters=5, The Silhouette Coefficient is 0.27537013718470194
For n_clusters=6, The Silhouette Coefficient is 0.28362446784890355
For n_clusters=7, The Silhouette Coefficient is 0.18830570359951626
For n_clusters=8, The Silhouette Coefficient is 0.18215019844437266
For n_clusters=9, The Silhouette Coefficient is 0.20450874859279705


As we can see, n_clusters=8 has highest Silhouette Coefficient. This means that 7 should be the optimal number of clusters. For n_clusters=7, The Silhouette Coefficient is 0.30171207432377817

Run k-means to cluster the neighborhood into 7 clusters.

In [37]:
%matplotlib inline
# set number of clusters
kclusters = 7

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(calgary_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [38]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

calgary_merged = calgary_df

# merge toronto_grouped with calgary_data to add latitude/longitude for each neighborhood
calgary_merged = calgary_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
calgary_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbeydale,"Northeast, Calgary",51.058836,-113.929413,1.0,Fast Food Restaurant,Vietnamese Restaurant,Italian Restaurant,Asian Restaurant,Chinese Restaurant,Restaurant,Filipino Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant
1,Acadia,"South, Calgary",50.968655,-114.055587,1.0,Fast Food Restaurant,Vietnamese Restaurant,Sushi Restaurant,American Restaurant,Restaurant,Italian Restaurant,Mexican Restaurant,Asian Restaurant,Chinese Restaurant,Greek Restaurant
2,Albert Park/Radisson Heights,"East, Calgary",51.044845,-113.990195,1.0,Vietnamese Restaurant,Fast Food Restaurant,Restaurant,Asian Restaurant,Indian Restaurant,Italian Restaurant,American Restaurant,Chinese Restaurant,Falafel Restaurant,Korean Restaurant
3,Altadore,"Centre, Calgary",51.015104,-114.100756,1.0,Vietnamese Restaurant,Restaurant,Mexican Restaurant,Fast Food Restaurant,French Restaurant,Greek Restaurant,American Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant
4,Alyth/Bonnybrook,"Centre, Calgary",51.016669,-114.024294,1.0,Restaurant,Fast Food Restaurant,Vietnamese Restaurant,American Restaurant,Chinese Restaurant,Eastern European Restaurant,Mediterranean Restaurant,French Restaurant,Japanese Restaurant,Empanada Restaurant


In [39]:
calgary_merged['Cluster Labels'] = calgary_merged['Cluster Labels'].fillna(0)
calgary_merged['Cluster Labels'] =   calgary_merged['Cluster Labels'].astype(int)
calgary_merged.head()

Unnamed: 0,Neighborhood,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbeydale,"Northeast, Calgary",51.058836,-113.929413,1,Fast Food Restaurant,Vietnamese Restaurant,Italian Restaurant,Asian Restaurant,Chinese Restaurant,Restaurant,Filipino Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant
1,Acadia,"South, Calgary",50.968655,-114.055587,1,Fast Food Restaurant,Vietnamese Restaurant,Sushi Restaurant,American Restaurant,Restaurant,Italian Restaurant,Mexican Restaurant,Asian Restaurant,Chinese Restaurant,Greek Restaurant
2,Albert Park/Radisson Heights,"East, Calgary",51.044845,-113.990195,1,Vietnamese Restaurant,Fast Food Restaurant,Restaurant,Asian Restaurant,Indian Restaurant,Italian Restaurant,American Restaurant,Chinese Restaurant,Falafel Restaurant,Korean Restaurant
3,Altadore,"Centre, Calgary",51.015104,-114.100756,1,Vietnamese Restaurant,Restaurant,Mexican Restaurant,Fast Food Restaurant,French Restaurant,Greek Restaurant,American Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant
4,Alyth/Bonnybrook,"Centre, Calgary",51.016669,-114.024294,1,Restaurant,Fast Food Restaurant,Vietnamese Restaurant,American Restaurant,Chinese Restaurant,Eastern European Restaurant,Mediterranean Restaurant,French Restaurant,Japanese Restaurant,Empanada Restaurant


In [48]:
# create map
map_clusters = folium.Map(location=[clat, clng], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(calgary_merged['Latitude'], calgary_merged['Longitude'], calgary_merged['Neighborhood'], calgary_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining the Clusters

In [49]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 0, calgary_merged.columns[[0] + list(range(8, calgary_merged.shape[1]))]].dropna()

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Chinatown,Falafel Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant
70,Elboya,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
74,Evergreen,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
85,Glenbrook,Falafel Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant
92,Greenview Industrial Park,Falafel Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
126,Mayland,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
137,Montgomery,Falafel Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
198,Scarboro/Sunalta West,Ethiopian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
202,Shaganappi,Vietnamese Restaurant,Falafel Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant


In [50]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 1, calgary_merged.columns[[0] + list(range(8, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbeydale,Asian Restaurant,Chinese Restaurant,Restaurant,Filipino Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant
1,Acadia,American Restaurant,Restaurant,Italian Restaurant,Mexican Restaurant,Asian Restaurant,Chinese Restaurant,Greek Restaurant
2,Albert Park/Radisson Heights,Asian Restaurant,Indian Restaurant,Italian Restaurant,American Restaurant,Chinese Restaurant,Falafel Restaurant,Korean Restaurant
3,Altadore,Fast Food Restaurant,French Restaurant,Greek Restaurant,American Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant
4,Alyth/Bonnybrook,American Restaurant,Chinese Restaurant,Eastern European Restaurant,Mediterranean Restaurant,French Restaurant,Japanese Restaurant,Empanada Restaurant
6,Arbour Lake,Japanese Restaurant,Greek Restaurant,Mexican Restaurant,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant
7,Aspen Woods,Asian Restaurant,Sushi Restaurant,Japanese Restaurant,Indian Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Fast Food Restaurant
8,Auburn Bay,Restaurant,Seafood Restaurant,Japanese Restaurant,American Restaurant,Brazilian Restaurant,Fast Food Restaurant,Indonesian Restaurant
9,Aurora Business Park,Italian Restaurant,American Restaurant,Restaurant,Japanese Restaurant,Indian Restaurant,Hong Kong Restaurant,Mediterranean Restaurant
10,Banff Trail,Vietnamese Restaurant,Restaurant,Asian Restaurant,Chinese Restaurant,Greek Restaurant,Indian Restaurant,Mediterranean Restaurant


In [51]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 2, calgary_merged.columns[[0] + list(range(8, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Canada Olympic Park,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Gluten-free Restaurant,French Restaurant
57,Discovery Ridge,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Gluten-free Restaurant,French Restaurant
68,Edgemont,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant
186,Rosemont,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Gluten-free Restaurant,French Restaurant


In [52]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 3, calgary_merged.columns[[0] + list(range(8, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,Collingwood,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
52,Dalhousie,Fast Food Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant
112,Lake Bonavista,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
128,McCall,Fast Food Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant
254,Winston Heights/Mountview,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant


In [53]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 4, calgary_merged.columns[[0] + list(range(8, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
43,Copperfield,Vietnamese Restaurant,Kebab Restaurant,Indonesian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
51,Crestmont,Falafel Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
67,Eau Claire,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant
100,Highland Park,Falafel Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
158,Point Mckay,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant,French Restaurant


In [54]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 5, calgary_merged.columns[[0] + list(range(8, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
44,Coral Springs,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
93,Greenwood/Greenbriar,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
184,Rocky Ridge,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant


In [55]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 6, calgary_merged.columns[[0] + list(range(8, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
65,East Shepard Industrial,Ethiopian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
79,Foothills,Falafel Restaurant,Italian Restaurant,Indonesian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
151,Parkdale,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
241,Valley Ridge,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Greek Restaurant,Gluten-free Restaurant
