# Capstone Project : <i>Battle of the Neighbourhoods</i>
## Analyis of Demographics and Neighborhood of Calgary, Alberta.

<div class = "alert alert-block alert-info">
    <b>Note : In case you find any libraries missing please use the commands below to install them</b>
</div>
   

In [None]:
#!pip install geopy
#!pip install folium
#!pip install kmeans
#!pip install matplotlib

<div class = "alert alert-block alert-info">
    <b>Importing all the required libraries first</b>
</div>
   

In [1]:
import pandas as pd
# library for data analsysis

import numpy as np
# library to handle data in a vectorized manner

import requests
# library to handle requests

from pandas.io.json import json_normalize
# tranform JSON file into a pandas dataframe

from sklearn.cluster import KMeans
# import k-means from clustering stage

import folium# map rendering library
from folium import plugins
from folium.plugins import HeatMap

from matplotlib import cm
from matplotlib import colors

import geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
print("All required libraries imported!")

All required libraries imported!


<div class = "alert alert-block alert-info">
    <b>Below are the commands to change display for large tables in notebook</b>
</div>
   

In [2]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

<div class = "alert alert-block alert-info">
    <b>Scraping data from wikipedia webpage containing list of neighbourhoods in Calgary, Alberta</b> : 
       <a href>https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Calgary</a><Br>
       We will use pandas to scrape the table from the webpage</b>
</div>
   

In [3]:
webpage_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Calgary')

In [4]:
# This is the data that we have obtained from scraping the webpage
webpage_data

[                          Name[10]               Quadrant  \
 0                        Abbeydale                  NE/SE   
 1                           Acadia                     SE   
 2     Albert Park/Radisson Heights                     SE   
 3                         Altadore                     SW   
 4                 Alyth/Bonnybrook                     SE   
 5                   Applewood Park                  SE/NE   
 6                      Arbour Lake                     NW   
 7                      Aspen Woods                     SW   
 8                       Auburn Bay                     SE   
 9             Aurora Business Park                     NE   
 10                     Banff Trail                     NW   
 11                        Bankview                     SW   
 12                         Bayview                     SW   
 13              Beddington Heights                  NW/NE   
 14                        Bel-Aire                     SW   
 15     

In [5]:
# Reading our desired table into dataframe called df . This dataframe shall be used throughout the project.
df = webpage_data[0]

In [6]:
# Using .head() function let's see how our dataframe looks
df.head()

Unnamed: 0,Name[10],Quadrant,Sector[11],Ward[12],Type[11],2012 PopulationRank,Population(2012)[10],Population(2011)[10],% change,Dwellings(2012)[10],Area(km2)[11],Populationdensity
0,Abbeydale,NE/SE,Northeast,10,Residential,82,5917.0,5700.0,3.8,2023.0,1.7,3480.6
1,Acadia,SE,South,9,Residential,27,10705.0,10615.0,0.8,5053.0,3.9,2744.9
2,Albert Park/Radisson Heights,SE,East,10,Residential,75,6234.0,6217.0,0.3,2709.0,2.5,2493.6
3,Altadore,SW,Centre,11,Residential,39,9116.0,8907.0,2.3,4486.0,2.9,3143.4
4,Alyth/Bonnybrook,SE,Centre,9,Industrial,208,16.0,17.0,−5.9,14.0,3.8,4.2


In [7]:
df.shape

(258, 12)

<div class = "alert alert-block alert-success">
    <b>Data successfully extracted!</b>
</div>
   

## Data Cleaning and exploration

<div class = "alert alert-block alert-info">
    <b>Now, since we have imported our table lets start with cleaning and exploration of data </b> : 
</div>
   

<div class = "alert alert-block alert-warning">
    <b>We start by removing columns unnecessary for our work</b> : 
</div>

In [8]:
remove_columns = ['Quadrant','Sector[11]','Ward[12]','Type[11]','2012 PopulationRank','Population(2012)[10]','Population(2011)[10]','% change','Dwellings(2012)[10]','Area(km2)[11]']
df = df.drop(remove_columns, axis = 1)

In [9]:
# How our dataframe look after removing the columns
df.head()

Unnamed: 0,Name[10],Populationdensity
0,Abbeydale,3480.6
1,Acadia,2744.9
2,Albert Park/Radisson Heights,2493.6
3,Altadore,3143.4
4,Alyth/Bonnybrook,4.2


<div class = "alert alert-block alert-warning">
    <b>We will rename both columns for our convenience</b> : 
</div>

In [10]:
df.rename(columns = {'Name[10]':'Neighborhood','Populationdensity':'PopDensity'},inplace = True)

In [11]:
# How our dataframe looks now
df.head()

Unnamed: 0,Neighborhood,PopDensity
0,Abbeydale,3480.6
1,Acadia,2744.9
2,Albert Park/Radisson Heights,2493.6
3,Altadore,3143.4
4,Alyth/Bonnybrook,4.2


In [12]:
# checking out the bottom of the table
df.tail()

Unnamed: 0,Neighborhood,PopDensity
253,Windsor Park,3173.8
254,Winston Heights/Mountview,1297.0
255,Woodbine,2853.4
256,Woodlands,2214.6
257,Total City of Calgary,1320.7


In [13]:
# Since we don't need total count for Calgary we will remove the last row

df.drop(df.tail(1).index,inplace = True) 
df.tail()

Unnamed: 0,Neighborhood,PopDensity
252,Willow Park,1537.9
253,Windsor Park,3173.8
254,Winston Heights/Mountview,1297.0
255,Woodbine,2853.4
256,Woodlands,2214.6


<div class = "alert alert-block alert-warning">
    <b>Let's check if we have some duplicate entries for neighbourhoods?</b> : 
</div>

In [14]:
duplicates = df.pivot_table(index = ['Neighborhood'], aggfunc ='size')
print(duplicates)

Neighborhood
Abbeydale                        1
Acadia                           1
Albert Park/Radisson Heights     1
Altadore                         1
Alyth/Bonnybrook                 1
Applewood Park                   1
Arbour Lake                      1
Aspen Woods                      1
Auburn Bay                       1
Aurora Business Park             1
Banff Trail                      1
Bankview                         1
Bayview                          1
Beddington Heights               1
Bel-Aire                         1
Beltline                         1
Bonavista Downs                  1
Bowness                          1
Braeside                         1
Brentwood                        1
Bridgeland/Riverside             1
Bridlewood                       1
Britannia                        1
Burns Industrial                 1
CFB Currie                       1
CFB Lincoln Park PMQ             1
Calgary International Airport    1
Cambrian Heights                 1
Canada 

In [15]:
print("No duplicates found!")

No duplicates found!


<div class = "alert alert-block alert-warning">
    <b>Upon exploring the Population density column we found some unusual entries : "—" and "0". Let's remove those first </b>
</div>

In [16]:
indexNames = df[(df['PopDensity'] == '0') | (df['PopDensity'] == '—')].index
df.drop(indexNames, inplace=True)

In [17]:
print(df)

                      Neighborhood PopDensity
0                        Abbeydale     3480.6
1                           Acadia     2744.9
2     Albert Park/Radisson Heights     2493.6
3                         Altadore     3143.4
4                 Alyth/Bonnybrook        4.2
5                   Applewood Park     4061.3
6                      Arbour Lake     2462.7
7                      Aspen Woods     1387.1
8                       Auburn Bay     1598.4
10                     Banff Trail       2558
11                        Bankview     7458.6
12                         Bayview       1705
13              Beddington Heights     3620.3
14                        Bel-Aire     1413.3
15                        Beltline     6786.6
16                 Bonavista Downs       1850
17                         Bowness     1966.4
18                        Braeside       2970
19                       Brentwood     2089.3
20            Bridgeland/Riverside     1804.5
21                      Bridlewood

In [18]:
#Let's check data types for both the columns
df.dtypes

Neighborhood    object
PopDensity      object
dtype: object

<div class = "alert alert-block alert-warning">
    <b>It will be beneficial for us to chage both the column data types to String and float repectively </b>
</div>

In [19]:
df.Neighborhood=df.Neighborhood.astype('string')
df['PopDensity']=pd.to_numeric(df['PopDensity'])
df.dtypes

Neighborhood     string
PopDensity      float64
dtype: object

<div class = "alert alert-block alert-warning">
    <b>Let's check for NaN values</b>
</div>

In [20]:
total_nan = df['PopDensity'].isnull().values.sum()
print('Number of NaN values present: ' + str(total_nan))

Number of NaN values present: 2


In [21]:
# We need to remove these Null Values as they might cause issues during
df = df.dropna()

In [22]:
count_nan = df['PopDensity'].isnull().values.sum()
print('Number of NaN values present after .dropna(): ' + str(count_nan))

Number of NaN values present after .dropna(): 0


<div class = "alert alert-block alert-warning">
    <b>Let's sort the dataframe by decreasing Population Density</b>
</div>

In [23]:
df.sort_values(by=['PopDensity'], inplace=True, ascending=False)
df

Unnamed: 0,Neighborhood,PopDensity
116,Lower Mount Royal,10600.0
135,Mission,8650.0
35,Chinatown,7885.0
11,Bankview,7458.6
15,Beltline,6786.6
60,Downtown Commercial Core,6165.4
233,Taradale,5807.2
62,Downtown West End,5805.0
61,Downtown East Village,5564.0
29,Castleridge,5080.0


In [24]:
# Resetting index

df.reset_index(inplace = True)
df

Unnamed: 0,index,Neighborhood,PopDensity
0,116,Lower Mount Royal,10600.0
1,135,Mission,8650.0
2,35,Chinatown,7885.0
3,11,Bankview,7458.6
4,15,Beltline,6786.6
5,60,Downtown Commercial Core,6165.4
6,233,Taradale,5807.2
7,62,Downtown West End,5805.0
8,61,Downtown East Village,5564.0
9,29,Castleridge,5080.0


In [25]:
# A new column containing arranged indexes was created, so we will drop the old index column

df = df.drop('index',axis = 1)
df

Unnamed: 0,Neighborhood,PopDensity
0,Lower Mount Royal,10600.0
1,Mission,8650.0
2,Chinatown,7885.0
3,Bankview,7458.6
4,Beltline,6786.6
5,Downtown Commercial Core,6165.4
6,Taradale,5807.2
7,Downtown West End,5805.0
8,Downtown East Village,5564.0
9,Castleridge,5080.0


<div class = "alert alert-block alert-success">
    <b>Our cleaning and exploration is complete! Let's check the final shape of columns after cleaning</b>
</div>

In [26]:
df.shape

(198, 2)

<div class = "alert alert-block alert-info">
    <b>Now, we will merge latitude and longitude of each neighbourhood into the column. For this we will use Nominatim module from geopy</b>
</div>

In [27]:
# Creating a nominatim object

nom = Nominatim(user_agent="Calgary_Explorer")

<div class = "alert alert-block alert-warning">
    <b>Before merging the coordinates three destinations in our dataframe were found to have no co-ordinates . We will remove them first</b>
</div>

In [28]:
df = df[df.Neighborhood != 'CFB Lincoln Park PMQ']
df = df[df.Neighborhood != 'Douglasdale/Glen']
df = df[df.Neighborhood != 'CFB Currie']
print("Removed CFB Lincoln Park PMQ, Douglasdale/Glen, CFB Currie from the dataframe" )

Removed CFB Lincoln Park PMQ, Douglasdale/Glen, CFB Currie from the dataframe


<div class = "alert alert-block alert-warning">
    <b>Here we will need to reset index of our columns since three items have been deleted</b>
</div>

In [29]:
# Resetting index

df.reset_index(inplace = True)
df

Unnamed: 0,index,Neighborhood,PopDensity
0,0,Lower Mount Royal,10600.0
1,1,Mission,8650.0
2,2,Chinatown,7885.0
3,3,Bankview,7458.6
4,4,Beltline,6786.6
5,5,Downtown Commercial Core,6165.4
6,6,Taradale,5807.2
7,7,Downtown West End,5805.0
8,8,Downtown East Village,5564.0
9,9,Castleridge,5080.0


In [30]:
df = df.drop('index',axis = 1)
df

Unnamed: 0,Neighborhood,PopDensity
0,Lower Mount Royal,10600.0
1,Mission,8650.0
2,Chinatown,7885.0
3,Bankview,7458.6
4,Beltline,6786.6
5,Downtown Commercial Core,6165.4
6,Taradale,5807.2
7,Downtown West End,5805.0
8,Downtown East Village,5564.0
9,Castleridge,5080.0


<div class = "alert alert-block alert-warning">
    <b>Now let's start by first getting co-ordinates for Calgary followed by getting co-ordiates for each neighborhood and merging it into out datframe</b>
</div>

In [31]:
Calgary = nom.geocode("Calgary,Alberta")

In [32]:
print(Calgary)

Calgary, Strathmore (town), Alberta, Canada


In [33]:
print("Latitude for Calgary  : ",Calgary.latitude)
print("Longitude for Calgary : ",Calgary.longitude)

Latitude for Calgary  :  51.0534234
Longitude for Calgary :  -114.0625892


<div class = "alert alert-block alert-warning">
    <b>We will be using Lambda funtion to insert values into columns Latitude and Longitude.Please click the link below to read more about Lambda funtions </b><Br>
     <a href>https://www.w3schools.com/python/python_lambda.asp<a>
</div>

In [34]:
#Create two columns and putting values of neighborhood's coordinates into their respective columns

df['Latitude']  = df.apply(lambda row:(nom.geocode("{},Calgary,Alberta".format(row['Neighborhood']))).latitude,  axis = 1) 
df['Longitude'] = df.apply(lambda row:(nom.geocode("{},Calgary,Alberta".format(row['Neighborhood']))).longitude,  axis = 1)
print("Columns Created!")

Columns Created!


In [35]:
#Let's see how our dataframe looks now
df

Unnamed: 0,Neighborhood,PopDensity,Latitude,Longitude
0,Lower Mount Royal,10600.0,51.036645,-114.087139
1,Mission,8650.0,51.031758,-114.06672
2,Chinatown,7885.0,51.050654,-114.062611
3,Bankview,7458.6,51.033887,-114.099518
4,Beltline,6786.6,51.040498,-114.072593
5,Downtown Commercial Core,6165.4,51.047378,-114.067199
6,Taradale,5807.2,51.116704,-113.938464
7,Downtown West End,5805.0,51.047554,-114.08342
8,Downtown East Village,5564.0,51.046496,-114.050643
9,Castleridge,5080.0,51.105977,-113.95982


<div class = "alert alert-block alert-success">
    <b>We have successfully inserted Latitude and Longitude into our dataframe! Now, lets check the shape and column names : </b><Br>
</div>

In [36]:
print(df.dtypes)
print(df.columns)


Neighborhood     string
PopDensity      float64
Latitude        float64
Longitude       float64
dtype: object
Index(['Neighborhood', 'PopDensity', 'Latitude', 'Longitude'], dtype='object')


<div class = "alert alert-block alert-info">
    <b>For the next part we will be visualizing the work we have been doing until now. This will include :<Br>
        1.The locations we will explore<Br>
        2.A heatmap of populaion distribution</b>
</div>

In [37]:
print('The geograpical coordinates of Calgary are {}, {}.'.format(Calgary.latitude,Calgary.longitude))


The geograpical coordinates of Calgary are 51.0534234, -114.0625892.


<div class = "alert alert-block alert-warning">
    <b>These are the Neighborhoods we will be exploring in our dataset</b>
</div>

In [38]:
map_calgary = folium.Map(location=[Calgary.latitude, Calgary.longitude], zoom_start=13)

# Add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_calgary)  
    
map_calgary

<div class = "alert alert-block alert-warning">
    <b>Followed by the distribution of population in Calgary city </b>
</div>

In [39]:
calgary_heatmap = folium.Map(location=[Calgary.latitude, Calgary.longitude], zoom_start=11) 

# List comprehension to make out list of lists
heat_data = [[row['Latitude'], 
              row['Longitude'],row['PopDensity']] for index, row in df.iterrows()]

# Plot it on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=15,
        blur=20,
        gradient=None,
        overlay=True).add_to(calgary_heatmap)

# Display the map
calgary_heatmap

# Now we will be exploring the city more with the help of FourSquare API

<div class = "alert alert-block alert-info">
    <b>We need FourSqaure Developer Credentials to get data from API. These credentials are confidential and shall be hidden from anyone who is viewing this project</b>
</div>

In [40]:
# @HIDDEN CELL
CLIENT_ID = 'URWZWGMIHOE2IV4VRZ53QRYSGAWTQWKBDTRPSQT4GI4BDV5C' # your Foursquare ID
CLIENT_SECRET = 'OPKIQUCHONYPIOWGCWB5EMHUIJCIBZHTGRMP5ERFMMILK31T' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
radius = 500

<div class = "alert alert-block alert-warning">
    <b>We will create a function called getVenues to get the venues for the city</b>
</div>

In [41]:
def getVenues(names, latitudes, longitudes, radius=10000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<div class = "alert alert-block alert-warning">
    <b>We will get venues for Calgary now</b>
</div>

In [42]:
# getting venues for all neighbrhoods in Calgary
calgary_venues = getVenues(names = df['Neighborhood'],
                                  latitudes =  df['Latitude'],
                                  longitudes = df['Longitude'])

Lower Mount Royal
Mission
Chinatown
Bankview
Beltline
Downtown Commercial Core
Taradale
Downtown West End
Downtown East Village
Castleridge
Martindale
Cliff Bungalow
Falconridge
Whitehorn
Evergreen
Rundle
Erin Woods
Penbrooke Meadows
Temple
Forest Heights
South Calgary
Applewood Park
Greenview
Crescent Heights
Spruce Cliff
Coventry Hills
Millrise
Killarney/Glengarry
Citadel
Glenbrook
Somerset
Pineridge
Sunnyside
Eau Claire
Bridlewood
Sunalta
University Heights
MacEwan Glen
Beddington Heights
Panorama Hills
Tuxedo Park
Abbeydale
Sandstone Valley
Kingsland
Harvest Hills
Marlborough Park
McKenzie Towne
Ranchlands
Point Mckay
Palliser
Coral Springs
Forest Lawn
Glamorgan
Rutland Park
Windsor Park
Altadore
Royal Oak
Marlborough
Monterey Park
Hawkwood
Rosscarrock
Hillhurst
Queensland
Coach Hill
Braeside
Cedarbrae
Kelvin Grove
Deer Ridge
Mayland Heights
Capitol Hill
New Brighton
North Haven
Woodbine
Chinook Park
Deer Run
Fairview
Tuscany
McKenzie Lake
Dalhousie
Huntington Hills
Christie Park
A

In [43]:
#This is how our dataframe looks

print(calgary_venues.shape)
calgary_venues.head()

(5844, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lower Mount Royal,51.036645,-114.087139,Galaxie Diner,51.03973,-114.089104,Diner
1,Lower Mount Royal,51.036645,-114.087139,Una Pizza + Wine,51.037922,-114.075496,Pizza Place
2,Lower Mount Royal,51.036645,-114.087139,Myhre's Deli,51.039767,-114.089195,Deli / Bodega
3,Lower Mount Royal,51.036645,-114.087139,Blanco Cantina,51.037716,-114.078657,Restaurant
4,Lower Mount Royal,51.036645,-114.087139,Gaga Pizzeria,51.042193,-114.090685,Pizza Place


<div class = "alert alert-block alert-warning">
    <b>Grouping Neighborhoods</b>
</div>

In [44]:
calgary_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbeydale,30,30,30,30,30,30
Acadia,30,30,30,30,30,30
Albert Park/Radisson Heights,30,30,30,30,30,30
Altadore,30,30,30,30,30,30
Alyth/Bonnybrook,30,30,30,30,30,30
Applewood Park,30,30,30,30,30,30
Arbour Lake,30,30,30,30,30,30
Aspen Woods,30,30,30,30,30,30
Auburn Bay,30,30,30,30,30,30
Banff Trail,30,30,30,30,30,30


<div class = "alert alert-block alert-warning">
    <b>No. of Unique Categories</b>
</div>

In [45]:
print('There are {} uniques categories.'.format(len(calgary_venues['Venue Category'].unique())))

There are 104 uniques categories.


<div class = "alert alert-block alert-warning">
    <b>Using One Hot Encoding to analyze each neighborhood</b>
</div>

In [46]:
calgary_onehot = pd.get_dummies(calgary_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
calgary_onehot['Neighborhood'] = calgary_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [calgary_onehot.columns[-1]] + list(calgary_onehot.columns[:-1])
calgary_onehot = calgary_onehot[fixed_columns]

calgary_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beer Garden,Beer Store,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Electronics Store,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food & Drink Shop,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Library,Liquor Store,Lounge,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Museum,Music Store,Music Venue,New American Restaurant,Noodle House,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Portuguese Restaurant,Pub,Racetrack,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shopping Mall,Skating Rink,Ski Area,Sporting Goods Shop,Sports Bar,Stables,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Theater,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Wine Shop,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,Lower Mount Royal,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Lower Mount Royal,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Lower Mount Royal,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Lower Mount Royal,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Lower Mount Royal,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [47]:
calgary_onehot.shape

(5844, 105)

<div class = "alert alert-block alert-warning">
    <b>Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category</b>
</div>

In [48]:
calgary_grouped = calgary_onehot.groupby('Neighborhood').mean().reset_index()
calgary_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beer Garden,Beer Store,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Electronics Store,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food & Drink Shop,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Library,Liquor Store,Lounge,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Museum,Music Store,Music Venue,New American Restaurant,Noodle House,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Portuguese Restaurant,Pub,Racetrack,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shopping Mall,Skating Rink,Ski Area,Sporting Goods Shop,Sports Bar,Stables,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Theater,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Wine Shop,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,Abbeydale,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.033333,0.1,0.0,0.0,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.1,0.033333,0.0,0.0,0.0,0.033333,0.0
1,Acadia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.033333,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0
2,Albert Park/Radisson Heights,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.033333,0.0,0.0,0.0,0.1,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.033333
3,Altadore,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alyth/Bonnybrook,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.066667,0.0,0.1,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0
5,Applewood Park,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.033333,0.1,0.0,0.0,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.066667,0.033333,0.0,0.0,0.0,0.033333,0.0
6,Arbour Lake,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.1,0.033333,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.0,0.0,0.033333,0.0,0.0
7,Aspen Woods,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.133333,0.0,0.033333,0.0,0.0,0.0,0.066667,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Auburn Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.033333,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0
9,Banff Trail,0.0,0.0,0.0,0.0,0.066667,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0



<div class = "alert alert-block alert-warning">
    <b>Let's print each neighborhood along with the top 5 most common venues</b>
</div>



In [49]:
num_top_venues = 5

for hood in calgary_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = calgary_grouped[calgary_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abbeydale----
                   venue  freq
0            Coffee Shop  0.10
1  Vietnamese Restaurant  0.10
2                Brewery  0.10
3                Exhibit  0.07
4          Deli / Bodega  0.07


----Acadia----
                venue  freq
0          Restaurant  0.10
1                Park  0.10
2       Shopping Mall  0.07
3      Farmers Market  0.07
4  Italian Restaurant  0.03


----Albert Park/Radisson Heights----
              venue  freq
0           Brewery  0.13
1           Exhibit  0.10
2     Deli / Bodega  0.10
3       Coffee Shop  0.10
4  Sushi Restaurant  0.03


----Altadore----
            venue  freq
0      Restaurant  0.17
1             Pub  0.07
2     Pizza Place  0.07
3          Bakery  0.07
4  Ice Cream Shop  0.07


----Alyth/Bonnybrook----
            venue  freq
0      Restaurant  0.10
1     Coffee Shop  0.10
2         Brewery  0.10
3  Farmers Market  0.07
4         Exhibit  0.07


----Applewood Park----
                   venue  freq
0            Coffee Shop  

In [50]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


<div class = "alert alert-block alert-warning">
    <b>Now let's create the new dataframe and display the top 10 venues for each neighborhood.</b>
</div>



In [51]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = calgary_grouped['Neighborhood']

for ind in np.arange(calgary_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(calgary_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbeydale,Coffee Shop,Vietnamese Restaurant,Brewery,Exhibit,Deli / Bodega,Diner,Salon / Barbershop,Restaurant,Pizza Place,Peruvian Restaurant
1,Acadia,Restaurant,Park,Shopping Mall,Farmers Market,Department Store,Coffee Shop,Food & Drink Shop,Café,Furniture / Home Store,Brewery
2,Albert Park/Radisson Heights,Brewery,Deli / Bodega,Exhibit,Coffee Shop,Zoo,Gym / Fitness Center,Grocery Store,Music Venue,Farmers Market,Falafel Restaurant
3,Altadore,Restaurant,Pub,Bakery,Pizza Place,Ice Cream Shop,Breakfast Spot,Brewery,History Museum,French Restaurant,Coffee Shop
4,Alyth/Bonnybrook,Brewery,Coffee Shop,Restaurant,Café,Exhibit,Deli / Bodega,Farmers Market,Sushi Restaurant,Mexican Restaurant,Racetrack


##  Cluster Neighborhoods



<div class = "alert alert-block alert-warning">
    <b>Run k-means to cluster the neighborhood into 5 clusters</b>
</div>

In [52]:
# set number of clusters
kclusters = 5

calgary_grouped_clustering = calgary_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(calgary_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 4, 1, 4, 4, 3, 0, 2, 0], dtype=int32)

In [53]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'ClusterLabels', kmeans.labels_)

calgary_merged = df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
calgary_merged = calgary_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

calgary_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,PopDensity,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Lower Mount Royal,10600.0,51.036645,-114.087139,1,Restaurant,Pizza Place,Deli / Bodega,Pub,Bakery,Sporting Goods Shop,Steakhouse,Park,Mexican Restaurant,Sandwich Place
1,Mission,8650.0,51.031758,-114.06672,1,Restaurant,Performing Arts Venue,Pub,Pizza Place,Deli / Bodega,French Restaurant,Yoga Studio,Coffee Shop,Sandwich Place,Gym / Fitness Center
2,Chinatown,7885.0,51.050654,-114.062611,1,Restaurant,Performing Arts Venue,Café,Diner,Pizza Place,Sporting Goods Shop,Deli / Bodega,Park,Mexican Restaurant,Pharmacy
3,Bankview,7458.6,51.033887,-114.099518,1,Restaurant,Pizza Place,French Restaurant,Deli / Bodega,Bakery,Sporting Goods Shop,Korean Restaurant,Brewery,Pub,Sandwich Place
4,Beltline,6786.6,51.040498,-114.072593,1,Restaurant,Park,Performing Arts Venue,Bakery,Pizza Place,Pub,Coffee Shop,Breakfast Spot,Diner,Sporting Goods Shop


<div class = "alert alert-block alert-warning">
    <b>Finally, let's visualize the resulting clusters</b>
</div>

In [54]:
# create map
map_clusters = folium.Map(location=[Calgary.latitude, Calgary.longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(calgary_merged['Latitude'], calgary_merged['Longitude'], calgary_merged['Neighborhood'], calgary_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<div class = "alert alert-block alert-success">
    <b>Finally, we have created clusters of venues in neighborhood</b>
</div>

<div class = "alert alert-block alert-warning">
    <b>Now, we can determine the discriminating venue categories that distinguish each cluster</b>
</div>

<div class = "alert alert-block alert-warning">
    <b>Cluster 1</b>
</div>

In [57]:
calgary_merged.loc[calgary_merged['ClusterLabels'] == 0, calgary_merged.columns[[0] + [1] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,PopDensity,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Spruce Cliff,3895.5,Restaurant,Café,Park,Pizza Place,BBQ Joint,Bakery,Korean Restaurant,Diner,Dessert Shop,Deli / Bodega
27,Killarney/Glengarry,3786.7,Park,Restaurant,Korean Restaurant,Bakery,Pizza Place,Dog Run,Sandwich Place,Deli / Bodega,Dessert Shop,Diner
29,Glenbrook,3713.7,Park,Coffee Shop,Pizza Place,Bakery,Korean Restaurant,Breakfast Spot,Dog Run,Diner,Dessert Shop,Ice Cream Shop
36,University Heights,3660.0,Café,Park,BBQ Joint,Korean Restaurant,Deli / Bodega,Dessert Shop,Shopping Mall,Sandwich Place,Concert Hall,Restaurant
48,Point Mckay,3295.0,Park,Café,BBQ Joint,Bakery,Korean Restaurant,Diner,Dessert Shop,Pizza Place,Hotel,Restaurant
52,Glamorgan,3188.0,Park,Coffee Shop,History Museum,Pizza Place,Korean Restaurant,Ice Cream Shop,Grocery Store,Dog Run,Pub,Deli / Bodega
53,Rutland Park,3181.4,Park,Korean Restaurant,Pizza Place,Bakery,Ice Cream Shop,Lake,Gym / Fitness Center,Grocery Store,French Restaurant,Sporting Goods Shop
60,Rosscarrock,3050.9,Café,Pizza Place,BBQ Joint,Bakery,Korean Restaurant,Restaurant,Pub,Diner,Dessert Shop,Deli / Bodega
63,Coach Hill,2980.9,Café,Gym / Fitness Center,Restaurant,Coffee Shop,Skating Rink,Ski Area,Park,Dessert Shop,Pharmacy,Hotel
69,Capitol Hill,2867.9,Park,Diner,Pizza Place,BBQ Joint,Italian Restaurant,Café,Hotel,Breakfast Spot,Dessert Shop,Deli / Bodega


<div class = "alert alert-block alert-warning">
    <b>Cluster 2</b>
</div>

In [58]:
calgary_merged.loc[calgary_merged['ClusterLabels'] == 1, calgary_merged.columns[[0] + [1] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,PopDensity,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Lower Mount Royal,10600.0,Restaurant,Pizza Place,Deli / Bodega,Pub,Bakery,Sporting Goods Shop,Steakhouse,Park,Mexican Restaurant,Sandwich Place
1,Mission,8650.0,Restaurant,Performing Arts Venue,Pub,Pizza Place,Deli / Bodega,French Restaurant,Yoga Studio,Coffee Shop,Sandwich Place,Gym / Fitness Center
2,Chinatown,7885.0,Restaurant,Performing Arts Venue,Café,Diner,Pizza Place,Sporting Goods Shop,Deli / Bodega,Park,Mexican Restaurant,Pharmacy
3,Bankview,7458.6,Restaurant,Pizza Place,French Restaurant,Deli / Bodega,Bakery,Sporting Goods Shop,Korean Restaurant,Brewery,Pub,Sandwich Place
4,Beltline,6786.6,Restaurant,Park,Performing Arts Venue,Bakery,Pizza Place,Pub,Coffee Shop,Breakfast Spot,Diner,Sporting Goods Shop
5,Downtown Commercial Core,6165.4,Restaurant,Bakery,Performing Arts Venue,Café,Lounge,Scenic Lookout,Sandwich Place,Coffee Shop,Pub,Pizza Place
7,Downtown West End,5805.0,Restaurant,Pizza Place,Pub,Bakery,Deli / Bodega,Steakhouse,Sporting Goods Shop,Sandwich Place,Café,Brewery
8,Downtown East Village,5564.0,Restaurant,Deli / Bodega,Diner,Performing Arts Venue,Coffee Shop,Steakhouse,Music Venue,Park,Pharmacy,Museum
11,Cliff Bungalow,4840.0,Restaurant,Pizza Place,Park,Deli / Bodega,Pub,Ice Cream Shop,Steakhouse,Breakfast Spot,Diner,Sporting Goods Shop
20,South Calgary,4108.9,Restaurant,Pub,Bakery,French Restaurant,Pizza Place,Ice Cream Shop,Breakfast Spot,Asian Restaurant,Mexican Restaurant,Sandwich Place


<div class = "alert alert-block alert-warning">
    <b>Cluster 3</b>
</div>

In [59]:
calgary_merged.loc[calgary_merged['ClusterLabels'] == 2, calgary_merged.columns[[0] + [1] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,PopDensity,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Evergreen,4371.7,Park,Sushi Restaurant,Restaurant,Grocery Store,Pizza Place,State / Provincial Park,Chinese Restaurant,Farmers Market,Pharmacy,Pet Store
26,Millrise,3828.9,Restaurant,Park,Farmers Market,Grocery Store,Sushi Restaurant,History Museum,Beer Store,Pet Store,Music Venue,Food & Drink Shop
30,Somerset,3713.0,Grocery Store,Park,Sushi Restaurant,Restaurant,Chinese Restaurant,Pub,Pharmacy,Pet Store,Music Venue,Bookstore
34,Bridlewood,3698.4,Park,Restaurant,Sushi Restaurant,Pizza Place,Pharmacy,Shopping Mall,Chinese Restaurant,Pub,Coffee Shop,Pet Store
43,Kingsland,3471.5,Park,Restaurant,History Museum,Bakery,Pub,Shopping Mall,Coffee Shop,Cosmetics Shop,Department Store,Dog Run
46,McKenzie Towne,3354.6,Restaurant,Sushi Restaurant,Breakfast Spot,Bookstore,Pharmacy,Pet Store,Coffee Shop,Fast Food Restaurant,Café,Burger Joint
49,Palliser,3293.0,Park,Restaurant,History Museum,Bakery,Ice Cream Shop,State / Provincial Park,Farmers Market,Department Store,Coffee Shop,Café
62,Queensland,2981.9,Restaurant,Farmers Market,Bookstore,Park,Pub,Chinese Restaurant,Food & Drink Shop,Shopping Mall,Brewery,Pharmacy
64,Braeside,2970.0,Park,History Museum,Sushi Restaurant,Farmers Market,Restaurant,Music Venue,State / Provincial Park,Café,Ice Cream Shop,Shopping Mall
65,Cedarbrae,2970.0,Park,Farmers Market,History Museum,Restaurant,Sushi Restaurant,Ice Cream Shop,State / Provincial Park,Department Store,Coffee Shop,Café


<div class = "alert alert-block alert-warning">
    <b>Cluster 4</b>
</div>

In [60]:
calgary_merged.loc[calgary_merged['ClusterLabels'] == 3, calgary_merged.columns[[0] + [1] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,PopDensity,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Greenview,3950.0,Italian Restaurant,Coffee Shop,Brewery,Vietnamese Restaurant,Park,Diner,Japanese Restaurant,Pharmacy,Pet Store,Noodle House
25,Coventry Hills,3894.9,Sporting Goods Shop,Vietnamese Restaurant,Hotel,Gym,Japanese Restaurant,Golf Course,Paper / Office Supplies Store,Pizza Place,Movie Theater,Middle Eastern Restaurant
28,Citadel,3776.7,Coffee Shop,Vietnamese Restaurant,Burger Joint,Sushi Restaurant,Japanese Restaurant,Liquor Store,Electronics Store,New American Restaurant,Music Store,Food & Drink Shop
37,MacEwan Glen,3642.9,Park,Vietnamese Restaurant,Gym,Sushi Restaurant,Liquor Store,Breakfast Spot,Music Store,Pizza Place,Movie Theater,Middle Eastern Restaurant
38,Beddington Heights,3620.3,Vietnamese Restaurant,Italian Restaurant,Park,Sushi Restaurant,Golf Course,Pet Store,Pizza Place,Movie Theater,Middle Eastern Restaurant,Frozen Yogurt Shop
39,Panorama Hills,3531.3,Vietnamese Restaurant,Gym,Grocery Store,Sporting Goods Shop,Japanese Restaurant,Golf Course,Pizza Place,Movie Theater,Middle Eastern Restaurant,Mexican Restaurant
40,Tuxedo Park,3516.2,Park,Vietnamese Restaurant,Coffee Shop,Restaurant,Deli / Bodega,Diner,Café,Italian Restaurant,Noodle House,Brewery
42,Sandstone Valley,3473.9,Vietnamese Restaurant,Park,Gym,Sushi Restaurant,Golf Course,Pizza Place,Movie Theater,Middle Eastern Restaurant,Mexican Restaurant,Frozen Yogurt Shop
44,Harvest Hills,3364.5,Italian Restaurant,Vietnamese Restaurant,Sporting Goods Shop,Gym,Park,Hotel,American Restaurant,Breakfast Spot,Grocery Store,Gastropub
47,Ranchlands,3315.7,Park,Skating Rink,Burger Joint,Restaurant,Sushi Restaurant,Ski Area,Shopping Mall,Café,Coffee Shop,Japanese Restaurant


<div class = "alert alert-block alert-warning">
    <b>Cluster 5</b>
</div>

In [63]:
calgary_merged.loc[calgary_merged['ClusterLabels'] == 4, calgary_merged.columns[[0] + [1] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,PopDensity,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Taradale,5807.2,Italian Restaurant,Vietnamese Restaurant,Brewery,Hotel,Grocery Store,Pizza Place,Coffee Shop,Sushi Restaurant,Sporting Goods Shop,Zoo
9,Castleridge,5080.0,Brewery,Italian Restaurant,Pizza Place,Grocery Store,Coffee Shop,Vietnamese Restaurant,Sushi Restaurant,Chinese Restaurant,Salon / Barbershop,Deli / Bodega
10,Martindale,5064.4,Italian Restaurant,Brewery,Hotel,Sporting Goods Shop,Grocery Store,Vietnamese Restaurant,Pizza Place,Coffee Shop,Sushi Restaurant,Zoo
12,Falconridge,4718.6,Pizza Place,Vietnamese Restaurant,Brewery,Italian Restaurant,Grocery Store,Coffee Shop,Sushi Restaurant,Chinese Restaurant,Salon / Barbershop,Deli / Bodega
13,Whitehorn,4558.5,Coffee Shop,Vietnamese Restaurant,Exhibit,Brewery,Diner,Grocery Store,Italian Restaurant,Zoo,Portuguese Restaurant,Pizza Place
15,Rundle,4330.0,Coffee Shop,Brewery,Deli / Bodega,Vietnamese Restaurant,Exhibit,Diner,Grocery Store,Korean Restaurant,Zoo,Peruvian Restaurant
16,Erin Woods,4313.1,Brewery,Deli / Bodega,Coffee Shop,Exhibit,Restaurant,Sports Bar,Gym / Fitness Center,Zoo,Movie Theater,Music Venue
17,Penbrooke Meadows,4273.5,Coffee Shop,Brewery,Deli / Bodega,Vietnamese Restaurant,Exhibit,Zoo,Grocery Store,Movie Theater,Falafel Restaurant,Gym / Fitness Center
18,Temple,4190.0,Coffee Shop,Brewery,Grocery Store,Vietnamese Restaurant,Exhibit,Pizza Place,Deli / Bodega,Pharmacy,Chinese Restaurant,Salon / Barbershop
19,Forest Heights,4141.3,Brewery,Coffee Shop,Deli / Bodega,Exhibit,Grocery Store,Farmers Market,Falafel Restaurant,Music Venue,Zoo,Diner


<div class = "alert alert-block alert-success">
    <b>This concludes our project. Thank you!</b>
</div>