## IBM Capstone Project - The Battle of Neighborhoods- The best location for an African restaurant in Calgary, AB Canada

# 1. Introduction
1.1 Description of the Problem

The population of Calgary has grown considerably over the last decades. Calgary is very diverse. The city is expanding, and new communities are being created and built all the time.  As communities increases in Calgary, the number of restaurants is increasing too but not at the same rate as the new communities.

Although there are many fine restaurants in Calgary— Asian, Middle Eastern, Latin and American restaurants but its very hard to find good place to dine in the finest of African cuisine that has combination of Nigerian, Ghanaian, Cameroonian, Senegalese and more.


1.2 Discussion of the Background

Due to the increase of Africans immigrating to Calgary, opening an African restaurant in Calgary right now will be very lucrative.  Calgary demography is so big, the available data for demography, communities and ethnicity will be used in determining the best community a new African restaurant should be planted.

1.3 Target Audience

Calgary has a multicultural sense because of its diversity. As such, there is a shortage in the high-end African-inclined restaurant.  The target audience is broad, it ranges from Africans, Caribbean, Calgarians, tourists, and those who are passionate about African food. 

# 2.2. Data acquisition and cleaning

2.1 Data sources 

The first data I checked is a Wikipedia page that described the demographics of Calgary data that shows the population of different ethnicities’ and the progression over the years - Ethnicity of Calgary.  The second data I used to get Calgary communities and its coordinates with the postal codes is a Wikipedia page that list all of Canada’s postal codes and its coordinates - Postal Codes of Canada

2.2 Data cleaning

A table with Postal code, Borough, Neighborhood, Latitude and Longitude data from the Postal Codes of Canada page was scraped and downloaded to the dataframe.  The Borough data is then filtered to only show Calgary.  Because some of the data in Calgary is showing NaN(null values), I dropped the rows that has NaN in any of its column. 
The Foursquare API will be used to obtain number of restaurants and other point of interest and their type and location in every community in Calgary.


In [3]:
!pip install geopy    
!pip install folium   
!pip install geocoder
!pip install bs4
!pip install BeautifulSoup4
!pip install lxml

#import libraries 
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import matplotlib.pyplot as plt # for graphical usage 

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library
from folium import plugins
from folium.plugins import HeatMap

# main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/
# how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k video
from bs4 import BeautifulSoup 
import pandas as pd
import requests
!conda install -c conda-forge geopy --yes 
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.1.0                |     pyhd3deb0d_0          64 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          98 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.1.0-pyhd3deb0d_0



Downloading and Extracting Packages
geopy-2.1.0          | 64 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

Scrape Data from Wipipedia

In [4]:
#source = requests.get("https://en.wikipedia.org/wiki/Lagos#Demographics").text
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T").text
soup = BeautifulSoup(source, 'html.parser')
Calgary_Table = soup.find('table',{'class':'wikitable sortable'})

In [5]:
table_rows = Calgary_Table.find_all('tr')

res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)

df_Calgary_Table = pd.DataFrame(res, columns=["Postal Codes", "Borough", "Communities", "Latitude", "Longitude"])
df_Calgary_Table.head()       

Unnamed: 0,Postal Codes,Borough,Communities,Latitude,Longitude
0,T1A,Medicine Hat,Central Medicine Hat,50.03646,-110.67925
1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432
2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158
3,T4A,Airdrie,East Airdrie,51.27245,-113.98698
4,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413


Filter out the Calgary data to be used in the analysis

In [6]:
df_Calgary_Table2 = df_Calgary_Table # assigns df1 to df2
df_Calgary_final = df_Calgary_Table2[df_Calgary_Table2['Borough'].str.contains('Calgary')]
df_Calgary_final.head()

Unnamed: 0,Postal Codes,Borough,Communities,Latitude,Longitude
1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432
2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158
10,T2B,Calgary,"Forest Lawn, Dover, Erin Woods",51.0318,-113.9786
11,T3B,Calgary,"Montgomery, Bowness, Silver Springs, Greenwood",51.0809,-114.1616
19,T2C,Calgary,"Lynnwood Ridge, Ogden, Foothills Industrial, G...",50.9878,-114.0001


Check the data type for each field

In [7]:
df_Calgary_final.dtypes

Postal Codes    object
Borough         object
Communities     object
Latitude        object
Longitude       object
dtype: object

Change the datatype for latitude and longitude to float and remove all the rows with any null value (NaN)

In [9]:
df_Calgary_final['Latitude'] = pd.to_numeric(df_Calgary_final['Latitude'],errors='coerce')
df_Calgary_final['Longitude'] = pd.to_numeric(df_Calgary_final['Longitude'],errors='coerce')
df_Calgary_final = df_Calgary_final.dropna()
df_Calgary_final.head()

Unnamed: 0,Postal Codes,Borough,Communities,Latitude,Longitude
1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432
2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158
10,T2B,Calgary,"Forest Lawn, Dover, Erin Woods",51.0318,-113.9786
11,T3B,Calgary,"Montgomery, Bowness, Silver Springs, Greenwood",51.0809,-114.1616
19,T2C,Calgary,"Lynnwood Ridge, Ogden, Foothills Industrial, G...",50.9878,-114.0001


Get the map of Calgary

In [10]:
address = "Calgary, AB"

geolocator = Nominatim(user_agent="Calgary_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Calgary city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Calgary city are 51.0534234, -114.0625892.


In [11]:
# create map of Calgary using latitude and longitude values
map_Calgary = folium.Map(location=[latitude, longitude], zoom_start=10)
map_Calgary

Get map of Calgary showing all communities

In [12]:
for lat, lng, borough, community in zip(
        df_Calgary_final['Latitude'], 
        df_Calgary_final['Longitude'], 
        df_Calgary_final['Borough'], 
        df_Calgary_final['Communities']):
    label = '{}, {}'.format(community, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Calgary)  

map_Calgary

Foursquare API

In [13]:
CLIENT_ID = 'KKIOXEJBYPET2JDYCHUBCDHAOFNDKCPZ41NN4WQUVUEVB0O4' # your Foursquare ID
CLIENT_SECRET = 'PMHAPJJTJHHP00HAJN0Y2OIJQSHAJNBQ2NHHEW4IMTNRJNNC' # your Foursquare Secret
ACCESS_TOKEN = 'IEM2BPPFYOYPBSILG030SZRZZCV1Q4H5AWNLUST0OC32CGQ5' # your FourSquare Access Token
VERSION = '20201230'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KKIOXEJBYPET2JDYCHUBCDHAOFNDKCPZ41NN4WQUVUEVB0O4
CLIENT_SECRET:PMHAPJJTJHHP00HAJN0Y2OIJQSHAJNBQ2NHHEW4IMTNRJNNC


In [14]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?ll=51.0534234, -114.0625892&client_id=KKIOXEJBYPET2JDYCHUBCDHAOFNDKCPZ41NN4WQUVUEVB0O4&client_secret=PMHAPJJTJHHP00HAJN0Y2OIJQSHAJNBQ2NHHEW4IMTNRJNNC&v=20201222&radius=1500&limit=100'
url


'https://api.foursquare.com/v2/venues/explore?ll=51.0534234, -114.0625892&client_id=KKIOXEJBYPET2JDYCHUBCDHAOFNDKCPZ41NN4WQUVUEVB0O4&client_secret=PMHAPJJTJHHP00HAJN0Y2OIJQSHAJNBQ2NHHEW4IMTNRJNNC&v=20201222&radius=1500&limit=100'

Foursquare API will be used to obtain number of restaurants and other point of interest and their type and location in every community in Calgary 

In [15]:
 #get the result to a json file
results = requests.get(url).json()

In [16]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues


  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,River Walk,Scenic Lookout,51.051413,-114.059962
1,Prince's Island Park,Park,51.054884,-114.069929
2,1886 Cafe,Café,51.052392,-114.069475
3,Over Easy Breakfast,Breakfast Spot,51.048561,-114.065917
4,Top Of Stairs In Crescent Heights,Scenic Lookout,51.05901,-114.067619
5,Monogram Coffee,Coffee Shop,51.049165,-114.067333
6,The Palomino Smokehouse,American Restaurant,51.046435,-114.06341
7,Lukes Drug Mart,Pharmacy,51.053187,-114.051327
8,Fionn MacCool's Calgary,Pub,51.051707,-114.069903
9,Diner Deluxe,Diner,51.05857,-114.054121


In [18]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?ll=51.0534234, -114.0625892&client_id=KKIOXEJBYPET2JDYCHUBCDHAOFNDKCPZ41NN4WQUVUEVB0O4&client_secret=PMHAPJJTJHHP00HAJN0Y2OIJQSHAJNBQ2NHHEW4IMTNRJNNC&v=20201222&radius=1500&limit=100'
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Communities', 
                  'Communities Latitude', 
                  'Communities Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
Calgary_area_venues = getNearbyVenues(names=df_Calgary_final['Communities'],
                                   latitudes=df_Calgary_final['Latitude'],
                                   longitudes=df_Calgary_final['Longitude']
                                  )

In [21]:
print(Calgary_area_venues.shape)
Calgary_area_venues.head()

(3400, 7)


Unnamed: 0,Communities,Communities Latitude,Communities Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,River Walk,51.051413,-114.059962,Scenic Lookout
1,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,Prince's Island Park,51.054884,-114.069929,Park
2,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,1886 Cafe,51.052392,-114.069475,Café
3,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,Over Easy Breakfast,51.048561,-114.065917,Breakfast Spot
4,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,Top Of Stairs In Crescent Heights,51.05901,-114.067619,Scenic Lookout


In [22]:
print('There are {} uniques categories.'.format(len(Calgary_area_venues['Venue Category'].unique())))

There are 57 uniques categories.


In [23]:
Calgary_area_venues_unique_count = Calgary_area_venues['Venue Category'].value_counts().to_frame(name='Count')
Calgary_area_venues_unique_count 

Unnamed: 0,Count
Pub,238
Coffee Shop,204
Restaurant,170
Steakhouse,170
Italian Restaurant,136
Hotel,136
Café,102
Park,102
Performing Arts Venue,68
Diner,68


In [24]:
# one hot encoding
Calgary_area_onehot = pd.get_dummies(Calgary_area_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Calgary_area_onehot['Communities'] = Calgary_area_venues['Communities'] 

# move neighborhood column to the first column
fixed_columns = [Calgary_area_onehot.columns[-1]] + list(Calgary_area_onehot.columns[:-1])
Caglary_area_onehot = Calgary_area_onehot[fixed_columns]

Calgary_area_onehot.head()

Unnamed: 0,American Restaurant,Argentinian Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Bistro,Board Shop,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Cocktail Bar,Coffee Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Falafel Restaurant,Fast Food Restaurant,Grocery Store,Gym / Fitness Center,Hookah Bar,Hostel,Hotel,Ice Cream Shop,Island,Italian Restaurant,Japanese Restaurant,Library,Liquor Store,Lounge,Mexican Restaurant,Middle Eastern Restaurant,Museum,Music Venue,Noodle House,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Pub,Restaurant,Scenic Lookout,Seafood Restaurant,Shopping Mall,Steakhouse,Sushi Restaurant,Thai Restaurant,Theater,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Yoga Studio,Communities
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,"Penbrooke Meadows, Marlborough"
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Penbrooke Meadows, Marlborough"
2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Penbrooke Meadows, Marlborough"
3,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Penbrooke Meadows, Marlborough"
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,"Penbrooke Meadows, Marlborough"


In [25]:
# move neighborhood column to the first column
fixed_columns = [Calgary_area_onehot.columns[-1]] + list(Calgary_area_onehot.columns[:-1])
Calgary_area_onehot = Calgary_area_onehot[fixed_columns]
Calgary_area_onehot.head()

Unnamed: 0,Communities,American Restaurant,Argentinian Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Bistro,Board Shop,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Cocktail Bar,Coffee Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Falafel Restaurant,Fast Food Restaurant,Grocery Store,Gym / Fitness Center,Hookah Bar,Hostel,Hotel,Ice Cream Shop,Island,Italian Restaurant,Japanese Restaurant,Library,Liquor Store,Lounge,Mexican Restaurant,Middle Eastern Restaurant,Museum,Music Venue,Noodle House,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Pub,Restaurant,Scenic Lookout,Seafood Restaurant,Shopping Mall,Steakhouse,Sushi Restaurant,Thai Restaurant,Theater,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


In [26]:
Calgary_area_grouped = Calgary_area_onehot.groupby('Communities').mean().reset_index()
Calgary_area_onehot.head()

Unnamed: 0,Communities,American Restaurant,Argentinian Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Bistro,Board Shop,Breakfast Spot,Brewery,Burger Joint,Café,Chinese Restaurant,Cocktail Bar,Coffee Shop,Deli / Bodega,Dim Sum Restaurant,Diner,Falafel Restaurant,Fast Food Restaurant,Grocery Store,Gym / Fitness Center,Hookah Bar,Hostel,Hotel,Ice Cream Shop,Island,Italian Restaurant,Japanese Restaurant,Library,Liquor Store,Lounge,Mexican Restaurant,Middle Eastern Restaurant,Museum,Music Venue,Noodle House,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Pub,Restaurant,Scenic Lookout,Seafood Restaurant,Shopping Mall,Steakhouse,Sushi Restaurant,Thai Restaurant,Theater,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Penbrooke Meadows, Marlborough",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']# create columns according to number of top venues
columns = ['Communities']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
Communities_venues_sorted = pd.DataFrame(columns=columns)
Communities_venues_sorted['Communities'] = Calgary_area_grouped['Communities']
for ind in np.arange(Calgary_area_grouped.shape[0]):
    Communities_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Calgary_area_grouped.iloc[ind, :], num_top_venues)
Communities_venues_sorted.head(5)

Unnamed: 0,Communities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Braeside, Cedarbrae, Woodbine",Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
1,"Brentwood, Collingwood, Nose Hill",Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
2,"Bridgeland, Greenview, Zoo, YYC",Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
3,"City Centre, Calgary Tower",Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
4,"Connaught, West Victoria Park",Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner


In [29]:
Calgary_grouped_clustering = Calgary_area_grouped.drop('Communities', 1)

In [30]:
# set number of clusters
kclusters = 5# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(Calgary_grouped_clustering)# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

  return_n_iter=True)


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [31]:
# add clustering labels
Communities_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
SE_Calgary_merged = df_Calgary_final
# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
SE_Calgary_merged.reset_index(inplace = True) 
SE_Calgary_merged = SE_Calgary_merged.join(Communities_venues_sorted.set_index('Communities'), on='Communities')
SE_Calgary_merged = SE_Calgary_merged.dropna()
SE_Calgary_merged.head() # check the last columns!

Unnamed: 0,index,Postal Codes,Borough,Communities,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
1,2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
2,10,T2B,Calgary,"Forest Lawn, Dover, Erin Woods",51.0318,-113.9786,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
3,11,T3B,Calgary,"Montgomery, Bowness, Silver Springs, Greenwood",51.0809,-114.1616,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
4,19,T2C,Calgary,"Lynnwood Ridge, Ogden, Foothills Industrial, G...",50.9878,-114.0001,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner


In [32]:
SE_Calgary_merged.loc[SE_Calgary_merged['Cluster Labels'] == 0, SE_Calgary_merged.columns[[1] + list(range(5, SE_Calgary_merged.shape[1]))]]

Unnamed: 0,Postal Codes,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,T2A,-113.96432,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
1,T3A,-114.143158,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
2,T2B,-113.9786,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
3,T3B,-114.1616,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
4,T2C,-114.0001,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
5,T3C,-114.098,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
6,T2E,-114.0614,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
7,T3E,-114.1342,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
8,T2G,-114.0599,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner
9,T3G,-114.1796,0,Pub,Coffee Shop,Steakhouse,Restaurant,Hotel,Italian Restaurant,Café,Park,Cocktail Bar,Diner


In [33]:
SE_Calgary_merged.loc[SE_Calgary_merged['Cluster Labels'] == 1, SE_Calgary_merged.columns[[1] + list(range(5, SE_Calgary_merged.shape[1]))]]

Unnamed: 0,Postal Codes,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
