## Introduction

The test project should provide tools and insights to assist the decision making for the following Business Problem:
Company considering opening several restaurants in Toronto. There are many factors affecting the business decision and calculations. Amongst them
•	Location of restaurants (Neighborhoods and Clusters of Neighborhoods)
•	What type of cuisine (perhaps Italian?) would be most suitable
•	Various levels of competition, i.e. existing vendors of the same type (i.e. Italian) vs. general competition from all restaurants etc.


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Python Libraries Imports</a>

2. <a href="#item2">Download, Transform and Explore Datasets</a>

3. <a href="#item3">Explore (Foresquare) and Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

<a id='item1'></a>
## 1. Python Libraries Imports

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# # Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


<a id='item2'></a>

## 2. Download, Transform and Explore Dataset

--Reading the wiki page--

In [16]:
import requests
wikipedia_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
raw_wikipedia_page = requests.get(wikipedia_link) 
page = raw_wikipedia_page.text 

--Using the BeautifulSoup to scrape the data--

In [17]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
table = soup.find ("table")

--transform the data into requested shape--

In [19]:
import pandas as pd
row_marker=0
twd=[]
for row in table.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    rwd=[]
    for column in columns:
        rwd.append(column.get_text().strip('\n'))
        column_marker += 1
        if len(columns) > 0:
            row_marker += 1
    twd.append(rwd)

new_table = pd.DataFrame(twd, columns=['Postcode','Borough','Neighborhood']) # I know the size
new_table.drop(new_table[new_table.Borough == 'Not assigned'].index, inplace=True)
ngTable = new_table.copy() 
new_table = new_table.groupby(['Postcode','Borough'], as_index=False ).agg(lambda x: ', '.join(set(x)))
new_table.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Morningside, Guildwood, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [20]:
new_table.shape

(103, 3)

using http://cocl.us/Geospatial_data

In [21]:
url="https://cocl.us/Geospatial_data"
geod=pd.read_csv(url)
geod.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [23]:
df1=new_table.set_index('Postcode').join(geod.set_index('Postal Code'))
df1.head()

Unnamed: 0_level_0,Borough,Neighborhood,Latitude,Longitude
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
M1E,Scarborough,"Morningside, Guildwood, West Hill",43.763573,-79.188711
M1G,Scarborough,Woburn,43.770992,-79.216917
M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Geo Coord of Toronto, Canada are 43.653963 -79.387201

In [114]:
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [33]:


df2=ngTable.set_index('Postcode').join(geod.set_index('Postal Code'))
# df2=df2.reset_index(drop=True)
df2=df2[:-1]
df2=df2.drop(df2[~df2.Borough.str.contains('Toronto')].index)
df2

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
M4E,East Toronto,The Beaches,43.676357,-79.293031
M4K,East Toronto,The Danforth West,43.679557,-79.352188
M4K,East Toronto,Riverdale,43.679557,-79.352188
M4L,East Toronto,The Beaches West,43.668999,-79.315572
M4L,East Toronto,India Bazaar,43.668999,-79.315572
M4M,East Toronto,Studio District,43.659526,-79.340923
M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
M4P,Central Toronto,Davisville North,43.712751,-79.390197
M4R,Central Toronto,North Toronto West,43.715383,-79.405678
M4S,Central Toronto,Davisville,43.704324,-79.38879


In [34]:
print('The dataframe for Toronto has {} boroughs and {} neighborhoods.'.format(
        len(df2['Borough'].unique()),
        df2.shape[0]
    )
)

The dataframe for Toronto has 4 boroughs and 74 neighborhoods.


In [109]:
import folium
# create map of Toronto using latitude and longitude values
latitude=43.653963
longitude=-79.387201
neighborhoods= df2

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<a id='item3'></a>
## 3. Explore (Foresquare) and Analyze Each Neighborhood

In [223]:
## login credentials data removed from public view

#### Use Foursquare API to get all categories

In [6]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
#get a list Foursquare categories related to food
url='https://api.foursquare.com/v2/venues/categories?&client_id={}&client_secret={}&v={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION)
# make the GET request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bbd0718db04f55c4241153e'},
 'response': {'categories': [{'id': '4d4b7104d754a06370d81259',
    'name': 'Arts & Entertainment',
    'pluralName': 'Arts & Entertainment',
    'shortName': 'Arts & Entertainment',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/default_',
     'suffix': '.png'},
    'categories': [{'id': '56aa371be4b08b9a8d5734db',
      'name': 'Amphitheater',
      'pluralName': 'Amphitheaters',
      'shortName': 'Amphitheater',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/default_',
       'suffix': '.png'},
      'categories': []},
     {'id': '4fceea171983d5d06c3e9823',
      'name': 'Aquarium',
      'pluralName': 'Aquariums',
      'shortName': 'Aquarium',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/aquarium_',
       'suffix': '.png'},
      'categories': []},
     {'id': '4bf58dd8d48988d1e1931735',
      'name': 'A

In [56]:
def extractCategories(e,res):
    for child in e.get('categories',[]):
        res.append(child['name'])
        result = extractCategories(child,res)
        if result is not None:
            return result
    return None

In [57]:
catList=[]
for cat in results['response']['categories']:
    catName = cat['name']
    if catName=='Food':
#       print (catName)
        extractCategories(cat, catList)        

In [58]:
catList

['Afghan Restaurant',
 'African Restaurant',
 'Ethiopian Restaurant',
 'American Restaurant',
 'New American Restaurant',
 'Asian Restaurant',
 'Burmese Restaurant',
 'Cambodian Restaurant',
 'Chinese Restaurant',
 'Anhui Restaurant',
 'Beijing Restaurant',
 'Cantonese Restaurant',
 'Cha Chaan Teng',
 'Chinese Aristocrat Restaurant',
 'Chinese Breakfast Place',
 'Dim Sum Restaurant',
 'Dongbei Restaurant',
 'Fujian Restaurant',
 'Guizhou Restaurant',
 'Hainan Restaurant',
 'Hakka Restaurant',
 'Henan Restaurant',
 'Hong Kong Restaurant',
 'Huaiyang Restaurant',
 'Hubei Restaurant',
 'Hunan Restaurant',
 'Imperial Restaurant',
 'Jiangsu Restaurant',
 'Jiangxi Restaurant',
 'Macanese Restaurant',
 'Manchu Restaurant',
 'Peking Duck Restaurant',
 'Shaanxi Restaurant',
 'Shandong Restaurant',
 'Shanghai Restaurant',
 'Shanxi Restaurant',
 'Szechuan Restaurant',
 'Taiwanese Restaurant',
 'Tianjin Restaurant',
 'Xinjiang Restaurant',
 'Yunnan Restaurant',
 'Zhejiang Restaurant',
 'Filipino R

#### Function to extract only Food categories from Foursqare API using previousely extracted list

In [229]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try: 
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            print ('error')
        
        for v in results:
            catNm = v['venue']['categories'][0]['name']
#           # return only Food categories
            if catNm in catList:
                venues_list.append([
                    name, 
                    lat, 
                    lng, 
                    v['venue']['name'], 
                    v['venue']['location']['lat'], 
                    v['venue']['location']['lng'],  
                    v['venue']['categories'][0]['name']])
#                 print('Append '+catNm)
                
    return venues_list          

#### Process all Neighborhoods to search for Food venues

In [200]:
LIMIT=100
radius=1000
toronto_data=df2
venues_list  = getNearbyVenues(names=toronto_data['Neighborhood'],
                               latitudes=toronto_data['Latitude'],
                               longitudes=toronto_data['Longitude']
                              )                                  


# venues_list

The Beaches
The Danforth West
Riverdale
The Beaches West
India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park
Summerhill East
Deer Park
Forest Hill SE
Rathnelly
South Hill
Summerhill West
Rosedale
Cabbagetown
St. James Town
Church and Wellesley
Harbourfront
Regent Park
Ryerson
Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide
King
Richmond
Harbourfront East
Toronto Islands
Union Station
Design Exchange
Toronto Dominion Centre
Commerce Court
error
Victoria Hotel
Roselawn
Forest Hill North
Forest Hill West
The Annex
North Midtown
Yorkville
Harbord
University of Toronto
Chinatown
Grange Park
Kensington Market
CN Tower
Bathurst Quay
Island airport
Harbourfront West
King and Spadina
Railway Lands
South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place
Underground city
Christie
Dovercourt Village
Dufferin
Little Portugal
Trinity
Brockton
Exhibition Place
Parkdale Village
High Park
The Junction South
Parkdale
Ronc

#### and shape the data into a dataframe

In [201]:
# nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
toronto_venues = pd.DataFrame  (venues_list)
toronto_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
1,The Danforth West,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
2,The Danforth West,43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop
3,The Danforth West,43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
4,The Danforth West,43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant


In [202]:
print(toronto_venues.shape)


(1906, 7)


In [205]:
df3=toronto_venues.copy()
df3
#toronto_venues.groupby('Neighborhood').count()
#toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
1,The Danforth West,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
2,The Danforth West,43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop
3,The Danforth West,43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
4,The Danforth West,43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant
5,The Danforth West,43.679557,-79.352188,Mezes,43.677962,-79.350196,Greek Restaurant
6,The Danforth West,43.679557,-79.352188,Christina's On The Danforth,43.67824,-79.349185,Greek Restaurant
7,The Danforth West,43.679557,-79.352188,La Diperie,43.67753,-79.352295,Ice Cream Shop
8,The Danforth West,43.679557,-79.352188,7 Numbers,43.677062,-79.353934,Italian Restaurant
9,The Danforth West,43.679557,-79.352188,Pizzeria Libretto,43.678489,-79.347576,Pizza Place


In [206]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,61,61,61,61,61,61
Berczy Park,28,28,28,28,28,28
Brockton,11,11,11,11,11,11
Business reply mail Processing Centre969 Eastern,4,4,4,4,4,4
Cabbagetown,29,29,29,29,29,29
Central Bay Street,64,64,64,64,64,64
Chinatown,70,70,70,70,70,70
Christie,7,7,7,7,7,7
Church and Wellesley,53,53,53,53,53,53
Commerce Court,63,63,63,63,63,63


In [207]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 86 uniques categories.


#### Analyze Each Neighborhood in Toronto dataframe

In [208]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
1,The Danforth West,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
2,The Danforth West,43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop
3,The Danforth West,43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
4,The Danforth West,43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant


#### apply Onehot encoding

In [209]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# # add neighborhood column back to dataframe
nbr = toronto_venues['Neighborhood']
# toronto_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
toronto_onehot.insert(0,'Neighborhood',nbr)

toronto_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Creperie,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,New American Restaurant,Noodle House,Persian Restaurant,Pizza Place,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Southern / Soul Food Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,The Danforth West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,The Danforth West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,The Danforth West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,The Danforth West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [210]:
toronto_onehot.shape

(1906, 87)

#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [211]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Creperie,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,New American Restaurant,Noodle House,Persian Restaurant,Pizza Place,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Southern / Soul Food Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Adelaide,0.0,0.065574,0.0,0.032787,0.0,0.0,0.032787,0.0,0.0,0.016393,0.032787,0.0,0.032787,0.016393,0.098361,0.0,0.0,0.0,0.114754,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.032787,0.0,0.016393,0.016393,0.0,0.0,0.016393,0.016393,0.0,0.016393,0.032787,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.016393,0.016393,0.0,0.016393,0.0,0.016393,0.0,0.04918,0.0,0.0,0.016393,0.0,0.0,0.0,0.065574,0.032787,0.0,0.0,0.0,0.0,0.065574,0.016393,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.035714,0.071429,0.035714,0.035714,0.0,0.0,0.0,0.035714,0.0,0.071429,0.0,0.0,0.0,0.142857,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0
2,Brockton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.090909,0.181818,0.0,0.090909,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business reply mail Processing Centre969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Cabbagetown,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.0,0.034483,0.068966,0.137931,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.068966,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.137931,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0


In [212]:
toronto_grouped.shape

(62, 87)

#### Print each neighborhood along with the top 5 most common venues

In [213]:
num_top_venues = 5
# toronto_grouped=toronto_sum
for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 5})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
                 venue     freq
0          Coffee Shop  0.11475
1                 Café  0.09836
2           Steakhouse  0.06557
3      Thai Restaurant  0.06557
4  American Restaurant  0.06557


----Berczy Park----
                venue     freq
0         Coffee Shop  0.14286
1                Café  0.07143
2          Restaurant  0.07143
3              Bakery  0.07143
4  Seafood Restaurant  0.07143


----Brockton----
                venue     freq
0         Coffee Shop  0.27273
1                Café  0.18182
2      Breakfast Spot  0.18182
3  Italian Restaurant  0.09091
4  Falafel Restaurant  0.09091


----Business reply mail Processing Centre969 Eastern----
                  venue  freq
0           Pizza Place  0.25
1         Burrito Place  0.25
2            Restaurant  0.25
3  Fast Food Restaurant  0.25
4             Irish Pub  0.00


----Cabbagetown----
                venue     freq
0          Restaurant  0.13793
1         Coffee Shop  0.13793
2                Café  0

In [214]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [215]:
import numpy as np 
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Thai Restaurant,American Restaurant,Steakhouse,Restaurant,Deli / Bodega,Burger Joint,Breakfast Spot,Japanese Restaurant
1,Berczy Park,Coffee Shop,Restaurant,Steakhouse,Bakery,Café,Seafood Restaurant,Greek Restaurant,Irish Pub,Italian Restaurant,Diner
2,Brockton,Coffee Shop,Breakfast Spot,Café,Burrito Place,Italian Restaurant,Caribbean Restaurant,Falafel Restaurant,Wings Joint,Dim Sum Restaurant,Diner
3,Business reply mail Processing Centre969 Eastern,Pizza Place,Restaurant,Fast Food Restaurant,Burrito Place,Wings Joint,Dumpling Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner
4,Cabbagetown,Coffee Shop,Restaurant,Chinese Restaurant,Pizza Place,Café,Bakery,Italian Restaurant,Indian Restaurant,Breakfast Spot,Diner


<a id='item4'></a>
## 4. Cluster Neighborhoods in Toronto

Run *k*-means to cluster the neighborhood into 5 clusters (based on Food Common Venues).

In [216]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_sum.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
kmeans.labels_.size
# kmeans.labels_

70

dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [217]:
toronto_merged = toronto_data[:70]

# # add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Coffee Shop,Wings Joint,Cuban Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
M4K,East Toronto,The Danforth West,43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Bubble Tea Shop,Diner,Pizza Place,Juice Bar,Caribbean Restaurant,Café
M4K,East Toronto,Riverdale,43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Bubble Tea Shop,Diner,Pizza Place,Juice Bar,Caribbean Restaurant,Café
M4L,East Toronto,The Beaches West,43.668999,-79.315572,0,Sandwich Place,Sushi Restaurant,Ice Cream Shop,Italian Restaurant,Pizza Place,Steakhouse,Burrito Place,Burger Joint,Fast Food Restaurant,Fish & Chips Shop
M4L,East Toronto,India Bazaar,43.668999,-79.315572,0,Sandwich Place,Sushi Restaurant,Ice Cream Shop,Italian Restaurant,Pizza Place,Steakhouse,Burrito Place,Burger Joint,Fast Food Restaurant,Fish & Chips Shop


visualize the resulting clusters

In [218]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters and specific distribution (Discussion)

In [219]:
clust0=toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
clust0

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
M4E,The Beaches,Coffee Shop,Wings Joint,Cuban Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
M4K,The Danforth West,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Bubble Tea Shop,Diner,Pizza Place,Juice Bar,Caribbean Restaurant,Café
M4K,Riverdale,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Bubble Tea Shop,Diner,Pizza Place,Juice Bar,Caribbean Restaurant,Café
M4L,The Beaches West,Sandwich Place,Sushi Restaurant,Ice Cream Shop,Italian Restaurant,Pizza Place,Steakhouse,Burrito Place,Burger Joint,Fast Food Restaurant,Fish & Chips Shop
M4L,India Bazaar,Sandwich Place,Sushi Restaurant,Ice Cream Shop,Italian Restaurant,Pizza Place,Steakhouse,Burrito Place,Burger Joint,Fast Food Restaurant,Fish & Chips Shop
M4M,Studio District,Café,Coffee Shop,Italian Restaurant,American Restaurant,Gastropub,Bakery,New American Restaurant,Middle Eastern Restaurant,Latin American Restaurant,Diner
M4P,Davisville North,Burger Joint,Pizza Place,Sandwich Place,Breakfast Spot,Wings Joint,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant
M4R,North Toronto West,Coffee Shop,Dessert Shop,Mexican Restaurant,Chinese Restaurant,Diner,Fast Food Restaurant,Café,Sandwich Place,Wings Joint,Donut Shop
M4S,Davisville,Dessert Shop,Sandwich Place,Italian Restaurant,Seafood Restaurant,Pizza Place,Café,Sushi Restaurant,Coffee Shop,Restaurant,Greek Restaurant
M4T,Summerhill East,,,,,,,,,,


Axillary function to calculate the distributions of specific columns against the sum of the rest

In [186]:
def least_freq (df, cols_to_keep):
    cols_to_keep.insert(0,'Neighborhood')

    c = df.columns.difference(cols_to_keep)
    df_sum = toronto_grouped[cols_to_keep].assign(Others=toronto_grouped[c].sum(axis=1), Total=toronto_grouped.sum(1))
    return df_sum
    

It might be helpful to view the distribution of specific restaraunt categories in the neighbourhood against all of Food categories in the same eighborhood with this simple code. Just pass the list of columns (Foursquare category labels)

In [221]:
# What 10 neighbourhood with lowerst frequency distribution of Italian restaraunts
cols_to_keep = ['Italian Restaurant','Pizza Place'] 
ital_sum = least_freq (toronto_grouped, cols_to_keep)
ital_sum = ital_sum.sort_values(by=['Italian Restaurant','Pizza Place']).reset_index(drop=True)
ital_sum.head(10)

Unnamed: 0,Neighborhood,Italian Restaurant,Pizza Place,Others,Total
0,Dovercourt Village,0.0,0.0,1.0,1.0
1,Dufferin,0.0,0.0,1.0,1.0
2,Forest Hill North,0.0,0.0,1.0,1.0
3,Forest Hill West,0.0,0.0,1.0,1.0
4,Lawrence Park,0.0,0.0,1.0,1.0
5,North Toronto West,0.0,0.0,1.0,1.0
6,Parkdale,0.0,0.0,1.0,1.0
7,Roncesvalles,0.0,0.0,1.0,1.0
8,The Beaches,0.0,0.0,1.0,1.0
9,Church and Wellesley,0.0,0.018868,0.981132,1.0


Finally it might be helpful to view the distribution (frequency) of specific restaurant categories against the most common restaurants in the cluster like this.

In [222]:
# join sum on cluster data
toronto_it=ital_sum.set_index('Neighborhood').join(clust0.set_index('Neighborhood'))
toronto_it

Unnamed: 0_level_0,Italian Restaurant,Pizza Place,Others,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Adelaide,0.016393,0.016393,0.967213,1.0,Coffee Shop,Café,Thai Restaurant,American Restaurant,Steakhouse,Restaurant,Deli / Bodega,Burger Joint,Breakfast Spot,Japanese Restaurant
Berczy Park,0.035714,0.0,0.964286,1.0,,,,,,,,,,
Brockton,0.090909,0.0,0.909091,1.0,Coffee Shop,Breakfast Spot,Café,Burrito Place,Italian Restaurant,Caribbean Restaurant,Falafel Restaurant,Wings Joint,Dim Sum Restaurant,Diner
Business reply mail Processing Centre969 Eastern,0.0,0.25,0.75,1.0,,,,,,,,,,
Cabbagetown,0.068966,0.068966,0.862069,1.0,Coffee Shop,Restaurant,Chinese Restaurant,Pizza Place,Café,Bakery,Italian Restaurant,Indian Restaurant,Breakfast Spot,Diner
Central Bay Street,0.0625,0.015625,0.921875,1.0,Coffee Shop,Café,Italian Restaurant,Ice Cream Shop,Sandwich Place,Japanese Restaurant,Bubble Tea Shop,Burger Joint,Indian Restaurant,Salad Place
Chinatown,0.014286,0.014286,0.971429,1.0,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Vietnamese Restaurant,Mexican Restaurant,Bakery,Dumpling Restaurant,Coffee Shop,Dessert Shop,Burger Joint
Christie,0.142857,0.0,0.857143,1.0,,,,,,,,,,
Church and Wellesley,0.0,0.018868,0.981132,1.0,Japanese Restaurant,Coffee Shop,Sushi Restaurant,Burger Joint,Restaurant,Mediterranean Restaurant,Bubble Tea Shop,Café,Gastropub,Fast Food Restaurant
Commerce Court,0.047619,0.015873,0.936508,1.0,Coffee Shop,Café,American Restaurant,Gastropub,Italian Restaurant,Restaurant,Deli / Bodega,Steakhouse,Burger Joint,Seafood Restaurant
