## Part 1:Data Preparation and Cleaning

For the Toronto neighborhood data, we will scrap the data from a Wikipedia page about the neighborhoods in Toronto and merge it with an existing set of geograaphical coordinates of the neighbourhood. For the data about the venues e.g restaurants and coffee shops, we will extract the data from Foursquare through the Forsquare API. We will then merge the data into a single dataset for subsequent analysis

In [1]:
#import modules
from bs4 import BeautifulSoup
import requests
import pandas as pd

**Part 1A Scraping and preparation of Toronto Geographical Data using Beautiful Soup**

In [2]:
#scrap page from url with BeautifulSoup
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
url=requests.get(url).text
soup=BeautifulSoup(url,'lxml')
#print(soup.prettify())

In [3]:
#Inspection of hierarchy tree
soup=BeautifulSoup(url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"201e9f5e-99d5-4e51-a21f-c0098a1cdb3f","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":949497198,"wgRevisionId":949497198,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toron

In [4]:
#Extract table with BeautifulSoup
table=soup.find('tbody')
table=table.find_all('td')
df=pd.DataFrame(table)

In [5]:
#Create Postalcode dataframe
postalcode=df.loc[::3,0]
postalcode=postalcode.astype(str)
postalcode=postalcode.str.replace('<td>','')
postalcode=postalcode.str.replace('\n</td>','')
postalcode=postalcode.reset_index()
postalcode.drop('index',axis=1)
postalcode.rename(columns={0:'PostalCode'}).drop('index',axis=1)

Unnamed: 0,PostalCode
0,M1A
1,M2A
2,M3A
3,M4A
4,M5A
5,M6A
6,M7A
7,M8A
8,M9A
9,M1B


In [6]:
#Create Borough dataframe
borough=df.loc[1::3,0]
borough=borough.astype(str)
borough=borough.str.replace('<td>','')
borough=borough.str.replace('\n</td>','')
borough=borough.reset_index()
borough.drop('index',axis=1)
borough.rename(columns={0:'Borough'}).drop('index',axis=1)

Unnamed: 0,Borough
0,Not assigned
1,Not assigned
2,North York
3,North York
4,Downtown Toronto
5,North York
6,Downtown Toronto
7,Not assigned
8,Etobicoke
9,Scarborough


In [7]:
#Create Neighbourhood dataframe
neighborhood=df.loc[2::3,0]
neighborhood=neighborhood.astype(str)
neighborhood=neighborhood.str.replace('<td>','')
neighborhood=neighborhood.str.replace('\n</td>','')
neighborhood=neighborhood.reset_index()
neighborhood.drop('index',axis=1)
neighborhood.rename(columns={0:'Neighborhood'}).drop('index',axis=1)

Unnamed: 0,Neighborhood
0,
1,
2,Parkwoods
3,Victoria Village
4,Regent Park / Harbourfront
5,Lawrence Manor / Lawrence Heights
6,Queen's Park / Ontario Provincial Government
7,
8,Islington Avenue
9,Malvern / Rouge


In [8]:
#Create Dataframe
Toronto=pd.concat([postalcode,borough,neighborhood],axis=1)
Toronto=Toronto.drop('index',axis=1)
Toronto

Unnamed: 0,0,0.1,0.2
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


In [9]:
#Name coloums as PostalCode, Borough and Neighborhood
Toronto.columns=['PostalCode','Borough','Neighborhood']
Toronto

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


In [10]:
#remove rows with Borough as 'Not assigned'
Toronto=Toronto[Toronto['Borough']!='Not assigned']
Toronto

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge
11,M3B,North York,Don Mills
12,M4B,East York,Parkview Hill / Woodbine Gardens
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [11]:
Toronto.shape

(103, 3)

The dataframe comprising is postal code, borough and neighborhood is then merged with the geographical coordinates of each postal code from the file at http://cocl.us/Geospatial_data

In [12]:
geodata=pd.read_csv('Geospatial_Coordinates.csv')
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
geodata=geodata.rename(columns={'Postal Code':'PostalCode'})

In [14]:
#check data types prior to merging
geodata.dtypes

PostalCode     object
Latitude      float64
Longitude     float64
dtype: object

In [15]:
geodata.shape

(103, 3)

In [16]:
#check data types prior to merging
Toronto.dtypes

PostalCode      object
Borough         object
Neighborhood    object
dtype: object

In [17]:
Toronto.shape

(103, 3)

In [18]:
#merge 'Toronto' and 'geodata' on common parameters of 'PostalCode'in'Toronto' and 'Postal Code' in 'geodata'
TorontoV2=pd.merge(Toronto,geodata,left_on='PostalCode',right_on='PostalCode')
Toronto=TorontoV2
#Toronto.to_csv('geographical.csv')
Toronto.head(20)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,Parkview Hill / Woodbine Gardens,43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


Visualisation of Toronto Neighborhood Geographical Data

In [19]:
conda install -c conda-forge folium

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [20]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [21]:
import folium 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from geopy.geocoders import Nominatim # To convert adress into latitude and longitude
import matplotlib.pyplot as plt # Library to handle visualizations
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
from pandas.io.json import json_normalize # Libraray to handle json data into pandas dataframe

In [22]:
# Using geopy library to get the coordinates of Toronto
address = 'Toronto,TO'
geolocator = Nominatim(user_agent='to_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f"The coordinates of Toronto are {latitude},{longitude}")

The coordinates of Toronto are 43.6534817,-79.3839347


In [23]:
map_toronto = folium.Map(location=[latitude,longitude], zoom_start=10)

for lat, lng, label in zip(Toronto['Latitude'],Toronto['Longitude'],Toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color = 'blue',
    fill=True,
    fill_color='#ffffff',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
    
map_toronto

**Part 1B: Obtain and prepare venue data of Toronto Neighborhoods through Foursquare API**

In [24]:
CLIENT_ID='CW4C5MHLZSAP1GUYXQ4GP51RONXVZYQFYPSHPKVUEQHJGJEB'

In [25]:
CLIENT_SECRET='MST4QE550NQHIPHSWM1P2BXW3T4PWU0ODUJWQIYWMURX34V3'

In [26]:
VERSION = '20180605'
LIMIT=100

*Explore venues in all neighbourhoods in Toronto

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
#Explore venues for Toronto neighbourhoods
Toronto_venues = getNearbyVenues(names=Toronto['Neighborhood'],
                                   latitudes=Toronto['Latitude'],
                                   longitudes=Toronto['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park / Harbourfront
Lawrence Manor / Lawrence Heights
Queen's Park / Ontario Provincial Government
Islington Avenue
Malvern / Rouge
Don Mills
Parkview Hill / Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park / Princess Gardens / Martin Grove / Islington / Cloverdale
Rouge Hill / Port Union / Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate / Bloordale Gardens / Old Burnhamthorpe / Markland Wood
Guildwood / Morningside / West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Richmond / Adelaide / King
Dufferin / Dovercourt Village
Scarborough Village
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Harbourfront East / Union Station / Toronto Islands
Little Portugal / Trinity
Kennedy Park / Ionview / East Birchmount Park
Bayview Village
Do

In [29]:
#results of above was saved as a csv file for stability and ease of retrival
#Toronto_venues.to_csv('venues.csv',index=False)

In [30]:
Toronto_venues=pd.read_csv('venues.csv')
print(Toronto_venues.shape)
Toronto_venues.head()

(4916, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


We will need to understand the variety of Southeast Asian restaurants and the coffee outlets there are in Toronto

In [31]:
#understanding the type of venues there are in Toronto
unique_venues=Toronto_venues['Venue Category'].unique()
unique_venues=pd.DataFrame(unique_venues)
print(unique_venues.head())
unique_venues.sort_values(0,ascending=True)

                      0
0  Caribbean Restaurant
1                  Park
2                  Café
3  Fast Food Restaurant
4         Grocery Store


Unnamed: 0,0
86,Accessories Store
215,Afghan Restaurant
242,Airport
316,Airport Lounge
144,American Restaurant
296,Amphitheater
53,Animal Shelter
274,Antique Shop
229,Aquarium
125,Art Gallery


From inspection of the venues above, the coffee outlets in the Toronto data are categorised as (1) Coffee Shop and (2) Cafe

In [32]:
#Identifying the type of restaruants present in Toronto
restaurants=unique_venues[unique_venues[0].str.contains('Restaurant')]
print(restaurants)

                                 0
0             Caribbean Restaurant
3             Fast Food Restaurant
15              Chinese Restaurant
24           Portuguese Restaurant
34                      Restaurant
40        Mediterranean Restaurant
43              Italian Restaurant
44               French Restaurant
49              Mexican Restaurant
50                 Thai Restaurant
55                Asian Restaurant
61               German Restaurant
63       Middle Eastern Restaurant
74                Sushi Restaurant
77            Pakistani Restaurant
78               Indian Restaurant
80           Vietnamese Restaurant
82                Greek Restaurant
84              Seafood Restaurant
93               Korean Restaurant
102               Ramen Restaurant
106  Vegetarian / Vegan Restaurant
108               Theme Restaurant
111     Modern European Restaurant
122            Japanese Restaurant
128               Tapas Restaurant
129             Falafel Restaurant
144            Ameri

In [115]:
restaurants.shape

(62, 1)

From inspection of the values above, we identify the following types of Southeast Asian restaurants: (1) Thai, (2) Vietnamese, (3) Indonesian, (4) Malay and (5) Filipino.

**Part 1C: Obtain the total count of restaurants in each Toronto neighboruhood**

In [33]:
Toronto_restaurants=Toronto_venues[Toronto_venues['Venue Category'].str.contains('Restaurant')]
Toronto_restaurants=Toronto_restaurants.drop(['Venue','Venue Latitude','Venue Longitude'],axis=1)
Toronto_restaurants=Toronto_restaurants.reset_index().drop('index',axis=1)
#Toronto_restaurants.to_csv('Toronto_restaurants_list.csv')
Toronto_restaurants.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Chinese Restaurant
3,Victoria Village,43.725882,-79.315572,Portuguese Restaurant
4,Regent Park / Harbourfront,43.65426,-79.360636,Restaurant


In [34]:
Toronto_restaurants=pd.read_csv('Toronto_restaurants_list.csv')
Toronto_restaurants.head()

Unnamed: 0.1,Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Category
0,0,Parkwoods,43.753259,-79.329656,Caribbean Restaurant
1,1,Parkwoods,43.753259,-79.329656,Fast Food Restaurant
2,2,Parkwoods,43.753259,-79.329656,Chinese Restaurant
3,3,Victoria Village,43.725882,-79.315572,Portuguese Restaurant
4,4,Regent Park / Harbourfront,43.65426,-79.360636,Restaurant


In [35]:
Toronto_restaurants2=pd.get_dummies(Toronto_restaurants['Venue Category'])
Toronto_restaurants2.head()
Toronto_restaurants2['Count']=Toronto_restaurants2.sum(axis=1)
Toronto_restaurants2.head()

Unnamed: 0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hakka Restaurant,Hawaiian Restaurant,Hong Kong Restaurant,Hotpot Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,New American Restaurant,Pakistani Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Syrian Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Count
0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [36]:
Toronto_neighbourhood=Toronto_restaurants[['Neighbourhood']]
Toronto_restaurants=pd.concat([Toronto_neighbourhood,Toronto_restaurants2],axis=1)
#Toronto_restaurants.to_csv('restaurants_oneshot.csv')
Toronto_restaurants.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hakka Restaurant,Hawaiian Restaurant,Hong Kong Restaurant,Hotpot Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,New American Restaurant,Pakistani Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Syrian Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Count
0,Parkwoods,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Parkwoods,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [37]:
Toronto_restaurants_counts=pd.DataFrame(Toronto_restaurants.groupby('Neighbourhood')['Count'].sum())

In [38]:
Toronto_restaurants_counts

Unnamed: 0_level_0,Count
Neighbourhood,Unnamed: 1_level_1
Agincourt,22
Alderwood / Long Branch,1
Bathurst Manor / Wilson Heights / Downsview North,4
Bayview Village,4
Bedford Park / Lawrence Manor East,13
Berczy Park,20
Birch Cliff / Cliffside West,2
Brockton / Parkdale Village / Exhibition Place,25
Business reply mail Processing CentrE,10
Caledonia-Fairbanks,5


In [39]:
TRC=Toronto_restaurants_counts
TRC=TRC.reset_index()

In [40]:
#TRC.to_csv('TRC.csv')

In [41]:
TRC=pd.read_csv('TRC.csv')
TRC=TRC.set_index('Neighbourhood')
TRC

Unnamed: 0_level_0,Total Restaurant Count
Neighbourhood,Unnamed: 1_level_1
Agincourt,22
Alderwood / Long Branch,1
Bathurst Manor / Wilson Heights / Downsview North,4
Bayview Village,4
Bedford Park / Lawrence Manor East,13
Berczy Park,20
Birch Cliff / Cliffside West,2
Brockton / Parkdale Village / Exhibition Place,25
Business reply mail Processing CentrE,10
Caledonia-Fairbanks,5


In [42]:
TRC.shape

(90, 1)

**Part 1D: Obtain the total count of ethnic Southeast Asian restaurants (Indonesian/Filipino/Malay/Thai/Vietnamese) in each Toronto neighbourhood**

In [43]:
Toronto_restaurants.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hakka Restaurant,Hawaiian Restaurant,Hong Kong Restaurant,Hotpot Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,New American Restaurant,Pakistani Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Syrian Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Count
0,Parkwoods,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Parkwoods,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [44]:
neighbourhood=Toronto_restaurants['Neighbourhood']
Toronto_SEA=Toronto_restaurants.set_index('Neighbourhood')
Toronto_SEA=Toronto_SEA.drop('Count',axis=1)

In [45]:
Toronto_SEA.head()

Unnamed: 0_level_0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hakka Restaurant,Hawaiian Restaurant,Hong Kong Restaurant,Hotpot Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,New American Restaurant,Pakistani Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Syrian Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1
Parkwoods,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Parkwoods,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [46]:
Toronto_SEA=Toronto_SEA[['Malay Restaurant','Indonesian Restaurant','Thai Restaurant','Filipino Restaurant','Vietnamese Restaurant']]
Toronto_SEA['SEA Count']=Toronto_SEA.sum(axis=1)
Toronto_SEA.head()

Unnamed: 0_level_0,Malay Restaurant,Indonesian Restaurant,Thai Restaurant,Filipino Restaurant,Vietnamese Restaurant,SEA Count
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Parkwoods,0,0,0,0,0,0
Parkwoods,0,0,0,0,0,0
Parkwoods,0,0,0,0,0,0
Victoria Village,0,0,0,0,0,0
Regent Park / Harbourfront,0,0,0,0,0,0


In [47]:
Toronto_SEA_counts=pd.DataFrame(Toronto_SEA.groupby('Neighbourhood')['SEA Count'].sum())
Toronto_SEA_counts

Unnamed: 0_level_0,SEA Count
Neighbourhood,Unnamed: 1_level_1
Agincourt,2
Alderwood / Long Branch,0
Bathurst Manor / Wilson Heights / Downsview North,0
Bayview Village,0
Bedford Park / Lawrence Manor East,1
Berczy Park,1
Birch Cliff / Cliffside West,1
Brockton / Parkdale Village / Exhibition Place,0
Business reply mail Processing CentrE,1
Caledonia-Fairbanks,0


In [48]:
Toronto_SEA_counts.shape

(90, 1)

**Part 1E: Obtain the total count of coffee shops and cafes in each Toronto neighbourhood**

In [49]:
Toronto_coffee=Toronto_venues[Toronto_venues['Venue Category'].str.contains('Coffee|Café')]
Toronto_coffee=Toronto_coffee.drop(['Venue','Venue Latitude','Venue Longitude'],axis=1)
Toronto_coffee.head()
#Toronto_coffee.to_csv('coffee.csv')

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Category
2,Parkwoods,43.753259,-79.329656,Café
19,Parkwoods,43.753259,-79.329656,Coffee Shop
30,Victoria Village,43.725882,-79.315572,Coffee Shop
33,Victoria Village,43.725882,-79.315572,Coffee Shop
42,Regent Park / Harbourfront,43.65426,-79.360636,Coffee Shop


In [50]:
neighbourhood_coffee=Toronto_coffee['Neighbourhood']
Toronto_coffee=pd.get_dummies(Toronto_coffee['Venue Category'])
Toronto_coffee.head()
Toronto_coffee['Coffee Count']=Toronto_coffee.sum(axis=1)
Toronto_coffee.head()

Unnamed: 0,Café,Coffee Shop,Coffee Count
2,1,0,1
19,0,1,1
30,0,1,1
33,0,1,1
42,0,1,1


In [51]:
Toronto_coffee=pd.concat([neighbourhood_coffee,Toronto_coffee],axis=1)
Toronto_coffee

Unnamed: 0,Neighbourhood,Café,Coffee Shop,Coffee Count
2,Parkwoods,1,0,1
19,Parkwoods,0,1,1
30,Victoria Village,0,1,1
33,Victoria Village,0,1,1
42,Regent Park / Harbourfront,0,1,1
48,Regent Park / Harbourfront,0,1,1
50,Regent Park / Harbourfront,0,1,1
55,Regent Park / Harbourfront,0,1,1
58,Regent Park / Harbourfront,0,1,1
61,Regent Park / Harbourfront,0,1,1


In [52]:
Toronto_coffee_counts=pd.DataFrame(Toronto_coffee.groupby('Neighbourhood')['Coffee Count'].sum())
Toronto_coffee_counts=Toronto_coffee_counts.reset_index()
Toronto_coffee_counts.head()

Unnamed: 0,Neighbourhood,Coffee Count
0,Agincourt,2
1,Alderwood / Long Branch,1
2,Bathurst Manor / Wilson Heights / Downsview North,2
3,Bayview Village,1
4,Bedford Park / Lawrence Manor East,4


In [53]:
Toronto_coffee_counts.shape

(90, 2)

**Part 1F: Merge the data on total restaurant counts, total ethnic southeast asian restaurant counts,total coffee shops and cafes in each Toronto neighbourhood into a single dataframe**

In [54]:
Project_data=pd.merge(TRC,Toronto_SEA_counts,left_on='Neighbourhood',right_on='Neighbourhood')
Project_data.head()

Unnamed: 0_level_0,Total Restaurant Count,SEA Count
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Agincourt,22,2
Alderwood / Long Branch,1,0
Bathurst Manor / Wilson Heights / Downsview North,4,0
Bayview Village,4,0
Bedford Park / Lawrence Manor East,13,1


In [55]:
Project_data.reset_index()

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count
0,Agincourt,22,2
1,Alderwood / Long Branch,1,0
2,Bathurst Manor / Wilson Heights / Downsview North,4,0
3,Bayview Village,4,0
4,Bedford Park / Lawrence Manor East,13,1
5,Berczy Park,20,1
6,Birch Cliff / Cliffside West,2,1
7,Brockton / Parkdale Village / Exhibition Place,25,0
8,Business reply mail Processing CentrE,10,1
9,Caledonia-Fairbanks,5,0


In [56]:
Project_data.shape

(90, 2)

In [57]:
Project_data=pd.merge(Project_data,Toronto_coffee_counts,left_on='Neighbourhood',right_on='Neighbourhood')

In [58]:
Project_data.shape

(85, 4)

In [59]:
Project_data=Project_data.reset_index()

In [117]:
Project_data.head(20)

Unnamed: 0,index,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count
0,0,Agincourt,22,2,2
1,1,Alderwood / Long Branch,1,0,1
2,2,Bathurst Manor / Wilson Heights / Downsview North,4,0,2
3,3,Bayview Village,4,0,1
4,4,Bedford Park / Lawrence Manor East,13,1,4
5,5,Berczy Park,20,1,16
6,6,Birch Cliff / Cliffside West,2,1,1
7,7,Brockton / Parkdale Village / Exhibition Place,25,0,14
8,8,Business reply mail Processing CentrE,10,1,4
9,9,Caledonia-Fairbanks,5,0,2


In [61]:
Toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


In [62]:
Project_data1=pd.merge(Toronto,Project_data,left_on='Neighborhood',right_on='Neighbourhood')

In [63]:
Project_data1.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,index,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count
0,M3A,North York,Parkwoods,43.753259,-79.329656,53,Parkwoods,3,0,2
1,M4A,North York,Victoria Village,43.725882,-79.315572,75,Victoria Village,1,0,2
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,55,Regent Park / Harbourfront,20,2,19
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,40,Lawrence Manor / Lawrence Heights,13,2,3
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,54,Queen's Park / Ontario Provincial Government,24,2,10


In [64]:
master=Project_data1

In [65]:
master.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,index,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count
0,M3A,North York,Parkwoods,43.753259,-79.329656,53,Parkwoods,3,0,2
1,M4A,North York,Victoria Village,43.725882,-79.315572,75,Victoria Village,1,0,2
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,55,Regent Park / Harbourfront,20,2,19
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,40,Lawrence Manor / Lawrence Heights,13,2,3
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,54,Queen's Park / Ontario Provincial Government,24,2,10


## Part 2: Analysis

With the master data set, we will proceed to (1) perform basic descriptive statistics; (2) identify the neighbourhood to locate the NYC outlet; and (3) understand the relationship between the concentration of Southeast Asian restaurants and coffee outlets to gain insights into the behaviour of NYC beachhead customer segment i.e. ethnic Southeast Asians who consumes coffee

In [66]:
df=master[['Neighbourhood','Total Restaurant Count','SEA Count','Coffee Count']]

In [67]:
df

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count
0,Parkwoods,3,0,2
1,Victoria Village,1,0,2
2,Regent Park / Harbourfront,20,2,19
3,Lawrence Manor / Lawrence Heights,13,2,3
4,Queen's Park / Ontario Provincial Government,24,2,10
5,Malvern / Rouge,5,0,2
6,Don Mills,24,1,8
7,Don Mills,24,1,8
8,Parkview Hill / Woodbine Gardens,2,0,1
9,"Garden District, Ryerson",27,1,12


In [68]:
#identify duplicates
duplicates=df[df['Neighbourhood'].duplicated()]
print(duplicates)

   Neighbourhood  Total Restaurant Count  SEA Count  Coffee Count
7      Don Mills                      24          1             8
38     Downsview                      17          5             6
39     Downsview                      17          5             6
40     Downsview                      17          5             6
54    Willowdale                      40          3            10


In [69]:
df=df.drop_duplicates(subset='Neighbourhood')

In [70]:
df.shape

(85, 4)

**Part 2A: Basic descriptive statistic of the data**

In [71]:
#basic descriptive statistic
df.describe().round(2)

Unnamed: 0,Total Restaurant Count,SEA Count,Coffee Count
count,85.0,85.0,85.0
mean,14.2,1.15,6.75
std,10.24,1.44,5.37
min,1.0,0.0,1.0
25%,4.0,0.0,2.0
50%,12.0,1.0,5.0
75%,22.0,2.0,12.0
max,40.0,6.0,19.0


**Part 2B: Understand the relation between total number of restaurants, total number of SEA restaurants and total number of coffee outlets in each neighbourhood**

Normalize the data

In [72]:
df['Total Restaurant Count Normalize']=(df['Total Restaurant Count']-df['Total Restaurant Count'].mean())/df['Total Restaurant Count'].std()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [73]:
df['SEA Count Normalize']=(df['SEA Count']-df['SEA Count'].mean())/df['SEA Count'].std()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [74]:
df['Coffee Count Normalize']=(df['Coffee Count']-df['Coffee Count'].mean())/df['Coffee Count'].std()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [75]:
df

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
0,Parkwoods,3,0,2,-1.094026,-0.803358,-0.884451
1,Victoria Village,1,0,2,-1.289387,-0.803358,-0.884451
2,Regent Park / Harbourfront,20,2,19,0.566549,0.590222,2.278993
3,Lawrence Manor / Lawrence Heights,13,2,3,-0.117217,0.590222,-0.698366
4,Queen's Park / Ontario Provincial Government,24,2,10,0.957272,0.590222,0.604229
5,Malvern / Rouge,5,0,2,-0.898664,-0.803358,-0.884451
6,Don Mills,24,1,8,0.957272,-0.106568,0.232059
8,Parkview Hill / Woodbine Gardens,2,0,1,-1.191707,-0.803358,-1.070535
9,"Garden District, Ryerson",27,1,12,1.250315,-0.106568,0.976398
10,Glencairn,7,0,2,-0.703302,-0.803358,-0.884451


In [76]:
df.shape

(85, 7)

Comparison of data by visual inspection

In [77]:
df.nlargest(5,'Total Restaurant Count')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
53,Willowdale,40,3,10,2.520166,1.287012,0.604229
71,Davisville,37,3,12,2.227124,1.287012,0.976398
22,Christie,35,1,13,2.031762,-0.106568,1.162483
41,The Danforth West / Riverdale,32,0,13,1.738719,-0.803358,1.162483
34,Little Portugal / Trinity,31,3,13,1.641039,1.287012,1.162483


In [78]:
df.nlargest(5,'Total Restaurant Count Normalize')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
53,Willowdale,40,3,10,2.520166,1.287012,0.604229
71,Davisville,37,3,12,2.227124,1.287012,0.976398
22,Christie,35,1,13,2.031762,-0.106568,1.162483
41,The Danforth West / Riverdale,32,0,13,1.738719,-0.803358,1.162483
34,Little Portugal / Trinity,31,3,13,1.641039,1.287012,1.162483


In [79]:
df.nlargest(5,'SEA Count')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
49,Studio District,28,6,12,1.347996,3.377381,0.976398
37,Downsview,17,5,6,0.273506,2.680592,-0.140111
62,High Park / The Junction South,21,5,15,0.66423,2.680592,1.534653
76,Kensington Market / Chinatown / Grange Park,31,5,12,1.641039,2.680592,0.976398
78,Summerhill West / Rathnelly / South Hill / For...,24,5,10,0.957272,2.680592,0.604229


In [80]:
df.nlargest(5,'SEA Count Normalize')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
49,Studio District,28,6,12,1.347996,3.377381,0.976398
37,Downsview,17,5,6,0.273506,2.680592,-0.140111
62,High Park / The Junction South,21,5,15,0.66423,2.680592,1.534653
76,Kensington Market / Chinatown / Grange Park,31,5,12,1.641039,2.680592,0.976398
78,Summerhill West / Rathnelly / South Hill / For...,24,5,10,0.957272,2.680592,0.604229


In [81]:
df.nlargest(5,'Coffee Count')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
2,Regent Park / Harbourfront,20,2,19,0.566549,0.590222,2.278993
60,Davisville North,24,1,17,0.957272,-0.106568,1.906823
82,Stn A PO Boxes,21,1,17,0.66423,-0.106568,1.906823
13,St. James Town,25,1,16,1.054953,-0.106568,1.720738
17,Berczy Park,20,1,16,0.566549,-0.106568,1.720738


In [82]:
df.nlargest(5,'Coffee Count Normalize')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
2,Regent Park / Harbourfront,20,2,19,0.566549,0.590222,2.278993
60,Davisville North,24,1,17,0.957272,-0.106568,1.906823
82,Stn A PO Boxes,21,1,17,0.66423,-0.106568,1.906823
13,St. James Town,25,1,16,1.054953,-0.106568,1.720738
17,Berczy Park,20,1,16,0.566549,-0.106568,1.720738


Understand correlation between total number of restaurants,total number of SEA restaurants and total number of coffee outlets

In [83]:
df_absolute=df[['Total Restaurant Count','SEA Count','Coffee Count']]
df_absolute.corr()

Unnamed: 0,Total Restaurant Count,SEA Count,Coffee Count
Total Restaurant Count,1.0,0.60722,0.81173
SEA Count,0.60722,1.0,0.49119
Coffee Count,0.81173,0.49119,1.0


In [84]:
df_normalize=df[['Total Restaurant Count Normalize','SEA Count Normalize','Coffee Count Normalize']]
df_normalize.corr()

Unnamed: 0,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
Total Restaurant Count Normalize,1.0,0.60722,0.81173
SEA Count Normalize,0.60722,1.0,0.49119
Coffee Count Normalize,0.81173,0.49119,1.0


From inspection of the above, the top five neighbourhoods respectively for the total number of restaurants, total number of ethnic Southeast Asian restaurants and total number of coffee outlets are different. 

From the correlation coefficients, it appears that there is only a moderate positive correlation between the number of ethnic Southeast Asian restaurant and number of coffee outlets in a neighbour, with a coefficient of 0.49119

**Part 2C: Identify the optimal neighbourhood to site the NYC outlet**

In [85]:
#Top 5 Neighbourhood in terms of number of SEA restaurants, among Top 5 neighbourhood in terms of total number of restaurants
df1=df[['Neighbourhood','Total Restaurant Count','SEA Count','Coffee Count']]
df1.nlargest(5,'Total Restaurant Count').nlargest(5,'SEA Count')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count
53,Willowdale,40,3,10
71,Davisville,37,3,12
34,Little Portugal / Trinity,31,3,13
22,Christie,35,1,13
41,The Danforth West / Riverdale,32,0,13


In [86]:
#Top 5 Neighbourhood in terms of number of coffee outlets, among Top 5 neighbourhood in terms of total number of SEA restaurants
df1.nlargest(5,'Total Restaurant Count').nlargest(5,'Coffee Count')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count
22,Christie,35,1,13
41,The Danforth West / Riverdale,32,0,13
34,Little Portugal / Trinity,31,3,13
71,Davisville,37,3,12
53,Willowdale,40,3,10


In [87]:
#Top 5 Neighbourhood in terms of number of coffee outlets, among Top 5 neighbourhood in terms of total number of SEA restaurants
df1.nlargest(5,'SEA Count').nlargest(5,'Coffee Count')

Unnamed: 0,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count
62,High Park / The Junction South,21,5,15
49,Studio District,28,6,12
76,Kensington Market / Chinatown / Grange Park,31,5,12
78,Summerhill West / Rathnelly / South Hill / For...,24,5,10
37,Downsview,17,5,6


From inspection of the above, the optimal neighbourhood to site the NYC outlet is High Park/The Junction South, Studio District, Kensington Market/Chinatown/Grange Park and Summerhill West/Rathnelly/South Hill/Forest Hill SE/Deer Park 

We will proceed to check the location through Folium visualisation

In [88]:
Toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


In [89]:
DF=pd.merge(Toronto,df,left_on='Neighborhood',right_on='Neighbourhood')

In [90]:
DF.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Neighbourhood,Total Restaurant Count,SEA Count,Coffee Count,Total Restaurant Count Normalize,SEA Count Normalize,Coffee Count Normalize
0,M3A,North York,Parkwoods,43.753259,-79.329656,Parkwoods,3,0,2,-1.094026,-0.803358,-0.884451
1,M4A,North York,Victoria Village,43.725882,-79.315572,Victoria Village,1,0,2,-1.289387,-0.803358,-0.884451
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,Regent Park / Harbourfront,20,2,19,0.566549,0.590222,2.278993
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,Lawrence Manor / Lawrence Heights,13,2,3,-0.117217,0.590222,-0.698366
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,Queen's Park / Ontario Provincial Government,24,2,10,0.957272,0.590222,0.604229


In [91]:
DF=DF[['Neighbourhood','Latitude','Longitude','Total Restaurant Count','SEA Count','Coffee Count']]

In [92]:
DF=DF.drop_duplicates(subset='Neighbourhood')

In [93]:
DF.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude,Total Restaurant Count,SEA Count,Coffee Count
0,Parkwoods,43.753259,-79.329656,3,0,2
1,Victoria Village,43.725882,-79.315572,1,0,2
2,Regent Park / Harbourfront,43.65426,-79.360636,20,2,19
3,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,13,2,3
4,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,24,2,10


In [94]:
DF_T5=DF.nlargest(5,'SEA Count').nlargest(5,'Coffee Count')

In [95]:
DF_T5

Unnamed: 0,Neighbourhood,Latitude,Longitude,Total Restaurant Count,SEA Count,Coffee Count
62,High Park / The Junction South,43.661608,-79.464763,21,5,15
49,Studio District,43.659526,-79.340923,28,6,12
76,Kensington Market / Chinatown / Grange Park,43.653206,-79.400049,31,5,12
78,Summerhill West / Rathnelly / South Hill / For...,43.686412,-79.400049,24,5,10
37,Downsview,43.737473,-79.464763,17,5,6


In [96]:
map_toronto = folium.Map(location=[latitude,longitude], zoom_start=10)

for lat, lng, label in zip(DF_T5['Latitude'],DF_T5['Longitude'],DF_T5['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color = 'blue',
    fill=True,
    fill_color='#ffffff',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
    
map_toronto

From visualisation, the neighbourhood of High Park/The Junction South, Studio District, Kensington Market/Chinatown/Grange Park and Summerhill West/Rathnelly/South Hill/Forest Hill SE/Deer Park forms a cluster at the southern part of Toronto around downtown, while Downsview is located further away to the north near the airport.

**Based on this, the neighbourhood of Kensington Market/Chinatown/Grange Park is preliminarily selected as the location for the NYC outlet, since it is the closest distance to the other neighbourhood in the cluster**

We will next proceed to explore the various neighbourhoods are clustered based on the restaurant count, ethnic SEA restaurant count and coffee outlet counts to give us further insights into the where to locate the NYC outlet; 

**Part 2D: Cluster the various neighbourhoods based on the restaurant count, ethnic SEA restaurant count and coffee outlet counts**

In [97]:
DF.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude,Total Restaurant Count,SEA Count,Coffee Count
0,Parkwoods,43.753259,-79.329656,3,0,2
1,Victoria Village,43.725882,-79.315572,1,0,2
2,Regent Park / Harbourfront,43.65426,-79.360636,20,2,19
3,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,13,2,3
4,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,24,2,10


In [98]:
DF.shape

(85, 6)

In [99]:
#cluster neighborhoods into clusters with k-means
from sklearn.cluster import KMeans
kclusters = 3
DF_cluster = DF.drop(['Neighbourhood','Latitude','Longitude'], axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(DF_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 1, 2, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 2, 1, 2, 0, 2, 0, 1, 1, 1, 0, 0, 2, 1, 1, 1, 0, 2, 1, 0,
       2, 1, 2, 0, 0, 1, 0, 0, 0, 2, 0, 1, 2, 1, 0, 2, 2, 1, 1, 2, 0, 2,
       1, 1, 1, 2, 2, 1, 2, 1, 0, 0, 0, 1, 0, 2, 1, 2, 1, 2, 2])

In [100]:
DF_cluster.insert(0,'Cluster Labels',kmeans.labels_)
DF_cluster

Unnamed: 0,Cluster Labels,Total Restaurant Count,SEA Count,Coffee Count
0,0,3,0,2
1,0,1,0,2
2,1,20,2,19
3,2,13,2,3
4,1,24,2,10
5,0,5,0,2
6,1,24,1,8
8,0,2,0,1
9,1,27,1,12
10,0,7,0,2


In [102]:
DF_cluster.shape

(85, 4)

In [101]:
DF_geo=DF[['Neighbourhood','Latitude','Longitude']]

In [103]:
DF_cluster=pd.concat([DF_geo,DF_cluster],axis=1)

In [104]:
DF_cluster.shape

(85, 7)

In [119]:
DF_cluster.head(10)

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,Total Restaurant Count,SEA Count,Coffee Count
0,Parkwoods,43.753259,-79.329656,0,3,0,2
1,Victoria Village,43.725882,-79.315572,0,1,0,2
2,Regent Park / Harbourfront,43.65426,-79.360636,1,20,2,19
3,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,2,13,2,3
4,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1,24,2,10
5,Malvern / Rouge,43.806686,-79.194353,0,5,0,2
6,Don Mills,43.745906,-79.352188,1,24,1,8
8,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,0,2,0,1
9,"Garden District, Ryerson",43.657162,-79.378937,1,27,1,12
10,Glencairn,43.709577,-79.445073,0,7,0,2


In [107]:
import numpy as np
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []

for lat, lon, poi, cluster in zip(DF_cluster['Latitude'], DF_cluster['Longitude'], DF_cluster['Neighbourhood'], DF_cluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Red as cluster 1;Purple as cluster 2;Green as cluster 3
Of note, we see that Cluster 2 is more concentrated and located at the southern part of Toronto at downtown

In [108]:
#Cluster 1
Cluster1=DF_cluster.loc[DF_cluster['Cluster Labels'] == 0,]
Cluster1.shape

(33, 7)

In [120]:
Cluster1.head(20)

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,Total Restaurant Count,SEA Count,Coffee Count
0,Parkwoods,43.753259,-79.329656,0,3,0,2
1,Victoria Village,43.725882,-79.315572,0,1,0,2
5,Malvern / Rouge,43.806686,-79.194353,0,5,0,2
8,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,0,2,0,1
10,Glencairn,43.709577,-79.445073,0,7,0,2
11,West Deane Park / Princess Gardens / Martin Gr...,43.650943,-79.554724,0,2,0,1
12,Woodbine Heights,43.695344,-79.318389,0,2,2,4
14,Humewood-Cedarvale,43.693781,-79.428191,0,4,0,2
15,Guildwood / Morningside / West Hill,43.763573,-79.188711,0,4,0,2
18,Caledonia-Fairbanks,43.689026,-79.453512,0,5,0,2


In [110]:
#Cluster 2
Cluster2=DF_cluster.loc[DF_cluster['Cluster Labels'] == 1,]
Cluster2.shape

(30, 7)

In [121]:
Cluster2.head(20)

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,Total Restaurant Count,SEA Count,Coffee Count
2,Regent Park / Harbourfront,43.65426,-79.360636,1,20,2,19
4,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1,24,2,10
6,Don Mills,43.745906,-79.352188,1,24,1,8
9,"Garden District, Ryerson",43.657162,-79.378937,1,27,1,12
13,St. James Town,43.651494,-79.375418,1,25,1,16
17,Berczy Park,43.644771,-79.373306,1,20,1,16
21,Central Bay Street,43.657952,-79.387383,1,23,1,12
22,Christie,43.669542,-79.422564,1,35,1,13
27,Richmond / Adelaide / King,43.650571,-79.384568,1,21,1,12
32,East Toronto,43.685347,-79.338106,1,29,3,15


In [112]:
#Cluster 3
Cluster3=DF_cluster.loc[DF_cluster['Cluster Labels'] == 2,]
Cluster3.shape

(22, 7)

In [122]:
Cluster3.head(20)

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,Total Restaurant Count,SEA Count,Coffee Count
3,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,2,13,2,3
16,The Beaches,43.676357,-79.293031,2,16,1,6
26,Thorncliffe Park,43.705369,-79.349372,2,11,0,5
28,Dufferin / Dovercourt Village,43.669005,-79.442259,2,15,2,13
30,Fairview / Henry Farm / Oriole,43.778517,-79.346556,2,9,0,5
37,Downsview,43.737473,-79.464763,2,17,5,6
45,India Bazaar / The Beaches West,43.668999,-79.315572,2,20,0,8
48,Willowdale / Newtonbrook,43.789053,-79.408493,2,11,0,5
50,Bedford Park / Lawrence Manor East,43.733283,-79.41975,2,13,1,4
58,Dorset Park / Wexford Heights / Scarborough To...,43.75741,-79.273304,2,15,1,3


Based on the clustering above, cluster 1 appeaars to be cluster with low concentration of restaurants (of all sort) and coffee outlets; cluster 2 appeara to be cluster with high concentrations of restaurant and coffee outlets; cluster 3 appears to be cluster with moderate concentration of restaurants (include SEA) and coffee outlets. 

Of note, the neighbourhood of High Park/The Junction South, Studio District, Kensington Market/Chinatown/Grange Park and Summerhill West/Rathnelly/South Hill/Forest Hill SE/Deer Park are all in cluster 2, while Downsview is located in cluster 3.

**Based on this, the clustering further supports the neighbourhood of Kensington Market/Chinatown/Grange Park as the location for the NYC outlet. It is not only the closest distance to the other three neighbourhood with the highest concentration of coffee outlets among neighbourhood with the highest concentration of SEA restaurants, it is also approximately at the centre where most of cluster 2 is located i.e. downtown Toronto