# Capstone Project - The Battle of the Neighborhoods (Week 2)

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

### The Problem
In this project we will try to find optimal location to open a Chinese restaurant. This report will be targeted to stakeholders interested in open such restaurant in **Toronto**, Canada.

We will try to detect **locations that are not already crowded with Chinese restaurant** and **locations that are highly populated**. We would also prefer locations **as close to downtown as possible**, assuming that the first two conditions met.

### Data <a name="data"></a>
Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants
* number of existing Chinese restaurants
* population of neighborhood
* distance of neighborhood from downtown

Following data sources will be needed to extract/generate the required information:
* centers of initial candidate areas will be found from [Wiki page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) and geographical information can be found from this [File](http://cocl.us/Geospatial_data)
* population of Toronto can be gathered from this [webpage](https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=1201&SR=1&S=22&O=A&RPP=9999&PR=0), this is population information for the year 2016. With the limitation of data online, we will use it as a reference eventhough it's not the most recent data
* number of Chinese restaurants in every neighborhood will be obtained using **Foursquare API**
* coordinate of Toronto downtown has been found through a simple Google search (43.6548, -79.3883)

#### Neighborhood Candidates
First, get neighborhood information from the [Wiki page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M). The table contains Postcode, Borough, and Neighbourhood information. We will use the existing Postcode area as our area candidates, and there are 103 such candidates. 
We could find Toronto Postcode geographic coordinates from an online [file](http://cocl.us/Geospatial_data). We will combine the 2 tables into 1 dataframe that contains Postcode, Neighborhood, Latitude, and Longitude.

In [9]:
import numpy as np
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
!pip install lxml

# scraping postcode table from webpage
t = 'http://web.archive.org/web/20200308015824/https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
table = pd.read_html(t, header=0)
tables = table[2]
df = pd.DataFrame(tables)
df = df[df.Borough != 'Not assigned']
df = df.replace("Not assigned", "Queen's Park")
df = df.groupby('Postcode').agg({'Borough':'min','Neighbourhood':', '.join})
df = df.reset_index()

# read the latitude and longitude csv file into a dataframe
url='http://cocl.us/Geospatial_data'
ll = pd.read_csv(url)

# add latitude and longitude info into the dataframe
new = pd.concat([df,ll], axis=1)
new = new.drop(['Postal Code'], axis=1)
new.head()

# scraping population table from webpage
p = 'https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=1201&SR=1&S=22&O=A&RPP=9999&PR=0'
pop = pd.read_html(p, header=0)
pop=pop[0]
dfpop = pd.DataFrame(pop)
dfpop = dfpop.rename(columns={'Geographic name': 'Postcode'})
dfpop = dfpop.drop(['Total private dwellings, 2016', 'Private dwellings occupied by usual residents, 2016'], axis=1)

# merge the 2 tables to add population information
new = pd.merge(new, dfpop, on='Postcode')
new.head()



Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,"Population, 2016"
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,66108.0
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,35626.0
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,46943.0
3,M1G,Scarborough,Woburn,43.770992,-79.216917,29690.0
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,24383.0


#### Foursquare
Then, we will use Foursquare to get Chinese restaurant venues information in each area.

In [10]:
# Foursquare Info
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [11]:
# Import needed libraries
import requests 
import json
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim

In [12]:
# Function to get all the venues
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['categories'][0]['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue Name', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category', 'Venue Id']
    
    return(nearby_venues)

In [13]:
# Call function to get all venues
Venues = getNearbyVenues(names=new['Postcode'], 
                         latitudes=new['Latitude'], 
                         longitudes=new['Longitude'])

M1B
M1C
M1E
M1G
M1H
M1J
M1K
M1L
M1M
M1N
M1P
M1R
M1S
M1T
M1V
M1W
M1X
M2H
M2J
M2K
M2L
M2M
M2N
M2P
M2R
M3A
M3B
M3C
M3H
M3J
M3K
M3L
M3M
M3N
M4A
M4B
M4C
M4E
M4G
M4H
M4J
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5M
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6A
M6B
M6C
M6E
M6G
M6H
M6J
M6K
M6L
M6M
M6N
M6P
M6R
M6S
M7A
M7Y
M8V
M8W
M8X
M8Y
M8Z
M9A
M9B
M9C
M9L
M9M
M9N
M9P
M9R
M9V
M9W


Filter for all restaurants:

In [16]:
# create new dataframe for all restaurants
column_names = ['Area', 'Area Latitude', 'Area Longitude', 'Venue Name', 
                'Venue Latitude', 'Venue Longitude', 'Venue Category', 'Venue Id']
restaurants = pd.DataFrame(columns=column_names)

U=pd.unique(Venues['Venue Category'])
category_list = U.tolist()

for rd in ['Restaurant', 'Dinner']:
    for category_name in category_list:
        if rd in category_name:
            rows = Venues.loc[Venues['Venue Category'] == category_name]
            restaurants = restaurants.append(rows, ignore_index = True) 
restaurants.head()

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category,Venue Id
0,M1B,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant,4bf58dd8d48988d16e941735
1,M1T,43.781638,-79.304302,KFC,43.7804,-79.3007,Fast Food Restaurant,4bf58dd8d48988d16e941735
2,M1W,43.799525,-79.318389,KFC,43.798938,-79.318854,Fast Food Restaurant,4bf58dd8d48988d16e941735
3,M1W,43.799525,-79.318389,McDonald's,43.798249,-79.318167,Fast Food Restaurant,4bf58dd8d48988d16e941735
4,M2J,43.778517,-79.346556,KFC,43.7776,-79.3442,Fast Food Restaurant,4bf58dd8d48988d16e941735


Now, let's filter for all the Chinese restaurants:

In [17]:
# create new dataframe for all Chinese restaurants
chinese_restaurants = pd.DataFrame(columns=column_names)

# Category IDs corresponding to Chinese restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):
chinese_restaurant_categories = ['4bf58dd8d48988d145941735', '52af3a5e3cf9994f4e043bea', '52af3a7c3cf9994f4e043bed', 
                                 '58daa1558bbb0b01f18ec1d3', '52af3a673cf9994f4e043beb', '52af3a903cf9994f4e043bee', 
                                 '4bf58dd8d48988d1f5931735', '52af3a9f3cf9994f4e043bef', '52af3aaa3cf9994f4e043bf0', 
                                 '52af3ab53cf9994f4e043bf1', '52af3abe3cf9994f4e043bf2', '52af3ac83cf9994f4e043bf3', 
                                 '52af3ad23cf9994f4e043bf4', '52af3add3cf9994f4e043bf5', '52af3af23cf9994f4e043bf7', 
                                 '52af3ae63cf9994f4e043bf6', '52af3afc3cf9994f4e043bf8', '52af3b053cf9994f4e043bf9', 
                                 '52af3b213cf9994f4e043bfa', '52af3b293cf9994f4e043bfb', '52af3b343cf9994f4e043bfc', 
                                 '52af3b3b3cf9994f4e043bfd', '52af3b463cf9994f4e043bfe', '52af3b633cf9994f4e043c01', 
                                 '52af3b513cf9994f4e043bff', '52af3b593cf9994f4e043c00', '52af3b6e3cf9994f4e043c02', 
                                 '52af3b773cf9994f4e043c03', '52af3b813cf9994f4e043c04', '52af3b893cf9994f4e043c05', 
                                 '52af3b913cf9994f4e043c06', '52af3b9a3cf9994f4e043c07', '52af3ba23cf9994f4e043c08']
uid=pd.unique(Venues['Venue Id'])
ID_list = uid.tolist()

for ids in chinese_restaurant_categories:
    if ids in ID_list:
        row = Venues.loc[Venues['Venue Id'] == ids]
        chinese_restaurants = chinese_restaurants.append(row, ignore_index = True) 
chinese_restaurants.head()

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category,Venue Id
0,M1K,43.727929,-79.262029,Hakka No.1,43.727688,-79.266057,Chinese Restaurant,4bf58dd8d48988d145941735
1,M1P,43.75741,-79.273304,Kim Kim restaurant,43.753833,-79.276611,Chinese Restaurant,4bf58dd8d48988d145941735
2,M1T,43.781638,-79.304302,The Royal Chinese Restaurant 避風塘小炒,43.780505,-79.298844,Chinese Restaurant,4bf58dd8d48988d145941735
3,M1W,43.799525,-79.318389,Mr Congee Chinese Cuisine 龍粥記,43.798879,-79.318335,Chinese Restaurant,4bf58dd8d48988d145941735
4,M1W,43.799525,-79.318389,Phoenix Restaurant 金鳳餐廳,43.798198,-79.318432,Chinese Restaurant,4bf58dd8d48988d145941735


We found all the Chinese restaurant in Toronto area, and we are ready for analyzing.

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Toronto that have low restaurant density, particularly those with low number of Chinese restaurants. 

In first step we have collected the required data: location and type (category) of every restaurant in Toronto. We have also identified Chinese restaurants (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of 'restaurant density' across different areas of Toronto - we will use heatmaps to identify a few promising areas close to center with low number of restaurants in general (and no Chinese restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will take into consideration locations with no more than two restaurants in radius of 500 meters, and we want locations without Chinese restaurants in radius of 800 meters. We will present map of all such locations but also create clusters (using k-means clustering) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

### Analysis
Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the number of restaurants in every area candidate:

In [19]:
# to get number of restaurants & Chineses restaurants in each area
rest_count = restaurants['Area'].value_counts().to_frame().reset_index().rename(columns={"index": "Area", "Area": "Count"})
cn_rest_count = chinese_restaurants['Area'].value_counts().to_frame().reset_index().rename(columns={"index": "Area", "Area": "Count"})
ll=ll.rename(columns={"Postal Code": "Area"})
rest = pd.merge(rest_count, ll, on='Area')
cn_rest = pd.merge(cn_rest_count, ll, on='Area')

# to get population in each area
population = new.drop(['Borough', 'Neighbourhood'], axis=1)
population.head()

Unnamed: 0,Postcode,Latitude,Longitude,"Population, 2016"
0,M1B,43.806686,-79.194353,66108.0
1,M1C,43.784535,-79.160497,35626.0
2,M1E,43.763573,-79.188711,46943.0
3,M1G,43.770992,-79.216917,29690.0
4,M1H,43.773136,-79.239476,24383.0


Now we found the number of all restaurants and Chinese restaurants in each area.

In [20]:
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip install folium
import folium
from folium import plugins
from folium.plugins import HeatMap

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 16.6MB/s eta 0:00:01
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1


Now, Let's visualize our data: downtown location, restaurants in blue dots, and Chinese restaurants in red dots.

In [21]:
# downtown geo info
downtown = [43.6548, -79.3883]

# list of Chinese restaurants
cn_venues = chinese_restaurants['Venue Name'].tolist()

# draw map
map1 = folium.Map(location=downtown, zoom_start=13)
folium.Marker(downtown, popup='downtown').add_to(map1)
for lat, lng, name, area in zip(restaurants['Venue Latitude'], restaurants['Venue Longitude'], restaurants['Venue Name'], restaurants['Area']):
    label = '{}, {}'.format(name, area)
    label = folium.Popup(label, parse_html=True)
    color = 'red' if name in cn_venues else 'blue'
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map1)  
    
map1

Let's crete a map showing heatmap / density of restaurants and try to extract some meaningfull info from that.

In [22]:
# list of geo info for all restaurants
restaurant_latlons=[]
for ind in rest.index: 
     restaurant_latlons += [[rest['Latitude'][ind], rest['Longitude'][ind]]]

# list of geo info for Chinese restaurants
chinese_latlons=[]
for ind2 in cn_rest.index: 
     chinese_latlons += [[cn_rest['Latitude'][ind2], cn_rest['Longitude'][ind2]]]

# list of geo info for all areas
population_latlons=[]
for ind3 in population.index:
    population_latlons += [[population['Latitude'][ind3], population['Longitude'][ind3]]]

Let's see the density of all restaurants:

In [23]:
# heat map for all restaurants in Toronto
map_rest = folium.Map(location=downtown, zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_rest) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_rest)
folium.Marker(downtown).add_to(map_rest)
folium.Circle(downtown, radius=2000, fill=False, color='white').add_to(map_rest)
folium.Circle(downtown, radius=4000, fill=False, color='white').add_to(map_rest)
folium.Circle(downtown, radius=6000, fill=False, color='white').add_to(map_rest)
folium.Circle(downtown, radius=8000, fill=False, color='white').add_to(map_rest)
map_rest

Then let's see density of Chinese restaurants only:

In [24]:
# heat map for all Chinese restaurants
map_cn = folium.Map(location=downtown, zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_cn) #cartodbpositron cartodbdark_matter
HeatMap(chinese_latlons).add_to(map_cn)
folium.Marker(downtown).add_to(map_cn)
folium.Circle(downtown, radius=2000, fill=False, color='white').add_to(map_cn)
folium.Circle(downtown, radius=4000, fill=False, color='white').add_to(map_cn)
folium.Circle(downtown, radius=6000, fill=False, color='white').add_to(map_cn)
folium.Circle(downtown, radius=8000, fill=False, color='white').add_to(map_cn)
map_cn

Last, let's see the population heatmap for Toronto:

In [25]:
# heat map for Toronto population by each postcode
map_pop = folium.Map(location=downtown, zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_pop) #cartodbpositron cartodbdark_matter
HeatMap(population_latlons).add_to(map_pop)
folium.Marker(downtown).add_to(map_pop)
folium.Circle(downtown, radius=2000, fill=False, color='white').add_to(map_pop)
folium.Circle(downtown, radius=4000, fill=False, color='white').add_to(map_pop)
folium.Circle(downtown, radius=6000, fill=False, color='white').add_to(map_pop)
folium.Circle(downtown, radius=8000, fill=False, color='white').add_to(map_pop)
map_pop

From the heatmaps above, we can tell that there are no Chinese restaurants at about 2~6km north from downtown, but it's quite populated in that area. It's a good direction to dig further.

#### Cartesian 2D coordinate system
To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

In [26]:
!pip install shapely
import shapely.geometry
!pip install pyproj
import pyproj
import math

# define a function that convert geographic coordinates to cartesian coordinates
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

# define a function that convert cartesian coordinates to geographic coordinates
def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

# define a function to calculate distance between 2 points
def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

# Downtown in Cartesian coordinates
dt_x, dt_y = lonlat_to_xy(downtown[1], downtown[0]) 
print("longitude: ", dt_x, "latitude: ", dt_y)

Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/20/fa/c96d3461fda99ed8e82ff0b219ac2c8384694b4e640a611a1a8390ecd415/Shapely-1.7.0-cp36-cp36m-manylinux1_x86_64.whl (1.8MB)
[K     |████████████████████████████████| 1.8MB 9.3MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.0
Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/ce/37/705ee471f71130d4ceee41bbcb06f3b52175cb89273cbb5755ed5e6374e0/pyproj-2.6.0-cp36-cp36m-manylinux2010_x86_64.whl (10.4MB)
[K     |████████████████████████████████| 10.4MB 9.0MB/s eta 0:00:01
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.6.0
longitude:  -5310261.673015753 latitude:  10508019.520958198


#### Zoom in closer
Let's re-define the map so it shows closer view of the area. The lighter blue shows that population of these areas are fairly equal (there are no very 'hot' colors). Blue dots are all restaurants except Chinese restaurants, and red dots are Chinese restaurants.

In [27]:
roi_x_min = dt_x
roi_y_max = dt_y + 2000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_toronto = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(downtown).add_to(map_toronto)
folium.Circle(roi_center, radius=1000, fill=False, color='white').add_to(map_toronto)
folium.Circle(roi_center, radius=2000, fill=False, color='white').add_to(map_toronto)
for lat, lng, name, area in zip(restaurants['Venue Latitude'], restaurants['Venue Longitude'], restaurants['Venue Name'], restaurants['Area']):
    label = '{}, {}'.format(name, area)
    label = folium.Popup(label, parse_html=True)
    color = 'red' if name in cn_venues else 'blue'
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
HeatMap(population_latlons).add_to(map_toronto)
map_toronto

#### University of Toronto and Museum area
We noticed that University of Toronto (U of T) and 2 Museums in our targeted area (Royal Ontario Museum and Gardiner Museum). <br>
<br>
From Wikipedia, we could easily find that the total enrollment for U of T is 61,690(2019). Which means there are a lot of students who will come to compus and would potentially need food services.<br>
<br>
And the 2 museums would possibly bring tourisums to the area. <br>
<br>
According to [timeout.com](https://www.timeout.com/toronto/things-to-do/best-museums-in-toronto), Royal Ontario Museum is one of the best Museum in Toronto. "This museum's expansive collection of cultural and historic artifacts makes it a must-see. Stop by the Royal Ontario Museum to learn about everything from art of the First Peoples to modern fashion to the age of dinosaurs. The museum has an ever-revolving schedule of exhibitions and events, including the summer-long Friday Night Live, which transforms the galleries into a party with live DJs, food and drinks."<br>
<br>
And just right across the street, there is the Gardiner Museum. "All things clay are on display at this museum dedicated to ceramics. Get your hands dirty at one of the Gardiner Museum's regular classes in hand building, wheel throwing and slip casting. If you'd prefer to leave the messy stuff to the experts, spend some time admiring the collection of some 4,000 pieces from the ancient Americas to today." - timeout.com

Let's define new, more narrow region of interest.

In [28]:
roi_x_min = dt_x - 800
roi_y_max = dt_y + 3000
roi_width = 2500
roi_height = 1500
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_toronto = folium.Map(location=roi_center, zoom_start=15)
folium.Marker(downtown).add_to(map_toronto)
folium.Circle(roi_center, radius=1000, fill=False, color='white').add_to(map_toronto)
folium.Circle(roi_center, radius=2000, fill=False, color='white').add_to(map_toronto)
for lat, lng, name, area in zip(restaurants['Venue Latitude'], restaurants['Venue Longitude'], restaurants['Venue Name'], restaurants['Area']):
    label = '{}, {}'.format(name, area)
    label = folium.Popup(label, parse_html=True)
    color = 'red' if name in cn_venues else 'blue'
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
HeatMap(population_latlons).add_to(map_toronto)
map_toronto

Let's also create new, more dense grid of location candidates restricted to our new region of interest -- the area in white circle above (let's make our location candidates 100m appart).

In [52]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 1500):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

817 candidate neighborhood centers generated.


In [56]:
map_t = folium.Map(location=roi_center, zoom_start=15)
folium.Marker(downtown).add_to(map_t)
for lat, lon in zip(roi_latitudes, roi_longitudes):
    folium.Circle([lat, lon], radius=30, color='blue', fill=False).add_to(map_t)
    folium.Circle(roi_center, radius=1000, fill=False, color='white').add_to(map_t)
map_t

OK. Now let's calculate two most important things for each location candidate: **number of restaurants in vicinity (we'll use radius of 500 meters)** and **distance to closest Chinese restaurant**.

In [59]:
# function to count number of restaurants nearby
def count_restaurants_nearby(x, y, rests, radius=500):    
    count = 0
    for ind in rests.index:
        lon = rests.loc[ind, 'Venue Longitude']
        lat = rests.loc[ind, 'Venue Latitude']
        res_x, res_y = lonlat_to_xy(lon, lat)
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

# function to find the closest restaurant
def find_nearest_restaurant(x, y, restss):
    d_min = 100000
    for ind in restss.index:
        lon = restss.loc[ind, 'Venue Longitude']
        lat = restss.loc[ind, 'Venue Latitude']
        res_x, res_y = lonlat_to_xy(lon, lat)
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

In [61]:
roi_restaurant_counts = []
print('Counting number of restaurants on location candidates... ', end='')
for x1, y1 in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x1, y1, restaurants, radius=500)
    roi_restaurant_counts.append(count)
print('done.')

Counting number of restaurants on location candidates... done.


In [60]:
roi_chinese_distances = []
print('Generating distance to Chinese restaurants on location candidates... ', end='')
for x1, y1 in zip(roi_xs, roi_ys):
    distance = find_nearest_restaurant(x1, y1, chinese_restaurants)
    roi_chinese_distances.append(distance)
print('done.')

Generating distance to Chinese restaurants on location candidates... done.


In [62]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Chinese restaurant':roi_chinese_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Chinese restaurant
0,43.664464,-79.381455,-5308812.0,10507060.0,15,792.239592
1,43.665085,-79.381549,-5308712.0,10507060.0,15,747.234371
2,43.665706,-79.381644,-5308612.0,10507060.0,13,713.494807
3,43.666327,-79.381738,-5308512.0,10507060.0,14,692.669093
4,43.666948,-79.381833,-5308412.0,10507060.0,17,685.934477
5,43.66757,-79.381928,-5308312.0,10507060.0,16,693.701478
6,43.662231,-79.381864,-5309162.0,10507150.0,13,954.285983
7,43.662852,-79.381959,-5309062.0,10507150.0,12,881.503696
8,43.663473,-79.382054,-5308962.0,10507150.0,13,811.026757
9,43.664094,-79.382148,-5308862.0,10507150.0,13,747.315217


OK. Let us now filter those locations: we're interested only in locations with no more than two restaurants in radius of 500 meters, and no Chinese restaurants in radius of 800 meters.

In [65]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_cn_distance = np.array(df_roi_locations['Distance to Chinese restaurant']>=800)
print('Locations with no Chinese restaurants within 800m:', good_cn_distance.sum())

good_locations = np.logical_and(good_res_count, good_cn_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than two restaurants nearby: 398
Locations with no Chinese restaurants within 800m: 294
Locations with both conditions met: 256


Let's see how this looks on a map.

In [72]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_t2 = folium.Map(location=roi_center, zoom_start=15)
folium.TileLayer('cartodbpositron').add_to(map_t2)
HeatMap(population_latlons).add_to(map_t2)
folium.Circle(roi_center, radius=1000, color='white', fill=True, fill_opacity=0.3).add_to(map_t2)
folium.Marker(downtown).add_to(map_t2)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_t2) 
map_t2

Let's now show those good locations in a form of heatmap:

In [73]:
map_l = folium.Map(location=roi_center, zoom_start=15)
HeatMap(good_locations, radius=25).add_to(map_l)
folium.Marker(downtown).add_to(map_l)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_l)
map_l

What we have now is a clear indication of zones with low number of restaurants in vicinity, and no Italian restaurants at all nearby.

Let us now cluster those locations to create centers of zones containing good locations. Those zones, their centers and addresses will be the final result of our analysis.

In [76]:
from sklearn.cluster import KMeans

number_of_clusters = 15

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_t = folium.Map(location=roi_center, zoom_start=15)
folium.TileLayer('cartodbpositron').add_to(map_t)
folium.Circle(roi_center, radius=1000, color='white', fill=True, fill_opacity=0.3).add_to(map_t)
folium.Marker(downtown).add_to(map_t)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.15).add_to(map_t) 
for lat, lon in zip(good_latitudes, good_longitudes):
    label = '{}, {}'.format(lat, lon)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=2, color='blue', popup=label, fill=True, fill_color='blue', fill_opacity=1).add_to(map_t)
map_t

Since we want the locations to be as close to downtown as possible, and also close to U of T and the 2 Museums, the locations at the south side would better satisfy our requirements. <br> <br>
Let's zoom in to the area:

In [80]:
# find geo information for one the location at south side
ln = [43.662150375767396, -79.39460005529365]
map_t3 = folium.Map(location=ln, zoom_start=16)
folium.Marker(downtown).add_to(map_t3)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_t3) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.03).add_to(map_t3)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_t3)
map_t3

Finaly, let's reverse geocode those candidate area centers to get the addresses which can be presented to stakeholders.

In [100]:
# import the ArcGIS API for Python.
from arcgis.gis import GIS
from arcgis.geocoding import geocode, reverse_geocode
from arcgis.geometry import Point
# Log into ArcGIS Online as an anonymous user.
gis = GIS()

In [101]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    location = {
     'Y': lat,  # `Y` is latitude
     'X': lon,  # `X` is longitude
    }
    unknown_pt = Point(location)
    address = reverse_geocode(location=unknown_pt)
    addr = address['address']['Match_addr']
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, dt_x, dt_y)
    print('{}{} => {:.1f}km from Downtown, Toronto'.format(addr, ' '*(50-len(addr)), d/1000))

Addresses of centers of areas recommended for further analysis

501-565 Huron St, Toronto, Ontario, M5R 2R6        => 2.9km from Downtown, Toronto
13 Hoskin Ave, Toronto, Ontario, M5S               => 1.8km from Downtown, Toronto
76 Hazelton Ave, Toronto, Ontario, M5R 2E2         => 3.0km from Downtown, Toronto
160 College St, Toronto, Ontario, M5S 3E1          => 1.0km from Downtown, Toronto
139 St George St, Toronto, Ontario, M5R 2L8        => 2.5km from Downtown, Toronto
Amaya Express                                      => 2.9km from Downtown, Toronto
34 Boswell Ave, Toronto, Ontario, M5R 1M4          => 3.0km from Downtown, Toronto
23 Bedford Rd, Toronto, Ontario, M5R 2J9           => 2.6km from Downtown, Toronto
58-98 King's College Cir, Toronto, Ontario, M5S    => 1.3km from Downtown, Toronto
273 Bloor St W, Toronto, Ontario, M5S 1W2          => 2.2km from Downtown, Toronto
135 Yorkville Ave, Toronto, Ontario, M5R 0C7       => 2.6km from Downtown, Toronto
Queen's Park Cres W, To

This concludes our analysis. We have created 15 addresses representing centers of zones containing locations with low number of restaurants and no Chinese restaurants nearby, all zones being fairly close to city center (all less than 3.5km from Downtown, Toronto). Although zones are shown on map with a radius of ~500 meters (green circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. All of them are close to U of T and the 2 Museums, which we have identified as interesting due to being popular with students and tourists.

In [105]:
map_t4 = folium.Map(location=roi_center, zoom_start=15)
folium.Circle(downtown, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_t4)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_t4) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.02).add_to(map_t4)
map_t4

## Results & Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in Toronto (more than 2000), there are pockets of low restaurant density fairly close to Downtown. Highest concentration of restaurants was detected at the Downtown area and about 6 to 8km north from Downtown, the population spread looks pretty much the same as the restaurants density, but it shows that the population in the area 2 to 6km north from Downtown is kind high. We notice that area (2~6km north from Downtown) is populated but not crowded with restaurants, so we focused our attention to that area. There are parks, university, Museums in that area, our attention was focused on area close to U of T, Royal Ontario Museum, and Gardiner Museum which offer a combination of popularity among students and tourists, strong socio-economic dynamics and a number of pockets of low restaurant density.

After directing our attention to this more narrow area of interest, we first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than two restaurants in radius of 500m and those with a Chinese restaurant closer than 800m were removed.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is 15 zones containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Chinese restaurants particularly. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on areas close to Downtown, Toronto but not crowded with existing restaurants (particularly Chinese) - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Toronto areas close to Downtown with low number of restaurants (particularly Chinese restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Chinese restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis (an area north from Donwtown), and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.