# Analysis of South Asian Restaurant Locations <br /> in the City of Brampton

Taranveer Birk

## Introduction

The Greater Toronto Area serves as a home for many immigrants from all parts of the world. The society is a diverse mix as a collective, although many celebrated enclaves of ethnic decent exist throughout the surrounding cities of Toronto. A very pronounced enclave of ethnic people is the South Asian (Afghanistan, Bangladesh, Bhutan, southasian, Maldives, Nepal, Pakistan, Sri Lanka) Community within the city of Brampton. The 2015 Canadian census reported that roughly __ % of residences were of South Asian decent, one of the few Canadian cities to have an ethnic community with the largest general population among all other communities. With such a pronounced presence within a city, this report will target stakeholders that are interested in opening an southasian Restaurant in the city of Brampton. 

  In order to provide the best context for analysis, this report will source spatial and venue data to better answer some initial questions. First, is the selection of a central point within the city. We’ll use the central point to define geometric clusters based on distance away from the center point. Stakeholders will want to compare the current landscape of venues across the city, then look at the penetration of southasian restaurants. Once we’ve collected the required data, we can map locations and use machine learning techniques to calculate optimal locations based on ideal requirements. Once the analysis is completed, we’ll discuss the outcome and see if we can narrow our location list to a few centroids for stakeholders to complete a street level analysis.


## Data

As stated in the introduction, we require a few external data sources to help with our analysis, they include:

*  StatsCan - 2015 Census data, to provide background on the ethnic population distribution of the federal census tracts in Brampton.

* The geocoordinates for the approximate addresses of a central location within the city of Brampton will be obtained using Google Maps API reverse geocoding. 

* Venue data for locational analysis will be obtained using the Foursquare API. The venue data will assist in mapping locations and calculating distance from the city centre.

* Addresses of centroid locations created during analysis will also be obtained using the Google Maps API for reverse geocoding. Addresses will be used to present optimal locations in the results section of the report. 

In addition to collecting the data in our notebook, it will be merged and processed to complete a clustering analysis based on the requirements of an ideal space within the city.  


## Assigning Neighborhood Candidates
   First, we need to find a central location within the city, one which can cover a large portion of populated land in the city when scaled out by its radius. The location selected is ‘140 Kennedy Rd N, Brampton, Ontario’, this is located close to the downtown so we'll refer to it as the 'City Center'. Its 12km radius covers most of Brampton excluding the west end. In order to begin assigning our neighborhood centroids, we need to connect the Google Maps API to acquire the geo-coordinates using our address. 


In [47]:
google_api_key='AIzaSyBcyQ15aciwG4BiICqD5IPrGZGVnawHo_0'

In [48]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
#address = 'Brampton, ON, Canada'
address = '140 Kennedy Rd N, Brampton, ON, Canada'

brampton_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, brampton_center))

Coordinate of 140 Kennedy Rd N, Brampton, ON, Canada: [43.703226, -79.758977]


In [49]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

#lat = '43.70852'
#lon = '-79.767456'
#brampton_center = (lat,lon)

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Brampton center longitude={}, latitude={}'.format(brampton_center[1], brampton_center[0]))
x, y = lonlat_to_xy(brampton_center[1], brampton_center[0])
print('Brampton center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Brampton center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Brampton center longitude=-79.758977, latitude=43.703226
Brampton center UTM X=-5297633.433080528, Y=10549881.596716313
Brampton center longitude=-79.7589770000005, latitude=43.703225999999795


## Creating and Mapping Neighborhood Centriods
We’ll then use the longitude and latitude values of our city centre to create identical circular grid cells which cover our area of interest. We will look to maximize the surface area within city limits, while keeping our total centers to less than 400 candidates. The code allows for adjusting the radius of both the total surface area and each centroid. This is important to adjust according to city limits, city planning and population. 

In [50]:
brampton_center_x, brampton_center_y = lonlat_to_xy(brampton_center[1], brampton_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = brampton_center_x - 9800
x_step = 980
y_min = brampton_center_y - 9800 - (int(21/k)*k*980 - 19600)/2
y_step = 980 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 490 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(brampton_center_x, brampton_center_y, x, y)
        if (distance_from_center <= 9801):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')
#print(latitudes, longitudes)

364 candidate neighborhood centers generated.


In [51]:
import pandas as pd

df_counters = pd.DataFrame(
    {
     'latitudes' : latitudes,
     'longitudes' : longitudes,
    })
df_counters.head()

Unnamed: 0,latitudes,longitudes
0,43.691866,-79.676063
1,43.697953,-79.677052
2,43.70404,-79.678042
3,43.710129,-79.679033
4,43.716217,-79.680023


In [52]:
#Lets validate that our dataframe has the right amount of geo-coordinates
locations = df_counters[['latitudes', 'longitudes']]
locationlist = locations.values.tolist()
len(locationlist)

364

In [53]:
#!pip install folium
import folium

In [54]:
#Use Brampton Centre and list of geo-coordinates to map Neighborhood centriod.

map_brampton = folium.Map(location= brampton_center, zoom_start=13)
folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)
for lat, lon in zip(df_counters['latitudes'], df_counters['longitudes']):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=340, color='blue', fill=False).add_to(map_brampton)
    #folium.Marker([lat, lon]).add_to(map_brampton)
map_brampton

In [55]:
google_api_key= 'AIzaSyBcyQ15aciwG4BiICqD5IPrGZGVnawHo_0'

def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, brampton_center[0], brampton_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(brampton_center[0], brampton_center[1], addr))

Reverse geocoding check
-----------------------
Address of [43.703226, -79.758977] is: 140 Kennedy Rd N, Brampton, ON L6V 2N4, Canada


In [56]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Brampton', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.




The **Google API** should provide the addresses of our list of locations. We can use this list to calculcate distances to the centre and convert our data to a Dataframe. Once we've validated the data, save to a pickle file.




In [57]:
#print a list of the first 15 addresses, review all addresses to ensure there are no formatting errors.
addresses[0:5]

['7400 Bramalea Rd, Mississauga, ON L5S 1X1, Canada',
 '7505 Bramalea Rd, Mississauga, ON L5S 1C4, Canada',
 'Express Toll Route, ON L6T, Canada',
 '20 Melanie Dr, ON L6T 4K8, Canada',
 '2550 Steeles Ave E, ON L6T, Canada']

In [58]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitudes': latitudes,
                             'Longitudes': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(5)

Unnamed: 0,Address,Latitudes,Longitudes,X,Y,Distance from center
0,"7400 Bramalea Rd, Mississauga, ON L5S 1X1, Canada",43.691866,-79.676063,-5300573.0,10540550.0,9787.742334
1,"7505 Bramalea Rd, Mississauga, ON L5S 1C4, Canada",43.697953,-79.677052,-5299593.0,10540550.0,9539.281944
2,"Express Toll Route, ON L6T, Canada",43.70404,-79.678042,-5298613.0,10540550.0,9387.04959
3,"20 Melanie Dr, ON L6T 4K8, Canada",43.710129,-79.679033,-5297633.0,10540550.0,9335.753853
4,"2550 Steeles Ave E, ON L6T, Canada",43.716217,-79.680023,-5296653.0,10540550.0,9387.04959


In [59]:
df_locations.to_pickle('./locations.pkl')    

## FourSquares API 
Using existing Foursquare credentials, we want to leverage the venue database to acquire a list and coordinates of all restaurants and southasian restaurants within the city of Brampton. Once we’ve acquired this data, we can look at a few simple metrics that describe the restaurant landscape within the city. 
1.	Total number of Restaurants, 
2.	Total number of southasian Restaurants
3.	Percentage of southasian Restaurants
4.	Average number of Restaurants in each neighborhood centroid

In order to only select restaurant venues, the Foursquares API allows for specific category ID’s to be filters when completing a data pull. This is very useful in limiting our data set, and defining categorical variables that can be visualized with their locations. 


In [60]:
CLIENT_ID = 'OQS5TSTRZTOKUVUV35EKZFBDRLSCQ5VLADICPMYG0F5AUFGF' # your Foursquare ID
CLIENT_SECRET = 'QOY10GDHK0WO1SLTYLXBIGKNLS1WVKZMTATD2MWCIHI14KP3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OQS5TSTRZTOKUVUV35EKZFBDRLSCQ5VLADICPMYG0F5AUFGF
CLIENT_SECRET:QOY10GDHK0WO1SLTYLXBIGKNLS1WVKZMTATD2MWCIHI14KP3


In [61]:
# Category IDs corresponding to South asian restaurants, sourced from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

southasian_restaurant_categories = ['4bf58dd8d48988d10f941735','54135bf5e4b08f3d2429dfe5','54135bf5e4b08f3d2429dff3',
'54135bf5e4b08f3d2429dff5','54135bf5e4b08f3d2429dfe2','54135bf5e4b08f3d2429dff2',
'54135bf5e4b08f3d2429dfe1','54135bf5e4b08f3d2429dfe3','54135bf5e4b08f3d2429dfe8',
'54135bf5e4b08f3d2429dfe9','54135bf5e4b08f3d2429dfe6','54135bf5e4b08f3d2429dfdf',
'54135bf5e4b08f3d2429dfe4','54135bf5e4b08f3d2429dfe7','54135bf5e4b08f3d2429dfea',
'54135bf5e4b08f3d2429dfeb','54135bf5e4b08f3d2429dfed','54135bf5e4b08f3d2429dfee',
'54135bf5e4b08f3d2429dff4','54135bf5e4b08f3d2429dfe0','54135bf5e4b08f3d2429dfdd',
'54135bf5e4b08f3d2429dff6','54135bf5e4b08f3d2429dfef','54135bf5e4b08f3d2429dff0',
'54135bf5e4b08f3d2429dff1','54135bf5e4b08f3d2429dfde','54135bf5e4b08f3d2429dfec',]

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Canada', '')
    address = address.replace('Canada, ', '')
    return address

#Define characteristics to filter against FourSqaure database
def get_venues_near_location(lat, lon, category, CLIENT_ID, CLIENT_SECRET, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [62]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found southasian restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    southasian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to make sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_southasian = is_restaurant(venue_categories, specific_filter=southasian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_southasian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_southasian:
                    southasian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, southasian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
southasian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('southasian_restaurants_350.pkl', 'rb') as f:
        southasian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, southasian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('southasian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(southasian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Restaurant data loaded.


Lets now review some of our data exploration statistics below:

In [63]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of southasian restaurants:', len(southasian_restaurants))
print('Percentage of southasian restaurants: {:.2f}%'.format(len(southasian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 239
Total number of southasian restaurants: 58
Percentage of southasian restaurants: 24.27%
Average number of restaurants in neighborhood: 0.489010989010989


In [64]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4b870970f964a520d7ac31e3', 'Tandoori Style', 43.712218415595096, -79.68140210512054, '30 Melanie Dr (at Steeles Ave E), Brampton ON L6T 4K9', 300, True, -5297269.47019653, 10540779.534936462)
('51535836e4b0205aeb7dbe3f', "Lena's Roti & Doubles", 43.71453763281609, -79.67727112716643, '2565 Steeles Ave E (at Torbram Rd,), Brampton ON L6T 4L6', 289, False, -5296957.448254551, 10540260.448014611)
('4bd33c3aa8b3a5932c70695f', 'Kwality Sweets & Restaurant', 43.713386911508934, -79.68148658325038, '2150 Steeles Ave E, Brampton ON L6T 1A7', 336, True, -5297082.836654316, 10540767.364294892)
('4ccda0d7c0378cfac0a28e48', 'Sanjhi Rasoi', 43.715473873501125, -79.677878144828, 'Canada', 191, True, -5296800.590336298, 10540312.807744492)
('4f03769893adc8245c1d9f70', "East side mario's", 43.70895709987345, -79.68409293657145, 'Front St (Simcoe), Toronto ON', 310, False, -5297750.591986535, 10541150.52341096)
('4d83e93a5e70224bccfb0109', "Brar's - Gra

In [65]:
print('List of southasian restaurants')
print('---------------------------')
for r in list(southasian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(southasian_restaurants))

List of southasian restaurants
---------------------------
('4b870970f964a520d7ac31e3', 'Tandoori Style', 43.712218415595096, -79.68140210512054, '30 Melanie Dr (at Steeles Ave E), Brampton ON L6T 4K9', 300, True, -5297269.47019653, 10540779.534936462)
('4bd33c3aa8b3a5932c70695f', 'Kwality Sweets & Restaurant', 43.713386911508934, -79.68148658325038, '2150 Steeles Ave E, Brampton ON L6T 1A7', 336, True, -5297082.836654316, 10540767.364294892)
('4ccda0d7c0378cfac0a28e48', 'Sanjhi Rasoi', 43.715473873501125, -79.677878144828, 'Canada', 191, True, -5296800.590336298, 10540312.807744492)
('4d83e93a5e70224bccfb0109', "Brar's - Grand", 43.69039936566263, -79.69450576887627, '199 Advance Blvd (at Dixie Rd), Brampton ON L6T 4N2', 316, True, -5300555.112618707, 10542698.483371334)
('54e10acf498ea6f176e85395', "Sanjeev Kapoor's Khazana", 43.7465257484077, -79.69864135178176, '9121 Airport Road (Queen Rd), Brampton ON', 142, True, -5291590.798249365, 10542119.596479703)
('5827fa0fbcf73e19eaa687c2

In [66]:
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 101: 
Restaurants around location 102: King Tandoori
Restaurants around location 103: coffee bubble tea, baba dhaba
Restaurants around location 104: Redstars
Restaurants around location 105: 
Restaurants around location 106: Saigon House, Teriyaki Experience
Restaurants around location 107: 
Restaurants around location 108: 
Restaurants around location 109: 
Restaurants around location 110: 


In [67]:
#Now that we have all the required venue data, lets visualize our resturants below (blue = Resturaunt) (Brown = South asian Resturaunt):

map_brampton = folium.Map(location=brampton_center, zoom_start=12)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_southasian = res[6]
    color = 'orange' if is_southasian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_brampton)
map_brampton

In our first map above, we've mapped our locations of restaurants in blue and southasian restaurants on brown. Lets observe the general clustering of locations, see if we can find any pockets of interest near our city center.

## Methodology


The objective will be to use the retrieved data to create and map clusters that have a low density of southasian restaurants within a particular radius. Based on our city center location, we will work with roughly a 10km radius around desired point. This should be sufficient surface coverage for a densly suburban population. With the combination of centroid location data and the FourSquare venue data, we can use Follium Maps to explore densities of restaurant throughout the city. 

Heat mapping and markers can be used to narrow down specific neighborhoods of interest. Ideally we want to reduce our radius from 10km to in-between 2-3km, and would preferably be close to our city center. We can refer to this area as the ROI Center, and apply the necessary parameters to view on the map. After we’ve defined our area of analysis, we can add new centroid clusters at a lower granularity. These centroids can be restricted by distance from other restaurants or southasian restaurants, this will allow use to prepare for k-means clustering in our final stage of analysis. 

Based on the thresholds the stakeholders determine for the basic location requirements, we want to avoid being 600 meters in distance to any other southasian restaurant and no more than one restaurant in a 250-meter radius. These requirements are fitting, as commercial real-estate is sparsely spread out around major intersections. When viewing on a map which has both ROI centroids and heatmap density of existing restaurants, we can begin visualizing small clusters of centroids. At this point we can utilize the k-means clustering algorithm to create zones of interest based on general neighborhood/addresses. They will be ranked according to overall distance to our city center. 


## Analysis

Lets now calcuate distances of each centriod from: 
1. Restaurants in the area  
2.southasian restaurants in the area

We'll append this information to our dataframe, in preperation for further analysis.

In [68]:
#get distances from other restaurants in the area 

location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(5)

Average number of restaurants in every area with radius=300m: 0.489010989010989


Unnamed: 0,Address,Latitudes,Longitudes,X,Y,Distance from center,Restaurants in area
0,"7400 Bramalea Rd, Mississauga, ON L5S 1X1, Canada",43.691866,-79.676063,-5300573.0,10540550.0,9787.742334,0
1,"7505 Bramalea Rd, Mississauga, ON L5S 1C4, Canada",43.697953,-79.677052,-5299593.0,10540550.0,9539.281944,0
2,"Express Toll Route, ON L6T, Canada",43.70404,-79.678042,-5298613.0,10540550.0,9387.04959,0
3,"20 Melanie Dr, ON L6T 4K8, Canada",43.710129,-79.679033,-5297633.0,10540550.0,9335.753853,1
4,"2550 Steeles Ave E, ON L6T, Canada",43.716217,-79.680023,-5296653.0,10540550.0,9387.04959,2


In [69]:
#get distances from South asian restaurants in the area, append to data frame 

distances_to_southasian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in southasian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_southasian_restaurant.append(min_distance)

df_locations['Distance to southasian restaurant'] = distances_to_southasian_restaurant

In [70]:
print('Average distance to closest South asian restaurant from each area center:', df_locations['Distance to southasian restaurant'].mean())

Average distance to closest South asian restaurant from each area center: 1501.0987494985943


In [71]:
df_locations.head(5)

Unnamed: 0,Address,Latitudes,Longitudes,X,Y,Distance from center,Restaurants in area,Distance to southasian restaurant
0,"7400 Bramalea Rd, Mississauga, ON L5S 1X1, Canada",43.691866,-79.676063,-5300573.0,10540550.0,9787.742334,0,2152.718466
1,"7505 Bramalea Rd, Mississauga, ON L5S 1C4, Canada",43.697953,-79.677052,-5299593.0,10540550.0,9539.281944,0,2294.871229
2,"Express Toll Route, ON L6T, Canada",43.70404,-79.678042,-5298613.0,10540550.0,9387.04959,0,1364.129106
3,"20 Melanie Dr, ON L6T 4K8, Canada",43.710129,-79.679033,-5297633.0,10540550.0,9335.753853,1,432.528572
4,"2550 Steeles Ave E, ON L6T, Canada",43.716217,-79.680023,-5296653.0,10540550.0,9387.04959,2,275.609551



Once the locations dataframe is ready, we should have sufficient data to:

1. Create a heat map of restaurant locations throughout the city
2. A heat map which focuses on a promising radius close to our city center
3. Generate candidate locations within our ROI area with applied restrictions
4. Use K-Means clustering to cluster our ROI candidate locations, narrow down top candidates  
5. Present top locations as markers on a map

Much of the interpretation of good/bad areas to explore will come from visually analyzing our maps. We'll adjust inputs to assume optimal representation of the city of Brampton for each of our assessments below.    


In [72]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

southasian_latlons = [[res[2], res[3]] for res in southasian_restaurants.values()]

In [73]:
from folium import plugins
from folium.plugins import HeatMap

map_brampton = folium.Map(location= brampton_center   , zoom_start=12)

folium.TileLayer('cartodbpositron').add_to(map_brampton) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_brampton)
#HeatMap(southasian_latlons).add_to(map_brampton)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Circle(brampton_center, radius=2000, fill=False, color='white').add_to(map_brampton)
folium.Circle(brampton_center, radius=4000, fill=False, color='white').add_to(map_brampton)
folium.Circle(brampton_center, radius=6000, fill=False, color='white').add_to(map_brampton)

map_brampton

Our first HEATMAP above serves as a secondary view of restaurant clusters. Toggle with the zoom to support analysis, at this point we should narrow down an attractive area with a low desnity of heat markers to analyze deeper.


In [74]:
roi_x_min = brampton_center_x - 5000
roi_y_max = brampton_center_y + 3500
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_brampton = folium.Map(location=roi_center, zoom_start=13)
HeatMap(restaurant_latlons).add_to(map_brampton)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(roi_center,popup ='ROI Center').add_to(map_brampton)
folium.Circle(roi_center,radius=40, color='purple', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Circle(roi_center, radius=2200, color='white', fill=True, fill_opacity=0.4).add_to(map_brampton)
map_brampton

Our second HEATMAP above has been narrowed down 2.2km radius area of interest. We've seen an attractive area of interest which is directly south (true north compass) of our ‘City Center’ marker, and we'll focus our clustering analysis here. We can refer to this area as the ‘ROI Center’.

Now lets create centriods within our ROI circle to prepare for clustering.

In [75]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 280
y_step = 280 * k 
roi_y_min = roi_center_y - 7000

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 3501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

526 candidate neighborhood centers generated.


In [76]:
def count_restaurants_nearby(x, y, restaurants, radius=600):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_southasian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=600)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, southasian_restaurants)
    roi_southasian_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [77]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to South Asian restaurant':roi_southasian_distances})

df_roi_locations.head(5)
#print (df_roi_locations)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to South Asian restaurant
0,43.684365,-79.735351,-5300953.0,10547520.0,5,344.478096
1,43.686104,-79.735638,-5300673.0,10547520.0,6,127.015049
2,43.687842,-79.735924,-5300393.0,10547520.0,8,226.434876
3,43.689581,-79.73621,-5300113.0,10547520.0,8,468.272709
4,43.69132,-79.736497,-5299833.0,10547520.0,4,737.621791


Now we can create centroids within our ‘ROI radius’ to prepare for clustering. In total we have 526 candidate neighborhood centers generated in our ‘ROI radius’. We'll then load the data into a new (ROI Locations) dataframe which calculates additional metrics for spatial analysis, including count of nearby restaurants and distance from South Asian Restaurant. This metrics will help in our clustering analysis.

We can utilize our new ROI locations dataframe to produce marker mapping conditions. These conditions will allow us to restrict marker placement on areas which have less than 1 restaurant nearby and 600 meters away from any South Asian restaurants. This will suffice for our competition requirements set by our stakeholders.

In [78]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=1))
print('Locations with no more than 1 restaurants nearby:', good_res_count.sum())

good_ind_distance = np.array(df_roi_locations['Distance to South Asian restaurant']>=600)
print('Locations with no South Asian restaurants within 600m:', good_ind_distance.sum())

good_locations = np.logical_and(good_res_count, good_ind_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than 1 restaurants nearby: 414
Locations with no South Asian restaurants within 600m: 454
Locations with both conditions met: 410


In [79]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_brampton = folium.Map(location=roi_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_brampton)
HeatMap(restaurant_latlons).add_to(map_brampton)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_brampton)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(roi_center,popup ='ROI Center').add_to(map_brampton)
folium.Circle(roi_center,radius=40, color='purple', fill=True, fill_opacity=0.4).add_to(map_brampton)

for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_brampton) 
#folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_brampton

The heat map above uses our defined restrictions to map 'good location' markers in our ROI radius. We can see some very promising pockets throughout our map.

In [80]:
map_brampton = folium.Map(location=roi_center, zoom_start=13)
HeatMap(good_locations, radius=22).add_to(map_brampton)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(roi_center,popup ='ROI Center').add_to(map_brampton)
folium.Circle(roi_center,radius=40, color='purple', fill=True, fill_opacity=0.4).add_to(map_brampton)

for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_brampton)
#folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_brampton)
map_brampton

Lets reverse the heat map to focus on the density of 'good location' markers in and around our  ROI area. We can begin exploring commercial area which may be promising for our stakeholders. We'll also complete a clustering exersise to find common neighborhoods in the 'good location' areas.

In [81]:
from sklearn.cluster import KMeans

number_of_clusters = 35

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_brampton = folium.Map(location=roi_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_brampton)
HeatMap(restaurant_latlons).add_to(map_brampton)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(roi_center,popup ='ROI Center').add_to(map_brampton)
folium.Circle(roi_center,radius=40, color='purple', fill=True, fill_opacity=0.4).add_to(map_brampton)

for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=400, color='green', fill=True, fill_opacity=0.25).add_to(map_brampton) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_brampton)
map_brampton

Using the k-means clustering algorithm, we're able to create 35 cluster centers that group our 'good locations' into larger cohorts. At this point we should have enough supporting material to review street level mapping and find good commercial areas within our ROI circle.

In [82]:
import pandas as pd

top_roi_locations_df = df = pd.DataFrame ({"Top_ROI_Address":['Kingspoint Plaza, Brampton, Ontario','51 McMurchy Ave. S, Brampton, Ontario',
                                '550 Queen St W, Brampton, Ontario','160 Main St S, Brampton, Ontario',
                                '9446 McLaughlin Rd N, Brampton, Ontario','110 Brickyard Way, Brampton, Ontario'],
                    "Top_ROI_Latitude":['43.695676','43.678406','43.671132','43.677839','43.684786','43.700982'],
                    "Top_ROI_Longitude":['-79.771232','-79.762528','-79.777226','-79.748198','-79.783054','-79.778761']})

top_roi_locations_df.head(10)

Unnamed: 0,Top_ROI_Address,Top_ROI_Latitude,Top_ROI_Longitude
0,"Kingspoint Plaza, Brampton, Ontario",43.695676,-79.771232
1,"51 McMurchy Ave. S, Brampton, Ontario",43.678406,-79.762528
2,"550 Queen St W, Brampton, Ontario",43.671132,-79.777226
3,"160 Main St S, Brampton, Ontario",43.677839,-79.748198
4,"9446 McLaughlin Rd N, Brampton, Ontario",43.684786,-79.783054
5,"110 Brickyard Way, Brampton, Ontario",43.700982,-79.778761


In [83]:
top_roi_locations_df['Top_ROI_Latitude'] = pd.to_numeric(top_roi_locations_df['Top_ROI_Latitude'], errors='coerce')
top_roi_locations_df['Top_ROI_Longitude'] = pd.to_numeric(top_roi_locations_df['Top_ROI_Longitude'], errors='coerce')

good_latitudes = top_roi_locations_df['Top_ROI_Latitude'].values
good_longitudes = top_roi_locations_df['Top_ROI_Longitude'].values

Once we have our Top ROI locations with geo-coordinates and addresses, we can write them to a dataframe. We can map these locations along with our cluster analysis. This should provide a sufficient view of our locations relative to our good clusters.

In [84]:

map_brampton = folium.Map(location=roi_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_brampton)
HeatMap(restaurant_latlons).add_to(map_brampton)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(roi_center,popup ='ROI Center').add_to(map_brampton)
folium.Circle(roi_center,radius=40, color='purple', fill=True, fill_opacity=0.4).add_to(map_brampton)

for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=400, color='green', fill=True, fill_opacity=0.25).add_to(map_brampton) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_brampton)

for i in range(0,len(top_roi_locations_df)):
    folium.Marker([top_roi_locations_df.iloc[i]['Top_ROI_Latitude'], top_roi_locations_df.iloc[i]['Top_ROI_Longitude']], popup=top_roi_locations_df.iloc[i]['Top_ROI_Address']).add_to(map_brampton)

map_brampton

Finally, Lets map our locations with markers for our final report.

In [85]:

map_brampton = folium.Map(location=roi_center, zoom_start=13)

folium.Marker(brampton_center, popup ='City Center').add_to(map_brampton)
folium.Circle(brampton_center,radius=40, color='green', fill=True, fill_opacity=0.4).add_to(map_brampton)

folium.Marker(roi_center,popup ='ROI Center').add_to(map_brampton)
folium.Circle(roi_center,radius=40, color='purple', fill=True, fill_opacity=0.4).add_to(map_brampton)

for i in range(0,len(top_roi_locations_df)):
    folium.Marker([top_roi_locations_df.iloc[i]['Top_ROI_Latitude'], top_roi_locations_df.iloc[i]['Top_ROI_Longitude']], popup=top_roi_locations_df.iloc[i]['Top_ROI_Address']).add_to(map_brampton)

map_brampton

## Results and Discussion

We began our analysis by exploring the distribution of restaurants within a 7km radius of our selected city center. Through the use of the FourSquare API we were able to determine that there is a significant presence of South Asian restaurants within our radius, approximately 24.27% as reported by our FourSquare data. When we consider the 44% of South Asian residences within the entire city, there is definitely opportunity to for more south Asian restaurants.

To provide a better understanding of our area of analysis, it’s important to understand that the city of Brampton is one of the more densely populated suburban cities within the country. Although there is a significant amount residential property already assigned throughout, many commercial lots exist around major intersections. Depending on the neighborhood; commercial lots can consist of strip plazas, indoor malls, commercial units or single structure buildings. Based on the average density of restaurants calculated, its likely to believe that much of the consumers will require transportation and possibly parking space to visit the location. 

After effectively mapping our locations and calculating the relevant distance measurements, our heat map analysis of the city allowed us to narrow down to a smaller radius close to the city center. We refer to this area as our ‘ROI Center’, with a 2.2km radius. This area is directly south of our existing city center, and appears to have a low density of south Asian restaurants in a competitive distance. The area also encompasses a large portion on Main street, which is considered the downtown core. Another area of consideration was North/East Brampton, known for its strong south Asian presence and dense residential zoning. Most commercial lots consist of small to medium size strip plaza’s. For the purpose of our analysis, the ‘ROI Center’ location was chosen as the ‘South Brampton’ area due its close proximity to the ‘City Center’. 


Once we were able to cluster zones closer to the street level using the k-means algorithm, we explored plausible locations that within our ‘ROI zone’ that would meet our stakeholder requirements. Recommended locations were analyzed and verified visually as commercially zoned lots. Although these locations meet our competitive distance requirements, there may be more research required to assess other optimization factors not explored in this analysis. In total there were 6 locations selected from our analysis:

Location 1: 110 Brickyard Way, Brampton, Ontario:  43.700982, -79.778761 <br /> 
•	Small commercial plaza located directly off Main street, close to two major intersections. 

Location 2: Kingspoint Plaza, Brampton, Ontario:  43.695593, -79.769250 <br /> 
•	Larger commercial strip plaza located directly off Main street, closest to our city center.

Location 3: 9446 McLaughlin Rd N, Brampton, Ontario:  43.684786, -79.783054 <br /> 
•	Small commercial plaza located directly off McLaughlin street, mixed planning in surrounding area

Location 4: 51 McMurchy Ave. S, Brampton, Ontario:  43.678406, -79.762528 <br /> 
•	Small commercial plaza located off of a inner city street, directly beside a train track. Mostly surrounded by residential land.

Location 5: 160 Main St S, Brampton, Ontario:  43.677839, -79.748198 <br /> 
•	Medium sized commercial plaza located off of Main Street South, between Queen Street and Steeles. 

Location 6: 550 Queen St W, Brampton, Ontario:   43.671132, -79.777226 <br /> 
•	Small commercial plaza located off of Queen Street, this would be the further proposed location from the ‘City Center’.


## Conclusion

Our stakeholders requested an analysis of the city of Brampton’s restaurant landscape, in order to find ideal commercial location for a south Asian restaurant based on specific requirements. By establishing the correct steps to collect, process and analyze FourSquare locational data in a python environment, we were able to conclusively explore the restaurant distribution across the city. Once our clustering of ‘ROI centroids’ was complete, it was much easier to explore at street level within a 2.2 km radius. The Folium map layer distinguishes between deferent land use types (residential, industrial, commercial ect.) and with the cluster mapping; top ROI locations we’re quickly narrowed down for this analysis. Ideally this final step would be more efficiently completed if code automated selecting commercial zones only. This can be something to explore in order to better automate the overall project.

Our strongest point of reference for the success of a South Asian restaurant in Brampton, is that the city has a very prominent South Asian community (1). An interesting comparison would be to complete a similar ethnic restaurant/population ratio analysis with other cities within the Greater Toronto Area. There are likely very strong insights to be gained in terms of understanding the potential success south Asian restaurants can have within its primary ethnic enclave, as compared to the success of other ethnic restaurants in their respective enclaves.  


## Data Sources

1. StatsCan https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/index-eng.cfm

2. Google Geocoding API

3. FourSquare Developer API 
