# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find the best locations to open new Italian restaurants. Specifically, this report will be directed to an Italian restaurant franchise interested in opening some **Italian restaurant** in **Toronto**, Canada.

Since there are many restaurants in Toronto, we will try to spot places that are not yet full of restaurants. 

We will use our data science powers to generate some more promising neighborhoods based on this criteria. The advantages of each area will be clearly stated so that stakeholders can choose the best possible final locations.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**

In [31]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [32]:
!pip install bs4
#!pip install requests



In [33]:
from bs4 import BeautifulSoup # this module helps in web scrapping.

In [34]:
#We Download the contents of the web page:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [35]:
#We use get to download the contents of the webpage in text format and store in a variable called data:
data  = requests.get(url).text 

In [36]:
#We create a BeautifulSoup object using the BeautifulSoup constructor
soup = BeautifulSoup(data,"html5lib")  # create a soup object using the variable 'data'

In [38]:
#Extraction table data
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

#to print the number of rows of dataframe
df.shape

(103, 3)

In [39]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [40]:
#We Download the contents of the web page:
data_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv"

#csv contents directly in url
df2 = pd.read_csv(data_url)
df2

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [41]:
#sort PostalCode
df1 = df.sort_values(['PostalCode'])

#create new df with reset index
df3 = df1.reset_index(drop=True)

#add new columns Latitude and Longitude
latitude = df2['Latitude']
longitude = df2['Longitude']
df3['Latitude'] = latitude
df3['Longitude'] = longitude
df3

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [42]:
#Define Foursquare Credentials and Version
CLIENT_ID = 'YD0SLKOTNA5XPUXW5CE5ULDFDLXYO0ZJN5KWGKUBMEDAPP31' # your Foursquare ID
CLIENT_SECRET = 'VXHUEG43OQGSSGNZM1KI0CK5CXVJGSTJ2BJ0GMQF15WMZTGQ' # your Foursquare Secret
ACCESS_TOKEN = 'NYMDWRCJ4HJ54LBXG0G5GRZZY3DC30PKIFUZMR53VCHVYFKK' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YD0SLKOTNA5XPUXW5CE5ULDFDLXYO0ZJN5KWGKUBMEDAPP31
CLIENT_SECRET:VXHUEG43OQGSSGNZM1KI0CK5CXVJGSTJ2BJ0GMQF15WMZTGQ


In [43]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)


In [44]:
# Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Toronto', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [45]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found italian restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, italian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
italian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_350.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, italian_restaurants, location_restaurants = get_restaurants(latitude, longitude)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('italian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(italian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Restaurant data loaded.


In [46]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Italian restaurants:', len(italian_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(italian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 421
Total number of Italian restaurants: 34
Percentage of Italian restaurants: 8.08%
Average number of restaurants in neighborhood: 3.9223300970873787


In [47]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('5931bca269e77b2bba697481', 'WEST HILL BURGERS * WINGS', 43.765864, -79.191093, '4379 Kingston Rd, Scarborough ON M1E 2M9, Canada', 318, False, -5295061.201122491, 10483362.868947849)
('57fd24f6cd1083addfd77bf9', 'Sail Sushi', 43.765951, -79.191275, '9-4352 Kingston Rd, Scarborough ON M1E 2M8, Canada', 335, False, -5295045.16271997, 10483382.37083262)
('5411f741498e9ebd5e35d8bd', 'Big Bite Burrito', 43.766299084470795, -79.19071980583941, '4383 Kingston rd., Scarborough ON, Canada', 343, False, -5294996.593606517, 10483312.590719065)
('4de0403ed4c040523ea079f4', 'Korean Grill House', 43.7708117291354, -79.21450208834013, '369 Yonge Street ON, Canada', 195, False, -5293989.680591431, 10485975.362196662)
('52b128ae498ee935a36869a6', 'El rey del cabrito, monterrey city mexico', 43.7688, -79.2198, 'Ave. Gonzalitos, Monterrey Nuevo Leon. Mexico, Canada', 336, False, -5294244.549798206, 10486619.32867811)
('4c27da423492a593158cb628', 'Thai One

In [48]:
print('List of Italian restaurants')
print('---------------------------')
for r in list(italian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(italian_restaurants))

List of Italian restaurants
---------------------------
('4bdaff7463c5c9b67bcb2568', 'Sorento Restaurant', 43.72657509457231, -79.34198930569546, '900 Don Mills Rd. ON, Canada', 114, True, -5299436.539560529, 10501419.375027534)
('4e3dbb5e45dd68e3273e03b7', 'Cafe Fiorentina', 43.677743, -79.350115, '463 Danforth Ave (Logan Ave) ON M4K 1P1, Canada', 261, True, -5307097.990499289, 10503207.509265002)
('4af4e0d0f964a5202ff721e3', '7 Numbers', 43.677061774959824, -79.35393428891682, '307 Danforth Ave. (at Bowden St) ON, Canada', 311, True, -5307157.941525041, 10503660.32483632)
('4ba0153bf964a520995837e3', 'Casa di Giorgio', 43.66664527559903, -79.31520351125722, '1646 Queen St. E (at Eastern Ave,) ON M4L 1G3, Canada', 263, True, -5309304.168944944, 10499369.071473999)
('4b1169f6f964a520177c23e3', 'Baldini', 43.661299966369135, -79.33902686943661, '1012 Queen St East (at Boston Ave) ON, Canada', 249, True, -5309854.246325896, 10502213.780609358)
('4b71edddf964a520cb642de3', 'Positano', 43.

In [49]:
#Use geopy library to get the latitude and longitude values of Toronto City
address = 'Toronto'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


Let's now see all the collected restaurants in our area of interest on map, and let's also show Italian restaurants in different color.

In [50]:
# create map of all colledted restaurants in Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

folium.Marker([latitude, longitude]).add_to(map_toronto)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_italian = res[6]
    color = 'red' if is_italian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_toronto)
map_toronto

Looking good. So now we have all the restaurants in area and we know which ones are Italian restaurants! We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Italian restaurants!

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Toronto that have low restaurant density, particularly those with low number of Italian restaurants.

In first step we have collected the required **data: location and type (category) of every restaurant**. We have also **identified Italian restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Toronto - we will use **heatmaps** to identify a few promising areas close to center with low number of restaurants in general and focus our attention on those areas.

In third and final step we will focus on most promising areas. We will present map of all such locations to identify neighborhoods for optimal locations by the franchise.

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the **number of restaurants in every neighborhood**:

In [51]:
location_restaurants_count = [len(res) for res in location_restaurants]

df3['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every neighborhood with radius=300m:', np.array(location_restaurants_count).mean())

df3

Average number of restaurants in every neighborhood with radius=300m: 3.9223300970873787


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Restaurants in area
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,4
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,0
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,0
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,1
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476,1
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,1


Let's crete a map showing **heatmap / density of restaurants** and try to extract some meaningfull info from that. 

In [52]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

italian_latlons = [[res[2], res[3]] for res in italian_restaurants.values()]

In [53]:
from folium import plugins
from folium.plugins import HeatMap

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_toronto) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_toronto)
folium.Marker([latitude, longitude]).add_to(map_toronto)
folium.Circle([latitude, longitude], radius=1000, fill=False, color='white').add_to(map_toronto)
folium.Circle([latitude, longitude], radius=2000, fill=False, color='white').add_to(map_toronto)
folium.Circle([latitude, longitude], radius=3000, fill=False, color='white').add_to(map_toronto)
map_toronto

Looks like a few pockets of low restaurant density closest to city center can be found **south, north and east from city center**. 

Let's create another heatmap map showing **heatmap/density of Italian restaurants** only.

In [54]:
from folium import plugins
from folium.plugins import HeatMap

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_toronto) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_toronto)
folium.Marker([latitude, longitude]).add_to(map_toronto)
folium.Circle([latitude, longitude], radius=1000, fill=False, color='white').add_to(map_toronto)
folium.Circle([latitude, longitude], radius=2000, fill=False, color='white').add_to(map_toronto)
folium.Circle([latitude, longitude], radius=3000, fill=False, color='white').add_to(map_toronto)
map_toronto

This map is not so 'hot' (Italian restaurants represent a subset of ~8% of all collected restaurants in Toronto) but it also indicates higher density of existing Italian restaurants directly north and south from city center.

Based on this we will now focus our analysis on neighborhoods without restaurants.

In [55]:
#the neighborhoods without restaurants
df10 = df3[df3['Restaurants in area'] == 0]
df10

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Restaurants in area
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,0
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,0
11,M1R,Scarborough,"Wexford, Maryvale",43.750072,-79.295849,0
13,M1T,Scarborough,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,0
14,M1V,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577,0
16,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,0
19,M2K,North York,Bayview Village,43.786947,-79.385975,0


In [56]:
#to print the number of rows of dataframe
df10.shape

(51, 6)

In [57]:
# create map of Neighborhood without restaurants
map_best_places = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df10['Latitude'], df10['Longitude'], df10['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_best_places)  
    
map_best_places

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in Toronto (~400), there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected south, north and east from city center from city center, so we focused our attention to areas west.

Result of all this is 51 zones with great potential to open new restaurants. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on areas close to Toronto center but not crowded with existing restaurants (particularly Italian). Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Toronto areas close to center with low number of restaurants in order to aid the franchise entreprise in narrowing down the search for optimal location for some new Italian restaurants. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations).

Final decission on optimal restaurant location will be made by the franchise based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.