# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Introduction

Through this notebook we will analyze the neighborhoods of toronto and identify the best area, theoretically, to open an italian restaurant. The purpose of this analysis is to serve as a basis to introduce and attract potential investors from the food industry and / or local families looking for a new investment, thus stimulating the city's economy. The location in which a business is established is fundamental to its success, especially when it comes to restaurants, so it is important to study the various possible locations in order to reduce the risks of the investment failing.

## Data

To carry out this study we will mainly use FourSquare, a technology company that built a massive dataset of location data. We will also use a table containing the postal codes of canada, imported from wikipegia, as well as a csv file with its coordinates. These will be the necessary tools to explore and analyze the different neighborhoods of toronto to identify different parameters, such as: number of restaurants, proximity to italian restaurants, among others.

### Neighborhood Candidates

In [1]:
import pandas as pd
import numpy as np

#### Importing a Dataframe from a Wikipegia page

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

In [3]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


#### Preparing and cleaning the data

In [4]:
df['Borough'].replace('Not assigned', np.NaN, inplace=True)
df['Neighborhood'].replace('Not assigned', np.NaN, inplace=True)

In [5]:
df = df[df['Borough'].notna()]

In [6]:
df.reset_index(inplace=True)

In [7]:
df['Postal Code'].duplicated().any()

False

In [8]:
df.drop(['index'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [9]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


#### Reading a csv file with cordinates data and merging it with our dataframe

In [10]:
df1 = pd.read_csv('Geospatial_Coordinates.csv')
df1.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [11]:
df1.shape

(103, 3)

In [12]:
df2 = pd.merge(df, df1, on='Postal Code')

In [13]:
df2

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


#### Restricting 'Borough' for places that contain 'Toronto'

In [14]:
import re
toronto_data = df2[df2['Borough'].str.match('.+\sToronto')== True]
toronto_data.reset_index(inplace=True)

In [15]:
toronto_data.drop(['index'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [16]:
toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [17]:
toronto_data.shape

(39, 5)

In [18]:
import json

from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

#### Using geopy library to get the latitude and longitude values of Toronto

In [19]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Creating a map of Toronto with neighborhoods superimposed on top

In [20]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Restricting Neighborhoods that are part of Downtown

Downtown Toronto is a buzzing area filled with skyscrapers, restaurants, nightlife, and an eclectic mix of neighbourhoods. It’s also home to iconic attractions like the CN Tower, St. Lawrence Market, and the Royal Ontario Museum, with exhibits on natural history. Bloor Street is an upscale shopping area, and the Eaton Centre is a huge, multistory mall. On the lake, the Harbourfront area has parks and cultural venues.

In [21]:
downtown = toronto_data[toronto_data['Borough'].str.contains('Downtown')]
downtown.reset_index(inplace=True)

In [22]:
downtown.drop(['index'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [23]:
downtown

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


#### Creating a map of Toronto with Downtown's neighborhoods superimposed on top

In [24]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, label in zip(downtown['Latitude'], downtown['Longitude'], downtown['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Exploring Neighborhoods with FourSquare

In [62]:
CLIENT_ID = 'REMOVED' # your Foursquare ID
CLIENT_SECRET = 'REMOVED' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [26]:
LIMIT = 500 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['id'],
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],
            v['venue']['location']['distance'],
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude',
                  'Venue ID',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
                  'Distance to Center',
                  'Venue Category']
    
    return(nearby_venues)

#### First let's get all the venues in a radius of 1000 meters from the center of each neighborhood

In [28]:
toronto_venues = getNearbyVenues(names=downtown['Neighborhood'],
                                   latitudes=downtown['Latitude'],
                                   longitudes=downtown['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [29]:
print(toronto_venues.shape)
toronto_venues.head()

(1669, 9)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Distance to Center,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,54ea41ad498e9a11e9e13308,Roselle Desserts,43.653447,-79.362017,143,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,53b8466a498e83df908c3f21,Tandem Coffee,43.653559,-79.361809,122,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,51ccc048498ec7792efc955e,Corktown Common,43.655618,-79.356211,387,Park
3,"Regent Park, Harbourfront",43.65426,-79.360636,4ad4c05ef964a520bff620e3,The Distillery Historic District,43.650244,-79.359323,459,Historic Site
4,"Regent Park, Harbourfront",43.65426,-79.360636,574c229e498ebb5c6b257902,Cooper Koo Family YMCA,43.653249,-79.358008,239,Distribution Center


#### Now we filter the venues for restaurants

In [30]:
toronto_restaurants = toronto_venues[toronto_venues['Venue Category'].str.contains('Restaurant|Diner|Taverna|Steakhouse|Place')]
print(toronto_restaurants.shape)

(485, 9)


In [31]:
toronto_restaurants.reset_index(inplace=True)

In [32]:
toronto_restaurants.drop(['index'], axis=1, inplace=True)
toronto_restaurants.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Distance to Center,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,5612b1cc498e3dd742af0dc8,Impact Kitchen,43.656369,-79.35698,376,Restaurant
1,"Regent Park, Harbourfront",43.65426,-79.360636,57e0745a498ea809dbf75f68,Souk Tabule,43.653756,-79.35439,506,Mediterranean Restaurant
2,"Regent Park, Harbourfront",43.65426,-79.360636,53a22c92498ec91fda7ce133,Cluny Bistro & Boulangerie,43.650565,-79.357843,468,French Restaurant
3,"Regent Park, Harbourfront",43.65426,-79.360636,4ad776eef964a520e20a21e3,Mangia and Bevi Resto-Bar,43.65225,-79.366355,512,Italian Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,4ada57aff964a520972121e3,Sukhothai,43.658444,-79.365681,618,Thai Restaurant


#### Then we filter the restaurants for italian restaurants

In [33]:
italian_restaurants = toronto_restaurants[toronto_restaurants['Venue Category'].str.contains('Italian|Pizza|Pasta')]
print(italian_restaurants.shape)
italian_restaurants.head()

(56, 9)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Distance to Center,Venue Category
3,"Regent Park, Harbourfront",43.65426,-79.360636,4ad776eef964a520e20a21e3,Mangia and Bevi Resto-Bar,43.65225,-79.366355,512,Italian Restaurant
6,"Regent Park, Harbourfront",43.65426,-79.360636,4cbdc6784495721ea262617a,Fusaro's,43.653347,-79.369517,722,Italian Restaurant
13,"Regent Park, Harbourfront",43.65426,-79.360636,4af61930f964a520260122e3,Bellisimo Pizzeria & Ristorante,43.648748,-79.368684,892,Pizza Place
14,"Regent Park, Harbourfront",43.65426,-79.360636,56d8dff7498eb4e5e661e78d,Ardo,43.651201,-79.36835,708,Italian Restaurant
25,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4a8355bff964a520d3fa1fe3,Mercatto,43.660391,-79.387664,258,Italian Restaurant


#### Creating a map of Toronto with blue circles representing all restaurants and red circles representing italian restaurants superimposed on top

In [34]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, label in zip(toronto_restaurants['Venue Latitude'], toronto_restaurants['Venue Longitude'], toronto_restaurants['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=(lat, lng),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

for lat, lng, label in zip(italian_restaurants['Venue Latitude'], italian_restaurants['Venue Longitude'], italian_restaurants['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=(lat, lng),
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto) 
    
map_toronto

## Methodology

The first part of our study was aimed at collecting and preparing the data so that we could analyze Our area of ​​interest was defined as Downtown Toronto and within this area we were able to collect all the restaurants, including Italians, within a radius of 1 km from neighborhoods centers.

Then we will go into the analysis section, which will be divided into two parts. The first will be responsible for randomly generating 1000 locations within our area of ​​interest and filtering them according to the following parameters: having less than four and at least two restaurants (possible gastronomic centers) within 250 meters and no Italian restaurant in 400 meters. 

In the second part we will use the machine learning model called k-means clustering, which will help us to identify promising zones, by grouping the ideal locations and generating centers for each of the 18 clusters. Finally, we will convert the clusters centers to addresses using reverse geocoding.

## Analysis

#### Creating random locations inside our interest area

In [35]:
import random
from shapely.geometry import Polygon, Point

In [36]:
poly = Polygon([(43.63817355603517, -79.38054644461133), (43.6416627086796, -79.3752137736969), (43.65879825022111, -79.35008516753265), (43.672357143937056, -79.37804481316573), (43.67101871082308, -79.39108133032904), (43.67378411528878, -79.41177020045674), (43.67067884412717, -79.42614819830024), (43.66785822046965, -79.42805427320322), (43.661895, -79.426479), (43.638267, -79.390187)])

def random_points_within(poly, num_points):
    min_x, min_y, max_x, max_y = poly.bounds

    points = []

    while len(points) < num_points:
        random_point = Point([random.uniform(min_x, max_x), random.uniform(min_y, max_y)])
        if (random_point.within(poly)):
            points.append(random_point)

    return points


points = random_points_within(poly,1000)

In [37]:
array_length = len(points)

#### Creating a map of Toronto with those locations superimposed on top

In [38]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

for c in range(array_length):
    lat, lng = points[c].x, points[c].y
    label = folium.Popup((lat, lng), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=(lat, lng),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto

#### Preparing the data acquired before to create cordinates lists and defining a function to measure distance beteween two locations

In [39]:
from math import sin, cos, sqrt, atan2, radians


def distance(x, y, x1, y1):

    R = 6373.0   # approximate radius of earth in km

    lat1 = radians(x)
    lon1 = radians(y)
    lat2 = radians(x1)
    lon2 = radians(y1)

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = R * c

    return int(distance*1000)

In [40]:
italat = []
for lat in italian_restaurants['Venue Latitude']:
    italat.append(lat)
    
italng = []
for lng in italian_restaurants['Venue Longitude']:
    italng.append(lng)
    
restlat = []
for lat in toronto_restaurants['Venue Latitude']:
    restlat.append(lat)
    
restlng = []
for lng in toronto_restaurants['Venue Longitude']:
    restlng.append(lng)

In [41]:
print(len(italat), len(italng))

56 56


In [42]:
itarest = []
x = 0
while x < 43:
    location = []
    location.append(italat[x])
    location.append(italng[x])
    itarest.append(location)
    x = x + 1

In [43]:
print(len(restlat), len(restlng))

485 485


In [44]:
rest = []
x = 0
while x < 368:
    location = []
    location.append(restlat[x])
    location.append(restlng[x])
    rest.append(location)
    x = x + 1

#### Filtering the locations that fit the established parameters

In [45]:
good_locations = []
X = []

for c in range(len(points)):
    location = []
    Xloc = []
    count = 0
    itacount = 0
    lat = points[c].x
    lng = points[c].y
    
    for r in range(len(rest)):
        restlat = rest[r][0]
        restlng = rest[r][1]
        if distance(lat, lng, restlat, restlng) < 250:
            count = count + 1
            
    for i in range(len(itarest)):
        italat = itarest[i][0]
        italng = itarest[i][1]
        if distance(lat, lng, italat, italng) < 400:
            itacount = itacount + 1
            
    if count >=2 and count <= 4 and itacount == 0:
        location.append(lat)
        location.append(lng)
        location.append(count)
        location.append(itacount)
        good_locations.append(location)
        
        Xloc.append(lat)
        Xloc.append(lng)
        X.append(Xloc)

In [46]:
len(good_locations)

111

#### Converting to Dataframe

In [47]:
df_good_locations = pd.DataFrame(good_locations, columns =['Latitude', 'Longitude', 'Restaurants in 250m', 'Italian Restaurants in 400m'])
df_good_locations.head()

Unnamed: 0,Latitude,Longitude,Restaurants in 250m,Italian Restaurants in 400m
0,43.638302,-79.387486,3,0
1,43.665744,-79.387976,3,0
2,43.647778,-79.391585,4,0
3,43.65832,-79.352113,2,0
4,43.668924,-79.383967,4,0


#### Creating a map of Toronto with the good locations superimposed on top

In [48]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, label in zip(df_good_locations['Latitude'], df_good_locations['Longitude'], df_good_locations['Restaurants in 250m']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=(lat, lng),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto

#### Clustering good locations to find it centers

Were defined 18 clusters, and numeric labels for each of them.

In [49]:
from sklearn.cluster import KMeans

In [50]:
kclusters = 18

In [51]:
k_means = KMeans(init="k-means++", n_clusters=kclusters, n_init=12)

In [52]:
k_means.fit(X)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=18, n_init=12, n_jobs=None, precompute_distances='auto',
       random_state=None, tol=0.0001, verbose=0)

In [53]:
k_means_labels = k_means.labels_
k_means_labels

array([ 6, 17, 11,  7,  3,  0, 16,  0, 13,  5,  5, 16,  9,  0,  0, 14,  4,
       14, 16,  0,  5, 11, 14, 16, 16,  4, 12, 16, 16, 13,  5, 16,  5,  5,
        6,  7,  0, 16, 13, 13, 11,  0, 10, 13, 13,  4, 15,  6, 14, 14,  1,
        8, 16, 17,  3,  0, 15, 11,  4, 16, 12,  8,  2,  9,  5,  5,  6, 11,
        7,  5, 14, 14,  7,  3,  5, 16,  5, 11,  1, 13, 13,  4, 10,  1,  0,
        1,  5,  2,  2, 10,  9,  8, 11, 11, 16, 12,  5, 16,  1,  2, 16, 16,
       14,  0,  1,  3,  5,  9,  4,  4,  4])

In [54]:
k_means_cluster_centers = k_means.cluster_centers_
k_means_cluster_centers

array([[ 43.65561545, -79.39234406],
       [ 43.66063301, -79.36721468],
       [ 43.66267398, -79.42425619],
       [ 43.66886376, -79.38397656],
       [ 43.65635987, -79.40743187],
       [ 43.64898657, -79.39761362],
       [ 43.63906082, -79.38788751],
       [ 43.65854874, -79.3524473 ],
       [ 43.66742967, -79.39765083],
       [ 43.65248753, -79.38815619],
       [ 43.64395266, -79.37704982],
       [ 43.64852968, -79.39393858],
       [ 43.67236391, -79.41671648],
       [ 43.65385744, -79.39798921],
       [ 43.66001051, -79.36412231],
       [ 43.65968927, -79.37462964],
       [ 43.65831895, -79.36954498],
       [ 43.66509773, -79.38793666]])

In [55]:
# add clustering labels
df_good_locations.insert(0, 'Cluster Labels', k_means.labels_)

In [56]:
df_good_locations.head()

Unnamed: 0,Cluster Labels,Latitude,Longitude,Restaurants in 250m,Italian Restaurants in 400m
0,6,43.638302,-79.387486,3,0
1,17,43.665744,-79.387976,3,0
2,11,43.647778,-79.391585,4,0
3,7,43.65832,-79.352113,2,0
4,3,43.668924,-79.383967,4,0


#### Creating a map of Toronto with the good locations clusterd in different colours superimposed on top

In [57]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, lng, cluster in zip(df_good_locations['Latitude'], df_good_locations['Longitude'], df_good_locations['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       

map_clusters

#### Using reverse geocoding to identify the adress of each cluster center

In [58]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

In [61]:
api_key = 'REMOVED'

In [60]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for c in range(len(k_means_cluster_centers)):
    lat = k_means_cluster_centers[c][0]
    lng = k_means_cluster_centers[c][1]
    addr = get_address(api_key, lat, lng)
    candidate_area_addresses.append(addr)    
    print(addr)

Addresses of centers of areas recommended for further analysis

162 McCaul St, Toronto, ON M5T 1W4, Canada
350 Parliament St, Toronto, ON M5A 2Z7, Canada
850 A Bloor St W, Toronto, ON M6G 1M2, Canada
45 Charles St E, Toronto, ON M4Y 1S2, Canada
431 College St, Toronto, ON M5T 1T1, Canada
3 Cameron St, Toronto, ON M5V 2A9, Canada
318 Queens Quay W, Toronto, ON M5V 3A7, Canada
630 Queen St E, Toronto, ON M4M 1G3, Canada
110 Devonshire Pl, Toronto, ON M5S 2C9, Canada
225 Simcoe St, Toronto, ON M5G 1S4, Canada
18 Lake Shore Blvd W, Toronto, ON M5E 1Z8, Canada
134 Peter St, Toronto, ON M5V 2H2, Canada
888 Palmerston Ave, Toronto, ON M6G 2S2, Canada
307 Spadina Ave, Toronto, ON M5T 2E6, Canada
500 Dundas St E, Toronto, ON M5A 3V3, Canada
368 George St, Toronto, ON M5A 2N3, Canada
142 Seaton St, Toronto, ON M5A 2T3, Canada
1000 Bay St, Toronto, ON M5S 3A6, Canada


This concludes our analysis. We have created 18 addresses representing centers of zones containing locations with low number of restaurants and no Italian restaurants nearby. Although it's important to emphasize that the centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. All of the zones are located in Downtown Toronto boroughs, which we have identified as interesting due to being popular with tourists, close to city center and well connected by public transport.

## Results and Discussion

We can observe that although there are a relatively large number of venues in the Downtown Toronto area, there are still possible zones to be explored with low concentration of restaurants.
 
Our result was to generate 18 addresses with potential for stakeholders to invest in an Italian restaurant (based on proximity to other restaurants). We have acquired information about those locations, however it does not indicate that these addresses are really ideal for opening an italian restaurant.
 
Future studies are necessary, because even with low competitiveness there may be other factors that make unsustainable for new restaurants to be setted in those locations.

## Conclusion

The aim of this project was to identify possible promising areas for opening an Italian restaurant in Toronto. By collecting information about the city we could see that the Downtown area was the most interesting for the study, because it has the biggest touristic center, busy streets etc. With the downtown Toronto region defined as our zone of interest, we generated 1000 random locations within that zone and filtered them to meet some parameters about nearby restaurants. So, we were able to group these locations into zones or clusters, which addresses are the final product of the work, to be used in the future as a starting point for stakeholder evaluation.
 
It is worth mentioning that although it is a starting point, the addresses obtained should not be readily considered ideal areas for opening the restaurant, because there are many other factors to be considered such as: proximity to attractions, commerce, price, availability of location, and the socio-economic dynamics of the neighborhood.
