# Location Analysis for <span style="color:#75b239">Veggie Grill</span> in Toronto, CA
<br>
<img src="https://cdn.pixabay.com/photo/2020/04/21/02/32/buildings-5070537_960_720.jpg" width="600" align="left" alt="Toronto Skyline">

<a href="https://pixabay.com/photos/buildings-city-cityscape-skyscraper-5070537/">By lucasgeorgewendt from Pixabay</a>

**Disclaimer:** Please be aware that this is a mock project and decisions for **<span style="color:#75b239">Veggie Grill</span>** to expand to Toronto and the customer segmentation are not based on facts, but are made up for the purpose of this project.

## Table of Contents


#### 1. Introduction  
#### 2. Methodology  
#### 3. Analysis  
#### 4. Results  


## 1. Introduction

This section gives an introduction to the restaurant chain Veggie Grill and the business problem to be solved by this project.

### Veggie Grill

**<span style="color:#75b239">Veggie Grill</span>** is a **fast-casual vegan restaurant chain** that operates in California, Oregon, Washington, Illinois, and Massachusetts. The **first** restaurant **opened in 2006** in Irvine, California, which has since grown to be the **largest vegetarian and vegan restaurant company in the U.S**.
The chain focuses on offering **only plant-based food**, with no meat, dairy, eggs, cholesterol, animal fat or trans fat.

### Business Problem

The owners of **<span style="color:#75b239">Veggie Grill</span>** have decided to expand into the Canadian market. Management of **<span style="color:#75b239">Veggie Grill</span>** has decided to choose Toronto as the city where the first restaurant should be opened in Canada.

To gain foothold in Toronto fast a suitable location in Downtown Toronto for the first restaurant should be chosen. The management of **<span style="color:#75b239">Veggie Grill</span>** now approached **Capstone Data Science** to analyze the ideal location, based on two criteria:
<br>
1. **Competition:** Overall competition of vegan/vegetarian restaurants in the area.
2. **Attractiveness:** Attractiveness of the neighborhood based on potential customers. Based on a customer segmentation done by **<span style="color:#75b239">Veggie Grill</span>** some months ago they identified that their customers are between the age of 10 to 55. Furthermore their main revenue stream comes from employees who go out for lunch during lunch break. 

## 2. Methodology

This section gives an overview about the methodology how the project will be approached and the data and according data sources that will be needed to solve the business problem.

### Overview

 To determine a suitable location mainly 4 data sources will be used:  
 1. **Foursquare Venue Data:** For competition data Foursquare will be used to build competition clusters based on density of vegan/vegetarian restaurants located in a neighborhood. The data will be retrieved via the Foursquare API.
 2. **Toronto Economic Data:** Toronto economics data will be used to determine how many business are located in a neighborhood, since **<span style="color:#75b239">Veggie Grill's</span>** main revenue stream comes from employees who go out for lunch. The data will be retrieved via the website of the city of Toronto: https://open.toronto.ca/dataset/wellbeing-toronto-economics/.
 3. **Toronto Demographics Data:** Toronto population data will be used to determine how many of the customers in the identified age group between 10 to 55 years are located in a neighborhood. The data will be retrieved via the website of the city of Toronto:  https://open.toronto.ca/dataset/wellbeing-toronto-demographics/
 4. **Toronto Location Data:** To plot the high potential neighborhoods on a map, location data for longitude and latitude values will be downloaded from the Toronto website as well: https://open.toronto.ca/dataset/neighbourhoods/. Generating the data via geocoder is not possible without bigger efforts, since the neighborhood names cannot be easily found via geocoder.
 


### Data Analysis

We will conduct basically two analyses based on clustering, which will result in a competition clustering and a attractiveness clustering. Both clustering exercises will then be combined to derive a conclusion which neighborhoods have the highest potential to open a first restaurant.

All relevant data of the above mentioned data sources will be combined into one data frame and then be plotted on a Toronto map with meaningful color coding so that the relevant neighborhoods can be instantly identified.

## 3. Analysis

This section contains the data preparation and data analysis to derive a meaningful conclusion.

In [2]:
import pandas as pd
from geopy.geocoders import Nominatim
import folium
import requests
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
print('Libraries imported.')

Libraries imported.


### Toronto Economics Data

In [4]:
df_eco = pd.read_csv(r"C:\Users\rhass\Desktop\Github\Coursera_Capstone\wellbeing-toronto-economics.csv")
df_eco.head()

Unnamed: 0,Neighbourhood,Neighbourhood Id,Businesses,Child Care Spaces,Debt Risk Score,Home Prices,Local Employment,Social Assistance Recipients
0,West Humber-Clairville,1,2463,195,719,317508,58271,2912
1,Mount Olive-Silverstone-Jamestown,2,271,60,687,251119,3244,6561
2,Thistletown-Beaumond Heights,3,217,25,718,414216,1311,1276
3,Rexdale-Kipling,4,144,75,721,392271,1178,1323
4,Elms-Old Rexdale,5,67,60,692,233832,903,1683


### Toronto Demographics Data

In [5]:
df_demo = pd.read_csv(r"C:\Users\rhass\Desktop\Github\Coursera_Capstone\wellbeing-toronto-population-total-2011-2016-and-age-groups-2016-1.csv")
df_demo.head()

Unnamed: 0,NeighbourhoodID,Neighbourhood,Total Area,0 to 04 years,0 to 14 years,05 to 09 years,10 to 14 years,100 years and over,15 to 19 years,20 to 24 years,...,65 to 69 years,65 years and over,70 to 74 years,75 to 79 years,80 to 84 years,85 to 89 years,85 years and over,90 to 94 years,95 to 99 years,Total Population - All Age Groups - 100% data
0,1.0,West Humber-Clairville,30.09,1540.0,5060.0,1720.0,1790.0,5.0,2325.0,3120.0,...,1595.0,4980.0,1185.0,885.0,700.0,400.0,615.0,160.0,50.0,33320.0
1,2.0,Mount Olive-Silverstone-Jamestown,4.6,2190.0,7090.0,2500.0,2415.0,0.0,2585.0,2655.0,...,1285.0,3560.0,885.0,630.0,465.0,225.0,300.0,70.0,10.0,32950.0
2,3.0,Thistletown-Beaumond Heights,3.4,540.0,1730.0,600.0,595.0,5.0,650.0,760.0,...,490.0,1880.0,375.0,335.0,320.0,225.0,350.0,100.0,20.0,10360.0
3,4.0,Rexdale-Kipling,2.5,560.0,1640.0,515.0,565.0,0.0,635.0,720.0,...,520.0,1730.0,350.0,295.0,270.0,205.0,300.0,85.0,15.0,10530.0
4,5.0,Elms-Old Rexdale,2.9,540.0,1805.0,605.0,660.0,0.0,690.0,750.0,...,415.0,1275.0,305.0,235.0,180.0,105.0,145.0,40.0,5.0,9460.0


### Location Data

In [7]:
df_loc = pd.read_csv(r"C:\Users\rhass\Desktop\Github\Coursera_Capstone\Neighbourhoods.csv")
df_loc.head()

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
0,5461,25886861,25926662,49885,94,94,Wychwood (94),Wychwood (94),,,-79.425515,43.676919,16491505,3217960.0,7515.779658,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
1,5462,25886820,25926663,49885,100,100,Yonge-Eglinton (100),Yonge-Eglinton (100),,,-79.40359,43.704689,16491521,3160334.0,7872.021074,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
2,5463,25886834,25926664,49885,97,97,Yonge-St.Clair (97),Yonge-St.Clair (97),,,-79.397871,43.687859,16491537,2222464.0,8130.411276,"{u'type': u'Polygon', u'coordinates': (((-79.3..."
3,5464,25886593,25926665,49885,27,27,York University Heights (27),York University Heights (27),,,-79.488883,43.765736,16491553,25418210.0,25632.33524,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
4,5465,25886688,25926666,49885,31,31,Yorkdale-Glen Park (31),Yorkdale-Glen Park (31),,,-79.457108,43.714672,16491569,11566690.0,13953.4081,"{u'type': u'Polygon', u'coordinates': (((-79.4..."


## 4. Results

This section presents the results of the analysis and the suggested neighborhoods.