# Final Capstone Project (Where to Buy when you love Greek Restaurants!)

## Table of Contents
* [Introduction: Problem Statement](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Problem Statement<a name="introduction"></a>

In this project I will try to find the best location to buy a house in Canada based on being situated closest to the most Greek restaurants.  Specifically, this report will be targeted to home buyers interested in buying a house as close as possible to the best Greek restaurants in one of the 6 largest Canadian cities.

I'll use data science to generate the best recommended location based on the aforementioned criteria--as this criteria is not available for buyers on common real estate sites like mls.ca.  

To keep with the Greek theme, we will use the colours of the Greece flag (blue & white) when creating map visualizations.

## Data <a name="data"></a>

Based on the problem statement, factors that will influence the recommendation are:
* number of and distance to Greek restaurants in the neighbourhood
* ratings of Greek restaurants

Following data sources will be needed to extract/generate the required information:
* number of Greek restaurants and location in every neighborhood will be obtained using **Foursquare API**

In [305]:
# IMPORT ALL LIBRARIES NEEDED

import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests 
from pandas.io.json import json_normalize 
from sklearn.cluster import KMeans 
import folium 
print('ALL LIBRARIES NEEDED ARE NOW IMPORTED.')

ALL LIBRARIES NEEDED ARE NOW IMPORTED.


In [306]:
# My Foursquare credentials

CLIENT_ID = '5M003IKTR4Y4AV1UJVTHI2LGQ1JOZLA5MXNBWVT3SW5BCP32' 
CLIENT_SECRET = 'WLX4XCT0PWVALPF2HBOUA3KHROFIUAWHS5Z0IA2GHGU2AMCD' 
VERSION = '20191129' 

#### Limit data retrieved to the Greek Restaurant category and the 6 largest Canadian cities only.  Only return Top 25 restaurants in each city.

In [307]:
LIMIT = 25
cities = ["Toronto, CA", 'Montreal, CA', 'Vancouver, CA', 'Calgary, CA', 'Edmonton, CA', 'Ottawa, CA']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        # The ID below is the Greek Restaurant Foursquare ID
        "4bf58dd8d48988d10e941735")
    results[city] = requests.get(url).json()

In [310]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

In [309]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=4,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#6daadc',
            fill_opacity=0.5,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of Greek restaurants in {city} is: ", results[city]['response']['totalResults'])

Total number of Greek restaurants in Toronto, CA is:  158
Total number of Greek restaurants in Montreal, CA is:  76
Total number of Greek restaurants in Vancouver, CA is:  46
Total number of Greek restaurants in Calgary, CA is:  40
Total number of Greek restaurants in Edmonton, CA is:  26
Total number of Greek restaurants in Ottawa, CA is:  27


#### Show where Top 25 Greek Restaurants are located in each city.

In [311]:
# Toronto map

maps[cities[0]]

In [156]:
# Montreal map

maps[cities[1]]

In [157]:
# Vancouver map

maps[cities[2]]

In [158]:
# Calgary map

maps[cities[3]]

In [159]:
# Edmonton map

maps[cities[4]]

In [160]:
# Ottawa map

maps[cities[5]]

#### Toronto & Montreal have the most Greek restaurants.   BUT......

Next step is to understand how close each restaurant is to the next.  The expected results are unknown as Toronto has the most restaurants, but is also a large, sprawling metropolitan area.

The mean location of all of the Greek Restaurants (city specific) will be used.  Then the average distance will be calculated--distance of the Greek restaurant from the mean location.

In addition, I will cluster the restaurants to provide another option for the home buyer.

This concludes the data gathering phase - we're now ready to use this data for analysis to understand where we should be buying a house.

## Methodology <a name="methodology"></a>

In this project I will focus on areas in each city that have a high number of top Greek restaurants--density. 

In first step I collected the required data (data: location and type (category) of Greek restaurants within each of the 6 largest Canadian cities)

Second step in the analysis I explored restaurant density across the 6 different cities using heatmaps to identify a few promising areas with a high number of top rated Greek restaurants.

Third step, I will use machine learning the cluster the areas to provide another option as the best place to buy a home.

In the final step I will focus on the most promising areas to buy a house based on the problem statement.

## Analysis <a name="analysis"></a>

#### Calculate the average distance of the Greek restaurants from the mean location of all of the Greek restaurants in each city.

In [216]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="blue", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='white',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.9,
        parse_html=False).add_to(maps[city])

    print(f"Avg dist. of Greek restaurants from the mean coord in {city} is: ")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

Avg dist. of Greek restaurants from the mean coord in Toronto, CA is: 
0.060587397101374056
Avg dist. of Greek restaurants from the mean coord in Montreal, CA is: 
0.03280839646554718
Avg dist. of Greek restaurants from the mean coord in Vancouver, CA is: 
0.028532714544404345
Avg dist. of Greek restaurants from the mean coord in Calgary, CA is: 
0.06525543307974888
Avg dist. of Greek restaurants from the mean coord in Edmonton, CA is: 
0.06487755998158204
Avg dist. of Greek restaurants from the mean coord in Ottawa, CA is: 
0.06342894005873345


In [162]:
# Toronto map

maps[cities[0]]

In [163]:
# Montreal map

maps[cities[1]]

In [236]:
# Ottawa map

maps[cities[5]]

In [165]:
# Calgary map

maps[cities[3]]

In [166]:
# Edmonton map

maps[cities[4]]

In [237]:
# Vancouver map

maps[cities[2]]

#### Through the use of the visualizations and the calculations shown before the maps, it appears that Vancouver would be the best place to buy a house based on the criteria.

In [239]:
LIMIT = 25
cities = ['Vancouver, CA']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        # The ID below is the Greek Restaurant Foursquare ID
        "4bf58dd8d48988d10e941735")
    results[city] = requests.get(url).json()

In [240]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

In [241]:
df_venues[city]

Unnamed: 0,Name,Address,Lat,Lng
0,Stepho's Souvlaki Greek Taverna,1124 Davie St.,49.280768,-123.131911
1,Parthenon Supermarket,3080 W Broadway,49.264184,-123.173762
2,Takis Taverna,1106 Davie St,49.28059,-123.131499
3,Maria's Taverna,W. 4th Ave,49.268085,-123.158082
4,Stepho's Souvlaki Greek Taverna,1359 Robson St,49.287678,-123.129346
5,Mediterranean Grill,1152 Denman St,49.287847,-123.141065
6,The Greek By Anatoli,1043 Mainland,49.276148,-123.119663
7,Opa! Souvlaki of Greece,701 West Georgia Street,49.282944,-123.117978
8,Double DD Pizza,3510 West 4th,49.268633,-123.182134
9,Simpatico Greek Restaurant,2222 W 4th Ave,49.268114,-123.155861


In [242]:
df1 = df_venues[city].drop('Name', axis=1)
df1

Unnamed: 0,Address,Lat,Lng
0,1124 Davie St.,49.280768,-123.131911
1,3080 W Broadway,49.264184,-123.173762
2,1106 Davie St,49.28059,-123.131499
3,W. 4th Ave,49.268085,-123.158082
4,1359 Robson St,49.287678,-123.129346
5,1152 Denman St,49.287847,-123.141065
6,1043 Mainland,49.276148,-123.119663
7,701 West Georgia Street,49.282944,-123.117978
8,3510 West 4th,49.268633,-123.182134
9,2222 W 4th Ave,49.268114,-123.155861


In [282]:
df2 = df1.drop('Address', axis=1)
df2.head()

Unnamed: 0,Lat,Lng
0,49.280768,-123.131911
1,49.264184,-123.173762
2,49.28059,-123.131499
3,49.268085,-123.158082
4,49.287678,-123.129346


In [283]:
num_clusters = 3

k_means = KMeans(init="k-means++", n_clusters=num_clusters, n_init=12)
k_means.fit(df2)
labels = k_means.labels_

print(labels)

[1 2 1 2 1 1 1 1 2 2 1 1 0 0 2 1 1 1 0 2 1 2 1 1 1]


In [284]:
df2["Labels"] = labels
df2

Unnamed: 0,Lat,Lng,Labels
0,49.280768,-123.131911,1
1,49.264184,-123.173762,2
2,49.28059,-123.131499,1
3,49.268085,-123.158082,2
4,49.287678,-123.129346,1
5,49.287847,-123.141065,1
6,49.276148,-123.119663,1
7,49.282944,-123.117978,1
8,49.268633,-123.182134,2
9,49.268114,-123.155861,2


Check the centroid values by averaging the features in each cluster.

In [285]:
df2 = df2.groupby('Labels').mean()
df2

Unnamed: 0_level_0,Lat,Lng
Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,49.266357,-123.058294
1,49.280592,-123.123993
2,49.266076,-123.165327


In [303]:
maps = {}
for city in cities:
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df2['Lat'], df2['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=10,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#FFFFFF',
            fill_opacity=0.5,
            parse_html=False).add_to(maps[city])  
   

In [304]:
# Vancouver map

maps[city]

## Results and Discussion <a name="results"></a>

The analysis shows that although there is a great number of Greek restaurants in Toronto & Montreal, there are pockets of top rated Greek restaurants density.  The top density areas for best Greek restaurants are  first Vancouver, and then Montreal--based on the top 25 Greek restaurants.

If the top # of Greek restaurants is changed for the analysis (ex. 30, 15, or 10), they all result with Vancouver being the city with the top density.
Close to the intersection of Pacific St. and Seymour St. would be the ideal location to buy a place based on our problem statement.

But, as shown by using maching learning to cluster the Greek restaurants in Vancouver into 3 clusters, then this will yield a different location to buy a home.  
Based on clustering, it would be ideal to live in the middle of one of the clusters (Lat: 49.280592, Long: -123.123993) near the intersection of Hornby and Nelson.  
This cluster has 15 of the top 25 Greek restaurants within it.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify the ideal location to buy a house in one of the largest 6 Canadian cities based on proximity to the best Greek restaurantes.  By calculating restaurant density distribution from Foursquare data I was able to determine where the best Greek restaurants were located in each city.  Further analysis of changing the # of top rated Greek restaurants to be in close proximity did not change the result--Vancouver is the best city.  
Following further analysis, two potential "best" locations were determined for the buyer as options.  
Option 1 at the intersection of Pacific & Seymour would be the best location if trying to be in the best location for all 25 restaurants.  
Option 2 presented a secondary option based on clustering which determined the best location to be in to be in closest proximity to 15 of the 25 best restaurants--situated at the intersection of Hornby and Nelson.  

##### Opa!