# Capstone Project

## Introduction: Business Problem

In this project we will try to find an optimal location for a Coffee shop. Specifically, this report will be targeted to stakeholders interested in opening a Coffee Shop in Beirut, Lebanon.

Since the main customer target of the coffee shop would be university students, we would try to detect locations that are not crowded with coffee shops but as close as possible to as much univerities in the area.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data

Based on the definition of our problem, factors that will influence our decission are:
* number of existing coffee shops in the neighborhood 
* Proximity distance to universities in the neighborhood

We decided to use regularly spaced grid of locations, centered around the agregate center of the existing universities, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* number of universities and coffee shops in every neighborhood will be obtained using **Foursquare API**
* Distance between cofee shops and the aggregate center of universities
* coordinate of Beirut will be obtained using **Nominatim from geopy.geocoders**

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
CLIENT_ID = 'SSZ12AZH55PYUSVDYFDDRWFA1FZOP2THRCSNEQMUG41MFMXK' # your Foursquare ID
CLIENT_SECRET = 'VG2VHW2PYWJREDV2TAMYFOW2KGNTSAN4LN2SSE1W3LCXJ50W' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SSZ12AZH55PYUSVDYFDDRWFA1FZOP2THRCSNEQMUG41MFMXK
CLIENT_SECRET:VG2VHW2PYWJREDV2TAMYFOW2KGNTSAN4LN2SSE1W3LCXJ50W


__Specify the Main destination__

In [3]:
address = 'Beirut, Lebanon'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

33.8959203 35.47843


__Specify the main query in the selected region__

In [4]:
search_query_U = 'University'
search_query_C = 'Cafe'
radius = 1000
R = radius

__Define the corresponding Foursquare URL__

In [5]:
url_U = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_U, radius, LIMIT)
url_C = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_C, radius, LIMIT)

__Send the GET Request and examine the results__

In [6]:
results_U = requests.get(url_U).json()
results_C = requests.get(url_C).json()

__Transform the JSON data into a pandas dataframe__

In [7]:
venues_U = results_U['response']['venues']
df_U = json_normalize(venues_U)

venues_C = results_C['response']['venues']
df_C = json_normalize(venues_C)

  
  """


__Dataframe filtering__

In [8]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

__Universities Dataframe__

In [9]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns_U = ['name', 'categories'] + [col for col in df_U.columns if col.startswith('location.')] + ['id']
df_fil_U = df_U.loc[:, filtered_columns_U]

# filter the category for each row
df_fil_U['categories'] = df_fil_U.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_fil_U.columns = [column.split('.')[-1] for column in df_fil_U.columns]

# Keep the rows containing a category different than University
univ = df_fil_U[df_fil_U['categories'].str.contains('University')]
univ.head()

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,cc,city,state,country,formattedAddress,crossStreet,postalCode,id
0,American University of Beirut (AUB),University,Bliss St.,33.900876,35.48142,"[{'label': 'display', 'lat': 33.90087600121215...",616,LB,بيروت,محافظة بيروت,لبنان,"[Bliss St., بيروت, لبنان]",,,4b69411bf964a520009d2be3
5,Lebanese American University (LAU),University,Qoreitem,33.892897,35.477557,"[{'label': 'display', 'lat': 33.89289677097259...",346,LB,بيروت,محافظة بيروت,لبنان,"[Qoreitem, بيروت, لبنان]",,,4b66b81bf964a52093282be3
11,University for Seniors - UFS,University,American University of Beirut,33.899539,35.484577,"[{'label': 'display', 'lat': 33.899539, 'lng':...",696,LB,بيروت,محافظة بيروت,لبنان,"[American University of Beirut (REP Office, Ol...","REP Office, Old Medical Building",,56bb5901498e28bd25c64b82
12,Queen university residence,University,,33.89357,35.477879,"[{'label': 'display', 'lat': 33.89357, 'lng': ...",266,LB,,,لبنان,[لبنان],,,5410767d498eb9a90fb08378
13,Jinan University,University,Georges Assi,33.89369,35.488678,"[{'label': 'display', 'lat': 33.89369, 'lng': ...",978,LB,بيروت,محافظة بيروت,لبنان,"[Georges Assi (Sanayeh), بيروت 5261, لبنان]",Sanayeh,5261.0,52b94765498e56b430bbd693


__Coffee shops Dataframe__

In [10]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns_C = ['name', 'categories'] + [col for col in df_C.columns if col.startswith('location.')] + ['id']
df_fil_C = df_C.loc[:, filtered_columns_C]

# filter the category for each row
df_fil_C['categories'] = df_fil_C.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_fil_C.columns = [column.split('.')[-1] for column in df_fil_C.columns]

CS = df_fil_C
CS.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,cc,city,state,country,formattedAddress,id
0,Cafe Hamra,Café,Hamra Main Street,Main St,33.896092,35.479447,"[{'label': 'display', 'lat': 33.896092, 'lng':...",95,LB,بيروت,محافظة بيروت,لبنان,"[Hamra Main Street (Main St), بيروت, لبنان]",4cb82b7143ec6dcb4cf29231
1,Cafe Younes,Café,"Nehme Yafet, Commodore",,33.895523,35.479906,"[{'label': 'display', 'lat': 33.89552254091945...",143,LB,بيروت,محافظة بيروت,لبنان,"[Nehme Yafet, Commodore, بيروت, لبنان]",4b67080af964a520f2352be3
2,alia cafe - hamra street (عليا),Café,,,33.895254,35.486288,"[{'label': 'display', 'lat': 33.89525368454582...",729,LB,,,لبنان,[لبنان],525aa805498e2bc055d9bbe4
3,Palace Cafe,Café,Manara,,33.900454,35.470093,"[{'label': 'display', 'lat': 33.90045387341104...",920,LB,,محافظة بيروت,لبنان,"[Manara, لبنان]",4dc5306552b1e8f9f7ccc6f0
4,The Deck Café,Seafood Restaurant,,,33.893382,35.46786,"[{'label': 'display', 'lat': 33.893382, 'lng':...",1016,LB,بيروت,محافظة بيروت,لبنان,"[بيروت, لبنان]",56432dc2498e9b29849d85d0


__Calculation of the aggregate center of the existing universities__

In [11]:
# Calculation of the aggregate center of the existing universities
a_center_lat = univ['lat'].mean()
a_center_lng = univ['lng'].mean()
print('The coordinates of the aggregate center of the universities existing in the neighborhood is:\n', 'LAT:', a_center_lat ,'LNG:', a_center_lng)

The coordinates of the aggregate center of the universities existing in the neighborhood is:
 LAT: 33.89611435443695 LNG: 35.48202214912109


__Defining the functions to calculate the coordinates and the distance between two points__

In [12]:
import pyproj

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return np.sqrt(dx*dx + dy*dy)

In [13]:
print('Coordinate transformation check')
print('-------------------------------')
print('Beirut center longitude={}, latitude={}'.format(longitude, latitude))
x, y = lonlat_to_xy(longitude, latitude)
print('Beirut center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Beirut center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Beirut center longitude=35.47843, latitude=33.8959203
Beirut center UTM X=2408411.6000010185, Y=3945820.4263711767
Beirut center longitude=35.47843, latitude=33.89592029999999


__Neighborhoods distribution and calaculation of the main parameter for each neighborhood__

In [14]:
x_center, y_center = lonlat_to_xy(a_center_lng, a_center_lat)

k = np.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = x_center - R
x_step = 600
y_min = y_center - R - (int(21/k)*k*600 - R*2)/2
y_step = 600* k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(x_center, y_center, x, y)
        if (distance_from_center <= (R+1)):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

9 candidate neighborhood centers generated.


__Map Visualization__

In [15]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate a map centred around Beirut

# add a red circle marker to represent my location
folium.features.CircleMarker(
    [a_center_lat, a_center_lng],
    radius=10,
    color='red',
    popup='Aggregate Center',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the coffee shops as blue circle markers
for lat, lng, label in zip(CS.lat, CS.lng, CS.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)
    
# add the universities as green circle markers
for lat, lng, label in zip(univ.lat, univ.lng, univ.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(venues_map)
    
for lat, lon in zip(latitudes, longitudes):
       folium.Circle([lat, lon], radius=300, color='black', fill=False).add_to(venues_map)

# display map
venues_map

In [16]:
df_neighborhood = pd.DataFrame({'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})
df_neighborhood

Unnamed: 0,Latitude,Longitude,X,Y,Distance from center
0,33.892429,35.476874,2408347.0,3945394.0,655.743852
1,33.891373,35.482951,2408947.0,3945394.0,556.776436
2,33.890317,35.489028,2409547.0,3945394.0,953.939201
3,33.897346,35.474931,2408047.0,3945913.0,700.0
4,33.89629,35.481009,2408647.0,3945913.0,100.0
5,33.895234,35.487087,2409247.0,3945913.0,500.0
6,33.901208,35.479067,2408347.0,3946433.0,655.743852
7,33.900151,35.485145,2408947.0,3946433.0,556.776436
8,33.899095,35.491223,2409547.0,3946433.0,953.939201


__Calculation of the number of existing competitors in each neighborhood__

In [17]:
# Calculating the number of coffee shops in each area
r = 300
number_of_cs = []
for j in df_neighborhood.index:
    counts = 0
    for i in CS.index:
        x, y = lonlat_to_xy(CS['lng'][i], CS['lat'][i])
        if calc_xy_distance(df_neighborhood['X'][j],df_neighborhood['Y'][j], x, y) < r:
            counts = counts +1
    number_of_cs.append(counts)    
number_of_cs

[2, 0, 0, 4, 12, 4, 0, 2, 0]

In [18]:
CS_nb = pd.DataFrame({'Number of coffee shops':number_of_cs})

In [19]:
result = pd.DataFrame({'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center,
                             'Number of coffee shops':number_of_cs})
result

Unnamed: 0,Latitude,Longitude,X,Y,Distance from center,Number of coffee shops
0,33.892429,35.476874,2408347.0,3945394.0,655.743852,2
1,33.891373,35.482951,2408947.0,3945394.0,556.776436,0
2,33.890317,35.489028,2409547.0,3945394.0,953.939201,0
3,33.897346,35.474931,2408047.0,3945913.0,700.0,4
4,33.89629,35.481009,2408647.0,3945913.0,100.0,12
5,33.895234,35.487087,2409247.0,3945913.0,500.0,4
6,33.901208,35.479067,2408347.0,3946433.0,655.743852,0
7,33.900151,35.485145,2408947.0,3946433.0,556.776436,2
8,33.899095,35.491223,2409547.0,3946433.0,953.939201,0


### Evaluation

In [20]:
Competition = 0.5
Proximity = 1- Competition
result['Evaluation Scoring'] = (1-result['Number of coffee shops']/result['Number of coffee shops'].max())*Competition + (result['Distance from center'].min()/result['Distance from center'])*Proximity

In [21]:
result.sort_values(by=['Evaluation Scoring'], ascending = False, ignore_index = True).head()

Unnamed: 0,Latitude,Longitude,X,Y,Distance from center,Number of coffee shops,Evaluation Scoring
0,33.891373,35.482951,2408947.0,3945394.0,556.776436,0,0.589803
1,33.901208,35.479067,2408347.0,3946433.0,655.743852,0,0.576249
2,33.899095,35.491223,2409547.0,3946433.0,953.939201,0,0.552414
3,33.890317,35.489028,2409547.0,3945394.0,953.939201,0,0.552414
4,33.900151,35.485145,2408947.0,3946433.0,556.776436,2,0.506469


In [22]:
recom = result.loc[(result['Evaluation Scoring'] == result['Evaluation Scoring'].max()), ['Latitude','Longitude']]

In [23]:
lat_recom = recom['Latitude'][1]
lng_recom = recom['Longitude'][1]

In [29]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate a map centred around Beirut

# add a red circle marker to represent my location
folium.features.CircleMarker(
    [a_center_lat, a_center_lng],
    radius=10,
    color='red',
    popup='Aggregate Center',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the coffee shops as blue circle markers
for lat, lng, label in zip(CS.lat, CS.lng, CS.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)
    
# add the universities as green circle markers
for lat, lng, label in zip(univ.lat, univ.lng, univ.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(venues_map)
    
for lat, lon in zip(latitudes, longitudes):
       folium.Circle([lat, lon], radius=300, color='black', fill=False).add_to(venues_map)

# Recommended area
folium.Circle(
    [lat_recom, lng_recom],
    radius=300,
    color='red',
    popup='recommended area',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.1
).add_to(venues_map)
        
# display map
venues_map