## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# INTRODUCTION: BUSINESS PROBLEM

Paris is one of the most dynamic cities in Europe. Before the pandemic, every day, millions of people go to the town to work, study, see each other, and enjoy its multi-cultural gastronomy. Therefore, the food and beverage service is an attractive market in the long term, although, at this moment, the pandemic hurts this sector. After the epidemic, I will open a Vietnamese take-away restaurant in Paris. We focus on serving students and young working people. Therefore, the idea location should be near to universities or coworking spaces. However, the concurrence from other take-away restaurants and other vietnamnese restaurants should be taken into consideration. In this project, I use the "power" of data science to find out the ideal location for my future restaurant.

# Data

As aformention in previous section, to make a decision, I take into consideration the followings factors:

* There are universities or coworking spaces near to the candidate location.

* Number of take-away restaurants near to the candidate location.

* Number of vietnamnese restaurants near to the candidate location.

For this purpose, I use the following data: 

* The address and location of universities and coworking space in Paris.

* The address and location of restaurants (including Vietnamese restaurant) near to the universities and coworking space. 

These data is collected from the Foursquare site.

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.11.8  |       ha878542_0         145 KB  conda-forge
    certifi-2020.11.8          |   py36h5fab9bb_0         150 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         392 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forg

In [2]:
pd.set_option('display.max_row', None)

Get the coordinate of Paris

In [3]:
address = 'Paris,france'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


In [4]:
# define the world map centered around Paris with a high zoom level
world_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display world map
world_map

Because my potential clients are students and youngs working in coworking spaces.
Therefore, I look for all universities and coworking spaces in Paris

In [5]:
#Foursquare account
version = '20201111'
city = 'Paris, France'

In [6]:
#categoryID for University and coworking space
UniversityID = '4bf58dd8d48988d1ae941735'
CoworkingID = '4bf58dd8d48988d174941735'
city = 'Paris%2C%20France'

In [7]:
def get_venues (client_id,client_secret,version,keyword,lat,lon,radius,limit=100,offset=0):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}&offset={}'.format(
            client_id, client_secret,version,lat,lon,keyword,radius,limit,offset)
    results = requests.get(url).json()['response']['groups'][0]['items']
    try:
        venues = [(item['venue']['id'],
                 item['venue']['name'],
                 item['venue']['categories'][0]['name'],
                 item['venue']['location']['lat'], 
                 item['venue']['location']['lng']) for item in results] 
        venues_df = pd.DataFrame(venues,columns = ['Id','Name','category','latitude','longitude'])
        return venues_df
    except:
        return None

In [8]:
# find all universities in paris
# Because, each request, Foursquare return only 100 result, we have to resque multi time to get all result
University_df = pd.DataFrame(columns = ['Id','Name','category','latitude','longitude'])
#get  result 0-100
result_df= get_venues(client_id,client_secret,version,'university',latitude,longitude,radius=6000,limit =100,offset= 0)
University_df = University_df.append(result_df)
#get result 100-200
result_df= get_venues(client_id,client_secret,version,'university',latitude,longitude,radius=6000,limit =100,offset= 100)
University_df = University_df.append(result_df)
University_df.reset_index(drop=True,inplace = True)
University_df.head(5)

Unnamed: 0,Id,Name,category,latitude,longitude
0,4b522597f964a5205d6b27e3,UPMC – Université Pierre et Marie Curie,University,48.846936,2.354886
1,4e8e9b0adab454671ece48ca,Cours de Civilisation Française de la Sorbonne,University,48.851714,2.347674
2,5384a19b498e22b2a690b1d1,NYU in Paris,University,48.850427,2.346816
3,4cebf24bbaa6a1cd7a10416c,Université Paris I – Panthéon-Sorbonne,University,48.847046,2.343903
4,4b5eb4acf964a520809629e3,Maison de la Recherche,University,48.85244,2.341689


In [9]:
# Number of university
print("Number of univserity in paris: {}".format(University_df.shape[0]))

Number of univserity in paris: 113


In [10]:
University_df.groupby(by = 'category').count()

Unnamed: 0_level_0,Id,Name,latitude,longitude
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Building,1,1,1,1
College Auditorium,1,1,1,1
College Science Building,1,1,1,1
General College & University,3,3,3,3
Student Center,1,1,1,1
University,106,106,106,106


In [11]:
# find all coworking spaces in paris. There are several different categories:  coworking space, cafe 
# Because, each request, Foursquare return only 100 result, we have to resque multi time to get all result
Coworking_df = pd.DataFrame(columns = ['Id','Name','category','latitude','longitude'])
#get  result 0-100
result_df= get_venues(client_id,client_secret,version,'coworking%20space',latitude,longitude,radius=6000,limit =100,offset= 0)
Coworking_df = Coworking_df.append(result_df)
#get result 100-200
result_df= get_venues(client_id,client_secret,version,'coworking%20space',latitude,longitude,radius=6000,limit =100,offset= 100)
Coworking_df = Coworking_df.append(result_df)
Coworking_df.reset_index(drop = True,inplace = True)
Coworking_df.head(5)

Unnamed: 0,Id,Name,category,latitude,longitude
0,565071c3498e84bcd5ea4e34,Nuage Café,Coffee Shop,48.849245,2.347605
1,51725b7b498e39cff9adcdef,Anticafé Beaubourg,Coffee Shop,48.862301,2.351142
2,55e80d34498e4e52001fe3b9,Hubsy,Coffee Shop,48.865775,2.354276
3,589f2cef9343e07629b7639e,Hubsy | Café & Coworking,Coffee Shop,48.871241,2.360203
4,5318c03b498e5ea5cf57b72d,Anticafé Louvre,Café,48.864245,2.336242


In [12]:
#veryfy the categories
Coworking_df.groupby(by = 'category').count()

Unnamed: 0_level_0,Id,Name,latitude,longitude
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Business Center,1,1,1,1
Café,1,1,1,1
Car Wash,1,1,1,1
Coffee Shop,4,4,4,4
Coworking Space,78,78,78,78
Office,1,1,1,1
Tech Startup,1,1,1,1


In [13]:
# we will take into account only the Coworking space category
Coworking_df = Coworking_df[Coworking_df['category']=='Coworking Space']
Coworking_df.reset_index(drop=True,inplace = True)
Coworking_df.shape

(78, 5)

In [14]:
# mark all universities and coworking spaces as interesting points on the Paris map
paris_map = folium.Map(location =[latitude, longitude], zoom_start=13)
for lat,lon in zip(University_df.latitude,University_df.longitude):
    folium.features.CircleMarker(
            [lat, lon],
            radius=2, # define how big you want the circle markers to be
            color='blue',
            fill=True,
            fill_color='blue',
            fill_opacity=1
        ).add_to(paris_map)
    
for lat,lon in zip(Coworking_df.latitude,Coworking_df.longitude):
    folium.features.CircleMarker(
            [lat, lon],
            radius=2, # define how big you want the circle markers to be
            color='blue',
            fill=True,
            fill_color='blue',
            fill_opacity=1
    ).add_to(paris_map)
paris_map

In [15]:
#interesting point
interesting_point = University_df.append(Coworking_df)
interesting_point.reset_index(drop=True, inplace = True)

In [16]:
 interesting_point

Unnamed: 0,Id,Name,category,latitude,longitude
0,4b522597f964a5205d6b27e3,UPMC – Université Pierre et Marie Curie,University,48.846936,2.354886
1,4e8e9b0adab454671ece48ca,Cours de Civilisation Française de la Sorbonne,University,48.851714,2.347674
2,5384a19b498e22b2a690b1d1,NYU in Paris,University,48.850427,2.346816
3,4cebf24bbaa6a1cd7a10416c,Université Paris I – Panthéon-Sorbonne,University,48.847046,2.343903
4,4b5eb4acf964a520809629e3,Maison de la Recherche,University,48.85244,2.341689
5,4ece855377c8ea62f9eaf2e1,École des Beaux Arts,University,48.857814,2.363412
6,5278e70b11d29f6fce4615b8,Centre de Recherches Interdisciplinaires,University,48.853026,2.363029
7,4d5963507e2237043aafb073,Institut du Monde Anglophone - Paris 3 Sorbonn...,University,48.850627,2.341998
8,4f4cdb93754ad1acf73866b4,Ecole de langue française,University,48.86439,2.345173
9,4adcda09f964a520143421e3,Université Paris IV – Paris-Sorbonne,University,48.848774,2.343463


# Methodology

In this analysis, I aim to discover the areas in Paris near universities and co-working spaces. Because there, I could find my potential clients: students and young people. However, I have considered the competition from other restaurants in these areas (especially from other Vietnamese restaurants). For this purpose, I conduct my analysis based on the following workflow:
* Firstly, I determine the number of Vietnamese restaurants around the interesting points( universities and co-working spaces). These numbers represent the competition between existing Vietnamese restaurant. The interesting point with a high level of competition will be removed.
* Secondly, I cluster the remaining interesting point into different candidate zones. These zones have a radius of 800m. Inside these zones, the competition between Vietnamese restaurants is low, and the opportunity to reach the target client is still high.
* Thirdly, I consider the competition from the other restaurant inside the candidate zones by determining their number of restaurants. The zone has a high number of restaurants removed from the list of candidate zones.
* Finally, I determine the center address of each remaining zones. This address is the starting point to do more further analysis and searches. 

We evaluate the interesting point based on two parameters:
*Number of Vietnamese restaurants within 800m. The higher number, the higher competition we have.
*Number of Vietnamese restaurants within 200m. If there is a Vietnamese restaurant too close to their office or university, the potential could not want to go further to find another Vietnamese restaurant 

In [17]:
# Count vietnamese restaurants with in 800m from each interesting point
near_800_vietnamese_restaurant = []
for interesting_point_lat, interesting_point_lon in zip (interesting_point.latitude,interesting_point.longitude):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&query=Vietnamese%20restaurant&radius=800&limit=500&offset=0'.format(
            client_id, client_secret,version, interesting_point_lat, interesting_point_lon)
    near_800_vietnamese_restaurant.append(requests.get(url).json()['response']['totalResults'])
    print(".", end = '')
print("done!")
interesting_point['800m_Viet_restaurant'] = near_800_vietnamese_restaurant

...............................................................................................................................................................................................done!


In [18]:
# Count vietnamese restaurants with in 200m from each interesting point
near_200_vietnamese_restaurant = []
for interesting_point_lat, interesting_point_lon in zip (interesting_point.latitude,interesting_point.longitude):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&query=Vietnamese%20restaurant&radius=200&limit=500&offset=0'.format(
            client_id, client_secret,version, interesting_point_lat, interesting_point_lon)
    near_200_vietnamese_restaurant.append(requests.get(url).json()['response']['totalResults'])
    print(".", end = '')
print("done!")
interesting_point['200m_Viet_restaurant'] = near_200_vietnamese_restaurant


...............................................................................................................................................................................................done!


In [19]:

interesting_point.describe()

Unnamed: 0,latitude,longitude,800m_Viet_restaurant,200m_Viet_restaurant
count,191.0,191.0,191.0,191.0
mean,48.859959,2.345096,8.26178,0.858639
std,0.020152,0.031364,7.109264,1.212182
min,48.810302,2.273655,0.0,0.0
25%,48.845742,2.327573,3.0,0.0
50%,48.861786,2.345173,6.0,0.0
75%,48.873755,2.366067,11.0,1.0
max,48.908361,2.42592,39.0,6.0


Take a look at the table data above. The number of Vietnamese restaurants within 800m around an interesting point could reach 36 !!!!! 
Let find where this interesting point is. I think it is in the 13th district of Paris - the Asian district.

In [20]:
# Display all crazy points (there are more than 18 vietnamese restaurant within 800m arround)
Crazy_point = interesting_point[interesting_point['800m_Viet_restaurant']>18]
paris_map = folium.Map(location =[latitude, longitude], zoom_start=13)
for lat,lon in zip(Crazy_point.latitude,Crazy_point.longitude):
    folium.features.CircleMarker(
            [lat, lon],
            radius=2, # define how big you want the circle markers to be
            color='blue',
            fill=True,
            fill_color='blue',
            fill_opacity=1
        ).add_to(paris_map)
paris_map

As my prediction, the crazy points are in the 13th district because there are many Vietnamese people there.
The crazy point also locates in the center of the city - the 1st district. It's understandable. because in the city center, the restaurant density is very high
But I'm amazed by the crazy point located in the east of Paris. I don't know much about this area.

For my own business, I'm interested in the points located in low concurrent zones. The points have less than 3 Vietnamese restaurants (quartile 25%) within 800m and have no Vietnamese restaurants within 200m.

In [21]:
#refine the interesting point list
interesting_point = interesting_point[interesting_point['800m_Viet_restaurant']<3]
interesting_point = interesting_point[interesting_point['200m_Viet_restaurant']<1]
interesting_point

Unnamed: 0,Id,Name,category,latitude,longitude,800m_Viet_restaurant,200m_Viet_restaurant
30,4b2bf7b8f964a520acbe24e3,Sciences Po,University,48.852589,2.32706,2,0
35,4bac6ecaf964a520dff43ae3,Collège des ingénieurs,University,48.856912,2.324643,2,0
51,4d4bc0f3b496b60cf442c800,École des Hautes Études en Sciences Sociales,University,48.836026,2.372154,1,0
65,4b7d5725f964a5202eb92fe3,Sup de Pub Quai de Seine,University,48.885946,2.372424,1,0
79,5073cd6de4b092f5e7ec261b,UFCV,University,48.895914,2.383331,2,0
80,4c98a842d799a1cdf6ddb352,Intuit Lab,University,48.844073,2.284931,2,0
81,4b0a3d60f964a520a92223e3,HETIC,University,48.851338,2.420593,1,0
82,4c4713d519fde21e472a0776,In'Tech INFO,University,48.814095,2.377397,2,0
86,4f911eb6754adf72d82c4986,Sup de Web,University,48.857097,2.28112,1,0
87,558c2432498eb89dbee64e75,Institut Le Cordon Bleu,University,48.848162,2.280854,1,0


In [22]:
#Show the interest point after refining on the map
paris_map = folium.Map(location =[latitude, longitude], zoom_start=12)
for lat,lon in zip(interesting_point.latitude,interesting_point.longitude):
    folium.features.CircleMarker(
            [lat, lon],
            radius=2, # define how big you want the circle markers to be
            color='blue',
            fill=True,
            fill_color='blue',
            fill_opacity=1
        ).add_to(paris_map)
paris_map

In [23]:
number_of_clusters = 16
interesting_zones = KMeans(n_clusters=number_of_clusters, random_state=0).fit(interesting_point[['latitude','longitude']].values)
interesting_zone_centers = interesting_zones.cluster_centers_
candidate_location = pd.DataFrame(interesting_zone_centers, columns = ['lat','lon'])
candidate_location

Unnamed: 0,lat,lon
0,48.908361,2.343871
1,48.817218,2.39461
2,48.840436,2.281655
3,48.854772,2.422224
4,48.882923,2.3721
5,48.855077,2.325462
6,48.886328,2.288336
7,48.836036,2.37217
8,48.903905,2.313687
9,48.813677,2.302715


In [24]:
paris_map = folium.Map(location =[latitude, longitude], zoom_start=12)
for lat,lon, category in zip(interesting_point.latitude,interesting_point.longitude,interesting_point.category):
    folium.features.CircleMarker( [lat, lon],radius=2, color='blue',fill=True,fill_color='blue',fill_opacity=1).add_to(paris_map)
    folium.Marker([lat, lon],popup = category).add_to(paris_map)
    
for centers in interesting_zone_centers:
    #folium.features.CircleMarker( centers, radius=2, color='red', fill=False).add_to(paris_map)
    folium.Circle(centers, radius=500, color='red', fill=True,fill_color='while', fill_opacity=0.5).add_to(paris_map)
paris_map

In [25]:
#  the number of restaurants with in 800m from the interesting points
near_800_restaurant = []
for lat,lon in zip(candidate_location.lat,candidate_location.lon):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius=800&limit=500&offset=0'.format(
            client_id, client_secret,version, lat, lon,'restaurant')
    near_800_restaurant.append(requests.get(url).json()['response']['totalResults'])
    print(".", end = '')
print("done!")
candidate_location['800m_restaurant'] = near_800_restaurant
candidate_location

................done!


Unnamed: 0,lat,lon,800m_restaurant
0,48.908361,2.343871,22
1,48.817218,2.39461,15
2,48.840436,2.281655,109
3,48.854772,2.422224,43
4,48.882923,2.3721,95
5,48.855077,2.325462,127
6,48.886328,2.288336,96
7,48.836036,2.37217,63
8,48.903905,2.313687,41
9,48.813677,2.302715,30


In [26]:
candidate_location.describe()

Unnamed: 0,lat,lon,800m_restaurant
count,16.0,16.0,16.0
mean,48.861738,2.341327,65.125
std,0.03225,0.046577,43.093503
min,48.813677,2.273655,15.0
25%,48.839336,2.29912,28.0
50%,48.855632,2.352786,53.0
75%,48.888724,2.373477,99.25
max,48.908361,2.422224,136.0


In [27]:
candidate_location = candidate_location[candidate_location['800m_restaurant']<58]
candidate_location

Unnamed: 0,lat,lon,800m_restaurant
0,48.908361,2.343871,22
1,48.817218,2.39461,15
3,48.854772,2.422224,43
8,48.903905,2.313687,41
9,48.813677,2.302715,30
10,48.900141,2.361701,20
11,48.870107,2.273655,42
15,48.814095,2.377397,15


In [28]:
# Show the candidate location
paris_map = folium.Map(location =[latitude, longitude], zoom_start=12)
    
for centers in zip(candidate_location.lat,candidate_location.lon):
    #folium.features.CircleMarker( centers, radius=2, color='red', fill=False).add_to(paris_map)
    folium.Circle(centers, radius=800, color='red', fill=True,fill_color='while', fill_opacity=0.5).add_to(paris_map)
    folium.Marker(centers).add_to(paris_map)
paris_map

In [29]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

Finally, I reserve the address of the candidate location center:

In [30]:

candidate_location_addresses = []
for lat, lon in zip(candidate_location.lat, candidate_location.lon):
    addr = get_address(google_api_key, lat, lon)
    candidate_location_addresses.append(addr)   
candidate_location['center address'] = candidate_location_addresses
candidate_location.reset_index(drop = True, inplace = True)
candidate_location

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,lat,lon,800m_restaurant,center address
0,48.908361,2.343871,22,"56-74 Avenue Michelet, 93400 Saint-Ouen, France"
1,48.817218,2.39461,15,"44 Rue Jean Jacques Rousseau, 94200 Ivry-sur-S..."
2,48.854772,2.422224,43,"4 Rue Paul Eluard, 93100 Montreuil, France"
3,48.903905,2.313687,41,"18 Impasse Dumur, 92110 Clichy, France"
4,48.813677,2.302715,30,"187 Avenue Pierre Brossolette, 92120 Montrouge..."
5,48.900141,2.361701,20,"54 Boulevard Ney, 75018 Paris, France"
6,48.870107,2.273655,42,"16 Boulevard Lannes, 75116 Paris, France"
7,48.814095,2.377397,15,"74 bis Avenue Maurice Thorez, 94200 Ivry-sur-S..."


There are 8 address that represent the center of the potential zone for my future restaurant. Each potential zone is a circle with radius of 800m. 

# Discussion 

As the result above, I identify eight potential zones for my future restaurant. These zones are quite far from the city center. It is normal because the competition is very high in the city center with many restaurants, including Vietnamese restaurants. Although there could be fewer activities and habitats in these zones than in the city center, the concurrent in these zones is lower than in the city center (especially, some interesting points are having more than 30 Vietnamese restaurants around in 1st and 13th district). Moreover, there are coworking spaces or universities within the identified zone. I hope there are enough potential clients for my new restaurant.

# Conclusion

In this project, I look for a potential location for my new Vietnamese restaurant in Paris. I focus on areas near universities or coworking spaces, where many young people are my target client. For this purpose, firstly, I identify all universities and coworking space in Paris. I call them as interesting points. Then, I count Vietnamese restaurants near these points (within 800m around and 200m around). The more restaurants there are, the higher competition I have. Therefore, I remove the interesting point with high competition from my list. Next, the remaining interesting points are clustered into 16 zones with a radius of 800m. I call them the potential zone where I could find a good location. Besides the Vietnamese restaurants, I take into account also the competition from other restaurants. I count the number of restaurants presented in the potential zones. The zone has more than 58 restaurants are removed. In the end, it remains eight potential zones with low competition but near enough to the interesting points. The identified zones locate in four corners of Paris: Nord, South, west, and south.   The west zones (16th district) and the east zones (Montreuil borough). These are affluent areas of Paris with many offices. Therefore, I prefer to do more analysis in these zone firsts.      