<h1>Paris restaurant analysis</h1>

<h2>Business Problem Description</h2>

<p>France is definitively the country of food and drinks</p>
<p>Having lived in Paris for a big part of my life, I somehow realised that each neighborhood have a singularity regarding food. Each neighborhood has a diverse range of cuisine from all around the world, but it seems that each neighborhood is more keen to have a particular type of cuisine.</p>
<p>Thinking about this, it raises some interesting questions for someone that wants to open a food business:</p>
<ul>
    <li>What are the more and less common specialities of food in the city of Paris ?</li>
    <li>What is each neighborhood food "speciality" and why ? Can we find some clusters and patterns ?</li>
    <li>When opening a restaurant, in which area should we open this new business depending on the type of food ? Should we open this restaurant in an area that contains a lot of this particular food scpeciality or in contrary open it in an area that has a shortage of this particular food kind ?</li>
</ul>
<p>We are going to look for answers to these questions in this analysis</p>
<p>Having answers to these questions can be trully helpfull for new investors that want to open their first restaurant in Paris to make their investment successfull and optimizing profitability of their future businesses.</p>


<h2>Our data</h2>

<p>The city of Paris is basically segmented into 20 districts that each has a postal code going from 75001 to 75020. We are going to use this districts to define our neighborhoods.</p>

<ul>
    <li>The coordinates of each district will be obtained using the <b>pgeocode</b> library, which is easy to use and return coordinates of the lacation with a postal code as input</li>
    <li>The information about restaurants will be obtained with the Foursquare API and loaded in a pandas dataframe</li>
    <li>For each postal code we will be getting the list of restaurants in a 1km radius and conduct analysis on this data to define the food particularities of each district. We will as well define clusters and try to find patterns in this data</li> 
</ul>

<h2>Methodology</h2>

<p>First let's import the necessary libraries in our notebook<p>

In [102]:
import pandas as pd
import numpy as np
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from  pandas import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library
import pgeocode #library to get coordinates with a postal code

<p>Let's create a dataframe which each row containing data about our 20 districts</p>

In [10]:
district_list = []
for i in range (1,21):
    district_list.append("District "+str(i))
district_df = pd.DataFrame(district_list)
district_df.columns = ['District']
district_df

Unnamed: 0,District
0,District 1
1,District 2
2,District 3
3,District 4
4,District 5
5,District 6
6,District 7
7,District 8
8,District 9
9,District 10


<p>Let's add the postal codes of each district to our dataframe</p>

In [13]:
postal_code = []
code = 75001
for i in range (1,21):
    postal_code.append(code)
    code +=1

district_df['Postal Code'] = postal_code
district_df

Unnamed: 0,District,Postal Code
0,District 1,75001
1,District 2,75002
2,District 3,75003
3,District 4,75004
4,District 5,75005
5,District 6,75006
6,District 7,75007
7,District 8,75008
8,District 9,75009
9,District 10,75010


<p>Now let's get the coordinates of each District in Paris using pgeocode</p>

In [11]:
#For example first district is 75001
nomi = pgeocode.Nominatim('fr')
nomi.query_postal_code("75001")

postal_code                 75001
country_code                   FR
place_name        Paris 01, Paris
state_name          Île-de-France
state_code                   11.0
county_name                 Paris
county_code                    75
community_name              Paris
community_code                751
latitude                  48.8592
longitude                 2.34525
accuracy                        5
Name: 0, dtype: object

In [17]:
latitude = []
longitude = []
#Looping through the postal codes
for i in district_df['Postal Code']:
    data = nomi.query_postal_code(i)
    latitude.append(data.latitude)
    longitude.append(data.longitude)
print(latitude)
print(longitude)

[48.8592, 48.8655, 48.8637, 48.8601, 48.8448, 48.8534, 48.8565, 48.8763, 48.8718, 48.8709, 48.8574, 48.8412, 48.8322, 48.8331, 48.8412, 48.8637, 48.8835, 48.8925, 48.8817, 48.8646]
[2.34525, 2.3457, 2.35515, 2.3497500000000002, 2.34795, 2.3394000000000004, 2.3349, 2.33355, 2.3443500000000004, 2.35245, 2.3641500000000004, 2.3682, 2.35245, 2.3376, 2.3245500000000003, 2.31285, 2.33535, 2.3466, 2.3655, 2.3736]


<p>We have our coordinates, let's add them to our dataframe</p>

In [23]:
district_df['Latitude'] = latitude
district_df['Longitude'] = longitude
district_df

Unnamed: 0,District,Postal Code,Latitude,Longitude
0,District 1,75001,48.8592,2.34525
1,District 2,75002,48.8655,2.3457
2,District 3,75003,48.8637,2.35515
3,District 4,75004,48.8601,2.34975
4,District 5,75005,48.8448,2.34795
5,District 6,75006,48.8534,2.3394
6,District 7,75007,48.8565,2.3349
7,District 8,75008,48.8763,2.33355
8,District 9,75009,48.8718,2.34435
9,District 10,75010,48.8709,2.35245


<p>So we have our dataframe with the necessary coordinates for each district. Let's plot the data using Folium to have an overview of the positioning of our districts</p>

<p>First, let's get the global coordinates of Paris to position our map:</p>

In [56]:
address = 'Paris'

geolocator = Nominatim(user_agent="paris")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


<p>Now, let's plot our graph:</p>

In [52]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, district in zip(district_df['Latitude'], district_df['Longitude'], district_df['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)
map_clusters

<p>As we can see on the map, we have two problems here:</p>
    <ul>
        <li>We don't have enough data points to cover the entire Paris area
        <li>Those points are not equally distributed around the map
    </ul>
<p>Therefore, we will need to consider other options to correctly map our Paris neighborhoods</p>  

<p>Luckilly, the city of Paris has a website "Paris data" (<a href = 'https://opendata.paris.fr'>link</a>) that contains a public dataset of all administrative neighborhoods of Paris with all the information we need ! Let's import the file and create a dataset with this data:</p>

In [45]:
district_detail = pd.read_csv(r'quartier_paris.csv',sep=';', lineterminator='\r')
district_detail

Unnamed: 0,N_SQ_QU,C_QU,C_QUINSEE,L_QU,C_AR,N_SQ_AR,PERIMETRE,SURFACE,Geometry X Y,Geometry
0,\n750000015,15.0,7510403.0,Arsenal,4.0,750000004.0,2878.559656,4.872649e+05,"48.851585175,2.36476795387","{""type"": ""Polygon"", ""coordinates"": [[[2.368512..."
1,\n750000018,18.0,7510502.0,Jardin-des-Plantes,5.0,750000005.0,4052.729521,7.983894e+05,"48.8419401934,2.35689388962","{""type"": ""Polygon"", ""coordinates"": [[[2.364561..."
2,\n750000039,39.0,7511003.0,Porte-Saint-Martin,10.0,750000010.0,3245.891413,6.090347e+05,"48.8712446509,2.36150364735","{""type"": ""Polygon"", ""coordinates"": [[[2.363917..."
3,\n750000043,43.0,7511103.0,Roquette,11.0,750000011.0,4973.010557,1.172087e+06,"48.8570640408,2.38036406173","{""type"": ""Polygon"", ""coordinates"": [[[2.379720..."
4,\n750000046,46.0,7511202.0,Picpus,12.0,750000012.0,18261.910318,7.205014e+06,"48.8303592424,2.42882681508","{""type"": ""Polygon"", ""coordinates"": [[[2.411249..."
...,...,...,...,...,...,...,...,...,...,...
76,\n750000032,32.0,7510804.0,Europe,8.0,750000008.0,4803.242769,1.182467e+06,"48.8781476759,2.3171746113","{""type"": ""Polygon"", ""coordinates"": [[[2.312293..."
77,\n750000044,44.0,7511104.0,Sainte-Marguerite,11.0,750000011.0,4591.310799,9.296092e+05,"48.852096507,2.3887648336","{""type"": ""Polygon"", ""coordinates"": [[[2.396236..."
78,\n750000054,54.0,7511402.0,Parc-de-Montsouris,14.0,750000014.0,5224.265369,1.357950e+06,"48.8234527716,2.33707017986","{""type"": ""Polygon"", ""coordinates"": [[[2.343996..."
79,\n750000057,57.0,7511501.0,Saint-Lambert,15.0,750000015.0,6928.792072,2.829202e+06,"48.8342936284,2.29691997445","{""type"": ""Polygon"", ""coordinates"": [[[2.304248..."


<p>We have a bunch of data, including perimeter, surface, delimitations, coordinates, insee nomenclature. Let's keep in our dataframe the area names, district number, perimeter, surface and coordinates and rename the columns</p>

In [46]:
district_detail = district_detail.drop(columns=['N_SQ_QU', 'C_QUINSEE', 'N_SQ_AR'])
district_detail = district_detail.rename(columns={"C_QU": "Neighborhood Number", "L_QU": "Neighborhood Name", "C_AR": "District", "PERIMETRE": "Perimeter", "SURFACE": "Surface"})
district_detail

Unnamed: 0,Neighborhood Number,Neighborhood Name,District,Perimeter,Surface,Geometry X Y,Geometry
0,15.0,Arsenal,4.0,2878.559656,4.872649e+05,"48.851585175,2.36476795387","{""type"": ""Polygon"", ""coordinates"": [[[2.368512..."
1,18.0,Jardin-des-Plantes,5.0,4052.729521,7.983894e+05,"48.8419401934,2.35689388962","{""type"": ""Polygon"", ""coordinates"": [[[2.364561..."
2,39.0,Porte-Saint-Martin,10.0,3245.891413,6.090347e+05,"48.8712446509,2.36150364735","{""type"": ""Polygon"", ""coordinates"": [[[2.363917..."
3,43.0,Roquette,11.0,4973.010557,1.172087e+06,"48.8570640408,2.38036406173","{""type"": ""Polygon"", ""coordinates"": [[[2.379720..."
4,46.0,Picpus,12.0,18261.910318,7.205014e+06,"48.8303592424,2.42882681508","{""type"": ""Polygon"", ""coordinates"": [[[2.411249..."
...,...,...,...,...,...,...,...
76,32.0,Europe,8.0,4803.242769,1.182467e+06,"48.8781476759,2.3171746113","{""type"": ""Polygon"", ""coordinates"": [[[2.312293..."
77,44.0,Sainte-Marguerite,11.0,4591.310799,9.296092e+05,"48.852096507,2.3887648336","{""type"": ""Polygon"", ""coordinates"": [[[2.396236..."
78,54.0,Parc-de-Montsouris,14.0,5224.265369,1.357950e+06,"48.8234527716,2.33707017986","{""type"": ""Polygon"", ""coordinates"": [[[2.343996..."
79,57.0,Saint-Lambert,15.0,6928.792072,2.829202e+06,"48.8342936284,2.29691997445","{""type"": ""Polygon"", ""coordinates"": [[[2.304248..."


<p>Let's erase the null values in the dataframe</p>

In [47]:
district_detail = district_detail.dropna()
district_detail

Unnamed: 0,Neighborhood Number,Neighborhood Name,District,Perimeter,Surface,Geometry X Y,Geometry
0,15.0,Arsenal,4.0,2878.559656,4.872649e+05,"48.851585175,2.36476795387","{""type"": ""Polygon"", ""coordinates"": [[[2.368512..."
1,18.0,Jardin-des-Plantes,5.0,4052.729521,7.983894e+05,"48.8419401934,2.35689388962","{""type"": ""Polygon"", ""coordinates"": [[[2.364561..."
2,39.0,Porte-Saint-Martin,10.0,3245.891413,6.090347e+05,"48.8712446509,2.36150364735","{""type"": ""Polygon"", ""coordinates"": [[[2.363917..."
3,43.0,Roquette,11.0,4973.010557,1.172087e+06,"48.8570640408,2.38036406173","{""type"": ""Polygon"", ""coordinates"": [[[2.379720..."
4,46.0,Picpus,12.0,18261.910318,7.205014e+06,"48.8303592424,2.42882681508","{""type"": ""Polygon"", ""coordinates"": [[[2.411249..."
...,...,...,...,...,...,...,...
75,71.0,Goutte-d'Or,18.0,5176.406895,1.089226e+06,"48.8921381876,2.3555361633","{""type"": ""Polygon"", ""coordinates"": [[[2.349667..."
76,32.0,Europe,8.0,4803.242769,1.182467e+06,"48.8781476759,2.3171746113","{""type"": ""Polygon"", ""coordinates"": [[[2.312293..."
77,44.0,Sainte-Marguerite,11.0,4591.310799,9.296092e+05,"48.852096507,2.3887648336","{""type"": ""Polygon"", ""coordinates"": [[[2.396236..."
78,54.0,Parc-de-Montsouris,14.0,5224.265369,1.357950e+06,"48.8234527716,2.33707017986","{""type"": ""Polygon"", ""coordinates"": [[[2.343996..."


<p>Finally let's separate the X Y coordinates into two columns Latitude and Longitude, erase the Geometry column and reorder the columns</p>

In [59]:
district_detail[['Latitude','Longitude']] = district_detail['Geometry X Y'].str.split(',',expand=True)
district_detail = district_detail.drop(columns=['Geometry X Y'])
district_detail

Unnamed: 0,Neighborhood Number,Neighborhood Name,District,Perimeter,Surface,Geometry,Latitude,Longitude
0,15.0,Arsenal,4.0,2878.559656,4.872649e+05,"{""type"": ""Polygon"", ""coordinates"": [[[2.368512...",48.851585175,2.36476795387
1,18.0,Jardin-des-Plantes,5.0,4052.729521,7.983894e+05,"{""type"": ""Polygon"", ""coordinates"": [[[2.364561...",48.8419401934,2.35689388962
2,39.0,Porte-Saint-Martin,10.0,3245.891413,6.090347e+05,"{""type"": ""Polygon"", ""coordinates"": [[[2.363917...",48.8712446509,2.36150364735
3,43.0,Roquette,11.0,4973.010557,1.172087e+06,"{""type"": ""Polygon"", ""coordinates"": [[[2.379720...",48.8570640408,2.38036406173
4,46.0,Picpus,12.0,18261.910318,7.205014e+06,"{""type"": ""Polygon"", ""coordinates"": [[[2.411249...",48.8303592424,2.42882681508
...,...,...,...,...,...,...,...,...
75,71.0,Goutte-d'Or,18.0,5176.406895,1.089226e+06,"{""type"": ""Polygon"", ""coordinates"": [[[2.349667...",48.8921381876,2.3555361633
76,32.0,Europe,8.0,4803.242769,1.182467e+06,"{""type"": ""Polygon"", ""coordinates"": [[[2.312293...",48.8781476759,2.3171746113
77,44.0,Sainte-Marguerite,11.0,4591.310799,9.296092e+05,"{""type"": ""Polygon"", ""coordinates"": [[[2.396236...",48.852096507,2.3887648336
78,54.0,Parc-de-Montsouris,14.0,5224.265369,1.357950e+06,"{""type"": ""Polygon"", ""coordinates"": [[[2.343996...",48.8234527716,2.33707017986


<p>Now we have a clean dataset of the Paris Neighborhoods that we will use to conduct our analysis</p>

<p>Let's visualize our list of neighborhoods:</p>

In [62]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, neighborhood, district in zip(district_detail['Latitude'], district_detail['Longitude'], district_detail['Neighborhood Name'], district_detail['District']):
    label = '{}, {}'.format(neighborhood, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)
map_clusters

<p>Much better ! We now have a complete list of Paris 80 Neighborhoods that are covering most of Paris areas. We can use this list to conduct our analysis</p>

<p>Now let's use Foursquare to get the list of restaurant that are in a radius of 400m of our neighborhoods datapoints</p>
<p>Let's start with our first neighborhood:</p>

In [66]:
neighborhood_lat = district_detail['Latitude'].iloc[0]
neighborhood_lon = district_detail['Longitude'].iloc[0]
print(neighborhood_lat)
print(neighborhood_lon)

48.851585175
2.36476795387


In [143]:
CLIENT_ID = 'HLBVAQVRSCO0HJKKDGPBFTORX2AXTJBNNFTSQKTW5GXX4WAX' 
CLIENT_SECRET = 'FHUAZGABZ5130R1SOLY4SFEZWBAHIN4M2G2V0QUFUONGE205' 
VERSION = '20180605' 
LIMIT = 100 

In [120]:
radius = 400
url = 'https://api.foursquare.com/v2/venues/search?categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
#url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_lat, 
    neighborhood_lon, 
    radius, 
    LIMIT)

results = requests.get(url).json()
#results

In [78]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [124]:
venues = results['response']['venues']
    
nearby_restaurants = json_normalize(venues) # flatten JSON

# filter columns
#filtered_columns = nearby_venues['name', 'categories', 'location.lat', 'location.lng']
nearby_restaurants =nearby_restaurants[['name', 'categories', 'location.lat', 'location.lng']]

# filter the category for each row
nearby_restaurants['categories'] = nearby_restaurants.apply(get_category_type, axis=1)

#rename columns
nearby_restaurants = nearby_restaurants.rename(columns={"location.lat": "latitude", "location.lng": "longitude"})

#nearby_venues
nearby_restaurants

Unnamed: 0,name,categories,latitude,longitude
0,The Grilled Cheese Factory,Sandwich Place,48.852796,2.367617
1,Le Temps des Cerises,French Restaurant,48.852554,2.364195
2,Le Sully,French Restaurant,48.851033,2.362125
3,Boulangerie Saint-Antoine,Bakery,48.853633,2.36522
4,Le Rotanah,Thai Restaurant,48.852778,2.366375
5,Cucina Napoletana,Italian Restaurant,48.853004,2.366237
6,Maeum,Café,48.852114,2.366656
7,HD Diner Bastille,Diner,48.85268,2.367173
8,Café Français,French Restaurant,48.853025,2.368484
9,Le Rempart,Café,48.853561,2.36647


In [125]:
nearby_restaurants.shape

(49, 4)

<p>So we have a list of 49 restaurants, let's define a dataset where we find the occurence for each food category</p>

In [140]:
#count occurences of categories
nearby_rest_agg = nearby_restaurants.groupby(['categories']).count()
#drop latitude and longitude
nearby_rest_agg = nearby_rest_agg.drop(columns=['latitude', 'longitude'])
#rename column
nearby_rest_agg = nearby_rest_agg.rename(columns={"name": "count"})
#sorting by count
nearby_rest_agg = nearby_rest_agg.sort_values(by=['count'], ascending=False)
nearby_rest_agg

Unnamed: 0_level_0,count
categories,Unnamed: 1_level_1
French Restaurant,7
Café,7
Italian Restaurant,7
Bistro,3
Coffee Shop,3
Diner,2
Restaurant,2
Japanese Restaurant,2
Asian Restaurant,2
Creperie,2


<p>We can see that we have a large variety of restaurants, maybe we can aggregate or drop some of the values that we have here. Here are the observations we can make:</p>
<ul>
    <li>Bistro can be added to french restaurant as well as since a bistro usually serve french food</li>
    <li>Coffee shop can be dropped since it is not properly speaking a place to eat</li>
    <li>Diner can be renamed as American restaurant</li>
</ul>