# The Wine bar project 

### Introduction & Business problem

In case my career as Data scientist fails (*let's hope it doesn't*), I want to open a wine bar in Paris, France. <br/> 
Of course wine, **I'm french !** <br/>

The problem is that, from my experience, Paris has multiple areas where people go out for a drink and these areas are not concentrated but rather spread around the city. <br/>

Therefore, where is the best location to open a new wine bar to ensure enough clients to be successful ? <br/>

To ensure success, I need the bar to be in a location where the concentration of venues such as theaters, cinemas, restaurants demonstrates an active life in the area. Using the Foursquare data, I will geolocate the venues and find the best spot to open my wine bar.

### Data section

To provide an analytical answer to the business problem of where to open my future wine bar in Paris I will do :<br/>
- A segmentation of Paris inner-city using a .geojson file
- Venues data related to the neighborhoods using Foursquare API (Category of the venue, customer rating, ...)

### Methodology

 Section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

In [1]:
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geopy.distance
from math import sqrt

#### Loading the Paris coordinates

In [2]:
with open('arrondissements.geojson') as json_data:
    parisarr = json.load(json_data)
    
par_data = parisarr['features']
colnames = ['PostCode', 'Neighborhood', 'Latitude', 'Longitude']
dfparis = pd.DataFrame(columns=colnames)

In [3]:
for d in par_data: 
    latlon = d['properties']['geom_x_y']
    code = d['properties']['c_ar']    
    neigh = d['properties']['l_aroff']
    
    lat = latlon[0]
    lon = latlon[1]
    dfparis= dfparis.append({'PostCode' : code, 'Neighborhood' : neigh, 'Latitude' : lat, 'Longitude' : lon}, ignore_index=True)   

dfparis.head()

Unnamed: 0,PostCode,Neighborhood,Latitude,Longitude
0,3,Temple,48.862872,2.360001
1,1,Louvre,48.862563,2.336443
2,5,Panthéon,48.844443,2.350715
3,6,Luxembourg,48.84913,2.332898
4,12,Reuilly,48.834974,2.421325


In [4]:
address = 'Paris, France'

geolocator = Nominatim(user_agent="par_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


#### Creation of a map of Paris, using Follium

In [5]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(dfparis['Latitude'], dfparis['Longitude'], dfparis['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

The above map shows Paris with the the center coordinates of its 20 arrondissements (neighborhoods).

In [6]:
df_coor = dfparis[['Latitude', 'Longitude']]
dfparis['Distance from center'] = ''

In [7]:
#Function to calculate the distance of center coordinates of each neighborgood to the center of Paris
def calc_xy_distance(coords_1, coords_2):
    return geopy.distance.vincenty(coords_1, coords_2).m

In [8]:
for i in range(0, len(df_coor)):
    dfparis['Distance from center'][i] = calc_xy_distance((df_coor['Latitude'][i], df_coor['Longitude'][i]), (latitude, longitude))
    
dfparis.head()

  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,PostCode,Neighborhood,Latitude,Longitude,Distance from center
0,3,Temple,48.862872,2.360001,929.653
1,1,Louvre,48.862563,2.336443,1280.59
2,5,Panthéon,48.844443,2.350715,1363.8
3,6,Luxembourg,48.84913,2.332898,1601.24
4,12,Reuilly,48.834974,2.421325,5668.31


Now let's identify the venues around each of these center coordinates of the city using the **Foursquare API**.

#### Foursquare

Let's use Foursquare API to get info on wine bars in each neighborhood.<br/>

We're interested in venues in 'Night life' category, since the density will indicate the activity of the area. Also, we are interested in areas where there is a good density of bars, nightclubs and pubs but less wine bars. We will include in out list only venues that have 'wine bar' in the category name.

In [9]:
#Foursquare Credentials
CLIENT_ID = 'JO31W52NKMLMEQBPQ3GSRBK3FKRXIIJLIFKSRNDDTC5K1Q23' # your Foursquare ID
CLIENT_SECRET = 'XVGAMH0OCJG03ALF5ONIWJN3CJ5TOMKTST0ECRVRKQVCVHNL' # your Foursquare Secret

Let's send a query to retrieve the venues using Foursquare API. To do so, we will send a query to Foursquare for each Paris' neighborhood coordinates and look for venues in the *Night life* category. <br/>

In [10]:
# Category IDs corresponding to Night life, Bars and Wine bar were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

category_id = '4d4b7105d754a06376d81259' #Night life
#= '4bf58dd8d48988d116941735' #Bar
sub_category_id = '4bf58dd8d48988d123941735' #Wine bar category

In [26]:
bars = {}
wine_bars = {}
location_bars = []  

In [30]:
for i in range(0, len(dfparis)):
    lat = dfparis['Latitude'][i]
    lon = dfparis['Longitude'][i]
    
    venues = get_venues_near_location(lat, lon, category_id, CLIENT_ID, CLIENT_SECRET, radius=500, limit=100)    
    area_bars = []
    
    for venue in venues:
        venue_id = venue[0]
        venue_name = venue[1]        
        venue_categories = venue[2]        
        venue_latlon = venue[3]
        venue_address = venue[4]
        venue_distance = venue[5]        
        venue_bar = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance)
        
        if venue_categories[0][1] == sub_category_id:
            wine_bars[venue_id] = venue_bar

('52505ced11d2d39c3cd942a6', 'Les Enfants Rouges', [('Wine Bar', '4bf58dd8d48988d123941735')], (48.863012925486814, 2.3612595453403116), '9 rue de Beauce, 75003 Paris, France', 93)
('4f4e96eae4b0a99a78161d9e', "Bar de l'Hôtel Jules et Jim", [('Hotel Bar', '4bf58dd8d48988d1d5941735')], (48.86346264663714, 2.357393198185953), '11 rue des Gravilliers, 75003 Paris, France', 201)
('5079f1e0e4b0eb8b83f90b0d', 'Little Red Door', [('Speakeasy', '4bf58dd8d48988d1d4941735')], (48.863702712072616, 2.3635136711780285), '60 rue Charlot, 75003 Paris, France', 273)
('4d77b39caf63cbff3997be0f', 'Candelaria', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.86303212989039, 2.3640589318953125), '56 rue de Saintonge, 75003 Paris, France', 297)
('527da64111d24c0b4ed20c75', 'Monsieur Henri', [('Wine Bar', '4bf58dd8d48988d123941735')], (48.86343077268416, 2.3623533759093167), '8 rue de Picardie, 75003 Paris, France', 183)
('4b68a117f964a520c8832be3', 'Le Barav', [('Wine Bar', '4bf58dd8d48988d123941735')]

('54469172498ed34e0178dc37', 'Brewberry Bar', [('Bar', '4bf58dd8d48988d116941735')], (48.84294546857269, 2.3486221990301255), '11 rue du Pot-de-Fer, 75005 Paris, France', 226)
('527bf21211d2285e6ca8fa26', 'Little Bastards', [('Bar', '4bf58dd8d48988d116941735')], (48.84452843080229, 2.3488003104567503), '5 rue Blainville, 75005 Paris, France', 140)
('517c14d2e4b0eef574ad0b91', 'Casa Hugo', [('Tapas Restaurant', '4bf58dd8d48988d1db931735')], (48.84510364869614, 2.3521898310752545), '48 rue Monge, 75005 Paris, France', 130)
('5535621c498ec2336c36ed7f', 'Nossa Churrasqueira', [('Portuguese Restaurant', '4def73e84765ae376e57713a')], (48.8473706224662, 2.3481693863868713), "1 rue de l'École Polytechnique, 75005 Paris, France", 375)
('5530cc1f498ea6903c33a3da', 'Bonvivant', [('Wine Bar', '4bf58dd8d48988d123941735')], (48.84755082513145, 2.3519687354564667), '7 rue des Écoles, 75005 Paris, France', 357)
('520d21d911d2769b08910957', 'Chicha Shop', [('Hookah Bar', '4bf58dd8d48988d119941735')], (

('5660a818498e87cb6fbcece5', 'Tiger', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.85212819617358, 2.3345545267982524), '13 rue Princesse, 75006 Paris, France', 355)
('4b06de60f964a520dcf122e3', 'Chez Georges', [('Wine Bar', '4bf58dd8d48988d123941735')], (48.85217316270022, 2.3337117120623367), '11 rue des Canettes, 75006 Paris, France', 343)
('4e67b1d8fa76f38efb3aaa09', 'Compagnie des Vins Surnaturels', [('Wine Bar', '4bf58dd8d48988d123941735')], (48.851740174158174, 2.33640631097926), '7 rue Lobineau, 75006 Paris, France', 387)
('4f873d63e4b0cec3a93d0845', 'Les Caves Alliées', [('Pub', '4bf58dd8d48988d11b941735')], (48.851927169343384, 2.337458256066046), '44 rue Grégoire de Tours, 75006 Paris, France', 456)
('4adcda08f964a520b63321e3', 'Chez Castel', [('Nightclub', '4bf58dd8d48988d11f941735')], (48.85206212491216, 2.334516175222025), '15 rue Princesse, 75006 Paris, France', 347)
('4c0acd91bbc676b0e5884ad5', 'Bodega de la Soif', [('Bar', '4bf58dd8d48988d116941735')], (48.85193

('514de425e4b0fb27135227df', 'O Soleil', [('Bar', '4bf58dd8d48988d116941735')], (48.82685705231591, 2.363627627491951), '98 rue de Tolbiac, 75013 Paris, France', 197)
('4df8e4add164d347cc73fdec', 'Le Jasmin', [('Hookah Bar', '4bf58dd8d48988d119941735')], (48.8316029701474, 2.360503993237203), '75013 Paris, France', 380)
('4c277d9f5c5ca5936dd447fe', 'La Place', [('French Restaurant', '4bf58dd8d48988d10c941735')], (48.83064143923074, 2.3568181693553925), '194 avenue de Choisy, 75013 Paris, France', 471)
('4d567f5248ea6ea8e34ae1a3', 'Le Rallye', [('Bar', '4bf58dd8d48988d116941735')], (48.827314448945295, 2.3685058541022492), '66 Rue de Tolbiac, 75013 Paris, France', 472)
('4b6094bcf964a520b3ee29e3', "Le Delly's", [('African Restaurant', '4bf58dd8d48988d1c8941735')], (48.878458299848, 2.357852395267173), '5 rue des Deux Gares, 75010 Paris, France', 333)
('4b0a99b8f964a520802523e3', 'Le Verre Volé - Le Bistrot', [('Wine Bar', '4bf58dd8d48988d123941735')], (48.872868755036365, 2.363668834185

('56f6ac70498e64a80be4d061', 'Fitzgerald', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.858561703791764, 2.3098137974739075), '54 boulevard de la Tour-Maubourg, 75007 Paris, France', 317)
('50810ddde4b0fedebcd47049', "L'Éclair", [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.85705811919089, 2.3062809651360836), '32 rue Cler, 75007 Paris, France', 443)
('4e0e478552b1b27c1b859a2e', 'La Crèmerie', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.856713305414935, 2.3064841510887444), '38 rue Cler, 75007 Paris, France', 422)
('4c506f5ff080a5936ccd86e2', 'Le Tourville', [('French Restaurant', '4bf58dd8d48988d10c941735')], (48.85442850825206, 2.306070823700922), '17 avenue de Tourville (Avenue de la Motte-Picquet), 75007 Paris, France', 488)
('4b91477ff964a52086af33e3', 'Au Canon des Invalides', [('Bar', '4bf58dd8d48988d116941735')], (48.860008295650324, 2.309709261585959), '54 rue Saint-Dominique, 75007 Paris, France', 463)
('4b560de9f964a5200cfe27e3', 'Café Central', [('Coffe

('50774a9be4b02bab185d282b', 'Bar du Bristol', [('Hotel Bar', '4bf58dd8d48988d1d5941735')], (48.87179631044972, 2.3150284778585632), 'Hôtel Bristol (112 rue du Faubourg Saint-Honoré), 75008 Paris, France', 208)
('57374a59498ecfdea755e4be', 'Gentlemen 1919', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.87093302661043, 2.3115142351802933), '11 rue Jean Mermoz, 75008 Paris, France', 213)
('4bca515b937ca593eeb1a792', 'Le Matignon', [('French Restaurant', '4bf58dd8d48988d10c941735')], (48.869708, 2.310968), '3 avenue Matignon, 75008 Paris, France', 354)
('4f3ca819e4b0d8b9d9f073ca', 'Le 105', [('Bar', '4bf58dd8d48988d116941735')], (48.87242023672575, 2.3112085461616516), '105 rue du Faubourg Saint-Honoré, 75008 Paris, France', 104)
('5a465a722e26807a24d01a82', 'Le Chat Blanc', [('Brasserie', '57558b36e4b065ecebd306b0')], (48.871673, 2.310028), '61 Avenue Franklin Delano Roosevelt, 75008 Paris, France', 218)
('4b092e36f964a520961423e3', "Bugsy's", [('Bar', '4bf58dd8d48988d116941735')],

('4defc926c65bf3f03e9f47eb', 'Le Silencio', [('Nightclub', '4bf58dd8d48988d11f941735')], (48.868998, 2.343417), '142 rue Montmartre, 75002 Paris, France', 91)
('575c8192498e3c8b2f8487cf', 'Danico', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.86707072006055, 2.339413647446296), '6 rue Vivienne, 75002 Paris, France', 282)
('546fd2d0498ee87303529fc7', 'Mabel', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.8675440546188, 2.3461500391476022), "58 rue d'Aboukir, 75002 Paris, France", 258)
('52ab22bf11d2464466e73347', 'Lockwood', [('Cocktail Bar', '4bf58dd8d48988d11e941735')], (48.86772674428229, 2.3469454610004647), "73 rue d'Aboukir, 75002 Paris, France", 309)
('4e7c7257e4cdf79c0e87aecb', 'Les Athlètes', [('French Restaurant', '4bf58dd8d48988d10c941735')], (48.86951553880589, 2.339537181794366), '6 rue des Colonnes, 75002 Paris, France', 275)
('4c9a920a3bc3199c77deb262', 'FCINQ', [('Office', '4bf58dd8d48988d124941735')], (48.86847259390376, 2.341913869068185), '32 rue Notre-Da

('4fb63065e4b039c1e74129d7', 'Le Bistrot Tocqueville', [('Diner', '4bf58dd8d48988d147941735')], (48.88636528783994, 2.309140034373838), '67 rue de Tocqueville, 75017 Paris, France', 203)
('5151985ae4b0be643a22b6a0', 'Le Verre Moutarde', [('Restaurant', '4bf58dd8d48988d1c4941735')], (48.889988, 2.305277), '145 rue de Saussure, 75017 Paris, France', 315)
('4b66b33df964a520de272be3', 'Le Tocqueville', [('Bar', '4bf58dd8d48988d116941735')], (48.88755265519139, 2.3075256283554637), '49 Boulevard Pereire, 75017 Paris, France', 60)
('55747b36498ebb62a4b3017c', 'Café Iguana', [('French Restaurant', '4bf58dd8d48988d10c941735')], (48.88787484549998, 2.311675037253491), '20 rue Severiano de Heredia (En face du 15 boulevard Pereire), 75017 Paris, France', 363)
('4c235be59085d13a68b687cc', 'Le Jouffroy', [('Bar', '4bf58dd8d48988d116941735')], (48.88524432031217, 2.3074579370604233), '75017, France', 237)
('4bbe49041416a593cbf1f33c', 'Bistro 74', [('French Restaurant', '4bf58dd8d48988d10c941735')], 

('4dbc61196e810768bf5d4105', 'Café Mama Kin', [('Bar', '4bf58dd8d48988d116941735')], (48.88938600031579, 2.3835958455884376), 'Rue de Thionville, 75019 Paris, France', 272)
('596628a4135b3919fc7a270c', "L'Atalante", [('Beer Bar', '56aa371ce4b08b9a8d57356c')], (48.88982746132813, 2.382823896846173), '26 quai de la Marne, 75019 Paris, France', 339)
('5571d51f498e9ea789f8342d', 'Paname Brewing Company', [('Brewery', '50327c8591d4c4b30a586d5d')], (48.887795, 2.378824), '41 bis quai de Loire (Bassin de la Villette), 75019 Paris, France', 446)
('53864aef498e7dc54e0704dc', 'Les Bancs Publics', [('Bistro', '52e81612bcbc57f1066b79f1')], (48.8909767953291, 2.3839232325553894), '2 rue de Nantes, 75019 Paris, France', 439)
('597cb9a9ee71206fc0a2752b', 'Kiez Kanal', [('Beer Bar', '56aa371ce4b08b9a8d57356c')], (48.88744474478056, 2.3790983423735113), '90 quai de la Loire, 75019 Paris, France', 420)
('4bc8c36212bdb71366fa3c94', "L'alliance", [('Bar', '4bf58dd8d48988d116941735')], (48.88593016941184, 

In [31]:
wine_bars

{'56e296ef498ea4e7647b17d1': ('56e296ef498ea4e7647b17d1',
  'Le Marais',
  48.85943,
  2.360017,
  'France',
  383),
 '4bba27c653649c74e7b248fb': ('4bba27c653649c74e7b248fb',
  "L'Attirail Café",
  48.86458269837049,
  2.3569582571447802,
  '9 rue au Maire, 75003 Paris, France',
  293),
 '4b73f75ff964a520e5c12de3': ('4b73f75ff964a520e5c12de3',
  'Le Social Square',
  48.864799,
  2.359869,
  '165 rue du Temple, 75003 Paris, France',
  214),
 '4d63f93b95b28cfa080553fa': ('4d63f93b95b28cfa080553fa',
  'Manfred',
  48.86469673753726,
  2.359529245704414,
  '1 rue Réaumur, 75003 Paris, France',
  206),
 '520f8fc011d2eea0a99bdcb0': ('520f8fc011d2eea0a99bdcb0',
  'Au Bienheureux',
  48.86139257914309,
  2.3536782558364346,
  '2 impasse Berthaud, 75003 Paris, France',
  491),
 '4bed298d9868a59390f75c46': ('4bed298d9868a59390f75c46',
  'Au Grand Turenne',
  48.86513070478254,
  2.3653591883742267,
  '27 boulevard du Temple, 75003 Paris, France',
  466),
 '4b23ec71f964a520045d24e3': ('4b23ec71f

In [19]:
def is_bar(categories, specific_filter=None):
    wine_words = ['bar', 'wine', 'sausage', 'cheese', 'charcuterie', 'fromage', 'vin']
    wine = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in wine_words:
            if r in category_name:
                bar = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            bar = True
    return bar, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [12]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=JO31W52NKMLMEQBPQ3GSRBK3FKRXIIJLIFKSRNDDTC5K1Q23&client_secret=XVGAMH0OCJG03ALF5ONIWJN3CJ5TOMKTST0ECRVRKQVCVHNL&v=20180605&ll=48.8566969,2.3514616&radius=500&limit=100'

In [13]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e3964c1006dce001c72355e'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Hôtel-de-Ville',
  'headerFullLocation': 'Hôtel-de-Ville, Paris',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 153,
  'suggestedBounds': {'ne': {'lat': 48.861196904500005,
    'lng': 2.3582883184847447},
   'sw': {'lat': 48.8521968955, 'lng': 2.344634881515255}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bf41231e5eba59334341f90',
       'name': "Place de l'Hôtel de Ville – Esplanade de la Libération",
       'location': {'address': "Place de l'Hôtel de Ville",
        'lat': 48.85692475726913,
        'lng': 2.3514118156673676,
        'distanc

### Results

### Discussion

### Conclusion