# The Wine bar project 

### Introduction & Business problem

In case my career as Data scientist fails (*let's hope it doesn't*), I want to open a wine bar in Paris, France. <br/> 
Of course wine, **I'm french !** <br/>

The problem is that, from my experience, Paris has multiple areas where people go out for a drink and these areas are not concentrated but rather spread around the city. <br/>

Therefore, where is the best location to open a new wine bar to ensure enough clients to be successful ? <br/>

To ensure success, I need the bar to be in a location where the concentration of venues such as theaters, cinemas, restaurants demonstrates an active life in the area. Using the Foursquare data, I will geolocate the venues and find the best spot to open my wine bar.

### Data section

To provide an analytical answer to the business problem of where to open my future wine bar in Paris I will do :<br/>
- A segmentation of Paris inner-city using a .geojson file
- Venues data related to the neighborhoods using Foursquare API (Category of the venue, customer rating, ...)

### Methodology

 Section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

In [80]:
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geopy.distance
from math import sqrt

#### Loading the Paris coordinates

In [81]:
with open('arrondissements.geojson') as json_data:
    parisarr = json.load(json_data)
    
par_data = parisarr['features']
colnames = ['PostCode', 'Neighborhood', 'Latitude', 'Longitude']
dfparis = pd.DataFrame(columns=colnames)

In [82]:
for d in par_data: 
    latlon = d['properties']['geom_x_y']
    code = d['properties']['c_ar']    
    neigh = d['properties']['l_aroff']
    
    lat = latlon[0]
    lon = latlon[1]
    dfparis= dfparis.append({'PostCode' : code, 'Neighborhood' : neigh, 'Latitude' : lat, 'Longitude' : lon}, ignore_index=True)   

dfparis.head()

Unnamed: 0,PostCode,Neighborhood,Latitude,Longitude
0,3,Temple,48.862872,2.360001
1,1,Louvre,48.862563,2.336443
2,5,Panthéon,48.844443,2.350715
3,6,Luxembourg,48.84913,2.332898
4,12,Reuilly,48.834974,2.421325


In [83]:
address = 'Paris, France'

geolocator = Nominatim(user_agent="par_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


#### Creation of a map of Paris, using Follium

In [84]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(dfparis['Latitude'], dfparis['Longitude'], dfparis['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

The above map shows Paris with the the center coordinates of its 20 arrondissements (neighborhoods).

In [85]:
df_coor = dfparis[['Latitude', 'Longitude']]
dfparis['Distance from center'] = ''

In [86]:
#Function to calculate the distance of center coordinates of each neighborgood to the center of Paris
def calc_xy_distance(coords_1, coords_2):
    return geopy.distance.vincenty(coords_1, coords_2).m

In [87]:
for i in range(0, len(df_coor)):
    dfparis['Distance from center'][i] = calc_xy_distance((df_coor['Latitude'][i], df_coor['Longitude'][i]), (latitude, longitude))
    
dfparis.head()

  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Latitude,Longitude
0,48.862872,2.360001
1,48.862563,2.336443
2,48.844443,2.350715
3,48.84913,2.332898
4,48.834974,2.421325


Now let's identify the venues around each of these center coordinates of the city using the **Foursquare API**.

#### Foursquare

Let's use Foursquare API to get info on wine bars in each neighborhood.<br/>

We're interested in venues in 'bar' category, but only those that are proper bars - coffe shops, pizza places, bakeries etc. are not direct competitors so we do not take them into account. *We will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Wine' category, as we need info on wine bars in the neighborhoods.*

In [100]:
#Foursquare Credentials
CLIENT_ID = 'JO31W52NKMLMEQBPQ3GSRBK3FKRXIIJLIFKSRNDDTC5K1Q23' # your Foursquare ID
CLIENT_SECRET = 'XVGAMH0OCJG03ALF5ONIWJN3CJ5TOMKTST0ECRVRKQVCVHNL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [101]:
category_id = '4d4b7105d754a06376d81259' #Night life
sub_category_id = '4bf58dd8d48988d116941735' #Bar
subsub_category_id = '4bf58dd8d48988d123941735' #Bar à vin category

### Results

### Discussion

### Conclusion