# The impact of location in restaurant ratings on social media

## Introduction

When a user rates a restaurant on Foursquare he surely takes into account the quality of the food, of the service, the setting but __does the location of a restaurant impact its rating ?__

The fact that a restaurant is well located, that is to say easy to access, should increase the number of comments but not necessarily impact the ratings significantly.

In this study, we will see if a correlation exists between the location of a restaurant and its global score on Foursquare. 

__Target audience :__
If this correlation exists, this study will show the most popular areas in NYC to open a restaurant. This map will help entrepreneurs who want to open a restaurant in NYC, providing guidance on which area to choose to settle their restaurant.

## Table of Contents

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [15]:
import json # library to handle JSON files

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests # library to handle requests

## 1. Download and Explore Dataset

To answer this question, we will analyze the Foursquare ratings of NYC restaurants. For that, we will use the Foursquare API to get the geographical coordinates and ratings of all the restaurants in NYC. Then we will display the results on a map to see if some areas have significant higher score.

NYC has a total of 5 boroughs and 306 neighbourhoods. In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighbourhoods that exist in each borough as well as the the latitude and longitude coordinates of each neighborhood.

Luckily, this dataset exists for free on the web: https://geo.nyu.edu/catalog/nyu_2451_34572

In [16]:
# Download data
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

# Load data
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
# Get the features
neighbourhoods_data = newyork_data['features']    

#### Tranform the data into a *pandas* dataframe

In [17]:
# Define the dataframe columns
column_names = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude'] 

# Instantiate the dataframe
neighbourhoods = pd.DataFrame(columns = column_names)

Then let's loop through the data and fill the dataframe one row at a time.

In [18]:
for data in neighbourhoods_data:
    borough = neighbourhood_name = data['properties']['borough'] 
    neighbourhood_name = data['properties']['name']
        
    neighbourhood_latlon = data['geometry']['coordinates']
    neighbourhood_lat = neighbourhood_latlon[1]
    neighbourhood_lon = neighbourhood_latlon[0]
    
    neighbourhoods = neighbourhoods.append({'Borough': borough,
                                          'Neighbourhood': neighbourhood_name,
                                          'Latitude': neighbourhood_lat,
                                          'Longitude': neighbourhood_lon}, ignore_index = True)

In [19]:
neighbourhoods.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


## 2. Explore Restaurant by Neighbourhoods in NYC

In [20]:
# The code was removed by Watson Studio for sharing.

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            "4d4b7105d754a06374d81259", # We only get venues from the food category
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng,
            v['venue']['id'], 
            v['venue']['name'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude',
                  'Venue Id',           
                  'Venue Name',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
NYC_venues = getNearbyVenues(names = neighbourhoods['Neighbourhood'],
                             latitudes = neighbourhoods['Latitude'],
                             longitudes = neighbourhoods['Longitude'])

In [24]:
# Add a 'rating' column with a default value set as 'none'
NYC_venues['rating'] = 'none'

# Drop rows where category name does not contains the term "Restaurant"
NYC_venues = NYC_venues[~NYC_venues['Venue Category'].str.contains('Restaurant') == False].reset_index(drop = True)

In [25]:
NYC_venues.shape

(4365, 9)

In [26]:
NYC_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Id,Venue Name,Venue Latitude,Venue Longitude,Venue Category,rating
0,Wakefield,40.894705,-73.847201,508af256e4b0578944c87392,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant,none
1,Co-op City,40.874294,-73.829939,5bc797c82d2fd9002c1b3ec6,Arby's,40.870411,-73.828606,Fast Food Restaurant,none
2,Co-op City,40.874294,-73.829939,4be2b79d660ec9284d04ca3b,Townhouse Restaurant,40.876086,-73.828868,Restaurant,none
3,Co-op City,40.874294,-73.829939,4c9d5f2654c8a1cd2e71834b,Guang Hui Chinese Restaurant,40.876603,-73.82971,Chinese Restaurant,none
4,Co-op City,40.874294,-73.829939,4c28fb31fe6e2d7fa81e543c,Kennedy's,40.876807,-73.829627,Fast Food Restaurant,none


In [27]:
# save NYC geographical data into a csv file
NYC_venues.to_csv('NYC_data.csv')

In [28]:
# Open the saved csv file if existing
NYC_venues = pd.read_csv('NYC_data.csv', index_col = 0)

### Get each restaurant rating from Foursquare API

In [87]:
# Iterate over rows with iterrows() to add the associated global score
for index, row in NYC_venues.iterrows():
    # Get score only for the row where it is not already defined
    if row['rating'] == 'none':
        venue_id = row['Venue Id']

        # Prepare request
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
            venue_id,
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION)
    
        # Make the GET request
        results = requests.get(url).json()
        
        # Stop if we the results show a error
        if results['meta']['code'] == 200:
    
            # If there is no rating for this venue we set the value at 0
            if 'rating' in results['response']['venue']:
                rating = results['response']['venue']['rating']
            else:
                rating = 0
    
            # Add the rating to the dataframe
            NYC_venues.loc[index, 'rating'] = rating
        else:
            print('API limit exceeded')
            break

print('finished')

# Update dataframe
NYC_venues.to_csv('NYC_data.csv')
print('saved')

# Save a backup
NYC_venues.to_csv('Backup.csv')

KeyboardInterrupt: 

### Import Data from CSV

As I am using a free account to access Foursquare API, I have exceeded the API daily request limit several times. So I ran the previous code several times and save the results in a csv file.

In [29]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0.1,Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Id,Venue Name,Venue Latitude,Venue Longitude,Venue Category,rating
0,0,Wakefield,40.894705,-73.847201,508af256e4b0578944c87392,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant,6.8
1,1,Co-op City,40.874294,-73.829939,5bc797c82d2fd9002c1b3ec6,Arby's,40.870411,-73.828606,Fast Food Restaurant,6.5
2,2,Co-op City,40.874294,-73.829939,4be2b79d660ec9284d04ca3b,Townhouse Restaurant,40.876086,-73.828868,Restaurant,5.8
3,3,Co-op City,40.874294,-73.829939,4c9d5f2654c8a1cd2e71834b,Guang Hui Chinese Restaurant,40.876603,-73.82971,Chinese Restaurant,0.0
4,4,Co-op City,40.874294,-73.829939,4c28fb31fe6e2d7fa81e543c,Kennedy's,40.876807,-73.829627,Fast Food Restaurant,0.0


### Data cleanup

In [30]:
# Drop the id column
NYC_venues = NYC_venues.drop('Venue Id', 1)

# Drop rows where rating is equal to 0
NYC_venues = NYC_venues[NYC_venues['rating'] != '0']

# Drop rows where rating is equal to 'none'
NYC_venues = NYC_venues[NYC_venues['rating'] != 'none']

#### Use geopy library to get the latitude and longitude values of NYC

In [31]:
# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

address = 'New York, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of NYC are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of NYC are 40.7127281, -74.0060152.


#### Create a map of NYC with restaurants superimposed on top.

In [5]:
# Import folium library
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [32]:
# Get min rating
min_rating = float(NYC_venues['rating'].min())

# Get max rating
max_rating = float(NYC_venues['rating'].max())

# Method that return the proper opacity according to the restaurant rating
def getOpacity(rating):
    rating = float(rating)
    opacity = (((rating - min_rating) * (1 - 0.1)) / (max_rating - min_rating)) + 0.1
    return(opacity)

## 3. Create maps to visualy see the best rated restaurants in NYC

In [33]:
# Create map of NYC using latitude and longitude values
map_NYC = folium.Map(location = [latitude, longitude], zoom_start = 10, tiles = 'Stamen Toner')

# Add markers to map
for lat, lng, name, rating in zip(NYC_venues['Venue Latitude'], NYC_venues['Venue Longitude'], NYC_venues['Venue Name'], NYC_venues['rating']):
    #label = '{}, {}'.format(name, rating)
    #label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 2,
        #popup = label,
        color = 'none',
        fill = True,
        fill_color = '#FF0000',
        fill_opacity = getOpacity(rating),
        parse_html = False).add_to(map_NYC)
    
map_NYC

On this map, the higher the rating of the restaurant, the more its marker is red colored. So we can see where the restaurant is located and how good is its rating.

In [11]:
from folium.plugins import FastMarkerCluster 
    
# Init the folium map object
my_map = folium.Map(location = [latitude, longitude], zoom_start = 10, tiles = 'Stamen Toner')

# Add all the point from the file to the map object using FastMarkerCluster
my_map.add_child(FastMarkerCluster(NYC_venues[['Venue Latitude', 'Venue Longitude']].values.tolist()))

# Display map
my_map

On this map, we can easyly see the repartition of the restaurants of NYC.

In [38]:
# Score by neighbourhood
NYC_venues.head()

Unnamed: 0.1,Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category,rating
0,0,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant,6.8
1,1,Co-op City,40.874294,-73.829939,Arby's,40.870411,-73.828606,Fast Food Restaurant,6.5
2,2,Co-op City,40.874294,-73.829939,Townhouse Restaurant,40.876086,-73.828868,Restaurant,5.8
6,6,Eastchester,40.887556,-73.827806,Fish & Ting,40.885539,-73.829151,Caribbean Restaurant,8.4
7,7,Eastchester,40.887556,-73.827806,Dyre Fish Market,40.889318,-73.831453,Seafood Restaurant,7.9


In [36]:
# Score by borough