# Restaurant Recommendation System

## Introduction

The aim of this document is to explore the idea of a recommendation system that would recommend restaurants to people in the New York neighborhood. This report describes the work to learn to predict the user's preferences and make recommendations to them based on historical data related to their past preferences and ratings and the data related to the retaurants available in the neighborhood or New York.

## Table of Contents  

1. Business Problem
2. Data Requirements
3. Data Collection

## Business Problem

Lately there has been a tremendous increase in the number of people travelling and exploring new places across the globe. People visiting new places would love to get recommendations about things to do, about places to see and about places to dine in. This has resulted in the increased popularity of several recommendation websites and apps like Yelp that provide recommendations to people. 

Being able to recommend restaurants for people to dine in when they are travelling or visiting a new place is a very valuable feature as far as the recommendation system goes. In this document the we will try and build a model that would recommend restaurants to users in the neighborhoods of New York based on what is popular in a certain place based on ratings and reviews provided by other people and also based on the user's past preferences in terms of the cuisines that they prefer to eat.

In this document we will try to explore and optimize the algorithms to provide a recommendation of the top restaurants to the user and try and validate the accuracy of the model. In our model we will try to take into account the preferences of the user and also the popularity of the restaurants in the neighborhoods of New York.

## Data Requirements

To be able to make recommendations, we will need some data that will form the basis for our recommendation.

How can we make recommendations regarding restaurants to a user? For being able to make recommendations we will need to understand the users preferences in terms of what kind of food they like to eat. 

Is it enough if we understand the users preferences for being able to make a recommendation? No, in addition to understanding the preferences of the user for whom we are making the recommendation, we will also need to identify the neighborhoods of New York and then get the list of restaurants that are present in the neighborhoods of New York and the also understand their popularity to know which ones are good and which ones are not.

## Data Collection

There is a variety of data that we need for building our recommendation system and this will need to be collected from various sources as identified below.

1) Neighborhood data for New York : 
    The Neighborhood has a total of 5 boroughs and 306 neighborhoods. We will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. This dataset is provided by New York University and is available for free on the web. We will download and use this data for the Neighborhood data.
    
2) Restaurant Data for New York :
    We will use the Places API provided by Foursquare and gather location data regarding restaurants using their API. For using their API for get the location data pertaining to the restaurants in the neighborhoods of New York, we will need the latitude and longitude coordinates for the neighborhoods. The coordinates are available to us as part of the Neighborhood data that we collected in 1).
 
3) Ratings Data for the Restaurants:
    We will again use the Places API provided by Foursquare to gather the ratings data for the list of restaurants that we collected in 2).

### 1) Neighborhood data for New York:

We will download the dataset pertaining to the nighborhoods of New York from ***https://<span></span>cocl.us/new_york_dataset***

In [2]:
!wget -q -O 'New_York_Neighborhood_Data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


Let us load the neighborhood data from the json file and take a quick look at the data in the file.

In [7]:
# import the json library for handling json files
import json

# Load the data from the json file
with open ('New_York_Neighborhood_Data.json') as json_data:
    ny_neighborhood_data = json.load(json_data)

json.load() returns the list of neighborhoods data with features as the key. Let us take a look at the data pertaining to one neighborhood from this list.

In [8]:
ny_neighborhood_data['features'][0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

Let us read the data for all the neighborhoods of New York from this list and load it into a pandas dataframe.

In [9]:
# import the pandas library
import pandas as pd

# extract the list of neighborhoods
list_neighborhood_data = ny_neighborhood_data['features']

# Define the empty data frame for loading the neighborhood data
df_ny_neighborhood_data = pd.DataFrame(columns = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'])

for i in range(0,len(list_neighborhood_data)):
    borough = list_neighborhood_data[i]['properties']['borough']
    neighborhood = list_neighborhood_data[i]['properties']['name']
    latitude = list_neighborhood_data[i]['geometry']['coordinates'][1]
    longitude = list_neighborhood_data[i]['geometry']['coordinates'][0]
    
    df_ny_neighborhood_data = df_ny_neighborhood_data.append ({'Borough' : borough,
                                                               'Neighborhood' : neighborhood,
                                                               'Latitude' : latitude,
                                                               'Longitude' : longitude}, ignore_index=True)

print ('Shape of dataframe is ', df_ny_neighborhood_data.shape)
df_ny_neighborhood_data.head()

Shape of dataframe is  (306, 4)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### 2) Restaurant Data for New York from Foursquare

Let us define the credentials for accessing the Foursquare API.

In [10]:
CLIENT_ID = '532CJWN2YHXD2TR0JF2J32FFAJ1OOSVBUAJ4G3BUFXV5ZURB' # your Foursquare ID
CLIENT_SECRET = 'AHAELISONAQKGDESXTPGT1O04GRTOZFMJYVY11I4TZIYLSVT' # your Foursquare Secret
VERSION = '20180604' # version of Foursquare API to be used

Using the Search API, let us search for restaurants in the neighborhoods of New York and load the restaurants data into a pandas dataframe. 

In [11]:
# import library to handle requests
import requests

# create a pandas dataframe for storing New York neighborhoods and restaurants data in one table
df_newyork_restaurants = pd.DataFrame(columns = ['Borough', 'Neighborhood', 'Neigh_Latitude', 'Neigh_Longitude', 'Restaurant_Name',
                                                 'Restaurant_City', 'Restaurant_Address', 'Restaurant_Latitude', 'Restaurant_Longitude', 
                                                 'Restaurant_Category', 'Foursquare_Venue_Id'])

# URL parameters
RADIUS = 2000
QUERY = 'restaurant'
LIMIT = 50

# Loop through the New York neighborhood data
for index, row in df_ny_neighborhood_data.iterrows():
    # retrieve neighborhood details from the dataframe for current row
    borough = row[0]
    neighborhood = row[1]
    LATITUDE = row[2]
    LONGITUDE = row[3]
    
    # fetch the results from the Search API for the neighborhood
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID,
                                                                                                                                   CLIENT_SECRET,
                                                                                                                                   LATITUDE,
                                                                                                                                   LONGITUDE,
                                                                                                                                   VERSION,
                                                                                                                                   QUERY,
                                                                                                                                   RADIUS,
                                                                                                                                   LIMIT)
    results = requests.get(url).json()
    
    # Loop through the List of restaurants retrieved from Foursquare for the neighborhood
    for i in range(0,len(results['response']['venues'])):
        # retrieve restaurant details from the result list for current row
        restaurant_name = results['response']['venues'][i]['name']

        try:
            restaurant_city = results['response']['venues'][i]['location']['city']
        except:
            restaurant_city = ''
            
        restaurant_address = results['response']['venues'][i]['location']['formattedAddress']
        restaurant_latitude = results['response']['venues'][i]['location']['lat']
        restaurant_longitude = results['response']['venues'][i]['location']['lng']
        restaurant_Venue_id = results['response']['venues'][i]['id']

        # Check to see if the restaurant has any categories
        if (len(results['response']['venues'][i]['categories']) > 0):
            restaurant_category = results['response']['venues'][i]['categories'][0]['name']
        else:
            restaurant_category = ''
        
        # Add the data row to the restaurants dataframe
        df_newyork_restaurants = df_newyork_restaurants.append({'Borough' : borough,
                                                                'Neighborhood' : neighborhood, 
                                                                'Neigh_Latitude' : LATITUDE, 
                                                                'Neigh_Longitude' : LONGITUDE, 
                                                                'Restaurant_Name' : restaurant_name,
                                                                'Restaurant_City' : restaurant_city,
                                                                'Restaurant_Address' : restaurant_address,
                                                                'Restaurant_Latitude' : restaurant_latitude,
                                                                'Restaurant_Longitude' : restaurant_longitude, 
                                                                'Restaurant_Category' : restaurant_category,
                                                                'Foursquare_Venue_Id' : restaurant_Venue_id}, ignore_index=True)

# Print the first 10 rows of the Restaurants dataframe
df_newyork_restaurants.head(2)

Unnamed: 0,Borough,Neighborhood,Neigh_Latitude,Neigh_Longitude,Restaurant_Name,Restaurant_City,Restaurant_Address,Restaurant_Latitude,Restaurant_Longitude,Restaurant_Category,Foursquare_Venue_Id
0,Bronx,Wakefield,40.894705,-73.847201,Big Daddy's Caribbean Taste Restaurant,Bronx,"[4406 White Plains Rd (Nereid Avenue), Bronx, ...",40.899767,-73.857135,Caribbean Restaurant,4db03c875da32cf2df4509f4
1,Bronx,Wakefield,40.894705,-73.847201,Red Flower Chinese Restaurant,Bronx,"[4733 White Plains Rd, Bronx, NY 10470, United...",40.904359,-73.849795,Chinese Restaurant,4e4de62abd4101d0d79dae8c


In [12]:
df_newyork_restaurants.shape

(12892, 11)

The number of 12892 seems to be very high for the number of restaurants in the City of New York. Let us check to see if there are any duplicate entries for restaurants in the data set obtained.

In [14]:
len(df_newyork_restaurants['Foursquare_Venue_Id'].unique())

3759

We can see that our guess was right. There are only 3759 unique restaurants in the data set. So let us cleanup the restaurants dataset by removing the duplicates from the dataframe.

In [16]:
# create a copy of the restaurants dataframe for storing the new list without the duplicate restaurants
df_ny_restaurants = df_newyork_restaurants.drop_duplicates(subset='Foursquare_Venue_Id', keep='first', inplace=False)
print (df_ny_restaurants.shape)
df_ny_restaurants.head(2)

(3759, 11)


Unnamed: 0,Borough,Neighborhood,Neigh_Latitude,Neigh_Longitude,Restaurant_Name,Restaurant_City,Restaurant_Address,Restaurant_Latitude,Restaurant_Longitude,Restaurant_Category,Foursquare_Venue_Id
0,Bronx,Wakefield,40.894705,-73.847201,Big Daddy's Caribbean Taste Restaurant,Bronx,"[4406 White Plains Rd (Nereid Avenue), Bronx, ...",40.899767,-73.857135,Caribbean Restaurant,4db03c875da32cf2df4509f4
1,Bronx,Wakefield,40.894705,-73.847201,Red Flower Chinese Restaurant,Bronx,"[4733 White Plains Rd, Bronx, NY 10470, United...",40.904359,-73.849795,Chinese Restaurant,4e4de62abd4101d0d79dae8c


Now we have the list of restaurants in the New York neighborhood.

### 3) Restaurant Ratings from Foursquare

We will use the Venue Details API from Foursquare for getting the ratings for each of these restaurants.

In [18]:
# Loop through the List of restaurants to find the ratings for each
for index, row in df_ny_restaurants.iterrows():
    VENUE_ID = row[10] # this is the Foursquare Venue Id
    
    # Use the Foursquare Venue details API to fetch the venue details
    details_url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(VENUE_ID,
                                                                                                  CLIENT_ID,
                                                                                                  CLIENT_SECRET,
                                                                                                  VERSION)
    venue_details = requests.get(details_url).json()
    
    # Read the rating for the venue from the API result
    try:
        rating = venue_details['response']['venue']['rating']
    except:
        rating = 0.0

    # populate the rating onto the restaurants dataframe
    df_ny_restaurants.loc[index, 'Rating'] = rating

df_ny_restaurants.head(2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Unnamed: 0,Borough,Neighborhood,Neigh_Latitude,Neigh_Longitude,Restaurant_Name,Restaurant_City,Restaurant_Address,Restaurant_Latitude,Restaurant_Longitude,Restaurant_Category,Foursquare_Venue_Id,Rating
0,Bronx,Wakefield,40.894705,-73.847201,Big Daddy's Caribbean Taste Restaurant,Bronx,"[4406 White Plains Rd (Nereid Avenue), Bronx, ...",40.899767,-73.857135,Caribbean Restaurant,4db03c875da32cf2df4509f4,0.0
1,Bronx,Wakefield,40.894705,-73.847201,Red Flower Chinese Restaurant,Bronx,"[4733 White Plains Rd, Bronx, NY 10470, United...",40.904359,-73.849795,Chinese Restaurant,4e4de62abd4101d0d79dae8c,0.0
