# Recommending restaurants to a user based off of their eating habits:

*This notebook is for the capstone project in IBM Data Science by Coursera.*

## Introduction / Business problem:

There are 29,560 restaurants in Los Angeles according to a count in 2018. That is a lot of potential restaurants to choose from ranging in cuisines, location and quality. Finding a restaurant that you will like is a hard, time-consuming and often laborious task. Websites such as Yelp can be helpful showing you some of the best rated restaurants in your area. However, just because other people have enjoyed a restaurant does not necessarily mean you will enjoy it too. That is why there is a need for a more personalized system of recommending a restaurant to the individual user. The reccomendation system is tailored to an individual user and thus the end-user is an individual who is looking for restaurant recommendations. 

In this notebook I will do just that, creating a recommendation system based off of a user's previous reviews and ratings they have left at some of their favorite restaurants in Los Angeles. The recommender will take into account the cuisine, quality and location of the restaurants and personalize the recommendations to the user.

## Data:

There will be three main datasets used for this project:

* **Postal codes and areas of Los Angeles - df_zips:**  
A list of postal codes in the Greater Los Angeles are along with combined sub area of Los Angeles. This will be used in order to provide the area for a given venue based off of the zipcode. The data will be scraped using the Beautifulsoup API from the website http://www.laalmanac.com/communications/cm02_communities.php

* **Restaurants - df_rests:**  
This dataset will contain a list of restaurants along with their location and type of cuisine, in the Greater Los Angeles area. The final recommended restaurants will come from this list. The data will be grabbed from the Foursquare API. Once the user data is evaluated anda list of restaurants are predicted based off of location and cuisine, more details will be added to the df_rests table such as, rating, number of checkins etc. These attributes will be used with the user's data in order to best rank the restaurants.

* **User's restaurant and review history - df_user:**  
This will contain the user's reviews, ratings and locations of each restaurant they have visited. This will serve as an example of a unique user and the conditions in which the recomendation system will use to predict restaurants. This dataset is a fictional example of an individual user, which in a real world production scenario would be replaced with the Foursquare users API which allows a user id to be passed in order to get the user's data. Unfortunately, I do not have a list of Foursquare user id's, so this fictional dataset will suffice for this project.


In [66]:
# Pre-reqs
import pandas as pd
import numpy as np
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import requests

In [55]:
# Creating the df_zips dataset
df_zips = pd.DataFrame()
source = Request("http://www.laalmanac.com/communications/cm02_communities.php" , headers={'User-Agent': 'Mozilla/5.0'})
html = urlopen(source).read()
page_soup = soup(html, "html.parser")
table = page_soup.find('table')
table_rows = table.find_all('tr')
lis = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    lis.append(row)
df_zips = pd.DataFrame(lis, columns=["Area", "Zipcode"])    
df_zips.dropna(inplace = True)

In [126]:
# Creating the df_rests dataset
CLIENT_ID = 'ZLTIUBPUGEOHF2RAMMRG23O110LLZGN32BIJTVBTT04KAZKM' # your Foursquare ID
CLIENT_SECRET = 'E20HZOF5JFCXPNBTCTTNSWCEBPV2M0XOK5NZAACSVE5VLI1O' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
category = '4d4b7105d754a06374d81259'
radius = 500
limit = 100

# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, 'Beverly Hills, CA', category, radius, limit)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
restaurants = []
restaurants.append([(v['venue']['id'], v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])
df_rests = pd.DataFrame([rest for restaurant_list in restaurants for rest in restaurant_list])
df_rests.columns = ['Venue_id', 'Restaurant', 'Latitude', 'Longitude', 'Cuisine']

In [129]:
# Creating the df_user dataset
data = [['54938133498ed65f02e8c4ba', 'Redbird', 'Food was cold and not good', '5', '1'], 
        ['4c11f6d6d41e76b0cf49320d', 'Izakaya & Bar Fu-ga', 'Food was really good and excellent service', '8', '2'], 
        ['5d75ae0d3539dc0008e5913b', 'Kabuto', 'Food was good', '7', '1'],
        ['4aa3f12af964a520784420e3', 'Meet in Paris', 'Amazing food', '9', '1'],
        ['49c7fff6f964a520df571fe3', 'Tender Greens', 'Terrible salad', '3', '1'],
        ['4ac6befcf964a52020b620e3', 'Thai Original BBQ Restaurant', 'Been here a couple of times, really good food', '8', '2'],
        ['4cbd051fd78f4688408fcb73', 'El Metapaneco', 'Food was ok, service was bad', '6', '1'],
        ['5d881c1aa3b6ca00086f4fea', 'Kappo Osen', 'Food was really good', '9', '2'],
        ['501b4a83e4b06c31c5074498', 'Kagura', 'My go to spot', '9', '5'],
        ['4e6acbefae60950955a7f1e1', 'Sushi Enya', 'Really good sushi', '10', '3'],
        ['4aa49017f964a5201b4720e3', 'Shabu Shabu House', 'Food was great', '8', '2'],
        ['4a07bb56f964a52098731fe3', 'Zencu Sushi & Grill', 'Parking was bad and food even worse', '3', '1'],
        ['54aae895498e545686bde596', 'My Ramen Bar', 'Food was cold and not good', '5', '1'],
        ['53421495498eff7fcb02e01e', 'Shiki Beverly Hills', 'Food was great', '8', '1'],
        ['4a67cb8bf964a52010ca1fe3', 'Yu-N-Mi', 'Food was cold and not good', '10', '10']] 
  
# Create the pandas DataFrame 
df_user = pd.DataFrame(data, columns = ['Venue_id', 'Restaurant', 'Review', 'Rating', 'Checkins']) 

#### An example of the df_zips dataset containing the zipcode(s) and area asssociated:

In [56]:
df_zips.head()

Unnamed: 0,Area,Zipcode
1,Acton,93510
2,Agoura Hills,91301
3,Agoura Hills (PO Boxes),91376
4,Agua Dulce,91390
5,Alhambra,"91801, 91803"


#### An example of the df_rests dataset containing their:

In [127]:
df_rests.head()

Unnamed: 0,Venue_id,Restaurant,Latitude,Longitude,Cuisine
0,4b32a54ff964a520cc1025e3,Il Tramezzino Cafe,34.071685,-118.401878,Café
1,3fd66200f964a520cbee1ee3,Il Pastaio,34.070739,-118.4008,Italian Restaurant
2,58e6b8280acb6a688ace966a,Cafe Gratitude - Beverly Hills,34.070956,-118.401345,Vegetarian / Vegan Restaurant
3,4ab3ffaef964a520716f20e3,E. Baldi,34.070515,-118.400787,Italian Restaurant
4,42893400f964a52068231fe3,La Scala,34.071436,-118.401462,Italian Restaurant


#### An example of the df_user dataset containing their restaurant history, reviews, ratings and number of times they ghave checked in:

In [130]:
df_user.head()

Unnamed: 0,Venue_id,Restaurant,Review,Rating,Checkins
0,54938133498ed65f02e8c4ba,Redbird,Food was cold and not good,5,1
1,4c11f6d6d41e76b0cf49320d,Izakaya & Bar Fu-ga,Food was really good and excellent service,8,2
2,5d75ae0d3539dc0008e5913b,Kabuto,Food was good,7,1
3,4aa3f12af964a520784420e3,Meet in Paris,Amazing food,9,1
4,49c7fff6f964a520df571fe3,Tender Greens,Terrible salad,3,1
