# USA Restaurant Recommendation - Content Based 

**About the DataSet**

This dataset can be accesed through this: [Kaggle](https://www.kaggle.com/datasets/siddharthmandgi/tripadvisor-restaurant-recommendation-data-usa/data)

- Name of the Restaurant : Restaurant Name
- Street Address : Restaurant Address
- Location : Detail Location, City, Country and PostCode
- Type of Cuisine Served : Cuisine
- Contact Number : Restaurant Contact Number
- TripAdvisor Restuarant URL : Restaurant URL 
- Menu URL : Restaurant Menu URL 

In [54]:
# Import Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn

In [55]:
# Load Data
restaurant = pd.read_csv('TripAdvisor_RestauarantRecommendation.csv', delimiter=',')
restaurant.head(3)

Unnamed: 0,Name,Street Address,Location,Type,Reviews,No of Reviews,Comments,Contact Number,Trip_advisor Url,Menu,Price_Range
0,Betty Lou's Seafood and Grill,318 Columbus Ave,"San Francisco, CA 94133-3908","Seafood, Vegetarian Friendly, Vegan Options",4.5 of 5 bubbles,243 reviews,,+1 415-757-0569,https://www.tripadvisor.com//Restaurant_Review...,Check The Website for a Menu,$$ - $$$
1,Coach House Diner,55 State Rt 4,"Hackensack, NJ 07601-6337","Diner, American, Vegetarian Friendly",4 of 5 bubbles,84 reviews,"Both times we were there very late, after 11 P...",+1 201-488-4999,https://www.tripadvisor.com//Restaurant_Review...,Check The Website for a Menu,$$ - $$$
2,Table Talk Diner,2521 South Rd Ste C,"Poughkeepsie, NY 12601-5476","American, Diner, Vegetarian Friendly",4 of 5 bubbles,256 reviews,Waitress was very friendly but a little pricey...,+1 845-849-2839,https://www.tripadvisor.com//Restaurant_Review...,http://tabletalkdiner.com/menu/breakfast/,$$ - $$$


In [56]:
print(restaurant['Name'].nunique())

2641


In [57]:
restaurant.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3062 entries, 0 to 3061
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Name              3062 non-null   object
 1   Street Address    3062 non-null   object
 2   Location          3062 non-null   object
 3   Type              3049 non-null   object
 4   Reviews           3062 non-null   object
 5   No of Reviews     3062 non-null   object
 6   Comments          2447 non-null   object
 7   Contact Number    3062 non-null   object
 8   Trip_advisor Url  3062 non-null   object
 9   Menu              3062 non-null   object
 10  Price_Range       3062 non-null   object
dtypes: object(11)
memory usage: 263.3+ KB


### Cleaning Dataset

In [58]:
# Drop the columns we don't need
drop_cols = ['Street Address', 'Comments', 'Contact Number', 'Trip_advisor Url', 'Menu', 'Price_Range']
restaurant = restaurant.drop(columns=drop_cols)

In [59]:
# Check Missing Values
restaurant.loc[restaurant.isna().any(axis=1) == True]

Unnamed: 0,Name,Location,Type,Reviews,No of Reviews
49,Luby's Cafeteria Mall Del Norte,"Laredo, TX",,4 of 5 bubbles,27 reviews
267,Very Juice,"Brooklyn, NY 11223-1935",,5 of 5 bubbles,1 review
1000,Cast Iron Trading Co,"Stockton, CA 95202-2407",,4.5 of 5 bubbles,11 reviews
1238,Sir Winston's Restaurant & Lounge,"Long Beach, CA 90802-6331",,4.5 of 5 bubbles,490 reviews
1629,Benji's French Basque Restaurant,"Bakersfield, CA 93308-6130",,4 of 5 bubbles,157 reviews
1641,Chuck's Hamburgers,"Stockton, CA 95207-4703",,4.5 of 5 bubbles,17 reviews
1744,Wavershak's Deli,"Toms River, NJ 08755-1284",,No review,Undefined Number
2090,Vera's Backyard Bar-B-Que,"Brownsville, TX 78521-3765",,4.5 of 5 bubbles,11 reviews
2230,Cafe Matisse,"Rutherford, NJ 07070-2307",,4.5 of 5 bubbles,227 reviews
2348,Cast Iron Trading Co,"Stockton, CA 95202-2407",,4.5 of 5 bubbles,11 reviews


Since the missing values is in column `Type`. There is nothing we can do because this gonna be our recommendation features, along with `Reviews` and `No of Reviews` columns. So this gonna be removed

In [60]:
# Drop the columns
restaurant = restaurant.dropna()

In [61]:
# Recheck if there are still exist missing value
restaurant.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3049 entries, 0 to 3061
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Name           3049 non-null   object
 1   Location       3049 non-null   object
 2   Type           3049 non-null   object
 3   Reviews        3049 non-null   object
 4   No of Reviews  3049 non-null   object
dtypes: object(5)
memory usage: 142.9+ KB


## Preprocessing

In [62]:
restaurant.duplicated().sum()

174

In [63]:
restaurant.loc[restaurant.duplicated()]

Unnamed: 0,Name,Location,Type,Reviews,No of Reviews
414,The Capital Grille,"Costa Mesa, CA 92626-1873","American, Steakhouse, Gluten Free Options",4.5 of 5 bubbles,399 reviews
422,The Slow Bone,"Dallas, TX 75207-6202","American, Barbecue",4.5 of 5 bubbles,236 reviews
445,Jackson Hole,"East Elmhurst, NY 11370","Greek, American, Vegetarian Friendly",4.5 of 5 bubbles,167 reviews
508,Amore Ristorante,"Woodland Park, NJ 07424-3305","Italian, Vegetarian Friendly, Vegan Options",4.5 of 5 bubbles,88 reviews
511,PITHARI TAVERNA,"Highland Park, NJ 08904-3234","Mediterranean, Greek, Vegetarian Friendly",4 of 5 bubbles,201 reviews
...,...,...,...,...,...
3046,Scalini Fedeli,"New York City, NY 10013-3332","Italian, Vegetarian Friendly, Vegan Options",4.5 of 5 bubbles,427 reviews
3049,The Ranch Restaurant & Saloon,"Anaheim, CA 92805-5957","American, Steakhouse, Vegetarian Friendly",4.5 of 5 bubbles,398 reviews
3051,Mesob Ethiopian Restaurant,"Montclair, NJ 07042-3442","Ethiopian, African, Vegetarian Friendly",4.5 of 5 bubbles,223 reviews
3056,Stamatis,"Astoria, NY 11105","Mediterranean, Greek, Vegetarian Friendly",4.5 of 5 bubbles,247 reviews


In [64]:
restaurant.loc[restaurant['Name'] == 'The Capital Grille']

Unnamed: 0,Name,Location,Type,Reviews,No of Reviews
40,The Capital Grille,"Austin, TX 78701-3914","American, Steakhouse, Gluten Free Options",4.5 of 5 bubbles,240 reviews
65,The Capital Grille,"New York City, NY 10017-5608","American, Steakhouse, Vegetarian Friendly",4.5 of 5 bubbles,932 reviews
171,The Capital Grille,"Houston, TX 77056-5402","American, Steakhouse, Vegetarian Friendly",4.5 of 5 bubbles,428 reviews
391,The Capital Grille,"Costa Mesa, CA 92626-1873","American, Steakhouse, Gluten Free Options",4.5 of 5 bubbles,399 reviews
414,The Capital Grille,"Costa Mesa, CA 92626-1873","American, Steakhouse, Gluten Free Options",4.5 of 5 bubbles,399 reviews
562,The Capital Grille,"Cherry Hill, NJ 08002-2100","American, Steakhouse, Gluten Free Options",4.5 of 5 bubbles,345 reviews
1311,The Capital Grille,"Plano, TX 75024-4013","American, Steakhouse, Vegan Options",4.5 of 5 bubbles,485 reviews
2126,The Capital Grille,"Paramus, NJ 07652-2404","American, Steakhouse, Gluten Free Options",4.5 of 5 bubbles,276 reviews
2444,The Capital Grille,"Dallas, TX 75201-1894","American, Steakhouse, Vegetarian Friendly",4.5 of 5 bubbles,437 reviews
2489,The Capital Grille,"Fort Worth, TX 76102-6247","American, Steakhouse, Vegetarian Friendly",4.5 of 5 bubbles,692 reviews


In [65]:
restaurant = restaurant.drop_duplicates()

In [66]:
restaurant.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2875 entries, 0 to 3061
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Name           2875 non-null   object
 1   Location       2875 non-null   object
 2   Type           2875 non-null   object
 3   Reviews        2875 non-null   object
 4   No of Reviews  2875 non-null   object
dtypes: object(5)
memory usage: 134.8+ KB


Get the City, and Country. Change Reviews and No of Reviews to float to get Weighted Reviews. All of that features gonna be recommendation reasons  

In [67]:
restaurant['City'] = restaurant['Location'].apply(lambda text: text.split(',')[0])
restaurant['Country'] = restaurant['Location'].apply(lambda text: text.split(',')[1] # Split first
                            .split(' ')[1] # Split second to get country
                            .replace(" ", "") # Remove White Space
                            )

restaurant['Reviews'] = restaurant['Reviews'].apply(lambda text: text.split(' ')[0]).astype(float)
restaurant['No of Reviews'] = restaurant['No of Reviews'].apply(lambda text: text.split(' ')[0]
                                                                .replace(',', '')
                                                                ).astype(float)

restaurant['Weighted Reviews'] = restaurant['Reviews']*restaurant['No of Reviews']

In [68]:
from sklearn.preprocessing import MinMaxScaler
# Standardize the Reviews and No of Reviews
mm = MinMaxScaler()

restaurant['Weighted Reviews'] = mm.fit_transform(restaurant['Weighted Reviews'].values.reshape(-1, 1))

In [69]:
# Set restaurant name as index
restaurant.set_index('Name', inplace=True)

## Modelling

In [70]:
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords

tfidf_city = TfidfVectorizer()

stop_words = stopwords.words('english') 
tfidf_type = TfidfVectorizer(stop_words=stop_words)

tfidf_country = TfidfVectorizer()

In [71]:
# Fit and transform to matrix
city_matrix = tfidf_city.fit_transform(restaurant['City'])
type_matrix = tfidf_type.fit_transform(restaurant['Type'])
country_matrix = tfidf_country.fit_transform(restaurant['Country'])

In [72]:
# Convert the sparse matrix to a DataFrame
city_df = pd.DataFrame(city_matrix.toarray(), columns=tfidf_city.get_feature_names_out(), index=restaurant.index)
type_df = pd.DataFrame(type_matrix.toarray(), columns=tfidf_type.get_feature_names_out(), index=restaurant.index)
country_df = pd.DataFrame(country_matrix.toarray(), columns=tfidf_country.get_feature_names_out(), index=restaurant.index)

reviews_df = pd.DataFrame(index=restaurant.index, data=restaurant['Weighted Reviews'])

# Concatenate the DataFrames
main_data = pd.concat([city_df, type_df, country_df, reviews_df], axis=1)

In [73]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute the cosine similarity matrix
cosine_sim = cosine_similarity(main_data)

# Convert the cosine similarity matrix to a DataFrame
cosine_sim_df = pd.DataFrame(cosine_sim, index=main_data.index, columns=main_data.index)

In [74]:
cosine_sim_df

Name,Betty Lou's Seafood and Grill,Coach House Diner,Table Talk Diner,Sixty Vines,The Clam Bar,E Tutto Qua,Black Angus Steakhouse - Federal Way,Ziziki's,Vince's Italian Restaurant & Pizzeria,John Thomas Steakhouse,...,Buffalo Chophouse,The Glass Tavern,Uncle Bill's Pancake House,El Mexicano,Crave Fishbar,Grazie,Indigo Kitchen & Ale House,BRIO Tuscan Grille,Maywood Pancake house,Porto Leggero
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Betty Lou's Seafood and Grill,1.000000,0.066114,0.066612,0.052185,0.142550,0.776587,0.000649,0.000287,0.067291,0.056343,...,0.000941,0.142200,0.116671,0.000908,0.108654,0.234521,0.086832,0.000622,0.165246,0.234043
Coach House Diner,0.066114,1.000000,0.333427,0.084213,0.039344,0.001382,0.037502,0.000085,0.074404,0.026483,...,0.044990,0.039240,0.522764,0.000267,0.030003,0.070216,0.140988,0.333425,0.356092,0.403322
Table Talk Diner,0.066612,0.333427,1.000000,0.084687,0.372941,0.004390,0.037906,0.000269,0.074544,0.360060,...,0.378612,0.372689,0.190012,0.333895,0.363704,0.403889,0.141364,0.000581,0.022935,0.070442
Sixty Vines,0.052185,0.084213,0.084687,1.000000,0.108108,0.004537,0.103022,0.333468,0.058246,0.021227,...,0.035799,0.107760,0.148498,0.000878,0.024459,0.055717,0.199358,0.067351,0.017960,0.055126
The Clam Bar,0.142550,0.039344,0.372941,0.108108,1.000000,0.148567,0.149012,0.000300,0.000268,0.363545,...,0.384529,0.666759,0.069593,0.333964,0.502097,0.334174,0.180558,0.097347,0.025930,0.000632
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Grazie,0.234521,0.070216,0.403889,0.055717,0.334174,0.088936,0.001010,0.000447,0.162317,0.393003,...,0.334289,0.333665,0.222899,0.334250,0.667394,1.000000,0.092390,0.088692,0.175325,0.424450
Indigo Kitchen & Ale House,0.086832,0.140988,0.141364,0.199358,0.180558,0.003809,0.505348,0.000233,0.430672,0.034970,...,0.059255,0.180294,0.248320,0.000737,0.039855,0.092390,1.000000,0.112410,0.029910,0.091928
BRIO Tuscan Grille,0.000622,0.333425,0.000581,0.067351,0.097347,0.087490,0.267302,0.000220,0.093553,0.123409,...,0.209866,0.097074,0.333729,0.000697,0.001092,0.088692,0.112410,1.000000,0.333431,0.421434
Maywood Pancake house,0.165246,0.356092,0.022935,0.017960,0.025930,0.001435,0.024698,0.000088,0.000078,0.246711,...,0.029649,0.025817,0.373386,0.000278,0.019861,0.175325,0.029910,0.333431,1.000000,0.508502


## Evaluation

In [75]:
def recommend_restaurants(restaurant_name, cosine_sim_df=cosine_sim_df, restaurant_data=restaurant, top=5):
    
    index = cosine_sim_df.loc[:, restaurant_name].to_numpy().argpartition(
        range(-1, -top, -1)
    )

    closest = cosine_sim_df.columns[index[-1:-(top+2):-1]]

    closest = closest.drop(restaurant_name, errors='ignore')

    return restaurant_data.loc[closest,]


In [76]:
restaurant.loc["Betty Lou's Seafood and Grill",:]

Location                            San Francisco, CA 94133-3908
Type                 Seafood, Vegetarian Friendly, Vegan Options
Reviews                                                      4.5
No of Reviews                                              243.0
City                                               San Francisco
Country                                                       CA
Weighted Reviews                                        0.049306
Name: Betty Lou's Seafood and Grill, dtype: object

In [77]:
# Test the recommendation system with Random Restaurant
recommend_restaurants("Betty Lou's Seafood and Grill")

Unnamed: 0_level_0,Location,Type,Reviews,No of Reviews,City,Country,Weighted Reviews
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Ristorante Franchino,"San Francisco, CA 94133-3907","Italian, Vegetarian Friendly, Vegan Options",4.5,429.0,San Francisco,CA,0.08775
Seven Hills,"San Francisco, CA 94109-3114","Seafood, Italian, Vegetarian Friendly",4.5,923.0,San Francisco,CA,0.189854
Quince,"San Francisco, CA 94133-4610","French, Vegetarian Friendly, Vegan Options",4.5,545.0,San Francisco,CA,0.111726
Pacific Cafe,"San Francisco, CA 94121-1623","American, Seafood, Gluten Free Options",4.5,241.0,San Francisco,CA,0.048893
Pacific Catch,"San Francisco, CA 94123-2701","Hawaiian, Seafood, Vegetarian Friendly",4.5,987.0,San Francisco,CA,0.203082
