## Modeling 

**Name**: Diane Lu

**Contact**: dianengalu@gmail.com

**Date**: 07/18/2023

### Table of Contents 

1. [Introduction](#intro)
2. [Final Dataset](#final)
    * Data Dictionary

### Introduction <a class="anchor" id="intro"></a>

asdf

#### Importing Python Libraries 

Importing necessary libraries for the EDA process.

In [1]:
# Import the basic packages
import numpy as np 
import pandas as pd 

from sklearn.metrics.pairwise import cosine_similarity

# Import the surprise packages
from surprise import SVD
from surprise.reader import Reader
from surprise import Dataset
from surprise.prediction_algorithms.matrix_factorization import SVD as FunkSVD
from surprise.model_selection import train_test_split
from surprise.model_selection import GridSearchCV
from surprise import accuracy

import warnings

# Ignore all warnings
warnings.filterwarnings("ignore")

### Final Dataset for Modeling <a class="anchor" id="final"></a>

**Data Dictionary:**
* `review_id`: unique review id
* `user_id`: unique user id
* `business_id`: unique user id
* `stars`: star rating
* `text`: the review itself
* `restaurant_name`: the restaurant's name
* `address`: the full address of the restaurant
* `city`: the city
* `state`: 2 character state code
* `postal_code`: the postal code
* `latitude`: latitude of the restaurant
* `longitude`: longitude of the restaurant
* `restaurant_rating`: star rating
* `restaurant_review_count`: number of reviews
* `user_review_count`: the number of reviews they've written
* `is_open`: 0 or 1 for closed or open
* `categories`: business categories
* `user_name`: the user's first name
* `average_stars`: average rating of all reviews

In [2]:
final_data = pd.read_pickle('/Users/diane/Desktop/BrainStation/Brainstation_Capstone/yelp_data/final_reviews.pkl')

In [3]:
final_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5574714 entries, 0 to 5574713
Data columns (total 19 columns):
 #   Column                   Dtype  
---  ------                   -----  
 0   review_id                object 
 1   user_id                  object 
 2   business_id              object 
 3   stars                    float64
 4   text                     object 
 5   restaurant_name          object 
 6   address                  object 
 7   city                     object 
 8   state                    object 
 9   postal_code              object 
 10  latitude                 float64
 11  longitude                float64
 12  restaurant_rating        float64
 13  restaurant_review_count  int64  
 14  is_open                  int64  
 15  categories               object 
 16  name                     object 
 17  user_review_count        int64  
 18  average_stars            float64
dtypes: float64(5), int64(3), object(11)
memory usage: 808.1+ MB


In [4]:
final_data.head()

Unnamed: 0,review_id,user_id,business_id,stars,text,restaurant_name,address,city,state,postal_code,latitude,longitude,restaurant_rating,restaurant_review_count,is_open,categories,name,user_review_count,average_stars
0,lWC-xP3rd6obsecCYsGZRg,ak0TdVmGKo4pwqdJSTLwWw,buF9druCkbuXLX526sGELQ,4.0,Apparently Prides Osteria had a rough summer a...,Prides Osteria,240 Rantoul St,Beverly,MA,1915,42.549609,-70.884046,3.5,83,0,"[Wine Bars, Nightlife, Farmers Market, Bars, I...",Mel,63,4.3
1,fLlML7BjkR4_fJnND_hEJw,ak0TdVmGKo4pwqdJSTLwWw,bNZ3-0rse12NKdSVqQ30xw,4.0,"Came with friends, split the funghi pizza and ...",Sulmona,608 Main St,Cambridge,MA,2139,42.362867,-71.093846,4.0,220,1,"[Pizza, Italian, Nightlife, Bars]",Mel,63,4.3
2,pRtbswupEVIG1Ykj9xkL7Q,ak0TdVmGKo4pwqdJSTLwWw,BVsIaKL-8QXVjt0Z9WoFWw,4.0,Went for late lunch had the combination seafoo...,Village Roast Beef & Seafood,10 Bessom St,Marblehead,MA,1945,42.500243,-70.859237,4.5,53,1,"[Seafood, American (Traditional)]",Mel,63,4.3
3,fUYl6bnZy4bSGnbPAizXug,ak0TdVmGKo4pwqdJSTLwWw,4MClvr12OXBNvGu8h1yGpA,5.0,"We were super excited to try Sarma, having bee...",Sarma,249 Pearl St,Somerville,MA,2145,42.38818,-71.095545,4.5,883,1,"[Turkish, Middle Eastern, Moroccan, Tapas/Smal...",Mel,63,4.3
4,jHh2LIXNsnJCMUiyI9pt5w,ak0TdVmGKo4pwqdJSTLwWw,2vH58mhkEl8GdcDug1OwWg,5.0,So glad we made the trip to Woburn for Gene's ...,Gene's Chinese Flatbread Cafe,466 Main St,Woburn,MA,1801,42.481598,-71.150877,4.0,233,1,"[Cafes, Noodles, Chinese]",Mel,63,4.3


In [5]:
final_data.isnull().sum()

review_id                  0
user_id                    0
business_id                0
stars                      0
text                       0
restaurant_name            0
address                    0
city                       0
state                      0
postal_code                0
latitude                   0
longitude                  0
restaurant_rating          0
restaurant_review_count    0
is_open                    0
categories                 0
name                       0
user_review_count          0
average_stars              0
dtype: int64

Creating a threshold where we are only including restaurants where `restaurant_review_count` is greater than 100. 

In [6]:
final_data = final_data[final_data['restaurant_review_count'] >= 100]

print(f"Sanity Check: The minimum amount of restaurant reviews is {final_data['restaurant_review_count'].min()}.")

Sanity Check: The minimum amount of restaurant reviews is 100.


Creating a threshold where we are only including users where `user_review_count` is greater than 100. 

In [7]:
final_data = final_data[final_data['user_review_count'] >= 100]

print(f"Sanity Check: The minimum amount of user reviews is {final_data['user_review_count'].min()}.")

Sanity Check: The minimum amount of user reviews is 100.


In [8]:
# Filter data for the city of Vancouver
vancouver_data = final_data[final_data['city'] == 'Vancouver']

# Sort the filtered data by the 'review_count' column in descending order
vancouver_data_sorted = vancouver_data.sort_values('restaurant_review_count', ascending=False)

# Drop duplicates of 'restaurant_name' to get unique restaurants
vancouver_data_unique = vancouver_data_sorted.drop_duplicates(subset='restaurant_name')

# Display the top 5 restaurants with the highest review_count
top_10_restaurants = vancouver_data_unique.head(10)
top_10_restaurants[['business_id', 'restaurant_name', 'address', 'city', 'state', 'restaurant_rating', 'restaurant_review_count', 'categories']]

Unnamed: 0,business_id,restaurant_name,address,city,state,restaurant_rating,restaurant_review_count,categories
463748,VPqWLp9kMiZEbctCebIZUA,Medina Cafe,780 Richards Street,Vancouver,BC,4.0,2302,"[Bars, Moroccan, Wine Bars, Breakfast & Brunch..."
4768085,4EV_ZcQmjAmP3pmO-_nb2A,Miku,"200 Granville Street, Suite 70",Vancouver,BC,4.5,1805,"[Japanese, Sushi Bars]"
5096244,_4R46MNkwx9MeOyt0YfNxA,Chambar,568 Beatty Street,Vancouver,BC,4.0,1356,"[Cafes, Middle Eastern, Nightlife, Breakfast &..."
188259,yeNenSjz_HCqngGFU5d8NQ,Phnom Penh,244 E Georgia Street,Vancouver,BC,4.0,1306,"[Vietnamese, Cambodian]"
150316,LjdbthVdtLYKSi7iVAFl0g,Jam Cafe on Beatty,556 Beatty Street,Vancouver,BC,4.5,1097,[Breakfast & Brunch]
1393114,NdEPf2Ls5Ql3_nkwjqKvXA,The Flying Pig - Yaletown,"1168 Hamilton Street, Unit 104",Vancouver,BC,4.0,1092,"[American (Traditional), American (New), Canad..."
179858,2cXOMeyBCx4JFgs5-CJQdQ,Joe Fortes Seafood & Chop House,777 Thurlow Street,Vancouver,BC,4.0,1037,"[Seafood, Steakhouses, Nightlife, American (Tr..."
917619,uAROEz8D29elXoNxjnPrkQ,Twisted Fork,213 Carral Street,Vancouver,BC,4.0,1032,"[American (New), Breakfast & Brunch, French]"
4960546,0iEFOEQIvk7RFcOo_jkOGA,Japadog,899 Burrard Street,Vancouver,BC,4.0,987,"[Food Stands, Japanese, Hot Dogs]"
1733867,i0xnuLimVcuSoBqO265obA,Hokkaido Ramen Santouka,1690 Robson Street,Vancouver,BC,4.0,949,"[Ramen, Japanese, Noodles]"


### Collaborative-Filtering Recommendation System without SVD

Collaborative filtering is a general technique used in recommendation systems to predict user preferences based on the preferences of similar users. It does not involve matrix factorization. Instead, it relies on computing similarities between users or items to generate recommendations. Collaborative filtering without SVD directly operates on the user-item interaction matrix and may use various similarity metrics to find similar users or items. 

In [9]:
# User-Item Interaction Matrix
user_item_matrix = final_data.pivot_table(index='user_id', columns='restaurant_name', values='stars').fillna(0)
user_item_matrix.sample(5)

restaurant_name,Gruby's New York Deli,'Ohana,/pôr/ wine house,10 Barrel Brewing Portland,10 Degrees South,101 Beer Kitchen,101 By Teahaus,101 Steak,10th & Piedmont,110 Grill,...,laV,mmmpanadas,nati's southern seafood boil,sweetgreen,wagamama,wagamama - faneuil hall,wagamama - prudential,wagamama - seaport,zpizza,ñoños tacos
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1lyz6hets-121LBYkulZsA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BYAri0ueniTgBItTbgrylA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
UMh9KhqlScXlkuYq8HQT4Q,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
kiKyT3FN1H3d3jSTU5I3zg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
sihX-ByGF0AnW7kP2nRF6g,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# Similarity Calculation (Cosine Similarity)
user_similarity = cosine_similarity(user_item_matrix)

In [None]:
# Function to get top N recommendations for a user
def get_top_N_recommendations(user_id, N=5):
    user_index = user_item_matrix.index.get_loc(user_id)

    similar_users = user_similarity[user_index]

    top_similar_users_indices = similar_users.argsort()[::-1][1:N+1]  # Exclude the user itself

    top_recommendations = user_item_matrix.iloc[top_similar_users_indices].mean(axis=0)
    top_recommendations = top_recommendations.sort_values(ascending=False)
    
    return top_recommendations.index.tolist()

In [None]:
# Example: Get top 5 recommendations for a user with user_id = 123
user_id = 123
top_recommendations = get_top_N_recommendations(user_id, N=5)
print(top_recommendations)

In [9]:
# User-Item Interaction Matrix
user_item_matrix = final_data.pivot_table(index='user_id', columns='restaurant_name', values='stars')

In [10]:
# Fill missing values (NaNs) with zeros
user_item_matrix = user_item_matrix.fillna(0)

In [11]:
user_item_matrix.shape

(81142, 14323)

In [13]:
# Displaying the first few rows to get an initial glimpse of the data
user_item_matrix.head()

business_id,--164t1nclzzmca7eDiJMw,--Q3mAcX9t63f7Xcbn7LVA,--UNNdnHRhsyFUbDgumdtQ,-0A60UZl9nbdq2WWySJ_tQ,-0iqnv7MjKrgh7Q7bYRlUQ,-0sIQ96u8XevGUXZ--pvaA,-1ShItlulHnBsoOQWnblzw,-1h2qkElNfKjUPw6brMbIw,-1mmKpu7b_NlBit2pOOPnQ,-1sIJLX71taHD-BgbwY64Q,...,zvKfCAOBzVcxc1HLpoIY8A,zwKIQgthba1FUPWS7nOo0w,zwhSGiftT_yzKSEmMCol6Q,zwn53gHyn1NlX9h3jKFOUg,zyBC3BUkH9klhPhMyQmxAQ,zyHMtStYlKG67WRprp6GZQ,zyauuvAYdVweBK4L7wBRmw,zz4WGzntV59HqhefV5zigQ,zzin1d1oHi81GuI0ufo1VA,zzlkjDG9Rv8Jn-vSolMgyw
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
--0zxhZTSLZ7w1hUD2bEwA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
--17Db1K-KujRuN7hY9Z0Q,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
--2vR0DIsmQ6WfcSzKWigw,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
--3WaS23LcIXtxyFULJHTA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
--3l8wysfp49Z2TLnyT0vg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
similarity_scores = cosine_similarity(user_item_matrix)
similarity_scores.shape

: 

: 

In [None]:
def recommend(restaurant_name):
    # Find the index of the input restaurant name in the pivot table
    index = np.where(user_item_matrix.index == restaurant_name)[0][0]

    # Retrieve the similarity scores of the input restaurant with other restaurants,
    # sort them in descending order, and select the top 4 similar items
    similar_items = sorted(list(enumerate(similarity_scores[index])), key=lambda x: x[1], reverse=True)[1:5]

    # Initialize an empty list to store recommended restaurant names
    data = []

    # Iterate through each similar item
    for i in similar_items:
        # Fetch the relevant restaurant name from the 'business_data' dataset
        similar_restaurant_name = user_item_matrix.index[i[0]]

        # Append the restaurant name to the 'data' list
        data.append(similar_restaurant_name)

    # Return the 'data' list containing names of the recommended restaurants
    return data

### Collaborative-Filtering Recommendation System with FunkSVD

In [43]:
# User-Item Interaction Matrix
user_item_matrix = final_data.pivot_table(index='user_id', columns='restaurant_name', values='stars')

In [44]:
user_item_matrix.shape

(81142, 12192)

In [45]:
# Displaying the first few rows to get an initial glimpse of the data
user_item_matrix.head()

restaurant_name,Gruby's New York Deli,'Ohana,/pôr/ wine house,10 Barrel Brewing Portland,10 Degrees South,101 Beer Kitchen,101 By Teahaus,101 Steak,10th & Piedmont,110 Grill,...,laV,mmmpanadas,nati's southern seafood boil,sweetgreen,wagamama,wagamama - faneuil hall,wagamama - prudential,wagamama - seaport,zpizza,ñoños tacos
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
--0zxhZTSLZ7w1hUD2bEwA,,,,,,,,,,,...,,,,,,,,,,
--17Db1K-KujRuN7hY9Z0Q,,,,,,,,,,,...,,,,,,,,,,
--2vR0DIsmQ6WfcSzKWigw,,,,,,,,,,,...,,,,,,,,,,
--3WaS23LcIXtxyFULJHTA,,,,,,,,,,,...,,,,,,,,,,
--3l8wysfp49Z2TLnyT0vg,,,,,,,,,,,...,,,,,,,,,,


In [60]:
user_item_matrix.columns

Index([' Gruby's New York Deli', ''Ohana', '/pôr/ wine house',
       '10 Barrel Brewing Portland', '10 Degrees South', '101 Beer Kitchen',
       '101 By Teahaus', '101 Steak', '10th & Piedmont', '110 Grill',
       ...
       'laV', 'mmmpanadas', 'nati's southern seafood boil', 'sweetgreen',
       'wagamama', 'wagamama - faneuil hall', 'wagamama - prudential',
       'wagamama - seaport', 'zpizza', 'ñoños tacos'],
      dtype='object', name='restaurant_name', length=12192)

In [46]:
# Set the reader with accurate rating scale
my_reader = Reader(rating_scale=(1, 5))

# Create the dataset using the reader object and the rating DataFrame
my_dataset = Dataset.load_from_df(final_data[['user_id', 'restaurant_name', 'stars']], my_reader)

In [47]:
my_dataset

<surprise.dataset.DatasetAutoFolds at 0x3d8ed9bb0>

In [49]:
# Set the parameter grid
param_grid = {
    'n_factors': [100, 150], 
    'n_epochs': [10, 20],
    'lr_all': [0.005, 0.1],
    'biased': [False] } #The parameter indicates to the algorithm that all latent information must be stored. 

# Set GridSearchCV with 3 cross-validation
GS = GridSearchCV(SVD, param_grid, measures=['fcp'], cv=3)

# Fit the model with the grid search on the training set
GS.fit(my_dataset)

# Get the best hyperparameters
best_params = GS.best_params['fcp']
print("Best Hyperparameters:", best_params)

Best Hyperparameters: {'n_factors': 100, 'n_epochs': 10, 'lr_all': 0.005, 'biased': False}


In [50]:
# Split train-test set 
trainset, testset = train_test_split(my_dataset, test_size=0.25)

In [51]:
# Set the algorithm
my_svd = FunkSVD(n_factors=100, 
                 n_epochs=10, 
                 lr_all=0.005, 
                 biased=False,
                 verbose=0)
# Fit train set
my_svd.fit(trainset)

# Test the algorithm using test set
my_pred = my_svd.test(testset)

In [52]:
# Access the P and Q matrices from the fitted model
P = my_svd.pu  # User matrix (P)
P
Q = my_svd.qi  # Item matrix (Q)
Q

array([[ 0.06194114, -0.14112599, -0.31683039, ..., -0.16282319,
         0.17948535, -0.2134076 ],
       [-0.34110531,  0.29034201, -0.00711049, ...,  0.06596525,
         0.34702696, -0.17407634],
       [-0.13488896,  0.43430532, -0.50093401, ..., -0.14135051,
         0.17353653,  0.30471074],
       ...,
       [-0.02159553,  0.13798291,  0.01950472, ..., -0.07777211,
        -0.0902592 , -0.04440696],
       [-0.08594597,  0.1347763 , -0.08041756, ...,  0.09145995,
         0.18483861, -0.16068039],
       [-0.19825591, -0.11820167, -0.01589753, ...,  0.19685929,
         0.00525314, -0.26347986]])

In [53]:
# Put my_pred result in a dataframe
df_prediction = pd.DataFrame(my_pred, columns=['user_id',
                                                'business_id',
                                                'actual',
                                                'prediction',
                                                'details'])

# Calculate the difference of actual and prediction into diff column
df_prediction['diff'] = abs(df_prediction['prediction'] - 
                            df_prediction['actual'])

In [54]:
# Check the df_prediction
df_prediction.head()

Unnamed: 0,user_id,business_id,actual,prediction,details,diff
0,hYlCMQ278BvKv9IP9v_m4w,Dinesty Dumpling House,1.0,3.298706,{'was_impossible': False},2.298706
1,XUQjZyApQXImNifP-2tAFQ,The Original Hoffbrau,5.0,3.201321,{'was_impossible': False},1.798679
2,Pf7FI0OukC_CEcCz0ZxoUw,KOi Fusion,5.0,4.448637,{'was_impossible': False},0.551363
3,g37Y_WmgPcJI9bf_kPV2Og,First Printer,4.0,2.085826,{'was_impossible': False},1.914174
4,ZveYZ3n1IOjP9H4HfFn3Yg,Fabian's,5.0,3.457422,{'was_impossible': False},1.542578


In [55]:
# See the best 10 predictions
df_prediction.sort_values(by='diff')[:10]

Unnamed: 0,user_id,business_id,actual,prediction,details,diff
242072,UZ8_xqhiguIYb9Lu2Wu8og,Museum Of Fine Arts,5.0,5.0,{'was_impossible': False},0.0
49595,9EB_WZ5Lw991mrnfkzkqvQ,Sushi Zanmai,5.0,5.0,{'was_impossible': False},0.0
102938,oSN3M4_WKdlTsnpgqPDiBg,Powell's City of Books,5.0,5.0,{'was_impossible': False},0.0
240896,lGxssT2UmyNZQZWwPDgX3A,Bar Mezzana,5.0,5.0,{'was_impossible': False},0.0
102990,0d89GUvxpJG4oFeL9rtUxQ,Tako Cheena,5.0,5.0,{'was_impossible': False},0.0
240911,nxI8n6lARJpMP5SI8U9S6w,Le Pigeon,5.0,5.0,{'was_impossible': False},0.0
6394,g3UbQdtWX1Luh9_FGIeCAw,Schmidt's Sausage Haus,5.0,5.0,{'was_impossible': False},0.0
102997,Je-c4Qu5od0DwPmYeHYOVg,Screen Door,5.0,5.0,{'was_impossible': False},0.0
280476,krWkC-U2U_YAtYdAvuRwAQ,Santarpio's Pizza,5.0,5.0,{'was_impossible': False},0.0
49526,7mL5GK8Qt3iIkNHfPsGnkg,Ball Square Cafe,5.0,5.0,{'was_impossible': False},0.0


In [56]:
(df_prediction["diff"] <= 1).mean()

0.6014563800547057

In [57]:
# Calculate RMSE
rmse = accuracy.rmse(my_pred)

# Calculate MAE
mae = accuracy.mae(my_pred)

RMSE: 1.3122
MAE:  1.0054


In [58]:
def recommend(restaurant_name, user_item_matrix, P, Q, top_n=5):
    # Find the index of the input restaurant name in the pivot table
    index = user_item_matrix.index.get_loc(restaurant_name)

    # Predict the ratings for the input restaurant using the FunkSVD model
    predicted_ratings = np.dot(P, Q.T)
    restaurant_ratings = predicted_ratings[index, :]

    # Get the indices of top recommended restaurants based on predicted ratings
    top_indices = np.argsort(restaurant_ratings)[::-1][:top_n]

    # Convert the indices to restaurant names
    recommended_restaurants = user_item_matrix.columns[top_indices]

    return recommended_restaurants

In [59]:
recommend('Miku', user_item_matrix, P, Q, top_n=5)

KeyError: 'Miku'