## Modelling

In this section we will create a recommendation system using the datasets to solve our main problem.
There are different types of recomentation models, in this project we will focus on three types of recommentation systems

* 1. Content-Based Recommender systems
* 2. Collaborative Filtering Systems
* 3. Deep Neural Networks

Now, in each of these categories we will compare the different models and see which ones perform best. For validation and comparison we will use the RMSE (root mean squared error) metric, that is how far is the prediction from the true value.

### 1. CONTENT BASED FILTERING

By utilizing restaurant features such as types of cuisine they offer or if they have WiFi, Alcohol, Happy Hour, Noise Level, Restaurants Attire, Wheelchair Accessible, Restaurants TableService etc, we are able to use cosine similarity to recommend the  restaurants with the closest similarity.



In [22]:
# Suppressing warnings
import warnings
warnings.filterwarnings('ignore')

# Core libraries
import numpy as np
import pandas as pd
import pickle
import requests

# Text processing and NLP
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords

# Machine learning and model selection
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from surprise import Reader, Dataset
from surprise.model_selection import cross_validate, GridSearchCV

# Deep learning with TensorFlow
from tensorflow.keras import models, layers, optimizers, losses, regularizers, metrics

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import folium

# Utility functions
from tabulate import tabulate

# Custom imports
from understanding import DataLoader, DataInfo


#### i) Cleaned Restaurant Informational Data

In [28]:
# Loading the restaurant data from the pickled file
df = pd.read_pickle('pickled_files/restaurant_data.pkl')

# Overview of dataset information to understand the features we require
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38552 entries, 0 to 38551
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   business_id      38552 non-null  object 
 1   name             38552 non-null  object 
 2   address          38552 non-null  object 
 3   city             38552 non-null  object 
 4   state            38552 non-null  object 
 5   postal_code      38552 non-null  object 
 6   latitude         38552 non-null  float64
 7   longitude        38552 non-null  float64
 8   stars            38552 non-null  float64
 9   review_count     38552 non-null  int64  
 10  is_open          38552 non-null  int64  
 11  attributes       38552 non-null  object 
 12  categories       38552 non-null  object 
 13  hours            38552 non-null  object 
 14  location         38552 non-null  object 
 15  attributes_true  38552 non-null  object 
dtypes: float64(3), int64(2), object(11)
memory usage: 4.7+ MB


In [29]:
# Preprocessing function
def preprocess(df):
    """
    Function to preprocess the data to combine the needed features into one column
    Returns a dataframe with the combined_features columns
    """
    filtered_df=df.copy()
    # Combining the features into one column
    filtered_df['combined_features'] = (
                                        filtered_df['attributes'] + " " +
                                        filtered_df['attributes_true'] 
                                        )
    # resetting the index
    filtered_df = filtered_df.reset_index(drop=True)

    # Return turns the filtered df
    return filtered_df

In [30]:
# Vectorization function
def create_feature_vectors(df):
    """
    Performing vectorization of the preprocessed categorical features 
    and combining with the numerical features
    """
    # Vectorize the combined text features
    tfidf = TfidfVectorizer(stop_words='english')
    tfidf_matrix = tfidf.fit_transform(df['combined_features'])
    
    # Combine the TF-IDF matrix with numerical columns
    numerical_features = df[['stars']].values
    combined_features = np.hstack((tfidf_matrix.toarray(), numerical_features))
    
    return combined_features

Using the cosine similarity matrix we will now create a content-based recommendation system that offers recommendations to users based on the restaurant names or text words representing the specifications of their desired restaurant and attributes.

We use the cosine similarity matrix to compare similarities between different restaurants and the customer's preferences, then pick the top n similar restaurants to recommend based on his/her input.

In [32]:
# Recommendation function
def recommendation(df, state, name=None, category=None):
    """
    Creates recommendation based on name or category/cuisine using cosine similarity and filtering
    Returns a dataframe containing name, state, city, address, stars and categories
    """
    preprocessed = preprocess(df)
    
    def cuisines(cuisine=None, state=state):
        """
        Function to filter to get the recommendations based on cuisine input
        """
        preprocessed=df[df["state"]==state]
        cuisine_df = preprocessed[preprocessed['categories'] == cuisine]
        cuisine_df_sorted = cuisine_df.sort_values(by=["stars", "city"], ascending=False)
        return cuisine_df_sorted[['name', 'state', 'city', 'stars', 'address', 'categories']]
    
    if name:
        if name not in preprocessed['name'].values:
            raise ValueError(f"Restaurant with name '{name}' not found in the filtered data.")

        # Finding the index of the restaurant name
        idx = preprocessed[preprocessed['name'] == name].index[0]
        exclude_names = [name]

        # Locating the restaurant row in the preprocessed df 
        row_to_add = preprocessed.iloc[idx]
        
        # convering it to a df
        row_to_add_df = pd.DataFrame([row_to_add])     
        
        #generating a df for only the state i want to recommend in
        specific_state= preprocessed[preprocessed["state"] == state]
        
        # concatinating it to the specific state df and reseting the index
        specific_state = pd.concat([specific_state, row_to_add_df]).reset_index(drop=True)
        
        # Finding the new index for the restaurant name
        idx = specific_state[specific_state['name'] == name].index[0]
        
        # Creating feature vectors
        combined_features = create_feature_vectors(specific_state)

        # Finding the cosine similarity
        cosine_sim = cosine_similarity(combined_features, combined_features)

        # Finding the top indices of the restaurants to recommend
        sim_scores = list(enumerate(cosine_sim[idx]))
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
        top_indices = [i[0] for i in sim_scores]  

        # Finding the rows of the top recommended restaurants
        recommended_restaurants = specific_state.iloc[top_indices]
        recommended_restaurants = recommended_restaurants[~recommended_restaurants['name'].isin(exclude_names)]        

        # Return a df with the required features
        return recommended_restaurants[['name', 'state', 'city', 'stars', 'address','categories']].drop_duplicates(subset='name')[:20]
    
    elif category:
        # Filter based on cuisine/cateogry
        return cuisines(category)

The content_based function uses content-based recommendation techniques to provide restaurant recommendations based on user input preferences, restaurant names, or user-defined text. 

In [33]:
# Example recommendations based on state, name
restaurants = recommendation(df, state="Indiana",  name="Coup de Taco")
restaurants.head()

Unnamed: 0,name,state,city,stars,address,categories
1785,El Torito Grill,Indiana,Indianapolis,4.5,8650 Keystone Crossing,Mexican
40,Taste of China,Indiana,Whiteland,4.0,989 N US 31,Chinese
2133,Hong Kong Inn,Indiana,Indianapolis,4.0,8079 E 38th St,Chinese
2062,Diarra's Cuisine,Indiana,Indianapolis,3.5,"2989 W 71st St, Ste 3",African
458,WB Pizza,Indiana,Indianapolis,4.5,2290 W 86th St,American (Traditional)


In [34]:
# Example recommendations based on state, category/cuisine
cuisines = recommendation(df, state="Indiana",  category="Italian")
cuisines.head()

Unnamed: 0,name,state,city,stars,address,categories
19189,Greek’s Pizzeria- Indianapolis,Indiana,Indianapolis,5.0,1601 Columbia Ave,Italian
30169,I Tre Mori,Indiana,Indianapolis,5.0,"8220 E 106th St, Ste 200",Italian
35845,The Twisted Sicilian,Indiana,Indianapolis,5.0,Unknown,Italian
12466,Ciao by Villaggio,Indiana,Zionsville,4.5,40 S Main St,Italian
21713,Convivio Italian Artisan Cuisine - Zionsville,Indiana,Zionsville,4.5,40 S Main St,Italian


### COLLABORATIVE FILTERING MODELS


Here the tasks related to building a collaborative filtering recommendation system using the Surprise library are undertaken for collaborative filtering by selecting the relevant columns, importing the Surprise library, initializing a Reader object to specify the data format, and then loading the data into a Surprise Dataset object for further analysis and model building.

#### ii) Cleaned User Review Data

In [35]:
# Loading the users csv file
users_data= pd.read_csv("data/users (1).csv")

# Summary information on the user review data
print(f'\nUSER DATASET INFORMATION\n' + '=='*20 + '\n')
users_data.info()


USER DATASET INFORMATION

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4724684 entries, 0 to 4724683
Data columns (total 3 columns):
 #   Column       Dtype 
---  ------       ----- 
 0   user_id      object
 1   business_id  object
 2   stars        int64 
dtypes: int64(1), object(2)
memory usage: 108.1+ MB


In [36]:
# merging the two datasets into one using the business_id primary key

data=pd.merge(left=users_data, right=df, how='inner', on='business_id')

# previewing the new merge dataset
data.head()


Unnamed: 0,user_id,business_id,stars_x,name,address,city,state,postal_code,latitude,longitude,stars_y,review_count,is_open,attributes,categories,hours,location,attributes_true
0,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5,Zaika,2481 Grant Ave,Philadelphia,Pennsylvania,19114,40.079848,-75.02508,4.0,181,1,"{'Caters': 'True', 'Ambience': ""{'romantic': F...",Halal,"{'Tuesday': '11:0-21:0', 'Wednesday': '11:0-21...","State:Pennsylvania, City:Philadelphia, Address...",Caters Ambience_casual BikeParking Restaurants...
1,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5,Zaika,2481 Grant Ave,Philadelphia,Pennsylvania,19114,40.079848,-75.02508,4.0,181,1,"{'Caters': 'True', 'Ambience': ""{'romantic': F...",Pakistani,"{'Tuesday': '11:0-21:0', 'Wednesday': '11:0-21...","State:Pennsylvania, City:Philadelphia, Address...",Caters Ambience_casual BikeParking Restaurants...
2,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5,Zaika,2481 Grant Ave,Philadelphia,Pennsylvania,19114,40.079848,-75.02508,4.0,181,1,"{'Caters': 'True', 'Ambience': ""{'romantic': F...",Indian,"{'Tuesday': '11:0-21:0', 'Wednesday': '11:0-21...","State:Pennsylvania, City:Philadelphia, Address...",Caters Ambience_casual BikeParking Restaurants...
3,kSMOJwJXuEUqzfmuFncK4A,kxX2SOes4o-D3ZQBkiMRfA,2,Zaika,2481 Grant Ave,Philadelphia,Pennsylvania,19114,40.079848,-75.02508,4.0,181,1,"{'Caters': 'True', 'Ambience': ""{'romantic': F...",Halal,"{'Tuesday': '11:0-21:0', 'Wednesday': '11:0-21...","State:Pennsylvania, City:Philadelphia, Address...",Caters Ambience_casual BikeParking Restaurants...
4,kSMOJwJXuEUqzfmuFncK4A,kxX2SOes4o-D3ZQBkiMRfA,2,Zaika,2481 Grant Ave,Philadelphia,Pennsylvania,19114,40.079848,-75.02508,4.0,181,1,"{'Caters': 'True', 'Ambience': ""{'romantic': F...",Pakistani,"{'Tuesday': '11:0-21:0', 'Wednesday': '11:0-21...","State:Pennsylvania, City:Philadelphia, Address...",Caters Ambience_casual BikeParking Restaurants...


### Renaming columns

Renaming the **stars_x** and **stars_y** columns into **rating** and **b/s_rating** columns for better understanding

In [37]:
data.rename(columns={'stars_x':'b/s_rating', 'stars_y':'rating'}, inplace=True)

In [38]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4404612 entries, 0 to 4404611
Data columns (total 18 columns):
 #   Column           Dtype  
---  ------           -----  
 0   user_id          object 
 1   business_id      object 
 2   b/s_rating       int64  
 3   name             object 
 4   address          object 
 5   city             object 
 6   state            object 
 7   postal_code      object 
 8   latitude         float64
 9   longitude        float64
 10  rating           float64
 11  review_count     int64  
 12  is_open          int64  
 13  attributes       object 
 14  categories       object 
 15  hours            object 
 16  location         object 
 17  attributes_true  object 
dtypes: float64(3), int64(3), object(12)
memory usage: 604.9+ MB


> First , we will model a baseline SVD() model using the default parameters.

## Baseline model

In [39]:
import pandas as pd
from surprise import Dataset, Reader, SVD, accuracy, NormalPredictor
from surprise.model_selection import train_test_split as surprise_train_test_split

 #selecting specific columns that are relevant for collaborative filtering models
new_df = data[['user_id', 'business_id', 'rating']]

# using Reader() from surprise module to convert dataframe into surprise dataformat
# instantiating a readerobject
reader = Reader(rating_scale=(1, 5))

# using the reader to read the trainset
data_2 = Dataset.load_from_df(new_df,reader)

dataset = data_2.build_full_trainset()

print('Number of users: ', dataset.n_users, '\n')
print('Number of Restaurants: ', dataset.n_items)

Number of users:  1134020 

Number of Restaurants:  29354


In [41]:
from surprise.model_selection import GridSearchCV, cross_validate, train_test_split 
# Split the data into training and test sets
trainset, testset = train_test_split(data_2, test_size=0.25)

# Initialize the SVD algorithm
model = NormalPredictor()

# Train the model on the training set
model.fit(trainset)

# Predict ratings for the test set
predictions = model.test(testset)

# Compute rmse
accuracy.rmse(predictions)

RMSE: 0.8108


0.8108282260080057

In [42]:
# Initialize the SVD algorithm
model = SVD()

# Train the model on the training set
model.fit(trainset)

# Predict ratings for the test set
predictions = model.test(testset)

# Compute RMSE
accuracy.rmse(predictions)

RMSE: 0.0909


0.09089867290929593

In [None]:
# using cross-validate to get the test rmse scores for 5 splits
results=cross_validate(model, data_2, cv=5, n_jobs=-1)


for values in results.items():
    print(values)
print("-------------------------")
print("Mean RMSE: ",results['test_rmse'].mean())

## 

Using the GridSearchCv we will tune the SVD model in order to improve the training RMSE scores.

In [14]:
# define a dictionary params with hyperparameter values to be tested
params = {'n_factors': [20, 50, 100], # number of factors for matrix factorization
         'reg_all': [0.02, 0.05, 0.1]} # regularization term
# create a GridSearchCV object 'g_s_svd' for hyperparameter tuning
g_s_svd = GridSearchCV(SVD,param_grid=params,n_jobs=-1) # specify the algorithm (SVD) to be tuned
# fit the GridSearchCV object to the data to find the best hyperparameters
g_s_svd.fit(data_2)

Here we perform hyperparameter tuning for the SVD collaborative filtering model using grid search and cross-validation. It tests different values of the number of latent factors (n_factors) and the regularization term (reg_all) to find the combination that results in the best model performance. The final best hyperparameters can be accessed from the g_s_svd object for use in the model.

In [15]:
print(g_s_svd.best_score)
print(g_s_svd.best_params)

{'rmse': 1.2171066185237724, 'mae': 0.9478990451899783}
{'rmse': {'n_factors': 100, 'reg_all': 0.02}, 'mae': {'n_factors': 100, 'reg_all': 0.02}}


The RMSE value for the optimized SVD model is approximately 1.254, indicating the model's average prediction error in terms of user ratings. Lower RMSE values are desirable as they signify better predictive accuracy.                              
The MAE value for the optimized SVD model is approximately 1.01, representing the average absolute difference between predicted and actual user ratings. A lower MAE indicates improved prediction accuracy.                                            
The best-performing hyperparameter values are as follows:                       
1) For RMSE, the optimal hyperparameters are 'n_factors' = 20 and 'reg_all' = 0.05.
2) For MAE, the optimal hyperparameters are 'n_factors' = 20 and 'reg_all' = 0.02.   
These results indicate that the SVD collaborative filtering model, when configured with these hyperparameters, provides a relatively low prediction error and is well-suited for making personalized recommendations based on user ratings.

In [16]:
# created an instance of the SVD model with specified hyperparameters
svd = SVD(n_factors= 20, reg_all=0.02)
# fit the SVD model to the dataset
svd.fit(dataset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f21e74f10d0>

The code we just did initializes an SVD model with specific hyperparameters and then trains the model on the provided dataset. The trained SVD model can be used for various tasks, such as making personalized recommendations based on user-item interactions.

In [17]:
# using the model, we'll try and make a rating prediction of user 15, on restaurant with id "Pns2l4eNsfO8kk83dixA6A"
svd.predict("15", "Pns2l4eNsfO8kk83dixA6A")

Prediction(uid='15', iid='Pns2l4eNsfO8kk83dixA6A', r_ui=None, est=3.8457184924429697, details={'was_impossible': False})

This below  allows a user to interactively rate restaurants by providing their ratings for a specified number of restaurants, and it collects this information in a list for further analysis or use in a recommendation system. The code also considers the restaurant category for selecting restaurants to rate if a category is provided.

In [None]:
def collect_ratings(df, num_samples=5):
    sampled_restaurants = df.sample(n=num_samples)
    ratings = []
    for _, row in sampled_restaurants.iterrows():
        restaurant_id = row['business_id']
        print(f"Please rate {row['name']} on a scale of 1 to 5:")
        rating = int(input())
        ratings.append((restaurant_id, rating))
    return ratings

In [None]:
def recommend_restaurants(user_id, rated_restaurants, all_restaurants_df=trial, state=None):
       
    # Get all restaurant IDs
    all_restaurants_df= all_restaurants_df[all_restaurants_df["state"]==state]
    all_restaurant_ids = all_restaurants_df['business_id'].unique()

    # Filter out the restaurants that the user has already rated
    unrated_restaurants = [rid for rid in all_restaurant_ids if rid not in [rid for rid, _ in rated_restaurants]]

    # Predict ratings for all unrated restaurants
    predictions = [svd.predict(user_id, rid) for rid in unrated_restaurants]
    

    # Create a DataFrame for the predictions
    pred_df = pd.DataFrame({
        'business_id': [pred.iid for pred in predictions],
        'predicted_rating': [pred.est for pred in predictions]
    })

    # Merge with the original restaurants DataFrame to get more information
    recommendations = pred_df.merge(all_restaurants_df, on='business_id', how='left')

    # Sort by predicted rating and get top recommendations
    recommendations = recommendations.sort_values(by='predicted_rating', ascending=False)
    
    return recommendations

In [None]:
def recommend_businesses(user_id, n=5):
    # Get all unique business IDs
    all_business_ids = df_collab['business_id'].unique()
    
    # Predict ratings for all businesses the user hasn't rated yet
    user_rated_businesses = df_collab[df_collab['user_id'] == user_id]['business_id']
    recommendations = []
    
    for business_id in all_business_ids:
        if business_id not in user_rated_businesses.values:
            pred = model.predict(user_id, business_id)
            recommendations.append((business_id, pred.est))
    
    # Sort by estimated rating and return top-n
    recommendations = sorted(recommendations, key=lambda x: x[1], reverse=True)
    return recommendations[:n]



In [None]:
# Collect ratings from the user
print("You will be asked to rate 5 random restaurants.")
user_ratings = collect_ratings(restaurant_data)

user_id = 'user_1'
recommended_restaurants = recommend_restaurants(user_id=user_id, rated_restaurants=user_ratings, state= "Pennsylvania").drop_duplicates(subset='name')[:20]
recommended_restaurants.head()

In [None]:
import requests

In [None]:
# Load your restaurant data
df = pd.read_csv('data/filtered_restaurants_data.csv')


# Your Yelp API key
API_KEY = 'QO9XAZfxn80KoHc2rPOj9iEhWK2r8EJXfLNH_Q1F2O04d3XpAvdxFiX0Bz1wKge_hR0IMLsbsn2-ObSe0uTx5EWttuS_Yy_6wYvew5D0GXBGru_BV2OkyQDUlQOyZnYx'

# Yelp Business Endpoint
YELP_BUSINESS_URL = "https://api.yelp.com/v3/businesses/"

# Headers for the API request
headers = {
    'Authorization': f'Bearer {API_KEY}',
}

def get_business_image_urls(business_id):
    response = requests.get(f'{YELP_BUSINESS_URL}{business_id}', headers=headers)
    
    if response.status_code == 200:
        business_data = response.json()
        # Extract the image URLs
        image_urls = business_data.get('photos', [])
        return image_urls
    elif response.status_code == 429:
        st.error("Rate limit exceeded. Please try again later.")
    else:
        st.error(f"Failed to retrieve data for Business ID: {business_id}, Status Code: {response.status_code}")
    
    return []

In [None]:
get_business_image_urls('Ep_jh1Pt4Ggyla21f-BQcQ')

### Neural Networks - Model

We will run a Keras deep neural network to implement a recommendation system and try to improve our RMSE scores by using neural networks.

> We are going to encode the user_id and business_id features into numeric integers in preparation for the deep learning model.

In [37]:
# Encoding the user_id column
user_encoder = LabelEncoder()                                    # instantiating the encoder
data['userId'] = user_encoder.fit_transform(data.user_id.values) # fitting and transforming the encoder to our column
n_users=data['userId'].nunique()                                 # assigning the number of users to n_user vaiable
print("Number of Users: ",n_users)

# Encoding the business_id column
item_encoder = LabelEncoder()                                          # instantiating the encoder
data['restId'] = user_encoder.fit_transform(data.business_id.values)   # fitting and transforming the encoder to our column
n_rests = data['restId'].nunique()                                  # assigning the number of restaurants to n_rests vaiable
print("Number of Restaurants: ",n_rests)

Number of Users:  220872
Number of Restaurants:  31834


> Splitting the data into training and testing sets for model evaluation.

In [38]:
# subsetting the x variable
X = data[['userId', 'restId']].values
# subsetting the y variable
y = data['rating'].values

# creating the train test splits and stratifying on basis of the y values 
# because of the uneven nature of the rating counts
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(428157, 2) (428157,)
(107040, 2) (107040,)


> Calculate the minimum and maximum ratings, which will be used to scale the output of the neural network later.

In [39]:
# Find the minimum and maximum rating
min_rating = min(data['rating'])
max_rating = max(data['rating'])

> The predicted ratings is calculated by multiplying the user and restaurant embeddings, then adding the user and restaurant bias. Therefore were are going to create user and restaurant embeddings together with bias.

In [40]:
# Number of latent factors
embedding_size = 50

> Defining user embedding

In [41]:
# User embeddings

# user input layer
user = layers.Input(shape=(1,))

# Embedding layer for calculating user latent factors of size 50
user_emb = layers.Embedding(n_users, embedding_size, embeddings_regularizer=regularizers.l2(1e-6))(user)

# Reshaping the layer to flatten the embedding vector.
user_emb = layers.Reshape((embedding_size,))(user_emb)

> Defining user bias, and reshape it.

In [42]:
# User bias

# Embedding layer
user_bias = layers.Embedding(n_users, 1, embeddings_regularizer=regularizers.l2(1e-6))(user)

# Reshapin the user bias layer
user_bias = layers.Reshape((1,))(user_bias)

> Defining restaurants embeddings

In [43]:
# restaurant embeddings

# Input layer
restaurant= layers.Input(shape=(1,))

# Embedding layer
rest_emb = layers.Embedding(n_rests, embedding_size, embeddings_regularizer=regularizers.l2(1e-6))(restaurant)

# Reshape layer
rest_emb = layers.Reshape((embedding_size,))(rest_emb)

> Defining restaurant bias, and reshape it.

In [44]:
# Restaurant bias

# Embedding layer
rest_bias = layers.Embedding(n_rests, 1, embeddings_regularizer=regularizers.l2(1e-6))(restaurant)

# Reshape layer
rest_bias = layers.Reshape((1,))(rest_bias)

> After defining the embedding and bias layers, the predicted rating is calculated by dot product of the user and restaurant embeddings and then adding the bias values in order to get more accurate ratings.

In [45]:
# Dot product of the user and restaurant embeddings
rating = layers.Concatenate()([user_emb, rest_emb])

# Add biases to the ratings
# Adding the user and restaurant bias to the predicted rating
rating = layers.Add()([rating, user_bias, rest_bias])

> We move on to pass the calculated rating to layers of dense networks and finally converting the rating score from binary values into a range of 1-5. 

We create our baseline model.

In [48]:

# first dense layer of 30 nodes with relu activation
rating = layers.Dense(30, activation='relu')(rating)

# second dense layer of 15 nodes
rating = layers.Dense(15, activation='relu')(rating)

# output layer with one node that produces values between 0 and 1 due to the sigmoid activation
rating = layers.Dense(1, activation='sigmoid')(rating)
# rating= layers.Dense(5, activation='softmax')(rating)

# Scales the predicted ratings to a range of 1 - 5
rating = layers.Lambda(lambda x:x*(max_rating - min_rating) + min_rating)(rating)


# Baseline Model 
baseline_model = models.Model([user, restaurant], rating)

# Compile the model
baseline_model.compile( optimizer='sgd', loss='mse',  metrics=[metrics.RootMeanSquaredError()])

# training the model
baseline_model .fit(x=[X_train[:,0], X_train[:,1]], y=y_train,
                    batch_size=256, 
                    epochs=10, 
                    verbose=1,
                    validation_data=([X_test[:,0], X_test[:,1]], y_test))

Epoch 1/10
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m246s[0m 146ms/step - loss: 2.2653 - root_mean_squared_error: 1.5015 - val_loss: 2.2630 - val_root_mean_squared_error: 1.5007
Epoch 2/10
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m248s[0m 148ms/step - loss: 2.2557 - root_mean_squared_error: 1.4983 - val_loss: 2.2616 - val_root_mean_squared_error: 1.5003
Epoch 3/10
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m245s[0m 138ms/step - loss: 2.2602 - root_mean_squared_error: 1.4998 - val_loss: 2.2623 - val_root_mean_squared_error: 1.5005
Epoch 4/10
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m341s[0m 185ms/step - loss: 2.2639 - root_mean_squared_error: 1.5011 - val_loss: 2.2614 - val_root_mean_squared_error: 1.5002
Epoch 5/10
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m275s[0m 157ms/step - loss: 2.2591 - root_mean_squared_error: 1.4994 - val_loss: 2.2607 - val_root_mean_squared_error: 1.5000
Epoch 6/10

<keras.src.callbacks.history.History at 0x7f21fe2e3950>

> Our baseline model, does not overfit since the training RMSE score and the validation scores are not far off. We then proceed to tune the model in order to get better rmse scores, by reducing the model complexity.

In [50]:

rating = layers.Concatenate()([user_emb, rest_emb])
rating = layers.Add()([rating, user_bias, rest_bias])

# redusing the first dense layer into 15 neurons and adding a l2 regularization
rating = layers.Dense(15, activation='relu',kernel_regularizer=regularizers.l2(1e-3))(rating)
# creating a dropout layer
rating = layers.Dropout(0.3)(rating)
# output layer
rating = layers.Dense(1, activation='sigmoid')(rating)
#convertion of output rating
rating = layers.Lambda(lambda x:x*(max_rating - min_rating) + min_rating)(rating)

model_1 = models.Model([user, restaurant], rating)

# Compile the model
model_1.compile( optimizer='sgd', loss='mse',  metrics=[metrics.RootMeanSquaredError()])

# Train the model
model_1.fit(x=[X_train[:,0], X_train[:,1]], y=y_train,
            batch_size=256,
            epochs=20, 
            verbose=1,
            validation_data=([X_test[:,0], X_test[:,1]], y_test))

Epoch 1/20


[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m313s[0m 184ms/step - loss: 2.1169 - root_mean_squared_error: 1.4413 - val_loss: 2.0627 - val_root_mean_squared_error: 1.4232
Epoch 2/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m256s[0m 153ms/step - loss: 1.8676 - root_mean_squared_error: 1.3530 - val_loss: 2.0022 - val_root_mean_squared_error: 1.4022
Epoch 3/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m321s[0m 188ms/step - loss: 1.7143 - root_mean_squared_error: 1.2955 - val_loss: 1.9576 - val_root_mean_squared_error: 1.3865
Epoch 4/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m308s[0m 180ms/step - loss: 1.5578 - root_mean_squared_error: 1.2340 - val_loss: 1.9376 - val_root_mean_squared_error: 1.3795
Epoch 5/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m224s[0m 134ms/step - loss: 1.4274 - root_mean_squared_error: 1.1802 - val_loss: 1.9344 - val_root_mean_squared_error: 1.3785
Epoch 6/20
[1m1673/1

<keras.src.callbacks.history.History at 0x7f21f98c1b50>

> The second model has performed worse than the first with a higher rmse score and the model is overfitting the training data i.e it has a good train score but poor validation score.

we will try and simplify the model further. 

In [51]:

rating = layers.Concatenate()([user_emb, rest_emb])
# Adds the user and restaurant embedding to the dot product of the embeddings
rating = layers.Add()([rating, user_bias, rest_bias])

# reducing the first layer further to 10 node
rating = layers.Dense(10, activation='relu')(rating)
# increasing the dropout rate to 0.2
rating = layers.Dropout(0.6)(rating)
# output layer
rating = layers.Dense(1, activation='sigmoid')(rating)
# conertion of output rating
rating = layers.Lambda(lambda x:x*(max_rating - min_rating) + min_rating)(rating)

model_2 = models.Model([user, restaurant], rating)

# Compile the model
model_2.compile( optimizer= 'sgd',
                loss='mse', 
                metrics= [metrics.RootMeanSquaredError()])

# Train the model
model_2.fit(x=[X_train[:,0], X_train[:,1]], y=y_train,
            batch_size=256, 
            epochs=20, 
            verbose=1,
            validation_data=([X_test[:,0], X_test[:,1]], y_test))

Epoch 1/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m237s[0m 141ms/step - loss: 1.6052 - root_mean_squared_error: 1.2586 - val_loss: 1.9195 - val_root_mean_squared_error: 1.3816
Epoch 2/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m250s[0m 133ms/step - loss: 1.1682 - root_mean_squared_error: 1.0758 - val_loss: 1.9942 - val_root_mean_squared_error: 1.4083
Epoch 3/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m263s[0m 134ms/step - loss: 1.1277 - root_mean_squared_error: 1.0568 - val_loss: 2.0187 - val_root_mean_squared_error: 1.4170
Epoch 4/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m261s[0m 133ms/step - loss: 1.0749 - root_mean_squared_error: 1.0315 - val_loss: 2.0266 - val_root_mean_squared_error: 1.4198
Epoch 5/20
[1m1673/1673[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m262s[0m 133ms/step - loss: 1.0261 - root_mean_squared_error: 1.0076 - val_loss: 2.0196 - val_root_mean_squared_error: 1.4173
Epoch 6/20

<keras.src.callbacks.history.History at 0x7f22018e22d0>

> The third model has further overfitted the training data as it has high validation score and low training score.
Therefore our best neural model is baseline model which has a validation score of 1.3179.

In [52]:
# evaluating the best model on the training data
print("Training data: ")
print(baseline_model.evaluate([X_train[:,0], X_train[:,1]], y_train))

# evaluating the best model on the test data
print("Testing data: ")
print(baseline_model.evaluate([X_test[:,0], X_test[:,1]], y_test))

Training data: 
[1m13380/13380[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m359s[0m 27ms/step - loss: 1.1958 - root_mean_squared_error: 1.0885
[1.1989389657974243, 1.089964509010315]
Testing data: 
[1m3345/3345[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m90s[0m 27ms/step - loss: 1.7751 - root_mean_squared_error: 1.3282
[1.779281497001648, 1.3298012018203735]


> The baseline model has a training RMSE of 1.1635 and a test RMSE of 1.302 hence being our better neural networks model with the lowest test scores.

In all the models SVD has emerged to be the best RMSE score of 1.25