# Importing libraries and loading prepared data

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from data_preparation import *
books, ratings, users, Y, R = prepared_data()

# Creating model parameters and prediction function

Let's set the initial number of dimensions of X, W and B parameters vectors to 10 in order to find a good balance between complexity and overfitting.


In X matrix, the i-th row corresponds to the feature vector for the book i. Similarly, in W matrix, the j-th row corresponds to the parameter vector for user j. The B vector corresponds to the user bias. Initially, let's set these values to random.

In [2]:
num_books, num_users = Y.shape
W = tf.Variable(np.random.normal(size=(num_users, 10)).astype(np.float32))
B = tf.Variable(np.random.normal(size=(num_users)).astype(np.float32))
X = tf.Variable(np.random.normal(size=(num_books, 10)).astype(np.float32))


Predicted rating is calculated with this pattern: $x^{(i)}$ ⋅ $w^{(j)}$ + $b^{(j)}$. We count the dot product of movie feature vector and user parameter vector W and we add the user bias B.

In [3]:
def predict(users, books):
    prediction = tf.reduce_sum(tf.gather(W, users) * tf.gather(X, books), axis=1)
    prediction += tf.gather(B, users)
    return prediction

# Cost function

The collaborative filtering cost function is given by adding sum of squarred errors and regularization terms.

In [4]:
def collfilt_cost_func(Y):
    non_zero_ratings = np.nonzero(Y)
    users = non_zero_ratings[1]
    books = non_zero_ratings[0]
    ratings = Y[non_zero_ratings]

    pred = predict(users, books)
    cost = tf.reduce_mean(tf.square(pred - ratings))

    cost += tf.reduce_sum(W**2) + tf.reduce_sum(B**2) + tf.reduce_sum(X**2)

    return cost

# Mean normalization

Mean normalization makes the algorithm behave a lot better and faster. We normalize the ratings by computing the mean rating for each book and subtracting it from the ratings.

In [5]:
mean_ratings = np.nanmean(Y, axis=1, keepdims=True)
Y_normalized = Y - mean_ratings

# Gradient descent

Let's use gradient descent to minimize the cost function. I will set the learning rate to 0.1.

In [7]:
optimizer = tf.optimizers.SGD(learning_rate=0.1)

# Train the model

We will repeat fitting the parameters until convergence.

In [None]:
epsilon = 1e-6  # threshold for the change in the cost function

prev_cost = float('inf')
while True:
    with tf.GradientTape() as tape:
        cost = collfilt_cost_func(Y_normalized)
    gradients = tape.gradient(cost, [W, B, X])
    optimizer.apply_gradients(zip(gradients, [W, B, X]))
    current_cost = collfilt_cost_func(Y_normalized).numpy()
    if abs(prev_cost - current_cost) < epsilon:
        break
    prev_cost = current_cost

# Recommendation function

Let's create a function that takes a list of books and their ratings and returns a list of recommended books based on the trained model. It takes a dictionary with book titles as keys and ratings as values and converts it to numpy array. Then it replace not given ratings with mean rating for each book and reshapes the array to fit the parameters. The last steps are computing the predicted ratings, selecting the highest ones and returning a list of them.

In [9]:
def recommend_books(user_ratings):
    # Get an array of ratings from a dictionary
    user_ratings_array = create_ratings_array(user_ratings, Y)

    # Replace NaN values with the mean rating
    nan_indices = np.isnan(user_ratings)
    user_ratings[nan_indices] = np.nanmean(user_ratings)

    # Reshape user_ratings and compute the predicted ratings
    user_ratings = user_ratings.reshape(-1, 1)
    ratings_mean = np.nanmean(ratings, axis=1).reshape(-1, 1)
    pred = tf.matmul(X, W, transpose_b=True) + B + ratings_mean

    # Get the indices of the books sorted by their predicted ratings
    sorted_indices = np.argsort(-pred, axis=1)

    # Get the ISBN numbers of the books
    isbn_numbers = Y[:, 0]

    # Get the recommended books
    recommended_books = isbn_numbers[sorted_indices]

    return recommended_books