In [1]:
import numpy as np

def matrix_factorization(r, p, q, k, steps=5000, alpha=0.0005, beta=0.05):
    samples = [
        (i, j, r[i, j])
        for i in range(r.shape[0])
        for j in range(r.shape[1])
        if r[i, j] > 0
    ]

    for step in range(steps):
        for i, j, rating in samples:
            prediction = np.dot(p[i, :], q[j, :].T)
            e = (rating - prediction)

            p[i, :] += alpha * (e * q[j, :] - beta * p[i, :])
            q[j, :] += alpha * (e * p[i, :] - beta * q[j, :])

    return p, q

# With 5000 Steps
r = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 0, 4],
    [0, 1, 5, 4]
])

num_row_r, num_col_r = r.shape
k = 2  # Number of latent factors
p = np.random.rand(num_row_r, k)
q = np.random.rand(num_col_r, k)

# Train the matrix factorization
new_p, new_q = matrix_factorization(r, p, q, k, steps=5000, alpha=0.0002, beta=0.02)

# Predicted ratings
r_predicted = np.dot(new_p, new_q.T)
print("Predicted Ratings with 5000 Steps:")
print(r_predicted.round(2))

# With 10000 Steps
# Using new_p and new_q from Part 1 as starting points
new_p, new_q = matrix_factorization(r, new_p, new_q, k, steps=10000, alpha=0.0002, beta=0.02)

# Predicted ratings after 10000 steps
r_predicted = np.dot(new_p, new_q.T)
print("\nPart 2 - Predicted Ratings with 10000 Steps:")
print(r_predicted.round(2))


Predicted Ratings with 5000 Steps:
[[5.05 2.75 3.15 0.99]
 [3.89 2.13 2.64 1.  ]
 [1.09 0.76 4.94 4.88]
 [0.97 0.66 4.02 3.93]
 [2.29 1.38 4.79 4.13]]

Part 2 - Predicted Ratings with 10000 Steps:
[[4.96 2.96 3.47 1.  ]
 [3.96 2.38 2.98 1.  ]
 [1.02 0.94 5.82 4.94]
 [0.99 0.85 4.75 3.96]
 [1.34 1.06 4.95 4.  ]]


In [2]:
import numpy as np

def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):
    Q = Q.T
    for step in range(steps):
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    eij = R[i][j] - np.dot(P[i,:], Q[:,j])
                    for k in range(K):
                        P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                        Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])
    return P, Q.T


## Matrix Factorization with 5000 Steps

We'll perform matrix factorization using 5000 steps to update our P and Q matrices. We start with a given user-item ratings matrix, `R`, and initialize `P` and `Q` matrices with random values. Our goal is to predict the missing ratings in `R` by optimizing `P` and `Q`.


In [4]:
# Initial Ratings Matrix R
R = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 0, 4],
    [0, 1, 5, 4]
])

num_users, num_items = R.shape
K = 2  # Number of latent factors

# Initialize P and Q with random values
P = np.random.rand(num_users, K)
Q = np.random.rand(num_items, K)

# Perform matrix factorization
nP, nQ = matrix_factorization(R, P, Q, K, steps=5000)

# Calculate the dot product of nP and nQ for the predicted ratings matrix
nR = np.dot(nP, nQ.T)

# Display the original and predicted ratings matrices
print("Original Ratings Matrix:")
print(R)
print("\nPredicted Ratings Matrix after 5000 steps:")
print(nR.round(2))


Original Ratings Matrix:
[[5 3 0 1]
 [4 0 0 1]
 [1 1 0 5]
 [1 0 0 4]
 [0 1 5 4]]

Predicted Ratings Matrix after 5000 steps:
[[4.99 2.95 4.66 1.  ]
 [3.97 2.35 3.89 1.  ]
 [1.06 0.85 5.27 4.96]
 [0.97 0.75 4.31 3.97]
 [1.71 1.19 4.92 4.03]]


##  Matrix Factorization with 10000 Steps

To potentially improve our predictions, we'll increase the number of steps to 10000. This allows the optimization process more iterations to converge towards a better solution. We'll use the `P` and `Q` matrices obtained from above as our starting point.


In [5]:
# Perform matrix factorization with 10000 steps using the matrices from Part 1
nP, nQ = matrix_factorization(R, nP, nQ, K, steps=10000)

# Calculate the dot product of nP and nQ for the new predicted ratings matrix
nR = np.dot(nP, nQ.T)

# Display the predicted ratings matrix after 10000 steps
print("Predicted Ratings Matrix after 10000 steps:")
print(nR.round(2))


Predicted Ratings Matrix after 10000 steps:
[[4.98 2.98 4.74 1.  ]
 [3.98 2.4  3.99 1.  ]
 [1.01 0.98 5.83 4.97]
 [1.   0.9  4.81 3.98]
 [1.21 1.02 4.98 3.99]]


# Matrix Factorization for Recommendation Systems

In this Jupyter Notebook, we demonstrate the use of matrix factorization, a foundational technique in recommendation systems, to predict user ratings for movies. The goal is to fill in the missing entries in a user-item ratings matrix based on observed ratings. We accomplish this by decomposing the original ratings matrix into two lower-dimensional matrices, representing latent user preferences and item attributes.

We start with an initial set of user ratings for a selection of movies. Not all users have rated all movies, resulting in a sparse matrix. Our matrix factorization algorithm will predict ratings for the movies that each user hasn't rated, providing personalized recommendations for each user.

Let's begin by defining our matrix factorization function and setting up our initial ratings matrix.


In [8]:
import numpy as np

def matrix_factorization(r, p, q, k, steps=5000, alpha=0.0002, beta=0.02):
    samples = [
        (i, j, r[i, j])
        for i in range(r.shape[0])
        for j in range(r.shape[1])
        if r[i, j] > 0
    ]

    for step in range(steps):
        for i, j, rating in samples:
            prediction = np.dot(p[i, :], q[j, :].T)
            e = (rating - prediction)

            p[i, :] += alpha * (e * q[j, :] - beta * p[i, :])
            q[j, :] += alpha * (e * p[i, :] - beta * q[j, :])

    return p, q


## Initial Ratings Matrix and Model Training

We initialize our ratings matrix `r` with user ratings for four movies. The matrix contains zeros where a user has not rated a movie. We then train our matrix factorization model with 5000 steps to predict these missing ratings.


In [9]:
# Initial Ratings Matrix
r = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 0, 4],
    [0, 1, 5, 4]
])

num_row_r, num_col_r = r.shape
k = 2  # Number of latent factors
p = np.random.rand(num_row_r, k)
q = np.random.rand(num_col_r, k)

# Train the matrix factorization model
new_p, new_q = matrix_factorization(r, p, q, k, steps=5000, alpha=0.0002, beta=0.02)

# Predicted ratings after 5000 steps
r_predicted = np.dot(new_p, new_q.T)
print("Predicted Ratings with 5000 Steps:")
print(r_predicted.round(2))


Predicted Ratings with 5000 Steps:
[[5.08 2.44 4.27 1.09]
 [3.89 1.88 3.36 0.97]
 [1.05 0.73 3.93 4.94]
 [1.03 0.67 3.25 3.92]
 [3.16 1.68 4.8  3.99]]


## Analysis of Predictions after 5000 Steps

Above, we can see the predicted ratings matrix alongside the actual ratings matrix. The non-zero values in the actual ratings matrix closely match the corresponding values in the predicted ratings matrix, indicating that the function has performed well. For zero values in the actual ratings matrix, the predicted ratings matrix now includes values. These new values represent the model's predictions for the movies that a user has not yet rated.

To further improve the accuracy of our predictions, we extend the training with an additional 5000 steps, making a total of 10000 steps.


In [10]:
# Continue training the model with an additional 5000 steps
new_p, new_q = matrix_factorization(r, new_p, new_q, k, steps=10000, alpha=0.0002, beta=0.02)

# Predicted ratings after 10000 steps
r_predicted = np.dot(new_p, new_q.T)
print("\nPredicted Ratings with 10000 Steps:")
print(r_predicted.round(2))



Predicted Ratings with 10000 Steps:
[[4.96 2.94 4.96 1.  ]
 [3.95 2.35 4.13 1.  ]
 [1.06 0.86 5.25 4.94]
 [0.97 0.76 4.31 3.96]
 [1.66 1.17 4.92 4.01]]


## Conclusion after 10000 Steps

The non-zero values in the actual ratings matrix now match even more closely to the corresponding values in the predicted ratings matrix. By extending the training to 10000 steps, we have further improved the accuracy of the predictions. This demonstrates the effectiveness of matrix factorization in predicting missing ratings and highlights the potential of such techniques in building robust recommendation systems.
