# Matrix Factorization

In this task you are supposed to (manually) implement the matrix factorization variant you learned in the Data Cleaning chapter using the `numpy` library.

In [38]:
import numpy as np

We continue the scenario from the tutorials.

Assume that you have a ginormous database $D$ of three users and three movies and ratings provided by some users to some movies, which we represent as a matrix, where the entry $D_{ij}$ represents the rating user $i$ gave to movie $j$.
Since not all users have rated movies, and the rating ranges from 1 to 5, we encode missing ratings as 0.

In [39]:
# missing values encoded as 0
D = [
     [3,1,0],
     [1,0,3],
     [0,3,5],
    ]
D = np.array(D)

N = len(D)
M = len(D[0])

First, randomly initialize the two factors $E$ and $A$ for $f=2$ latent features. For evaluating the correctness of your results from the tutorial, you may *additionally* provide hard-coded inital factors as they have been provided in the tutorial.

In [40]:
# number of latent features
f = 2

# TODO your code goes here
E = np.random.random((N, f))
A = np.random.random((f, M))

Implement a function that takes the data matrix $D$, the inital factors $E, A$, the number of epochs (iterations), the learning rate $\eta$, and performs the factorization of $D$. Use a default number of 5000 for the epochs and 0.001 for $\eta$.

Updates to $E$ and $A$ are applied immediately. $\tilde{D}$ is updated after an entry from D was completely dealt with. Update ordered by latent features and E before A.

In [41]:
# TODO your code goes here
def matrix_factorization(D, E, A, epochs = 5000, eta = 0.001):
    N = len(D)
    M = len(D[0])
    f = len(A)
    for _ in range(epochs):
        for k in range(f):
            for n in range(N):
                for m in range(M):
                    if D[n][m] == 0:
                        continue
                    E[n][k] = E[n][k] + eta * 2* (D[n][m] - sum(E[n]*A[:,m])) * A[k,m]
                    A[k][m] = A[k][m] + eta * 2* (D[n][m] - sum(E[n]*A[:,m])) * E[n,k]
    for i in range(N):
        for j in range(M):
            if D[i][j] == 0:
                D[i][j] = sum(E[i] * A[:,j])
    return D

Now test your matrix factorization for the parameters specified above.

In [42]:
# TODO your code goes here
print(matrix_factorization(D, E, A))

[[3 1 4]
 [1 2 3]
 [2 3 5]]
