# Collaborative Filtering

### Notations
Imagine we are to predict the rating for each movie given by a user.

| Notation | Description |
| :- | :- |
| $n_u$ | Number of users |
| $n_m$ | Number of movies |
| $r(i, j)$ | It equals 1 if user "j" has rated move "i" |
| $y^{(i, j)}$ | Rating of user "j" for the movie "i" |
| $n$ | Number of features for each movie |
| $w^{(j)}$ | Weights of features for user "j" |
| $x^{(i)}$ | Features for movie "i" |
| $b^{(j)}$ | Intersection for user "j" |

Given a dataset like below, we can use collaborative filtering to use all the users' rating to predict future ratings.

| Movie | User1 | User2 | User3 | User4 | User5 | Feature1 ($x_1$) | Feature2 ($x_2$) |
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| Movie1 | 5 | 5 | 0 | 0 | ? | 0.9 | 0 |
| Movie2 | 5 | ? | ? | 0 | ? | 1.0 | 0.01 |
| Movie3 | ? | 4 | 0 | ? | ? | 0.99 | 0 |
| Movie4 | 0 | 0 | 5 | 4 | ? | 0.1 | 1.0 |
| Movie5 | 0 | 0 | 5 | ? | ? | 0 | 0.9 |

In [1]:
import numpy as np
import tensorflow as tf

### Function
The function is like a linear function with a difference that we have three variables<br>
$
f_{W, b, X}(W^{(j)}, b^{(j)}, X^{(i)}) = W^{(j)}.X^{(i)} + b^{(j)}
$
<br>
If the function is mean normalized: <br>
$
f_{W, b, X}(W^{(j)}, b^{(j)}, X^{(i)}) = W^{(j)}.X^{(i)} + b^{(j)} - \mu^{(i)}
$
<br>
where $\mu$ is a vector with $n_m$ rows. (Mean normalization could be calculated in a way to have $n_u$ rows i.e. be calculated via taking the mean of each column rather than row)

### Cost function
Now instead of learning parameters $W$ and $b$ we also have to learn $X$. <br>
$
\begin{equation}
\displaystyle
    J(W, b, X) = \frac{1}{2} \sum_{(i, j):r(i, j)=1}(W^{(j)}.X^{(i)} + b^{(j)} - y^{(i, j)})^2  + \frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^{n}(W_{k}^{(j)})^2 + \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(X_{k}^{(i)})^2
\end{equation}
$
<br>
We can write this in another way <br>
$
\begin{equation}
\displaystyle
    J(W, b, X) = \frac{1}{2} \sum_{i=1}^{n_m}\sum_{j=1}^{n_u}r(i, j)\times(W^{(j)}.X^{(i)} + b^{(j)} - y^{(i, j)})^2  + \frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^{n}(W_{k}^{(j)})^2 + \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(X_{k}^{(i)})^2
\end{equation}
$

### Gradient descent
Repeat {<br>
    $W_{i}^{(j)} = W_{i}^{(j)} - \alpha\displaystyle\frac{\partial}{\partial W_{i}^{(j)}}J(W, b, X)$ <br>
    $b^{(j)} = b^{(j)} - \alpha\displaystyle\frac{\partial}{\partial b^{(j)}}J(W, b, X)$ <br>
    $X_{k}^{(j)} = X_{k}^{(j)} - \alpha\displaystyle\frac{\partial}{\partial X_{k}^{(j)}}J(W, b, X)$ <br>
}

In [2]:
def cf_func(x, w, b):
    return np.matmul(x, w.T) + b

In [3]:
def cf_cost(X, W, b, Y, R, lambda_):
    nm, nu = Y.shape
    J = 0
    nm, n = X.shape
    nu, _ = W.shape
    
    for j in range(nu):
        for i in range(nm):
            J += R[i, j] * (1/2) * (cf_func(W[j], X[i], b[0, j]) - Y[i, j]) ** 2
    
    for j in range(nu):
        for k in range(n):
            J += (lambda_ / 2 ) * W[j, k] ** 2
            
    for i in range(nm):
        for k in range(n):
            J += (lambda_ / 2) * X[i, k] ** 2

    return J

Function below is vectorized implementation of the code (refrence "Coursera, unsupervised learning, recommenders, reinforcement-learning")

In [4]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

Since finding derivative of the cost function seems difficult, we use "auto diff" tools from tensorflow.

In [5]:
def mean_normalization(X):
    mu = X.sum(axis=1) / X.shape[1]
    mu = mu.reshape(-1, 1)
    X = X - mu.reshape(-1, 1)
    return X, mu

In [6]:
def gradient_descent(n, Y, R, alpha=0.1, iterations=200, lambda_=0):
    rows, columns = Y.shape
    X = tf.Variable(tf.random.normal((rows, n), dtype=tf.dtypes.float64), name='X')
    W = tf.Variable(tf.random.normal((columns, n), dtype=tf.dtypes.float64), name='W')
    b = tf.Variable(tf.random.normal((1, columns), dtype=tf.dtypes.float64), name='b')
    optimizer = tf.keras.optimizers.Adam(alpha)
    for i in range(iterations):
        with tf.GradientTape() as tape:
            cost_val = cofi_cost_func_v(X, W, b, Y, R, lambda_)
            
        grads = tape.gradient(cost_val, [X, W, b])
        optimizer.apply_gradients(zip(grads, [X, W, b]))
        
        print(f'Iteration {i + 1} cost:{cost_val}')
    
    return X.numpy(), W.numpy(), b.numpy()

## Example

In [7]:
import pandas as pd
df = pd.read_csv('./ratings_small.csv')
df = pd.pivot_table(df, values='rating', index='movieId', columns='userId', fill_value=0)
Y = df.values
R = np.zeros(Y.shape)
R[Y != 0] = 1

In [8]:
Y, mu = mean_normalization(Y)
X, W, b = gradient_descent(15, Y, R, alpha=0.1, iterations=600, lambda_=0)

Iteration 1 cost:1403556.221024639
Iteration 2 cost:1100561.9176381542
Iteration 3 cost:878347.1311303538
Iteration 4 cost:714357.0503925778
Iteration 5 cost:590730.78205938
Iteration 6 cost:494257.2029549968
Iteration 7 cost:415559.6094436392
Iteration 8 cost:348411.9797136605
Iteration 9 cost:289271.10504924564
Iteration 10 cost:236930.36829732775
Iteration 11 cost:192038.27715424774
Iteration 12 cost:156152.9131740852
Iteration 13 cost:130258.632585889
Iteration 14 cost:113392.20816523414
Iteration 15 cost:102535.88309053582
Iteration 16 cost:94049.81870032329
Iteration 17 cost:85544.70275244408
Iteration 18 cost:76614.55023186126
Iteration 19 cost:68157.84400366741
Iteration 20 cost:61220.68214996153
Iteration 21 cost:56245.23292548319
Iteration 22 cost:52975.246940576966
Iteration 23 cost:50770.1549833352
Iteration 24 cost:48975.904032261766
Iteration 25 cost:47150.226633782666
Iteration 26 cost:45118.40190031382
Iteration 27 cost:42920.918275247495
Iteration 28 cost:40717.2311918

Iteration 221 cost:10553.668713316249
Iteration 222 cost:10537.011563050992
Iteration 223 cost:10520.507125157119
Iteration 224 cost:10504.153172597196
Iteration 225 cost:10487.948226133767
Iteration 226 cost:10471.890257614652
Iteration 227 cost:10455.976933688678
Iteration 228 cost:10440.206742530521
Iteration 229 cost:10424.579009423986
Iteration 230 cost:10409.09341775834
Iteration 231 cost:10393.744985101826
Iteration 232 cost:10378.523550914651
Iteration 233 cost:10363.436585413921
Iteration 234 cost:10348.49137919311
Iteration 235 cost:10333.664485562009
Iteration 236 cost:10318.960051914593
Iteration 237 cost:10304.40059383055
Iteration 238 cost:10289.969459911978
Iteration 239 cost:10275.648889179294
Iteration 240 cost:10261.440067321146
Iteration 241 cost:10247.36065912428
Iteration 242 cost:10233.407789359495
Iteration 243 cost:10219.559973346868
Iteration 244 cost:10205.833278636424
Iteration 245 cost:10192.224123723261
Iteration 246 cost:10178.718931278474
Iteration 247 co

Iteration 444 cost:8702.97534366432
Iteration 445 cost:8698.854987601206
Iteration 446 cost:8694.74986288888
Iteration 447 cost:8690.697045859779
Iteration 448 cost:8686.650808375776
Iteration 449 cost:8682.582761006879
Iteration 450 cost:8678.538426508248
Iteration 451 cost:8674.503878940712
Iteration 452 cost:8670.506066265363
Iteration 453 cost:8666.55744926879
Iteration 454 cost:8662.60348431571
Iteration 455 cost:8658.658286593345
Iteration 456 cost:8654.733482788653
Iteration 457 cost:8650.833728540734
Iteration 458 cost:8646.972983851356
Iteration 459 cost:8643.111998258093
Iteration 460 cost:8639.258596623895
Iteration 461 cost:8635.440224264705
Iteration 462 cost:8631.631510206053
Iteration 463 cost:8627.831106550151
Iteration 464 cost:8624.043842231267
Iteration 465 cost:8620.268523575452
Iteration 466 cost:8616.51316090413
Iteration 467 cost:8612.769509710037
Iteration 468 cost:8609.046741155513
Iteration 469 cost:8605.361725733794
Iteration 470 cost:8601.693602527455
Iterat

In [9]:
predicted = cf_func(X, W, b) + mu
predicted = predicted.astype(float)
predicted = np.round(predicted, 1)
predicted_df = pd.DataFrame(predicted)
predicted_df.columns.name = 'Users'
predicted_df.index.name = 'Movies'
predicted_df

Users,0,1,2,3,4,5,6,7,8,9,...,661,662,663,664,665,666,667,668,669,670
Movies,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,5.2,4.2,3.0,4.4,4.3,1.4,3.4,4.0,3.6,3.8,...,3.6,3.8,4.0,4.0,3.4,3.8,4.1,3.1,4.2,4.5
1,3.8,3.0,3.2,4.3,4.0,4.1,2.0,3.7,3.0,4.0,...,3.8,3.6,3.4,3.2,4.0,3.2,5.9,3.2,5.0,3.8
2,5.0,3.3,1.3,3.5,3.6,6.3,3.3,3.3,2.9,0.6,...,0.7,4.7,3.0,3.8,-6.1,4.7,3.8,3.7,7.7,2.7
3,7.6,8.2,-3.7,-1.7,3.3,19.0,1.5,4.2,-0.8,5.7,...,3.4,3.1,2.2,4.8,2.5,-1.9,2.5,-4.3,-17.2,8.5
4,4.6,6.6,3.3,3.5,3.0,7.7,1.4,2.5,3.9,1.5,...,2.0,4.3,3.2,3.6,1.7,2.0,2.8,1.3,1.4,4.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9061,8.1,6.1,1.1,7.0,4.5,-3.8,4.0,3.4,3.5,7.4,...,6.0,5.4,3.6,4.3,1.6,2.9,4.0,3.0,-5.6,5.3
9062,0.1,6.2,5.8,5.9,2.8,0.7,4.7,2.2,4.1,5.2,...,2.8,2.9,3.4,1.9,6.4,1.8,0.9,4.9,-0.1,3.9
9063,3.4,5.7,0.2,3.5,4.7,1.5,2.7,6.7,-1.2,3.8,...,2.1,4.0,4.2,5.5,1.3,0.4,2.6,-1.0,1.3,5.5
9064,6.1,4.7,5.4,-1.0,2.0,14.1,0.8,4.6,1.7,2.5,...,-2.1,0.1,3.3,0.8,6.2,3.3,-0.1,-1.0,3.3,3.8


In [10]:
df

userId,1,2,3,4,5,6,7,8,9,10,...,662,663,664,665,666,667,668,669,670,671
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0,0.0,0,0.0,0.0,3,0.0,4,0,...,0,4.0,3.5,0,0,0,0,0,4,5.0
2,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0,...,5,0.0,0.0,3,0,0,0,0,0,0.0
3,0.0,0,0.0,0,4.0,0.0,0,0.0,0,0,...,0,0.0,0.0,3,0,0,0,0,0,0.0
4,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0.0
5,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0,...,0,0.0,0.0,3,0,0,0,0,0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
161944,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0.0
162376,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0.0
162542,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0.0
162672,0.0,0,0.0,0,0.0,0.0,0,0.0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0.0
