# Collaborative Filtering

Busco modelar la calificacion del usuario j al item i, dado un conjunto de clasificaciones de otros usuarios.

> $y(i,j) = w^j*x^i+b^j$ Es la calificacion del user j al item i.

Entonces 
- $X^i$: vector del item
- $W^j$, $B^j$: vectores del usuario

Obs. $W^j$, $X^i$ tienen misma dimension.

Por lo tanto, busco $W$, $X$, $B$ para minimizar el error J.

En particular, cuando la calificacion no es binaria, el error es muy parecido a varios modelos de aprendizaje supervizado no binarios:

> $J(W,B,X) = 1/2\sum_{(i,j):r(i,j)=1}(W^j*X^i+B^j-Y^{(i,j)})^2 + \lambda/2(\sum^{nU}_{j}\sum_{k}^{n}(w_k^j)^2 + \sum^{nI}_{i}\sum_{k}^{n}(x_k^i)^2)$

Siendo $r(i,j) | r(i,j) = 1 <=> $ el item i fue clasificado por el usuario j. 

Cuando la clasificacion es binaria (like, follow, click, etc), en cambio es: 

> $J(W,B,X) = \sum_{(i,j):r(i,j)=1}L(f^{*}_{w,b,x}(x)-Y^{(i,j)})$

> $L(f^*_{w,b,x}(x),y^{i,j}) = -y^{(i,j)}*log(f^{*}(x))-(1-y^{(i,j)})*log(1-f^{*}(x))$

> $f^{*}_{w,b,x}(x) = g(W^j*X^i+B^j)$

> $g(z) = \frac{1}{1 + e^{-z}}$

### Implementacion ejemplo

Voy a hacer una implementacion para recomendar items en base a calificaciones (de -10 a +10) que hicieron usuarios sobre distintos chistes. 

- Kaggle Dataset: https://www.kaggle.com/datasets/aakaashjois/jester-collaborative-filtering-dataset?resource=download

In [62]:
import pandas as pd

PATH_TO_DATASET = r"joke-rating\UserRatings1.csv"
df = pd.read_csv(PATH_TO_DATASET)

In [63]:
df.head(5)

Unnamed: 0,JokeId,User1,User2,User3,User4,User5,User6,User7,User8,User9,...,User36701,User36702,User36703,User36704,User36705,User36706,User36707,User36708,User36709,User36710
0,0,5.1,-8.79,-3.5,7.14,-8.79,9.22,-4.03,3.11,-3.64,...,,,,,,,,,2.91,
1,1,4.9,-0.87,-2.91,-3.88,-0.58,9.37,-1.55,0.92,-3.35,...,,,,-5.63,,-6.07,,-1.6,-4.56,
2,2,1.75,1.99,-2.18,-3.06,-0.58,-3.93,-3.64,7.52,-6.46,...,,,,,,4.08,,,8.98,
3,3,-4.17,-4.61,-0.1,0.05,8.98,9.27,-6.99,0.49,-3.4,...,,,,,,,,,,
4,4,5.15,5.39,7.52,6.26,7.67,3.45,5.44,-0.58,1.26,...,2.28,-0.49,5.1,-0.29,-3.54,-1.36,7.48,-5.78,0.73,2.62


In [64]:
num_users = 100
num_jokes = 100
num_features = 10

La idea es poder recomendar chistes similares a las personas segun sus gustos.

In [81]:
# Inicializo las variables W, B, X con valores aleatorios y a R segun si el usuario califico o no el chiste.
# Tambien inicializo a Y

import numpy as np
import tensorflow as tf
from tensorflow import keras

W = tf.Variable(tf.random.normal((num_users, num_features), dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_jokes, num_features), dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

Y = df.to_numpy() # de esta forma solo devuelve 100 filas
Y = Y [0:num_jokes, 1:num_users+1]

R = np.zeros([num_jokes, num_users])

for j in range(Y.shape[1]):
    for i in range(Y.shape[0]):
        R[j,i] = int (Y[j,i] is not np.nan) 

R.shape, Y.shape

((100, 100), (100, 100))

In [66]:
## Normalizo a Y
Ynorm = Y / np.linalg.norm(Y, 1)

### Defino la funcion error

In [120]:
def J_ERROR (W, B, X, Y, R, lambda_=0.1):
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

In [68]:
# Instancio un optimizador
lambda_ = 1e-1
optimizer = keras.optimizers.Adam(learning_rate=lambda_)

In [121]:
iterations = 1000
for iter in range(iterations):
    with tf.GradientTape() as tape:
        cost_value = J_ERROR (W, b, X, Ynorm, R, lambda_)
    grads = tape.gradient( cost_value, [X,W,b] )
    optimizer.apply_gradients( zip(grads, [X,W,b]) )
    if iter % 100 == 0:
        print (f"+ #{iter} Loss: {cost_value}")

+ #0 Loss: 0.15501157403958468
+ #100 Loss: 0.1545140066790011
+ #200 Loss: 0.15419706048975237
+ #300 Loss: 0.15399174690512168
+ #400 Loss: 0.15385646539706097
+ #500 Loss: 0.15376588744369052
+ #600 Loss: 0.15370437566259484
+ #700 Loss: 0.15385297908258783
+ #800 Loss: 0.15363273990525242
+ #900 Loss: 0.15414246824539324


In [156]:
# Armo la matriz de predicciones
preds = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()
preds += Ynorm

preds

array([[ 7.06819757e+00,  3.59843867e+00, -1.43783275e+00, ...,
        -8.54084165e-01,  2.00316317e-01,  7.08969457e-01],
       [-9.70526431e-02, -7.72529700e-01,  3.49686450e+00, ...,
         8.55165902e+00, -1.39149536e+00, -5.94163142e+00],
       [-2.07225659e+00, -1.76538034e+00, -2.76461024e-01, ...,
        -3.71469927e+00, -6.95666572e-01, -3.01465960e+00],
       ...,
       [-2.28643680e-01,  3.00586026e+00, -2.86884339e+00, ...,
        -3.62887059e+00,  1.70803215e+00, -6.85146832e-01],
       [ 3.19796524e+00,  1.17722896e+00, -1.49379193e+00, ...,
         2.16716152e+00, -9.94588596e-01,  1.37811044e+00],
       [-6.65866373e+00, -3.11564238e+00,  6.25193232e-03, ...,
         4.55544559e+00, -1.37444689e+00,  3.57848081e-01]])

In [174]:
def compare_model_rating (joke_i, user_j):
    if R[joke_i, user_j] == 0:
        print ("+ El usuario no habia clasificado el chiste. ")
        print ("+ El modelo predice un rating de ", round(preds[joke_i,user_j], 2))
    else: 
        print ("+ User: ", Y[joke_i,user_j])
        print ("+ Model: ", round(preds[joke_i,user_j], 2))
        print ("+ Error: ", round(100*abs((Y[joke_i,user_j] - preds[joke_i,user_j])/20), 1), "%")

compare_model_rating(17, 22)

+ User:  -1.26
+ Model:  -2.9
+ Error:  8.2 %


### Armo una clase que haga todo lo que esta codificado de forma organizada

In [175]:

class JokeRecommender ():

    def __init__ (self, num_users, num_jokes):

        # Inicio variables del modelo
        self.W = tf.Variable(tf.random.normal((num_users, num_features), dtype=tf.float64),  name='W')
        self.X = tf.Variable(tf.random.normal((num_jokes, num_features), dtype=tf.float64),  name='X')
        self.b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

        # Leo Y
        Y = df.to_numpy() # devuelve 100 filas solo
        self.Y = Y [0:num_jokes, 1:num_users+1]

        # Armo matriz R
        self.R = np.zeros([num_jokes, num_users])
        for j in range(self.Y.shape[1]):
            for i in range(self.Y.shape[0]):
                self.R[j,i] = int (Y[j,i] is not np.nan) 

        ## Creo Y normalizado
        self.Ymean = np.linalg.norm(Y, 1)
        self.Ynorm = self.Y / self.Ymean

        # Creo la matriz que va a guardar las predicciones del modelo entrenado
        self.preds = np.zeros([num_jokes, num_users])
        self.lambda_ = 1e-1
        self.optimizer = keras.optimizers.Adam(learning_rate=lambda_)


    def fit (self, _iterations):
        iterations = _iterations
        for iter in range(iterations):
            with tf.GradientTape() as tape:
                cost_value = J_ERROR (self.W, self.b, self.X, self.Ynorm, self.R, self.lambda_)
            grads = tape.gradient( cost_value, [self.X, self.W, self.b] )
            self.optimizer.apply_gradients( zip(grads, [self.X, self.W, self.b]) )
            if iter % 100 == 0:
                print (f"+ #{iter} Loss: {cost_value}")

        # Actualizo la matriz de predicciones
        self.preds = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

    def predict (self, joke_i, user_j):
        return self.preds[i, j]

In [177]:
myrecomm = JokeRecommender(100, 100)
myrecomm.fit(100)
myrecomm.preds

+ #0 Loss: nan


array([[ 7.06180995,  3.60944792, -1.43344909, ..., -0.85427204,
         0.19454241,  0.70045264],
       [-0.10318976, -0.77144005,  3.5005092 , ...,  8.55147115,
        -1.39818357, -5.9470421 ],
       [-2.07444842, -1.76787276, -0.27373063, ..., -3.71531299,
        -0.70380765, -3.02548098],
       ...,
       [-0.22840571,  3.00068755, -2.86021385, ..., -3.62850737,
         1.713017  , -0.67705585],
       [ 3.1938947 ,  1.17953351, -1.49087367, ...,  2.16053594,
        -0.99361167,  1.37069579],
       [-6.66413704, -3.1193497 ,  0.01549519, ...,  4.5648141 ,
        -1.36587997,  0.34811636]])