Copyright (C) 2024 Pablo Castells y Alejandro Bellogín

El código que contiene este notebook se ha implementado para la realización de las prácticas de la asignatura "Sistemas de recomendación" del Máster en Ciencia de Datos, impartido en la Escuela Politécnica Superior de la Universidad Autónoma de Madrid. El fin del mismo, así como su uso, se ciñe a las actividades docentes de dicha asignatura.

### **Sistemas de recomendación 2024-25**
### Universidad Autónoma de Madrid, Escuela Politécnica Superior
### Máster en Ciencia de Datos

# Filtrado colaborativo con aprendizaje profundo: EASE, Two-Tower, Transformers

Fechas:

* Comienzo: martes 25 de febrero.
* Entrega: lunes 17 de marzo, 23:59.

## Objetivos

Esta práctica tiene por objetivo comprender el diseño de métodos de filtrado colaborativo mediante deep learning como transición desde un modelo bilineal típico de factorización de matrices hacia modelos neuronales de complejidad arbitraria. En este bloque se desarrollarán:

* Algoritmos de filtrado colaborativo basados en aprendizaje profundo.
* Algoritmos de filtrado colaborativo orientados a datos secuenciales.
* Métricas de evaluación de sistemas de recomendación.

## Material proporcionado

Al igual que en la P1, se proporcionan software y datos para la realización de la práctica:

* Algunas estructuras de datos ya implementadas, para manejar datos de ratings y la salida de los recomendadores.
* Un esqueleto de clases y funciones donde el estudiante desarrollará sus implementaciones. 
  - Se proporciona una celda de prueba al final de este notebook que deberá funcionar con las implementaciones del estudiante.
  - Junto a la celda de prueba en este mismo notebook, se muestra como referencia un ejemplo de salida generada con una implementación de los profesores.
* Los mismos conjuntos de datos de ratings que se usaban en la P1:
  - Dos conjuntos de juguete para prueba y depuración: <ins>toy1.csv</ins> y <ins>toy2.csv</ins> con ratings ficticios.
  - Un conjunto de datos reales de ratings a películas: *ml-1m.zip* disponible en la Web de [MovieLens](https://grouplens.org/datasets/movielens/1m). De los archivos disponibles, se utilizará sólamente <ins>ratings.dat</ins>, añadiéndole una cabecera `u::i::r::t`.
  
Los esqueletos de código que se proporcionan aquí son a modo de guía: el estudiante puede modificarlo todo libremente, siempre que la celda de prueba funcione correctamente **sin cambios**.

En concreto, si para la P1 el estudiante ya hubiera hecho cambios en alguna de estas clases, puede continuar usando dichas modificaciones.

La entrega consistirá en un fichero tipo *notebook* donde se incluirán todas las **implementaciones** solicitadas en cada ejercicio, así como una explicación de cada uno a modo de **memoria**.

La celda de prueba deberá ejecutar sin errores a la primera con el código entregado por el estudiante (naturalmente con salvedad de los ejercicios que no se hayan implementado).

## Estructuras de datos: ratings y recomendaciones

Se proporcionan:
* Una clase Ratings que permite leer los datos de un fichero de texto, así como un método que genera dos particiones (de forma <b>aleatoria</b> o <b>temporal</b>) de entrenamiento y test, para evaluar y comparar la efectividad de diferentes algoritmos de recomendación.
* Se pueden reutilizar las clases Recommender y Recommendation de la práctica anterior.

In [1]:
import numpy as np
import pandas as pd

class Ratings:
    def __init__(self, file=None, sep=','):
        if file:
            data = pd.read_csv(file, delimiter=sep, engine='python')
            u, i, r, t = data.columns[0:4]
            data.r = 1
            self.m = data.pivot(index=u, columns=i, values=r).fillna(0).to_numpy(dtype=np.float32)
            self.mt = data.pivot(index=u, columns=i, values=t).fillna(-1).to_numpy(dtype=np.float32)
            self.uids = np.sort(data[u].unique())
            self.iids = np.sort(data[i].unique())
            self.uidxs = {u:j for j, u in enumerate(self.uids)}
            self.iidxs = {i:j for j, i in enumerate(self.iids)}
            self._nratings = (self.m > 0).sum()
            self.data = data
        
    def copy(self, ratings, matrix, temp_matrix):
        self.m = matrix
        self.mt = temp_matrix
        self.uids = ratings.uids
        self.iids = ratings.iids
        self.uidxs = ratings.uidxs
        self.iidxs = ratings.iidxs
        self._nratings = (matrix > 0).sum()
        dfr = pd.DataFrame(columns=self.iids, index=self.uids, data=self.m).unstack().reset_index(name='r')
        dfr.columns = ['i', 'u', 'r']
        dft = pd.DataFrame(columns=self.iids, index=self.uids, data=self.mt).unstack().reset_index(name='t')
        dft.columns = ['i', 'u', 't']
        df_key = ['u','i']
        df = pd.concat([dfr.set_index(df_key).squeeze(), dft.set_index(df_key).squeeze()], keys = ['r','t'],axis=1).fillna(0).reset_index()
        self.data = df[df.r>0][['u', 'i', 'r', 't']].sort_values(by=['u', 'i'])
        return self
    
    def matrix(self):
        return self.m

    def temporal_matrix(self):
        return self.mt

    def nusers(self):
        return len(self.uids)
    
    def nitems(self):
        return len(self.iids)
    
    # uidx can be an int or an array-like of ints.
    def uidx_to_uid(self, uidx):
        return self.uids[uidx]
        
    # iidx can be an int or an array-like of ints.
    def iidx_to_iid(self, iidx):
        return self.iids[iidx]
    
    def uid_to_uidx(self, uid):
        return self.uidxs[uid]
        
    def iid_to_iidx(self, iid):
        return self.iidxs[iid]
        
    def iidx_rated_by(self, uidx):
        self.m[uidx].nonzero()
        
    def uidx_who_rated(self, iidx):
        self.m[:, iidx].nonzero()
        
    def random_split(self, ratio):
        mask = np.random.choice([True, False], size=self.m.shape, p=[ratio, 1-ratio])
        train = self.m * mask
        temp_train = self.mt * mask
        test = self.m * ~mask
        temp_test = self.mt * ~mask
        return Ratings().copy(self, train, temp_train), Ratings().copy(self, test, temp_test)
    
    def peruser_sequence_split(self, ntestitems=1):
        test_ids_arr = [group.sort_values(by='t', ascending=False)[['u', 'i']].to_numpy() 
                    for _, group in self.data.groupby(by='u')]
        test_ids = []
        for user_arr in test_ids_arr:
            for ids in user_arr[:ntestitems]:
                test_ids.append(ids)
        #print(test_ids)
        test_idx = np.array([[self.uid_to_uidx(uid), self.iid_to_iidx(iid)] for uid, iid in test_ids])
        mask = np.ones(self.matrix().shape)
        mask[test_idx[:, 0], test_idx[:, 1]] = 0
        train = self.m * mask
        temp_train = self.mt * mask
        test = self.m * (1-mask)
        temp_test = self.mt * (1-mask)
        return Ratings().copy(self, train, temp_train), Ratings().copy(self, test, temp_test)
    
    #
    # The remaining functions are just for debugging purposes.
    #

    def rating(self, uid, iid):
        return self.matrix()[self.uid_to_uidx(uid), self.iid_to_iidx(iid)]

    def items_rated_by(self, uid):
        return self.iidx_to_iid(self.iidx_rated_by(self.uid_to_uidx(uid)))
        
    def users_who_rated(self, iid):
        return self.uidx_to_uid(self.uidx_who_rated(self.iid_to_iidx(iid)))
    
    def user_ratings(self, uid):
        iidxs = self.matrix()[self.uid_to_uidx(uid)].nonzero()[0]
        return {self.iidx_to_iid(iidx): fround(r) for iidx, r in zip(iidxs, self.matrix()[self.uid_to_uidx(uid), iidxs])}

    def item_ratings(self, iid):
        uidxs = self.matrix()[:, self.iid_to_iidx(iid)].nonzero()[0]
        return {self.uidx_to_uid(uidx): fround(r) for uidx, r in zip(uidxs, self.matrix()[uidxs, self.iid_to_iidx(iid)])}

    def nratings(self):
        return self._nratings
 

## Ejercicio 1: EASE

Implementar un modelo de filtrado colaborativo lineal basado en autoencoders.

Observación: el parámetro _threshold_ indica a partir de qué valor se binariza la matriz de entrada, es decir, qué valores se consideran como positivos o negativos.

In [None]:
class Ease(Recommender):
    def __init__(self, training, l=20, threshold=3):
        super().__init__(training)
        # Your code here...

        self.scores = # Your code here...


### Ejercicio 1 &ndash; Explicación/documentación

(por hacer)

## Ejercicio 2: Factorización de matrices: modelo deep learning

Como alternativa a la implementación realizada en la P1 del modelo de factorización de matrices, en esta práctica vas a reformular esa implementación como un caso particular "degenerado" de arquitectura neuronal.

### Implementación en TensorFlow

Completar los huecos marcados con `# Your code here...`.

Observaciones:
* Por la estructura de datos de entrenamiento que maneja TensorFlow, entrenar con toda la matriz de ratings (incluyendo todas las celdas sin dato) es demasiado costoso. Por ello se tomará una muestra pequeña de ejemplos negativos en cada época.
* En el esqueleto que aquí se proporciona, no se genera la traza (curva) de P@10 durante el entrenamiento ya que no encaja fácilmente en el API Keras de TensorFlow.

In [None]:
import tensorflow as tf
from tqdm.keras import TqdmCallback
import time, datetime

class DLMFRecommender(Recommender):
    def __init__(self, training, k=50, lrate=0.01, nepochs=150, neg=4):
        super().__init__(training)
        # Create the model - this will directly trigger training.        
        tf.random.set_seed(0) # For comparability and debugging (the randomness here is in parameter initialization).
        self.model, self.hist = self.create_model(training, k, lrate, neg, nepochs)
        # Plot the training error and report the final test metric value (P@10).
        # Your code here...

        uexplode = np.full((training.nitems(), training.nusers()), np.arange(training.nusers())).T.flatten()
        iexplode = np.full((training.nusers(), training.nitems()), np.arange(training.nitems())).flatten()
        self.scores = self.model.predict([uexplode, iexplode], batch_size=training.nusers()*100, 
                            verbose=1).reshape(training.nusers(), training.nitems())

    def create_model(self, ratings, k, lrate, neg, nepochs):
        # 'users' is an input layer of type tf.int64.
        users = # Your code here...
        user_embeddings = # Your code here...
        # 'items' is an input layer of type tf.int64.
        items = # Your code here...
        item_embeddings = # Your code here...
        # TensorFlow has a built-in dot-product layer.
        dot = # Your code here...
        # Now we need a generic model that wraps up the "network", specifying the input and output layers.
        model = # Your code here...

        # Compile the model: Adam optimizer is suggested here over SGD.
        # Your code here...

        # Show the model topology
        model.summary()
        tf.keras.utils.plot_model(model, show_shapes=True, dpi=150)
        
        hist = self.train_model(ratings, model, neg, nepochs)
        return model, hist

    def train_model(self, ratings, tf_mf, neg, nepochs):
        # We inject 'neg' negative samples for every available rating in the training data
        nneg = neg * ratings.nratings()
        user_ids = np.concatenate(([ratings.uid_to_uidx(u) for u in ratings.data.u], np.random.choice(list(ratings.uidxs.values()), size=nneg)))
        item_ids = np.concatenate(([ratings.iid_to_iidx(u) for u in ratings.data.i], np.random.choice(list(ratings.iidxs.values()), size=nneg)))
        rs = ratings.matrix()[user_ids, item_ids]
        batch_size = ratings.nratings() + nneg # Single batch with all the data at once.
        
        # Your code here... to actually do the training
        hist = # Your code here...
            , callbacks=[TqdmCallback(verbose=0)]) # Produces a prettier progress bar.
        return hist

### Ejercicio 2 &ndash; Explicación/documentación

(por hacer)

## Ejercicio 3: Implementación modelo Two-Tower

Implementar tu propia versión de un modelo Two-Tower a partir de la arquitectura implementada de MF en el ejercicio anterior. 

In [None]:
class TwoTowerRecommender(DLMFRecommender):
    # Your code here... Puede no ser necesario re-implementar todos los métodos...

### Ejercicio 3 &ndash; Explicación/documentación

(por hacer)

## Ejercicio 4: Recomendación secuencial [ACTUALIZADO 3 de Marzo]

Vas a implementar tu propia versión de un modelo basado en Transformers, en concreto, la del algoritmo SASRec, que incluye un embedding posicional y un modelo causal en la capa de atención.

Para ello, puedes revisar la siguiente implementación de un modelo basado en GRU, aunque tu versión puede hacer un procesamiento por lotes mucho más sencillo (aquí se han hecho los mismos mini-lotes en paralelo que en la propuesta original).

In [None]:
class SessionDataset:
      def __init__(self, df, n):
          self.df = df.sort_values(by = ['s', 't']).reset_index(drop = True)
          self.offsets    = np.concatenate((np.zeros(1, dtype = np.int32), self.df.groupby('s').size().cumsum().values))
          self.n_sessions = len(self.offsets) - 1
          self.item_to_id = {item : i for i, item in enumerate(self.df.i.unique())}
          self.n_items = len(self.item_to_id)
          self.n = n

      def item_to_one_hot(self, item):
          return tf.one_hot(self.item_to_id[item], depth = self.n)

      def extract_session(self, i, one_hot_encoded = True):
          session = self.df[self.offsets[i]:self.offsets[i+1]].copy()
          if one_hot_encoded:
              session.loc[:, 'i'] = session.i.apply(lambda x : self.item_to_one_hot(x))
          return session.i.values.tolist()

def from_ratings_to_sessions(ratings, session_size):
    ratings_sorted = ratings.data.sort_values(by=["u", "t"])
    data_sessions = []
    cur_session = 0
    cur_session_length = 0
    cur_user = None
    def my_fun(row):
        nonlocal data_sessions
        nonlocal cur_session
        nonlocal cur_session_length
        nonlocal cur_user

        if not cur_user:
            cur_user = row.u
        if row.u != cur_user:
            cur_user = row.u
            cur_session += 1
            cur_session_length = 0
        if row.u == cur_user:
            cur_session_length += 1
        data_sessions.append(cur_session)
        if session_size and cur_session_length >= session_size:
            cur_session += 1
            cur_session_length = 0

    ratings_sorted.apply(my_fun, axis=1)
    sessions = pd.DataFrame(
        data={
            "u": ratings_sorted.u,
            "i": ratings_sorted.i,
            "r": ratings_sorted.r,
            "t": ratings_sorted.t,
            "s": data_sessions,
        }
    )
    return sessions

class GRU4RecRecommender(Recommender):
    def __init__(self, training, k=50, nepochs=150, steps_per_epoch=100, ngru_layers=1, batch_size=8, session_length=5, compute_final=False):
        super().__init__(training)
        # Create the model - this will directly trigger training.        
        tf.random.set_seed(0) # For comparability and debugging (the randomness here is in parameter initialization).
        dataset = SessionDataset(from_ratings_to_sessions(training, session_length), training.nitems())
        self.tf_mf, self.hist = self.create_model(dataset, training.nitems(), k, nepochs, steps_per_epoch, ngru_layers, batch_size)
        # Since the network is stateful, the batch size cannot be modified (at least in Keras), so we must always predict batch_size elements at once.
        n_classes = training.nitems()
        # se deberían calcular los estados ocultos para todas las sesiones, pero tarda mucho
        # idealmente, se podría paralelizar este cálculo
        if compute_final:
            final_states = self._calculate_final_states(dataset, self.tf_mf, ngru_layers, k, batch_size, n_classes)
        self._reset_hidden(self.tf_mf, 0)
        y_pred = np.empty(shape = (dataset.n_sessions, n_classes))
        y_pred[:] = None
        X = np.empty(shape = (batch_size, 1, n_classes))
        next_session_id = 0
        for batch_id in range(dataset.n_sessions // batch_size):
            # X contains the penultimate item in the session (= last item in the training set)
            X[:] = None
            for i in range(batch_size):
                X[i, :] = dataset.extract_session(next_session_id)[-1]
                next_session_id += 1
            nlg = 0
            for nl, layer in enumerate(self.tf_mf.layers):
                if self._is_GRU_layer(layer):
                    self.tf_mf.layers[nl].reset_states()
                    nlg += 1
            # objective: predict last element in the session
            y_pred[batch_id * batch_size : (batch_id + 1) * batch_size, :] = self.tf_mf.predict(X, verbose = 0)[:batch_size]

        y_pred = tf.constant(y_pred[:dataset.n_sessions], dtype = tf.float32)
        # recover predictions as item scores for each user for classical recommendation (this should not be done in general, but perform a sequential evaluation)
        self.scores = y_pred.numpy()

    def create_model(self, dataset, n_classes, k, nepochs, steps_per_epoch, ngru_layers, batch_size):
        model = tf.keras.models.Sequential(name="GRU4Rec")
        for i in range(ngru_layers):
            model.add(tf.keras.layers.GRU(name = 'GRU_{}'.format(i+1),
                                          units      = k, 
                                          activation = 'relu', 
                                          stateful   = True,
                                          return_sequences = (i < ngru_layers - 1)))
        model.add(tf.keras.layers.Dense(units = n_classes, activation = 'linear'))   # class logits

        # track top 3 accuracy (= how often the true item is among the top 3 recommended)
        top3accuracy = lambda y_true, y_pred: tf.keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k = 3)
        top3accuracy.__name__ = 'top3accuracy'
        model.compile(loss = GRU4RecRecommender._TOP1, optimizer = 'adam', metrics = ['accuracy', top3accuracy])

        model.build(input_shape = (batch_size, 1, n_classes))
        print(model.summary())
        
        hist = self.train_model(dataset, model, n_classes, batch_size, nepochs, steps_per_epoch)
        return model, hist
    
    def train_batch_generator(self, model, dataset, batch_size, n_classes):
        ixs = np.arange(dataset.n_sessions)

        stacks = [[]] * batch_size   # stacks containing batch_size REVERSED (pieces of) sessions at once. Will be emptied progressively
        next_session_id = 0

        X, y = np.empty(shape = (batch_size, 1, n_classes)), np.empty(shape = (batch_size, n_classes)) 
        while True:
            X[:], y[:] = None, None
            for i in range(batch_size): # fill in X, y (current batch)
                # 1. If stack i is empty (only happens at first round) or has only one element: fill it with a new session
                if len(stacks[i]) <= 1:
                    while not len(stacks[i]) >= 2:   # ignore sessions with only one element (cannot contribute to the training)
                        stacks[i] = dataset.extract_session(ixs[next_session_id])[::-1]  # the data does not have to be all in memory at the same time: we could e.g. load a session at once
                        next_session_id += 1
                        if next_session_id >= dataset.n_sessions: # no more sessions available: shuffle sessions and restart
                            np.random.shuffle(ixs)
                            next_session_id = 0
                    self._reset_hidden(model, i)   # if session changes, the corresponding hidden state must be reset
                # 2. Stack i is now valid: set input + target variables
                X[i, 0] = stacks[i].pop()
                y[i]    = stacks[i][-1]

            yield tf.constant(X, dtype = tf.float32), tf.constant(y, dtype = tf.float32)

    def _reset_hidden(self, model, i):
        for nl, layer in enumerate(model.layers):   # session has changed: reset corresponding hidden state
            if self._is_GRU_layer(layer) and layer.states[0] is not None:
                hidden_updated = layer.states[0].numpy()
                hidden_updated[i, :] = 0.
                model.layers[nl].reset_states()

    def _is_GRU_layer(self, layer):
        return layer.name.startswith('GRU_')

    def train_model(self, dataset, model, n_classes, batch_size, nepochs, steps_per_epoch):
        hist = model.fit(self.train_batch_generator(model, dataset, batch_size, n_classes), 
                            steps_per_epoch = steps_per_epoch, 
                            epochs          = nepochs,
                            callbacks       = [TqdmCallback(verbose=0)], 
                            shuffle         = False)
        return hist
    
    def _TOP1(y_true, y_pred):
        _y_pred = tf.expand_dims(y_pred, axis = -1)
        mat = tf.matmul(tf.expand_dims(tf.ones_like(y_true), -1), tf.expand_dims(y_true, axis = 1))
        score_diffs = tf.matmul(mat, _y_pred)
        score_diffs = tf.squeeze(score_diffs - _y_pred, -1)
        loss_by_sample = tf.reduce_sum(tf.nn.sigmoid(tf.square(y_pred)), axis = -1) + \
                          tf.reduce_sum(tf.sigmoid(-score_diffs), axis = -1) + \
                        -tf.squeeze(tf.squeeze(tf.nn.sigmoid(tf.square(tf.matmul(tf.expand_dims(y_true, 1), _y_pred))), -1), -1)
        return tf.reduce_sum(loss_by_sample)

    def _calculate_final_states(self, dataset, model, n_layers, n_hidden, batch_size, n_classes):
        final_states = np.empty(shape = (dataset.n_sessions, n_layers, n_hidden)) # final states will be stored here
        final_states[:] = None
        done = [False] * dataset.n_sessions   # keep track of the sessions for which the last state has already been calculated

        stacks = [dataset.extract_session(i)[::-1] for i in range(batch_size)]
        next_session_id = batch_size
        batch_idx_to_session = np.arange(batch_size)   # keep track of which session is in each batch element
        X = np.empty(shape = (batch_size, 1, n_classes))

        self._reset_hidden(model, 0)
        n_done = 0
        while n_done < dataset.n_sessions:
            for i in range(batch_size):
                while len(stacks[i]) == 1:  # stack i is at the end
                    if not done[batch_idx_to_session[i]]:
                        # save final hidden state
                        final_states[batch_idx_to_session[i], :] = np.array([layer.states[0][i, :] for layer in model.layers if self._is_GRU_layer(layer)])
                        done[batch_idx_to_session[i]] = True
                        n_done += 1
                        if n_done % 1000 == 0:
                            print("Progress: {} / {}".format(n_done, dataset.n_sessions))
                    if next_session_id >= dataset.n_sessions: # restart from the beginning (just to reach required batch size)
                        next_session_id = 0
                    stacks[i] = dataset.extract_session(next_session_id)[::-1]
                    batch_idx_to_session[i] = next_session_id
                    next_session_id += 1
                    self._reset_hidden(model, i)   # session has changed --> reset corresponding hidden state
                X[i, 0] = stacks[i].pop()

            _ = model.predict(X, verbose = 0)   # hidden states get updated when "predict" is called
        
        return final_states


In [None]:
def from_ratings_to_sequences(ratings):
    ratings_sorted = ratings.data.sort_values(by=["u", "t"])
    data_sequences = {}
    cur_sequence = 0
    cur_user = None
    def my_fun(row):
        nonlocal data_sequences
        nonlocal cur_sequence
        nonlocal cur_user

        if not cur_user:
            cur_user = row.u
            data_sequences[cur_sequence] = []
        if row.u != cur_user:
            cur_user = row.u
            cur_sequence += 1
            data_sequences[cur_sequence] = []
        # append item, but also transform it into index instead of using its id
        data_sequences[cur_sequence].append(ratings.iid_to_iidx(row.i))

    ratings_sorted.apply(my_fun, axis=1)
    return data_sequences

class PositionalEmbedding(tf.keras.Layer):
    def __init__(self, sequence_length, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.position_embeddings = tf.keras.layers.Embedding(
            input_dim=sequence_length, output_dim=output_dim
        )
        self.sequence_length = sequence_length
        self.output_dim = output_dim

    def call(self, inputs):
        length = tf.shape(inputs)[1]
        positions = tf.range(start=0, limit=length, delta=1)
        embedded_positions = self.position_embeddings(positions)
        return inputs + embedded_positions


class TransformerRecommender(Recommender):
    def __init__(self, training, batch_size=32, dim=128, num_heads=4, nlayers=2, max_len=50, dropout=0.1, nepochs=5, steps_per_epoch=100):
        super().__init__(training)
        self.max_len = max_len
        # Create the model - this will directly trigger training.
        tf.random.set_seed(0) # For comparability and debugging (the randomness here is in parameter initialization).
        dataset = from_ratings_to_sequences(training)
        self.model, self.hist = self.create_model(dataset, training.nitems(), batch_size, dim, num_heads, nlayers, max_len, dropout, nepochs, steps_per_epoch)

        padded_sequences = []
        for sequence in dataset.values():
            # Your code here...

        input_tensor = tf.constant(padded_sequences, dtype=tf.float32)
        logits =  # Your code here...
        scores =  # Your code here...
        self.scores = scores.numpy()

    def create_model(self, ratings, nitems, batch_size, dim=128, num_heads=4, nlayers=2, max_len=50, dropout=0.1, nepochs=10, steps_per_epoch=100):
        inputs = tf.keras.Input(shape=(max_len,))
        x = tf.keras.layers.Embedding(nitems + 1, dim, mask_zero=True)(inputs)
        x = # Your code here...

        for _ in range(nlayers):
            x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(x)
            attn_output = # Your code here...
            x = tf.keras.layers.Add()([x, attn_output])
            x = tf.keras.layers.Dropout(dropout)(x)

            x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(x)
            ffn_output = # Your code here...
            x = tf.keras.layers.Add()([x, ffn_output])
            x = tf.keras.layers.Dropout(dropout)(x)

        outputs = # Your code here...

        model = tf.keras.Model(inputs=inputs, outputs=outputs, name="Transformer")
        # Compile the model: Adam optimizer and SparseCategoricalCrossentropy as loss
        # Your code here...
        print(model.summary())

        hist = self.train_model(ratings, model, batch_size, nepochs, steps_per_epoch)
        return model, hist

    def train_batch_generator(self, dataset, batch_size):
        ixs = np.arange(len(dataset))
        next_seq_id = 0

        X, Y = np.empty(shape = (batch_size, self.max_len)), np.empty(shape = (batch_size, self.max_len)) 
        while True:
            X[:], Y[:] = None, None
            # Your code here...

            yield tf.constant(X, dtype = tf.float32), tf.constant(Y, dtype = tf.float32)

    def train_model(self, dataset, model, batch_size, nepochs, steps_per_epoch):
        hist = model.fit(self.train_batch_generator(dataset, batch_size), 
                            steps_per_epoch = steps_per_epoch, 
                            epochs          = nepochs,
                            callbacks       = [TqdmCallback(verbose=0)], 
                            shuffle         = False)
        return hist

### Ejercicio 4 &ndash; Explicación/documentación

(por hacer)

## Ejercicio 5: Ampliaciones

Explorar variaciones sobre una o varias de las implementaciones anteriores, tales como:

* En general:
    * Además de las métricas de evaluación usadas en la P1 (precision, recall) incluir otras métricas, bien de acierto (NDCG) u otras (cobertura, diversidad, etc.)

* Sobre el ejercicio 1 (algoritmo EASE):
    * Añadir la opción de asignar 0 a los pesos negativos, comprobando que su eficacia disminuye (como indica el artículo original)

* Sobre el ejercicio 2 (factorización de matrices por aprendizaje profundo) y 3 (modelo Two-Tower):
    * Diferentes funciones de scoring de pérdida: sigmoide / BCE loss, BCE loss with logits.
    * Diferentes optimizadores y configuraciones de los mismos (SGD, Adam, etc.).
    * Variaciones en los hiperparámetros y configuración del modelo: learning rate, número de factores k, número de épocas, inicialización de parámetros del modelo, etc.
    * Añadir opciones tales como regularización, dropout, etc.
    * Añadir capas ocultas en la implementación sobre framework de deep learning.
    * Explorar una formulación *pairwise learning to rank* sobre MF (p.e., BPR).

* Sobre el ejercicio 4 (recomendación secuencial):
    * Estudiar el impacto del tamaño de los lotes (batch_size)
    * Prueba otras formas de partir el conjunto de datos y observa si la eficacia de los algoritmos secuenciales cambia

Idealmente estas variaciones buscan mejorar la precisión de la recomendación, pero se valorarán intentos interesantes aunque resulten fallidos en ese aspecto.

Para probar las implementaciones deberá completarse la función `student_test()` para ilustrar la ejecución de las variantes adicionales, y se incluirán las filas que correspondan en la tabla del apartado anterior.

In [8]:
# Código aquí: clases, funciones...

def student_test():
    # Código de prueba aquí...
    pass

### Ejercicio 5 &ndash; Explicación/documentación

(por hacer)

## Celda de prueba

Descarga los ficheros de datos y coloca sus contenidos en una carpeta **data** en el mismo directorio que este *notebook*.

In [11]:
import datetime, time

class CategoricalAccuracy(Metric):
    def __init__(self, test, cutoff=np.inf):
        super().__init__(cutoff)
        dataset = from_ratings_to_sequences(test)
        y_true = np.empty(shape = (len(dataset), 1))
        for i in range(y_true.shape[0]):
            y_true[i, :] = dataset[i][:1]
        self.y_true = tf.constant(y_true, dtype = tf.float32)

    def compute(self, recommendation):
        y_pred = tf.constant(recommendation.ranked_iidx(), dtype = tf.float32)
        acc = (tf.reduce_sum(tf.keras.metrics.top_k_categorical_accuracy(self.y_true, y_pred, k = self.cutoff)) / self.y_true.shape[0]).numpy()
        return acc


# Test data structures and algorithms on a dataset.
def test(ratings_file, topn=np.inf, cutoff=np.inf, threshold=1, sep=','):
    print(colored(f'Reading the data at ' + time.strftime('%X...'), 'blue'))
    start = time.time()
    ratings = Ratings(ratings_file, sep)
    print(f'Ratings matrix takes {round(10 * ratings.matrix().nbytes / 1024 / 1024) / 10:,} MB in RAM')
    timer(start)

    # Produce a rating split and test a set of recommenders. 
    train, test = ratings.random_split(0.8)
    train_temp, test_temp = ratings.peruser_sequence_split(ntestitems=1)
    metrics = [Precision(test, cutoff=cutoff, threshold=threshold), Recall(test, cutoff=cutoff, threshold=threshold)]
    metrics_temp = [Precision(test_temp, cutoff=cutoff, threshold=threshold), Recall(test_temp, cutoff=cutoff, threshold=threshold), CategoricalAccuracy(test_temp, cutoff=1), CategoricalAccuracy(test_temp, cutoff=3), CategoricalAccuracy(test_temp, cutoff=cutoff)]
    run_recommenders(train, metrics, topn)
    run_temp_recommenders(train_temp, metrics_temp, topn)


# Run some recommenders on the some rating data as input - no evaluation.
def run_recommenders(train, metrics, topn):
    print('-------------------------')
    start = time.time()
    run_recommender(DLMFRecommender(train, nepochs=5), metrics, topn)
    start = timer(start)
    
    print('-------------------------')
    run_recommender(TwoTowerRecommender(train, nepochs=5), metrics, topn)
    start = timer(start)
    
    print('-------------------------')
    run_recommender(Ease(train, threshold=1), metrics, topn)
    timer(start)

def run_temp_recommenders(train, metrics, topn):
    print('-------------------------')
    start = time.time()
    run_recommender(GRU4RecRecommender(train, nepochs=5, session_length=None, batch_size=32), metrics, topn)
    start = timer(start)
    
    print('-------------------------')
    run_recommender(TransformerRecommender(train, nlayers=1), metrics, topn)
    start = timer(start)
    
    print('-------------------------')
    run_recommender(TransformerRecommender(train, nlayers=2), metrics, topn)
    timer(start)

# Run a recommender and evaluate a list of metrics on its output.
def run_recommender(recommender, metrics, topn):
    print(f'Testing {recommender} (top {topn})')
    recommendation = recommender.recommend(topn)
    print('Four example recommendations:\n' + recommendation.display(4))
    for metric in metrics:
        print(metric, '=', metric.compute(recommendation))

from termcolor import colored
def timer(start):
    print(colored(f'--> elapsed time: {datetime.timedelta(seconds=round(time.time() - start))} <--', 'blue'))
    return time.time()
    
np.random.seed(0)
print('=========================\nTesting MovieLens \'1 million\' dataset')
test('data/ratings-1m.dat', topn=10, cutoff=10, sep='::')
print('=========================\nDone.')

# Additional testing?
student_test()

Testing MovieLens '1 million' dataset
[34mReading the data at 10:50:35...[0m
Ratings matrix takes 85.4 MB in RAM
[34m--> elapsed time: 0:00:07 <--[0m


2025-03-03 10:51:47.791495: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2025-03-03 10:51:47.791555: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:137] retrieving CUDA diagnostic information for host: lex
2025-03-03 10:51:47.791564: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:144] hostname: lex
2025-03-03 10:51:47.791705: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] libcuda reported version is: 535.183.1
2025-03-03 10:51:47.791733: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:172] kernel reported version is: 470.256.2
2025-03-03 10:51:47.791742: E external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:262] kernel version 470.256.2 does not match DSO version 535.183.1 -- cannot find working devices in this configuration


-------------------------


0epoch [00:00, ?epoch/s]

[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 73ms/step
Testing DLMFRecommender (top 10)
Four example recommendations:
    User 1 -> <1653:0 3412:0 1198:0 1375:0 1476:0 2393:0 805:0 1192:0 3932:0 481:0>
    User 2 -> <2867:0 1434:0 3506:0 3022:0 1913:0 1053:0 802:0 1252:0 1150:0 3606:0>
    User 3 -> <3755:0 1663:0 3753:0 1307:0 3094:0 1834:0 3437:0 2906:0 1036:0 1328:0>
    User 4 -> <3211:0 726:0 3321:0 1104:0 734:0 3229:0 3419:0 3845:0 2662:0 468:0>
Precision@10 = 0.04841059602649006
Recall@10 = 0.009920007624033047
[34m--> elapsed time: 0:00:18 <--[0m
-------------------------


0epoch [00:00, ?epoch/s]

[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 128ms/step
Testing TwoTowerRecommender (top 10)
Four example recommendations:
    User 1 -> <2858:0.9281 1210:0.8443 1196:0.8389 593:0.8029 480:0.8028 589:0.8026 1198:0.7941 110:0.7922 2571:0.7869 1270:0.7847>
    User 2 -> <260:1.222 2762:1.1377 608:1.1348 110:1.132 1580:1.122 1270:1.1217 2396:1.1169 1617:1.1095 1197:1.0933 527:1.0932>
    User 3 -> <260:0.7965 2028:0.7654 589:0.7475 2762:0.7419 608:0.7391 110:0.7378 2571:0.7328 2396:0.7282 1617:0.7231 527:0.7123>
    User 4 -> <2858:0.6986 2028:0.6179 593:0.6034 589:0.6033 2762:0.5989 608:0.5962 110:0.5954 2571:0.5913 1270:0.5896 1580:0.5892>
Precision@10 = 0.18160596026490067
Recall@10 = 0.06748152424989019
[34m--> elapsed time: 0:00:23 <--[0m
-------------------------
   x @ x.T computed [94m[0:00:01][0m
   Matrix p inverted [94m[0:00:02][0m
   Scores computed [94m[0:00:01][0m
Testing Ease (top 10)
Four example recommendations:
    User 1 -> <588:0.5914 364:0.5

None


0epoch [00:00, ?epoch/s]

Epoch 1/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 441ms/step - accuracy: 0.0069 - loss: 118133.8047 - top3accuracy: 0.0138
Epoch 2/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 444ms/step - accuracy: 0.0035 - loss: 116657.1719 - top3accuracy: 0.0098
Epoch 3/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 441ms/step - accuracy: 0.0062 - loss: 115741.2734 - top3accuracy: 0.0134
Epoch 4/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 444ms/step - accuracy: 0.0029 - loss: 115171.7266 - top3accuracy: 0.0109
Epoch 5/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 441ms/step - accuracy: 0.0051 - loss: 114426.3516 - top3accuracy: 0.0168
Testing GRU4RecRecommender (top 10)
Four example recommendations:
    User 1 -> <64:0.3325 140:0.3138 99:0.2941 80:0.2895 17:0.2715 525:0.2708 133:0.2598 8:0.2525 135:0.2444 203:0.239>
    User 2 -> <64:0.2979 150:0.2893 140:0.2856 99:0.2663 80:0.256



None


0epoch [00:00, ?epoch/s]

Epoch 1/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 137ms/step - accuracy: 0.9009 - loss: 2.0901
Epoch 2/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 130ms/step - accuracy: 1.0000 - loss: 0.0012
Epoch 3/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 131ms/step - accuracy: 1.0000 - loss: 6.0254e-04
Epoch 4/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 135ms/step - accuracy: 1.0000 - loss: 3.6936e-04
Epoch 5/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 135ms/step - accuracy: 1.0000 - loss: 2.5148e-04
[1m189/189[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 59ms/step
Testing TransformerRecommender (top 10)
Four example recommendations:
    User 1 -> <2354:11.6549 1096:7.315 1192:7.1456 3185:6.7863 1544:6.6763 2686:6.6111 607:6.5668 1960:5.852 2796:5.6287 1206:5.5888>
    User 2 -> <2354:9.8011 3104:6.393 744:6.1252 1192:6.0659 2686:6.0554 1034:6.0358 149:5.7617 5



None


0epoch [00:00, ?epoch/s]

Epoch 1/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 209ms/step - accuracy: 0.9002 - loss: 2.1691
Epoch 2/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 199ms/step - accuracy: 1.0000 - loss: 0.0013
Epoch 3/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 198ms/step - accuracy: 1.0000 - loss: 6.6873e-04
Epoch 4/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 202ms/step - accuracy: 1.0000 - loss: 4.0155e-04
Epoch 5/5
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 200ms/step - accuracy: 1.0000 - loss: 2.6556e-04
[1m189/189[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 89ms/step
Testing TransformerRecommender (top 10)
Four example recommendations:
    User 1 -> <2354:16.5057 660:12.8146 593:10.1056 918:9.919 2320:9.2147 2686:8.8334 149:8.7487 2761:8.7118 1096:8.6839 913:8.6194>
    User 2 -> <2354:16.0389 660:11.5319 259:10.6225 918:10.3091 1835:9.4998 3113:9.0252 1720:8.939