##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Listwise ranking

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/listwise_ranking"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/listwise_ranking.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/listwise_ranking.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/listwise_ranking.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In [the basic ranking tutorial](basic_ranking), we trained a model that can predict ratings for user/movie pairs. The model was trained to minimize the mean squared error of predicted ratings.

However, optimizing the model's predictions on individual movies is not necessarily the best method for training ranking models. We do not need ranking models to predict scores with great accuracy. Instead, we care more about the ability of the model to generate an ordered list of items that matches the user's preference ordering.

Instead of optimizing the model's predictions on individual query/item pairs, we can optimize the model's ranking of a list as a whole. This method is called _listwise ranking_.

In this tutorial, we will use TensorFlow Recommenders to build listwise ranking models. To do so, we will make use of ranking losses and metrics provided by [TensorFlow Ranking](https://github.com/tensorflow/ranking), a TensorFlow package that focuses on [learning to rank](https://www.microsoft.com/en-us/research/publication/learning-to-rank-from-pairwise-approach-to-listwise-approach/).

## Preliminaries

If TensorFlow Ranking is not available in your runtime environment, you can install it using `pip`:

In [None]:
!pip install -q tensorflow-recommenders
!pip install -q --upgrade tensorflow-datasets
!pip install -q tensorflow-ranking

We can then import all the necessary packages:

In [None]:
import pprint

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
import tensorflow_ranking as tfr
import tensorflow_recommenders as tfrs

We will continue to use the MovieLens 100K dataset. As before, we load the datasets and keep only the user id, movie title, and user rating features for this tutorial. We also do some houskeeping to prepare our vocabularies.

In [None]:
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "user_rating": x["user_rating"],
})
movies = movies.map(lambda x: x["movie_title"])

unique_movie_titles = np.unique(np.concatenate(list(movies.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(ratings.batch(1_000).map(
    lambda x: x["user_id"]))))

## Data preprocessing

However, we cannot use the MovieLens dataset for list optimization directly. To perform listwise optimization, we need to have access to a list of movies each user has rated, but each example in the MovieLens 100K dataset contains only the rating of a single movie.

To get around this we transform the dataset so that each example contains a user id and a list of movies rated by that user. Some movies in the list will be ranked higher than others; the goal of our model will be to make predictions that match this ordering.

To do this, we use the `tfrs.examples.movielens.movielens_to_listwise` helper function. It takes the MovieLens 100K dataset and generates a dataset containing list examples as discussed above. The implementation details can be found in the [source code](https://github.com/tensorflow/recommenders/blob/main/tensorflow_recommenders/examples/movielens.py).

In [None]:
tf.random.set_seed(42)

# Split between train and tests sets, as before.
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

# We sample 50 lists for each user for the training data. For each list we
# sample 5 movies from the movies the user rated.
train = tfrs.examples.movielens.sample_listwise(
    train,
    num_list_per_user=50,
    num_examples_per_list=5,
    seed=42
)
test = tfrs.examples.movielens.sample_listwise(
    test,
    num_list_per_user=1,
    num_examples_per_list=5,
    seed=42
)

We can inspect an example from the training data. The example includes a user id, a list of 10 movie ids, and their ratings by the user.

In [None]:
for example in train.take(1):
  pprint.pprint(example)

## Model definition

We will train the same model with three different losses:

- mean squared error,
- pairwise hinge loss, and
- a listwise ListMLE loss.

These three losses correspond to pointwise, pairwise, and listwise optimization.

To evaluate the model we use [normalized discounted cumulative gain (NDCG)](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG). NDCG measures a predicted ranking by taking a weighted sum of the actual rating of each candidate. The ratings of movies that are ranked lower by the model would be discounted more. As a result, a good model that ranks highly-rated movies on top would have a high NDCG result. Since this metric takes the ranked position of each candidate into account, it is a listwise metric.

In [None]:
class RankingModel(tfrs.Model):

  def __init__(self, loss):
    super().__init__()
    embedding_dimension = 32
    self.loss_function = loss  # NDCGLoss2

    # Compute embeddings for users.
    self.user_embeddings = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=unique_user_ids),
      tf.keras.layers.Embedding(len(unique_user_ids) + 2, embedding_dimension)
    ])

    # Compute embeddings for movies.
    self.movie_embeddings = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=unique_movie_titles),
      tf.keras.layers.Embedding(len(unique_movie_titles) + 2, embedding_dimension)
    ])

    # Compute predictions.
    self.score_model = tf.keras.Sequential([
      # Learn multiple dense layers.
      tf.keras.layers.Dense(256, activation="relu"),
      tf.keras.layers.Dense(64, activation="relu"),
      # Make rating predictions in the final layer.
      tf.keras.layers.Dense(1)
    ])

    self.task = tfrs.tasks.Ranking(
      loss=loss,
      metrics=[
        tfr.keras.metrics.NDCGMetric(name="ndcg_metric"),
        tf.keras.metrics.RootMeanSquaredError()
      ]
    )

  def call(self, features):
    # We first convert the id features into embeddings.
    # User embeddings are a [batch_size, embedding_dim] tensor.
    user_embeddings = self.user_embeddings(features["user_id"])

    # Movie embeddings are a [batch_size, num_movies_in_list, embedding_dim]
    # tensor.
    movie_embeddings = self.movie_embeddings(features["movie_title"])

    # We want to concatenate user embeddings with movie emebeddings to pass
    # them into the ranking model. To do so, we need to reshape the user
    # embeddings to match the shape of movie embeddings.
    list_length = features["movie_title"].shape[1]
    user_embedding_repeated = tf.repeat(
        tf.expand_dims(user_embeddings, 1), [list_length], axis=1)

    # Once reshaped, we concatenate and pass into the dense layers to generate
    # predictions.
    concatenated_embeddings = tf.concat(
        [user_embedding_repeated, movie_embeddings], 2)

    return self.score_model(concatenated_embeddings)


  # def compute_loss(self, features, training=False):
  #   labels = features.pop("user_rating")

  #   scores = self(features)

  #   return self.task(
  #       labels=labels,
  #       predictions=tf.squeeze(scores, axis=-1),
  #   )
  def compute_loss(self, features, training=False):
      labels = features.pop("user_rating")
      scores = self(features)

      # Compute loss using NDCGLoss2
      return self.loss_function(labels, tf.squeeze(scores, axis=-1))

## Training the models

We can now train each of the three models.

In [None]:
epochs = 100

cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

### Mean squared error model

This model is very similar to the model in [the basic ranking tutorial](basic_ranking). We train the model to minimize the mean squared error between the actual ratings and predicted ratings. Therefore, this loss is computed individually for each movie and the training is pointwise.

In [None]:
mse_model = RankingModel(tf.keras.losses.MeanSquaredError())
mse_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
mse_model.fit(cached_train, epochs=epochs, verbose=False)

### Pairwise hinge loss model

By minimizing the pairwise hinge loss, the model tries to maximize the difference between the model's predictions for a highly rated item and a low rated item: the bigger that difference is, the lower the model loss. However, once the difference is large enough, the loss becomes zero, stopping the model from further optimizing this particular pair and letting it focus on other pairs that are incorrectly ranked

This loss is not computed for individual movies, but rather for pairs of movies. Hence the training using this loss is pairwise.

In [None]:
hinge_model = RankingModel(tfr.keras.losses.PairwiseHingeLoss())
hinge_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
hinge_model.fit(cached_train, epochs=epochs, verbose=False)

### Listwise model

The `ListMLE` loss from TensorFlow Ranking expresses list maximum likelihood estimation. To calculate the ListMLE loss, we first use the user ratings to generate an optimal ranking. We then calculate the likelihood of each candidate being out-ranked by any item below it in the optimal ranking using the predicted scores. The model tries to minimize such likelihood to ensure highly rated candidates are not out-ranked by low rated candidates. You can learn more about the details of ListMLE in section 2.2 of the paper [Position-aware ListMLE: A Sequential Learning Process](http://auai.org/uai2014/proceedings/individuals/164.pdf).

Note that since the likelihood is computed with respect to a candidate and all candidates below it in the optimal ranking, the loss is not pairwise but listwise. Hence the training uses list optimization.

In [None]:
listwise_model = RankingModel(tfr.keras.losses.ListMLELoss())
listwise_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
listwise_model.fit(cached_train, epochs=epochs, verbose=False)

## Comparing the models

In [None]:
mse_model_result = mse_model.evaluate(cached_test, return_dict=True)
print("NDCG of the MSE Model: {:.4f}".format(mse_model_result["ndcg_metric"]))

In [None]:
hinge_model_result = hinge_model.evaluate(cached_test, return_dict=True)
print("NDCG of the pairwise hinge loss model: {:.4f}".format(hinge_model_result["ndcg_metric"]))

In [None]:
listwise_model_result = listwise_model.evaluate(cached_test, return_dict=True)
print("NDCG of the ListMLE model: {:.4f}".format(listwise_model_result["ndcg_metric"]))

Of the three models, the model trained using ListMLE has the highest NDCG metric. This result shows how listwise optimization can be used to train ranking models and can potentially produce models that perform better than models optimized in a pointwise or pairwise fashion.

In [None]:
# import tensorflow as tf

# class NDCGLoss2(tf.keras.losses.Loss):
#     def __init__(self, sigma=1.0, reduction=tf.keras.losses.Reduction.AUTO, name=None):
#         super().__init__(reduction=reduction, name=name)
#         self.sigma = sigma

#     def call(self, y_true, y_pred):
#         # Ensure that y_true and y_pred are the same shape and dtype
#         y_true = tf.cast(y_true, tf.float32)
#         y_pred = tf.cast(y_pred, tf.float32)

#         # Compute pairwise differences in scores
#         y_pred_diff = tf.expand_dims(y_pred, 2) - tf.expand_dims(y_pred, 1)

#         # Mask for pairs where y_i > y_j
#         mask = tf.greater(tf.expand_dims(y_true, 2), tf.expand_dims(y_true, 1))

#         # Sigmoid function applied to score differences
#         sigmoid_scores = tf.sigmoid(self.sigma * y_pred_diff)

#         # Rank difference term
#         num_items = tf.shape(y_true)[1]
#         rank_diff = tf.abs(tf.range(num_items)[:, None] - tf.range(num_items)[None, :])
#         rank_diff = tf.cast(rank_diff, tf.float32)
#         epsilon = 1e-10  # A small constant to avoid division by zero
#         log_rank_term = tf.abs(1 / (tf.math.log1p(rank_diff) + epsilon) - 1 / (tf.math.log1p(rank_diff + 1) + epsilon))
#         # Expand log_rank_term to match the batch size
#         log_rank_term = tf.expand_dims(log_rank_term, 0)
#         log_rank_term = tf.repeat(log_rank_term, tf.shape(y_true)[0], axis=0)

#         # Simplified Hard Assignment Distribution H(pi|s)
#         predicted_order = tf.argsort(y_pred, axis=1, direction='DESCENDING')
#         true_order = tf.argsort(y_true, axis=1, direction='DESCENDING')
#         H_pi_s = tf.cast(tf.equal(predicted_order[:, :, None], true_order[:, None, :]), tf.float32)

#         # Compute the inner sum
#         inner_sum = H_pi_s * sigmoid_scores**log_rank_term

#         # Apply the mask and compute the loss
#         masked_inner_sum = tf.boolean_mask(inner_sum, mask)
#         loss = -tf.reduce_sum(tf.math.log1p(masked_inner_sum), axis=-1)
#         # tf.print("Loss value:", loss)

#         return loss

In [None]:
class NDCGLoss2(tf.keras.losses.Loss):
    def __init__(self, sigma=1.0, reduction=tf.keras.losses.Reduction.AUTO, name=None):
        super().__init__(reduction=reduction, name=name)
        self.sigma = sigma

    def call(self, y_true, y_pred):
        # Ensure that y_true and y_pred are the same shape and dtype
        y_true = tf.cast(y_true, tf.float32)
        y_pred = tf.cast(y_pred, tf.float32)

        # Compute pairwise differences in scores
        y_pred_diff = tf.expand_dims(y_pred, 2) - tf.expand_dims(y_pred, 1)

        # Mask for pairs where y_i > y_j
        mask = tf.greater(tf.expand_dims(y_true, 2), tf.expand_dims(y_true, 1))

        # Sigmoid function applied to score differences
        sigmoid_scores = tf.sigmoid(self.sigma * y_pred_diff)

        # Rank difference term
        num_items = tf.shape(y_true)[1]
        rank_diff = tf.abs(tf.range(num_items)[:, None] - tf.range(num_items)[None, :])
        rank_diff = tf.cast(rank_diff, tf.float32)
        epsilon = 1e-10  # A small constant to avoid division by zero
        log_rank_term = tf.abs(1 / (tf.math.log1p(rank_diff) + epsilon) - 1 / (tf.math.log1p(rank_diff + 1) + epsilon))

        # Expand log_rank_term to match the batch size
        log_rank_term = tf.expand_dims(log_rank_term, 0)
        log_rank_term = tf.repeat(log_rank_term, tf.shape(y_true)[0], axis=0)

        # Simplified Hard Assignment Distribution H(pi|s)
        predicted_order = tf.argsort(y_pred, axis=1, direction='DESCENDING')
        true_order = tf.argsort(y_true, axis=1, direction='DESCENDING')
        H_pi_s = tf.cast(tf.equal(predicted_order[:, :, None], true_order[:, None, :]), tf.float32)

        # Compute the inner sum
        inner_sum = H_pi_s * sigmoid_scores**log_rank_term

        # Apply the mask and compute the loss
        masked_inner_sum = tf.boolean_mask(inner_sum, mask)
        loss = -tf.reduce_sum(tf.math.log1p(masked_inner_sum), axis=-1)

        # Check for NaN or Inf in tensors and print if present
        if tf.reduce_any(tf.math.is_nan(loss)) or tf.reduce_any(tf.math.is_inf(loss)):
            tf.print("y_true:", y_true)
            tf.print("y_pred:", y_pred)
            tf.print("y_pred_diff:", y_pred_diff)
            tf.print("mask:", mask)
            tf.print("sigmoid_scores:", sigmoid_scores)
            tf.print("expanded log_rank_term:", log_rank_term)
            tf.print("H_pi_s:", H_pi_s)
            tf.print("inner_sum:", inner_sum)
            tf.print("Loss value:", loss)

        return loss


In [None]:
customModel = RankingModel(NDCGLoss2())
# customModel.compile(optimizer=tf.keras.optimizers.Adam(0.1))

In [None]:
# customModel.fit(cached_train, epochs=10, verbose=True)
# optimizer = tf.keras.optimizers.Adam(learning_rate=0.05)
optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.1)

In [None]:
@tf.function
def train_step(model, features, optimizer, clip_norm=1.0):
    # Create a copy of features to avoid modifying the original argument
    features_copy = {key: tf.identity(value) for key, value in features.items()}

    with tf.GradientTape() as tape:
        loss = model.compute_loss(features_copy)

    gradients = tape.gradient(loss, model.trainable_variables)
    clipped_gradients, _ = tf.clip_by_global_norm(gradients, clip_norm)
    optimizer.apply_gradients(zip(clipped_gradients, model.trainable_variables))

    return loss

In [None]:
ndcg_metric = tfr.keras.metrics.NDCGMetric(name="ndcg_metric")

In [None]:
import os
import sys
from contextlib import contextmanager

# Context manager to suppress stdout
@contextmanager
def suppress_stdout():
    with open(os.devnull, 'w') as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:
            yield
        finally:
            sys.stdout = old_stdout

In [None]:
for epoch in range(50):
    # Reset the metric at the start of each epoch for training
    ndcg_metric.reset_states()
    total_loss = 0

    for batch_features in cached_train:
        loss = train_step(customModel, batch_features, optimizer)
        total_loss += loss

        with suppress_stdout():
            predictions = customModel.predict(batch_features)

        predictions = tf.squeeze(predictions)
        ndcg_metric.update_state(batch_features['user_rating'], predictions)

    # Compute average loss over the epoch for training
    average_loss = total_loss / len(cached_train)

    # Get the training NDCG result
    ndcg_train_result = ndcg_metric.result().numpy()

    # Reset the metric for test evaluation
    ndcg_metric.reset_states()

    # Evaluate on test set
    for batch_features in cached_test:
        with suppress_stdout():
            predictions = customModel.predict(batch_features)

        predictions = tf.squeeze(predictions)
        ndcg_metric.update_state(batch_features['user_rating'], predictions)

    # Get the test NDCG result
    ndcg_test_result = ndcg_metric.result().numpy()

    # Print epoch results for both training and test
    print(f"Epoch {epoch}: Average Loss: {average_loss.numpy()}, Training NDCG: {ndcg_train_result}, Test NDCG: {ndcg_test_result}")


In [None]:
for epoch in range(50):
    # Reset the metric at the start of each epoch for training
    ndcg_metric.reset_states()
    total_loss = 0

    for batch_features in cached_train:
        loss = train_step(customModel, batch_features, optimizer)
        total_loss += loss

        with suppress_stdout():
            predictions = customModel.predict(batch_features)

        predictions = tf.squeeze(predictions)
        ndcg_metric.update_state(batch_features['user_rating'], predictions)

    # Compute average loss over the epoch for training
    average_loss = total_loss / len(cached_train)

    # Get the training NDCG result
    ndcg_train_result = ndcg_metric.result().numpy()

    # Reset the metric for test evaluation
    ndcg_metric.reset_states()

    # Evaluate on test set
    for batch_features in cached_test:
        with suppress_stdout():
            predictions = customModel.predict(batch_features)

        predictions = tf.squeeze(predictions)
        ndcg_metric.update_state(batch_features['user_rating'], predictions)

    # Get the test NDCG result
    ndcg_test_result = ndcg_metric.result().numpy()

    # Print epoch results for both training and test
    print(f"Epoch {epoch}: Average Loss: {average_loss.numpy()}, Training NDCG: {ndcg_train_result}, Test NDCG: {ndcg_test_result}")


In [None]:
# for epoch in range(5):
#     # Reset the metric at the start of each epoch
#     ndcg_metric.reset_states()
#     total_loss = 0

#     for batch_features in cached_train:
#         loss = train_step(customModel, batch_features, optimizer)
#         total_loss += loss

#         # Suppress the output of predict
#         with suppress_stdout():
#             predictions = customModel.predict(batch_features)

#         predictions = tf.squeeze(predictions)
#         ndcg_metric.update_state(batch_features['user_rating'], predictions)

#     # Compute average loss over the epoch
#     average_loss = total_loss / len(cached_train)

#     # Get the result from the metric
#     ndcg_result = ndcg_metric.result().numpy()

#     # Print epoch results
#     print(f"Epoch {epoch}: Average Loss: {average_loss.numpy()}, NDCG: {ndcg_result}")

In [None]:
# customModel_result = customModel.evaluate(cached_test, return_dict=True)
# print("NDCG of the Custom model: {:.4f}".format(customModel_result["ndcg_metric"]))

In [None]:
# import tensorflow_ranking as tfr

# # Step 1: Create an instance of the NDCG metric
# ndcg_metric = tfr.keras.metrics.NDCGMetric(name="ndcg_metric")

# # Step 2: Generate predictions and update the metric
# for batch_features in cached_test:
#     # Obtain predictions
#     predictions = customModel.predict(batch_features)
#     predictions = tf.squeeze(predictions)  # Ensure the correct shape

#     # Update the NDCG metric
#     ndcg_metric.update_state(batch_features['user_rating'], predictions)

# # Step 3: Compute the NDCG score
# ndcg_score = ndcg_metric.result().numpy()

# print("NDCG of the Custom model on test set: {:.4f}".format(ndcg_score))
