# Pointwise ranking

This notebook shows an example of how to train a deep pointwise ranker in TensorFlow.

The dataset used is MovieLens 100K dataset, however any costum dataset can be used.

In [None]:
# !pip install -q tensorflow-recommenders
# !pip install -q --upgrade tensorflow-datasets
# !pip install -q tensorflow-ranking

In [None]:
import sys
sys.path.append("../")

In [None]:
import pprint

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

import tensorflow_ranking as tfr
import tensorflow_recommenders as tfrs

from utils.feature_extraction import FeatureExtractionTower
from utils.models import RankingModel
from utils.preprocessing import *

import logging
tf.get_logger().setLevel(logging.ERROR)

## Data Loading


In [None]:
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "user_rating": x["user_rating"],
    "user_occupation_text": x["user_occupation_text"]
})

## Data preprocessing

We transform the dataset so that each example contains a user info and a list of items rated by that user. Some items in the list will be ranked higher than others; the goal of our model will be to make predictions that match this ordering.

This transformation can be applied to any custom dataset

In [None]:
user_features = ["user_id", "user_occupation_text"]
item_features = ["movie_title"]
label_name = "user_rating"
num_list_per_user=50
num_examples_per_list=5
seed=42

In [None]:
tf.random.set_seed(42)

# Split between train and tests sets, as before.
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

# We sample 50 lists for each user for the training data. For each list we
# sample 5 movies from the movies the user rated.
train = sample_listwise(
                train,
                user_features = user_features,
                item_features = item_features,
                label_name = label_name,
                num_list_per_user=num_list_per_user,
                num_examples_per_list=num_examples_per_list,
                seed=seed
            )
test = sample_listwise(
                test,
                user_features = user_features,
                item_features = item_features,
                label_name = label_name,
                num_list_per_user=num_list_per_user,
                num_examples_per_list=num_examples_per_list,
                seed=seed
            )

We can inspect an example from the training data. The example includes a user id, a list of 10 movie ids, and their ratings by the user.

In [None]:
for example in train.take(1):
    pprint.pprint(example)

## Model definition

We build a user tower and item tower and feed them to a ranking task.
We use  

We train the model to minimize the mean squared error between the actual ratings and predicted ratings. Therefore, this loss is computed individually for each movie and the training is pointwise.


In [None]:
user_tower = FeatureExtractionTower(ratings, cats_to_embedding=["user_id", "user_occupation_text"])

In [None]:
movie_tower = FeatureExtractionTower(ratings, cats_to_embedding=["movie_title"])

In [None]:
movie_tower.call({"user_id": np.array(["42"]), "movie_title": tf.constant(["Speed (1994)", "Speed (1994)"])})

## Training the models

In [None]:
epochs = 30

cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

In [None]:
mse_model = RankingModel(user_tower, movie_tower, tf.keras.losses.MeanSquaredError(), num_examples_per_list)
mse_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
mse_model.fit(cached_train, epochs=epochs, verbose=False)

In [None]:
mse_model_result = mse_model.evaluate(cached_test, return_dict=True)
print("NDCG of the MSE Model: {:.4f}".format(mse_model_result["ndcg_metric"]))