<a href="https://colab.research.google.com/github/DhanashriPetkar/A-transformer-based-recommendation-system/blob/main/Final_movielens_recommendations_transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Transformer-based recommendation system


## Introduction
The Behavior Sequence Transformer (BST) model predicts movie ratings using the Movielens dataset. It uses sequences of movies a user has watched, the ratings they've given, and user and movie features.

Inputs to the model are:

1. Sequence of watched movie IDs.
2. Sequence of ratings for these movies.
3. User features: ID, sex, occupation, age group.
4. Genres for each movie and the target movie.
5. Target movie ID to predict the rating.

Changes in this example:

1. Incorporate movie genres directly into the movie embeddings.
2. Use movie ratings and their positions to update sequences before self-attention.

Requires TensorFlow 2.4+.

## The dataset

We use the Movielens 1M dataset, which has about 1 million ratings from 6000 users on 4000 movies. It includes user features, movie genres, and timestamps for each rating. The timestamps help create sequences of movie ratings for each user, needed for the BST model.

## Setup

In [31]:
# Import the os module to interact with the operating system
import os

# Set the Keras backend to TensorFlow
os.environ["KERAS_BACKEND"] = "tensorflow"

# Import the math module for mathematical functions
import math

# Import the ZipFile class from the zipfile module to handle zip files
from zipfile import ZipFile

# Import the urlretrieve function from urllib.request to download files from a URL
from urllib.request import urlretrieve

# Import the keras module for building neural networks
import keras

# Import numpy for numerical operations on arrays
import numpy as np

# Import pandas for data manipulation and analysis
import pandas as pd

# Import tensorflow for machine learning and deep learning tasks
import tensorflow as tf

# Import layers from keras for building neural network layers
from keras import layers

# Import StringLookup from keras.layers for converting strings to integer indices
from keras.layers import StringLookup

## Prepare the data

### Download and prepare the DataFrames

First, download the Movielens data.

The folder will have three files: `users.dat`,` movies.dat`, and `ratings.dat`.

In [32]:
# Download the MovieLens dataset zip file from the specified URL and save it as "movielens.zip"
urlretrieve("http://files.grouplens.org/datasets/movielens/ml-1m.zip", "movielens.zip")

# Extract all the contents of the "movielens.zip" file into the current directory
ZipFile("movielens.zip", "r").extractall()


Then, we load the data into pandas DataFrames with their proper column names.

In [33]:
# Read the users data from the "users.dat" file into a pandas DataFrame
users = pd.read_csv(
    "ml-1m/users.dat",
    sep="::",  # Specify the separator used in the file
    names=["user_id", "sex", "age_group", "occupation", "zip_code"],  # Define column names
    encoding="ISO-8859-1",  # Specify the encoding of the file
    engine="python",  # Specify the parsing engine
)

# Read the ratings data from the "ratings.dat" file into a pandas DataFrame
ratings = pd.read_csv(
    "ml-1m/ratings.dat",
    sep="::",  # Specify the separator used in the file
    names=["user_id", "movie_id", "rating", "unix_timestamp"],  # Define column names
    encoding="ISO-8859-1",  # Specify the encoding of the file
    engine="python",  # Specify the parsing engine
)

# Read the movies data from the "movies.dat" file into a pandas DataFrame
movies = pd.read_csv(
    "ml-1m/movies.dat",
    sep="::",  # Specify the separator used in the file
    names=["movie_id", "title", "genres"],  # Define column names
    encoding="ISO-8859-1",  # Specify the encoding of the file
    engine="python",  # Specify the parsing engine
)

Here, we do some simple data processing to fix the data types of the columns.

In [34]:
# Update the user_id column in the users DataFrame to prepend "user_" to each user_id
users["user_id"] = users["user_id"].apply(lambda x: f"user_{x}")

# Update the age_group column in the users DataFrame to prepend "group_" to each age_group
users["age_group"] = users["age_group"].apply(lambda x: f"group_{x}")

# Update the occupation column in the users DataFrame to prepend "occupation_" to each occupation
users["occupation"] = users["occupation"].apply(lambda x: f"occupation_{x}")

# Update the movie_id column in the movies DataFrame to prepend "movie_" to each movie_id
movies["movie_id"] = movies["movie_id"].apply(lambda x: f"movie_{x}")

# Update the movie_id column in the ratings DataFrame to prepend "movie_" to each movie_id
ratings["movie_id"] = ratings["movie_id"].apply(lambda x: f"movie_{x}")

# Update the user_id column in the ratings DataFrame to prepend "user_" to each user_id
ratings["user_id"] = ratings["user_id"].apply(lambda x: f"user_{x}")

# Convert the rating column in the ratings DataFrame to float
ratings["rating"] = ratings["rating"].apply(lambda x: float(x))

Each movie has multiple genres. We split them into separate columns in the `movies`
DataFrame.

In [35]:
# List of genres
genres = ["Action", "Adventure", "Animation", "Children's", "Comedy", "Crime",
          "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical",
          "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]

# Extend the genres list with genres from the "genres" column in the movies DataFrame
for genre in genres:
    # Apply a lambda function to create binary indicators for each genre
    movies[genre] = movies["genres"].apply(
        lambda values: int(genre in values.split("|"))  # Convert genre presence into 0 or 1
    )

### Transform the movie ratings data into sequences

First, sort the ratings data by `unix_timestamp`. Then, group `movie_id` and rating by `user_id`.

The output DataFrame will have each `user_id` with two ordered lists: the movies they rated and their `ratings`.

In [36]:
# Sort the ratings DataFrame by "unix_timestamp" and then group by "user_id"
ratings_group = ratings.sort_values(by=["unix_timestamp"]).groupby("user_id")

# Create a new DataFrame ratings_data from the grouped data
ratings_data = pd.DataFrame(
    data={
        "user_id": list(ratings_group.groups.keys()),  # List of user_ids
        "movie_ids": list(ratings_group.movie_id.apply(list)),  # List of lists of movie_ids per user
        "ratings": list(ratings_group.rating.apply(list)),  # List of lists of ratings per user
        "timestamps": list(ratings_group.unix_timestamp.apply(list)),  # List of lists of timestamps per user
    }
)

Split the `movie_ids` and `ratings` lists into sequences of a fixed length. Adjust the `sequence_length` variable to change the input length, and `step_size` to control the number of sequences per user.

In [37]:
# Define the sequence length and step size
sequence_length = 4
step_size = 2

# Function to create sequences of a specified window size and step size
def create_sequences(values, window_size, step_size):
    sequences = []
    start_index = 0
    while True:
        end_index = start_index + window_size
        seq = values[start_index:end_index]

        # If the sequence is shorter than the window size, pad with the last window
        if len(seq) < window_size:
            seq = values[-window_size:]
            if len(seq) == window_size:
                sequences.append(seq)
            break

        # Append the sequence to the list of sequences
        sequences.append(seq)

        # Move the start index forward by the step size
        start_index += step_size

    return sequences

# Apply create_sequences to the movie_ids column in ratings_data
ratings_data.movie_ids = ratings_data.movie_ids.apply(
    lambda ids: create_sequences(ids, sequence_length, step_size)
)

# Apply create_sequences to the ratings column in ratings_data
ratings_data.ratings = ratings_data.ratings.apply(
    lambda ids: create_sequences(ids, sequence_length, step_size)
)

# Remove the "timestamps" column from ratings_data
del ratings_data["timestamps"]

After that, we process the output to have each sequence in a separate records in
the DataFrame. In addition, we join the user features with the ratings data.

In [38]:
# Explode the "movie_ids" column in ratings_data and create a new DataFrame ratings_data_movies
ratings_data_movies = ratings_data[["user_id", "movie_ids"]].explode(
    "movie_ids", ignore_index=True
)

# Explode the "ratings" column in ratings_data and create a new DataFrame ratings_data_rating
ratings_data_rating = ratings_data[["ratings"]].explode("ratings", ignore_index=True)

# Concatenate ratings_data_movies and ratings_data_rating along axis 1 to combine them
ratings_data_transformed = pd.concat([ratings_data_movies, ratings_data_rating], axis=1)

# Join the users DataFrame with ratings_data_transformed on "user_id"
ratings_data_transformed = ratings_data_transformed.join(
    users.set_index("user_id"), on="user_id"
)

# Convert the "movie_ids" column to a comma-separated string of movie IDs
ratings_data_transformed.movie_ids = ratings_data_transformed.movie_ids.apply(
    lambda x: ",".join(x)
)

# Convert the "ratings" column to a comma-separated string of ratings
ratings_data_transformed.ratings = ratings_data_transformed.ratings.apply(
    lambda x: ",".join([str(v) for v in x])
)

# Remove the "zip_code" column from ratings_data_transformed
del ratings_data_transformed["zip_code"]

# Rename the columns "movie_ids" to "sequence_movie_ids" and "ratings" to "sequence_ratings"
ratings_data_transformed.rename(
    columns={"movie_ids": "sequence_movie_ids", "ratings": "sequence_ratings"},
    inplace=True,
)

With `sequence_length` of 4 and `step_size` of 2, we end up with 498,623 sequences.

Finally, we split the data into training and testing splits, with 85% and 15% of
the instances, respectively, and store them to CSV files.

In [39]:
import numpy as np

# Create a boolean mask for random selection of rows
random_selection = np.random.rand(len(ratings_data_transformed.index)) <= 0.85

# Select rows for the train_data DataFrame based on the random_selection mask
train_data = ratings_data_transformed[random_selection]

# Select rows for the test_data DataFrame based on the inverse of random_selection mask
test_data = ratings_data_transformed[~random_selection]

# Save train_data to a CSV file without index, using "|" as separator and without header
train_data.to_csv("train_data.csv", index=False, sep="|", header=False)

# Save test_data to a CSV file without index, using "|" as separator and without header
test_data.to_csv("test_data.csv", index=False, sep="|", header=False)

## Define metadata

In [40]:
# Define CSV_HEADER as a list of column names from ratings_data_transformed
CSV_HEADER = list(ratings_data_transformed.columns)

# Define CATEGORICAL_FEATURES_WITH_VOCABULARY as a dictionary of categorical features and their unique values
CATEGORICAL_FEATURES_WITH_VOCABULARY = {
    "user_id": list(users.user_id.unique()),        # List of unique user IDs
    "movie_id": list(movies.movie_id.unique()),     # List of unique movie IDs
    "sex": list(users.sex.unique()),                # List of unique values from the 'sex' column in users
    "age_group": list(users.age_group.unique()),    # List of unique age groups from the 'age_group' column in users
    "occupation": list(users.occupation.unique()),  # List of unique occupations from the 'occupation' column in users
}

# Define USER_FEATURES as a list of categorical features related to users
USER_FEATURES = ["sex", "age_group", "occupation"]

# Define MOVIE_FEATURES as a list of categorical features related to movies
MOVIE_FEATURES = ["genres"]

## Create `tf.data.Dataset` for training and evaluation

In [41]:
import tensorflow as tf

# Define the function get_dataset_from_csv to create a TensorFlow dataset from a CSV file
def get_dataset_from_csv(csv_file_path, shuffle=False, batch_size=128):
    # Define the process function to process features and target from CSV data
    def process(features):
        # Split sequence_movie_ids string into a tensor
        movie_ids_string = features["sequence_movie_ids"]
        sequence_movie_ids = tf.strings.split(movie_ids_string, ",").to_tensor()

        # The last movie id in the sequence is the target movie.
        features["target_movie_id"] = sequence_movie_ids[:, -1]
        features["sequence_movie_ids"] = sequence_movie_ids[:, :-1]

        # Split sequence_ratings string into a tensor and convert to float32
        ratings_string = features["sequence_ratings"]
        sequence_ratings = tf.strings.to_number(
            tf.strings.split(ratings_string, ","), tf.dtypes.float32
        ).to_tensor()

        # The last rating in the sequence is the target for the model to predict.
        target = sequence_ratings[:, -1]
        features["sequence_ratings"] = sequence_ratings[:, :-1]

        return features, target

    # Create a TensorFlow dataset from the CSV file using make_csv_dataset
    dataset = tf.data.experimental.make_csv_dataset(
        csv_file_path,
        batch_size=batch_size,
        column_names=CSV_HEADER,
        num_epochs=1,
        header=False,
        field_delim="|",
        shuffle=shuffle,
    ).map(process)  # Apply the process function to each batch in the dataset

    return dataset

## Create model inputs

In [42]:
import tensorflow as tf
from tensorflow import keras

# Define a function create_model_inputs to create input layers for the model
def create_model_inputs():
    return {
        # User-related inputs
        "user_id": keras.Input(name="user_id", shape=(1,), dtype="string"),
        "sex": keras.Input(name="sex", shape=(1,), dtype="string"),
        "age_group": keras.Input(name="age_group", shape=(1,), dtype="string"),
        "occupation": keras.Input(name="occupation", shape=(1,), dtype="string"),

        # Sequence inputs
        "sequence_movie_ids": keras.Input(
            name="sequence_movie_ids", shape=(sequence_length - 1,), dtype="string"
        ),
        "target_movie_id": keras.Input(
            name="target_movie_id", shape=(1,), dtype="string"
        ),
        "sequence_ratings": keras.Input(
            name="sequence_ratings", shape=(sequence_length - 1,), dtype=tf.float32
        ),
    }

## Encode input features

The `encode_input_features` method works as follows:

1. Each categorical user feature is encoded using `layers.Embedding`, with embedding
dimension equals to the square root of the vocabulary size of the feature.
The embeddings of these features are concatenated to form a single input tensor.

2. Each movie in the movie sequence and the target movie is encoded `layers.Embedding`,
where the dimension size is the square root of the number of movies.

3. A multi-hot genres vector for each movie is concatenated with its embedding vector,
and processed using a non-linear `layers.Dense` to output a vector of the same movie
embedding dimensions.

4. A positional embedding is added to each movie embedding in the sequence, and then
multiplied by its rating from the ratings sequence.

5. The target movie embedding is concatenated to the sequence movie embeddings, producing
a tensor with the shape of `[batch size, sequence length, embedding size]`, as expected
by the attention layer for the transformer architecture.

6. The method returns a tuple of two elements:  `encoded_transformer_features` and
`encoded_other_features`.

In [43]:
import tensorflow as tf
from tensorflow import keras
import math

# Define a function encode_input_features to encode input features for the model
def encode_input_features(
    inputs,
    include_user_id=True,
    include_user_features=True,
    include_movie_features=True,
):
    encoded_transformer_features = []
    encoded_other_features = []

    other_feature_names = []
    if include_user_id:
        other_feature_names.append("user_id")
    if include_user_features:
        other_feature_names.extend(USER_FEATURES)

    ## Encode user features
    for feature_name in other_feature_names:
        # Convert the string input values into integer indices.
        vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY[feature_name]
        idx = StringLookup(vocabulary=vocabulary, mask_token=None, num_oov_indices=0)(
            inputs[feature_name]
        )
        # Compute embedding dimensions
        embedding_dims = int(math.sqrt(len(vocabulary)))
        # Create an embedding layer with the specified dimensions.
        embedding_encoder = layers.Embedding(
            input_dim=len(vocabulary),
            output_dim=embedding_dims,
            name=f"{feature_name}_embedding",
        )
        # Convert the index values to embedding representations.
        encoded_other_features.append(embedding_encoder(idx))

    ## Create a single embedding vector for the user features
    if len(encoded_other_features) > 1:
        encoded_other_features = layers.concatenate(encoded_other_features)
    elif len(encoded_other_features) == 1:
        encoded_other_features = encoded_other_features[0]
    else:
        encoded_other_features = None

    ## Create a movie embedding encoder
    movie_vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY["movie_id"]
    movie_embedding_dims = int(math.sqrt(len(movie_vocabulary)))
    # Create a lookup to convert string values to integer indices.
    movie_index_lookup = StringLookup(
        vocabulary=movie_vocabulary,
        mask_token=None,
        num_oov_indices=0,
        name="movie_index_lookup",
    )
    # Create an embedding layer with the specified dimensions.
    movie_embedding_encoder = layers.Embedding(
        input_dim=len(movie_vocabulary),
        output_dim=movie_embedding_dims,
        name=f"movie_embedding",
    )
    # Create a vector lookup for movie genres.
    genre_vectors = movies[genres].to_numpy()
    movie_genres_lookup = layers.Embedding(
        input_dim=genre_vectors.shape[0],
        output_dim=genre_vectors.shape[1],
        embeddings_initializer=keras.initializers.Constant(genre_vectors),
        trainable=False,
        name="genres_vector",
    )
    # Create a processing layer for genres.
    movie_embedding_processor = layers.Dense(
        units=movie_embedding_dims,
        activation="relu",
        name="process_movie_embedding_with_genres",
    )

    ## Define a function to encode a given movie id.
    def encode_movie(movie_id):
        # Convert the string input values into integer indices.
        movie_idx = movie_index_lookup(movie_id)
        movie_embedding = movie_embedding_encoder(movie_idx)
        encoded_movie = movie_embedding
        if include_movie_features:
            movie_genres_vector = movie_genres_lookup(movie_idx)
            encoded_movie = movie_embedding_processor(
                layers.concatenate([movie_embedding, movie_genres_vector])
            )
        return encoded_movie

    ## Encoding target_movie_id
    target_movie_id = inputs["target_movie_id"]
    encoded_target_movie = encode_movie(target_movie_id)

    ## Encoding sequence_movie_ids
    sequence_movie_ids = inputs["sequence_movie_ids"]
    encoded_sequence_movies = encode_movie(sequence_movie_ids)

    ## Create positional embedding
    position_embedding_encoder = layers.Embedding(
        input_dim=sequence_length,
        output_dim=movie_embedding_dims,
        name="position_embedding",
    )
    positions = tf.range(start=0, limit=sequence_length - 1, delta=1)
    encoded_positions = position_embedding_encoder(positions)

    ## Retrieve sequence ratings to incorporate them into the movie encoding
    sequence_ratings = inputs["sequence_ratings"]
    sequence_ratings = tf.expand_dims(sequence_ratings, -1)

    ## Add positional encoding to movie encodings and multiply them by rating
    encoded_sequence_movies_with_position_and_rating = layers.Multiply()(
        [(encoded_sequence_movies + encoded_positions), sequence_ratings]
    )

    # Construct the transformer inputs
    for i in range(sequence_length - 1):
        feature = encoded_sequence_movies_with_position_and_rating[:, i, ...]
        feature = tf.expand_dims(feature, 1)
        encoded_transformer_features.append(feature)
    encoded_transformer_features.append(encoded_target_movie)

    encoded_transformer_features = layers.concatenate(
        encoded_transformer_features, axis=1
    )

    return encoded_transformer_features, encoded_other_features

## Create a BST model

In [44]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import math

# Define global variables
include_user_id = False
include_user_features = False
include_movie_features = False

hidden_units = [256, 128]
dropout_rate = 0.1
num_heads = 3

# Function to create model inputs
def create_model_inputs():
    return {
        "user_id": keras.Input(name="user_id", shape=(1,), dtype="string"),
        "sequence_movie_ids": keras.Input(
            name="sequence_movie_ids", shape=(sequence_length - 1,), dtype="string"
        ),
        "target_movie_id": keras.Input(
            name="target_movie_id", shape=(1,), dtype="string"
        ),
        "sequence_ratings": keras.Input(
            name="sequence_ratings", shape=(sequence_length - 1,), dtype=tf.float32
        ),
        "sex": keras.Input(name="sex", shape=(1,), dtype="string"),
        "age_group": keras.Input(name="age_group", shape=(1,), dtype="string"),
        "occupation": keras.Input(name="occupation", shape=(1,), dtype="string"),
    }

# Function to encode input features
def encode_input_features(
    inputs,
    include_user_id=True,
    include_user_features=True,
    include_movie_features=True,
):
    encoded_transformer_features = []
    encoded_other_features = []

    other_feature_names = []
    if include_user_id:
        other_feature_names.append("user_id")
    if include_user_features:
        other_feature_names.extend(USER_FEATURES)

    ## Encode user features
    for feature_name in other_feature_names:
        # Convert the string input values into integer indices.
        vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY[feature_name]
        idx = StringLookup(vocabulary=vocabulary, mask_token=None, num_oov_indices=0)(
            inputs[feature_name]
        )
        # Compute embedding dimensions
        embedding_dims = int(math.sqrt(len(vocabulary)))
        # Create an embedding layer with the specified dimensions.
        embedding_encoder = layers.Embedding(
            input_dim=len(vocabulary),
            output_dim=embedding_dims,
            name=f"{feature_name}_embedding",
        )
        # Convert the index values to embedding representations.
        encoded_other_features.append(embedding_encoder(idx))

    ## Create a single embedding vector for the user features
    if len(encoded_other_features) > 1:
        encoded_other_features = layers.concatenate(encoded_other_features)
    elif len(encoded_other_features) == 1:
        encoded_other_features = encoded_other_features[0]
    else:
        encoded_other_features = None

    ## Create a movie embedding encoder
    movie_vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY["movie_id"]
    movie_embedding_dims = int(math.sqrt(len(movie_vocabulary)))
    # Create a lookup to convert string values to integer indices.
    movie_index_lookup = StringLookup(
        vocabulary=movie_vocabulary,
        mask_token=None,
        num_oov_indices=0,
        name="movie_index_lookup",
    )
    # Create an embedding layer with the specified dimensions.
    movie_embedding_encoder = layers.Embedding(
        input_dim=len(movie_vocabulary),
        output_dim=movie_embedding_dims,
        name=f"movie_embedding",
    )
    # Create a vector lookup for movie genres.
    genre_vectors = movies[genres].to_numpy()
    movie_genres_lookup = layers.Embedding(
        input_dim=genre_vectors.shape[0],
        output_dim=genre_vectors.shape[1],
        embeddings_initializer=keras.initializers.Constant(genre_vectors),
        trainable=False,
        name="genres_vector",
    )
    # Create a processing layer for genres.
    movie_embedding_processor = layers.Dense(
        units=movie_embedding_dims,
        activation="relu",
        name="process_movie_embedding_with_genres",
    )

    ## Define a function to encode a given movie id.
    def encode_movie(movie_id):
        # Convert the string input values into integer indices.
        movie_idx = movie_index_lookup(movie_id)
        movie_embedding = movie_embedding_encoder(movie_idx)
        encoded_movie = movie_embedding
        if include_movie_features:
            movie_genres_vector = movie_genres_lookup(movie_idx)
            encoded_movie = movie_embedding_processor(
                layers.concatenate([movie_embedding, movie_genres_vector])
            )
        return encoded_movie

    ## Encoding target_movie_id
    target_movie_id = inputs["target_movie_id"]
    encoded_target_movie = encode_movie(target_movie_id)

    ## Encoding sequence_movie_ids
    sequence_movie_ids = inputs["sequence_movie_ids"]
    encoded_sequence_movies = encode_movie(sequence_movie_ids)

    ## Create positional embedding
    position_embedding_encoder = layers.Embedding(
        input_dim=sequence_length,
        output_dim=movie_embedding_dims,
        name="position_embedding",
    )
    positions = tf.range(start=0, limit=sequence_length - 1, delta=1)
    encoded_positions = position_embedding_encoder(positions)

    ## Retrieve sequence ratings to incorporate them into the movie encoding
    sequence_ratings = inputs["sequence_ratings"]
    sequence_ratings = tf.expand_dims(sequence_ratings, -1)

    ## Add positional encoding to movie encodings and multiply them by rating
    encoded_sequence_movies_with_position_and_rating = layers.Multiply()(
        [(encoded_sequence_movies + encoded_positions), sequence_ratings]
    )

    # Construct the transformer inputs
    for i in range(sequence_length - 1):
        feature = encoded_sequence_movies_with_position_and_rating[:, i, ...]
        feature = tf.expand_dims(feature, 1)
        encoded_transformer_features.append(feature)
    encoded_transformer_features.append(encoded_target_movie)

    encoded_transformer_features = layers.concatenate(
        encoded_transformer_features, axis=1
    )

    return encoded_transformer_features, encoded_other_features

# Function to create the model
def create_model():
    inputs = create_model_inputs()
    transformer_features, other_features = encode_input_features(
        inputs, include_user_id, include_user_features, include_movie_features
    )

    # Create a multi-headed attention layer
    attention_output = layers.MultiHeadAttention(
        num_heads=num_heads, key_dim=transformer_features.shape[2], dropout=dropout_rate
    )(transformer_features, transformer_features)

    # Transformer block
    attention_output = layers.Dropout(dropout_rate)(attention_output)
    x1 = layers.Add()([transformer_features, attention_output])
    x1 = layers.LayerNormalization()(x1)
    x2 = layers.LeakyReLU()(x1)
    x2 = layers.Dense(units=x2.shape[-1])(x2)
    x2 = layers.Dropout(dropout_rate)(x2)
    transformer_features = layers.Add()([x1, x2])
    transformer_features = layers.LayerNormalization()(transformer_features)
    features = layers.Flatten()(transformer_features)

    # Include the other features
    if other_features is not None:
        features = layers.concatenate(
            [features, layers.Reshape([other_features.shape[-1]])(other_features)]
        )

    # Fully-connected layers
    for num_units in hidden_units:
        features = layers.Dense(num_units)(features)
        features = layers.BatchNormalization()(features)
        features = layers.LeakyReLU()(features)
        features = layers.Dropout(dropout_rate)(features)

    outputs = layers.Dense(units=1)(features)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

# Create the model
model = create_model()

## Run training and evaluation experiment

In [45]:
# Compile the model.
model.compile(
    optimizer=keras.optimizers.Adagrad(learning_rate=0.01),  # Use Adagrad optimizer with learning rate 0.01
    loss=keras.losses.MeanSquaredError(),  # Mean Squared Error loss function
    metrics=[keras.metrics.MeanAbsoluteError()],  # Mean Absolute Error as the evaluation metric
)

# Read the training data.
train_dataset = get_dataset_from_csv("train_data.csv", shuffle=True, batch_size=265)

# Fit the model with the training data.
model.fit(train_dataset, epochs=5)  # Train the model for 5 epochs

# Read the test data.
test_dataset = get_dataset_from_csv("test_data.csv", batch_size=265)

# Evaluate the model on the test data.
_, mae = model.evaluate(test_dataset, verbose=0)  # Evaluate and get Mean Absolute Error
print(f"Test MAE: {round(mae, 3)}")  # Print the Mean Absolute Error rounded to 3 decimal places

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test MAE: 0.78


You should achieve a Mean Absolute Error (MAE) at or around 0.7 on the test data.

## Conclusion

The BST model uses the Transformer layer in its architecture to capture the sequential signals underlying
users’ behavior sequences for recommendation.

You can try training this model with different configurations, for example, by increasing
the input sequence length and training the model for a larger number of epochs. In addition,
you can try including other features like movie release year and customer
zipcode, and including cross features like sex X genre.