In [1]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_transformers4rec_end-to-end-session-based-02-end-to-end-session-based-with-yoochoose-pyt/nvidia_logo.png" style="width: 90px; float: right;">

# End-to-end session-based recommendations with PyTorch

In recent years, several deep learning-based algorithms have been proposed for recommendation systems while its adoption in industry deployments have been steeply growing. In particular, NLP inspired approaches have been successfully adapted for sequential and session-based recommendation problems, which are important for many domains like e-commerce, news and streaming media. Session-Based Recommender Systems (SBRS) have been proposed to model the sequence of interactions within the current user session, where a session is a short sequence of user interactions typically bounded by user inactivity. They have recently gained popularity due to their ability to capture short-term or contextual user preferences towards items. 

The field of NLP has evolved significantly within the last decade, particularly due to the increased usage of deep learning. As a result, state of the art NLP approaches have inspired RecSys practitioners and researchers to adapt those architectures, especially for sequential and session-based recommendation problems. Here, we leverage one of the state-of-the-art Transformer-based architecture, [XLNet](https://arxiv.org/abs/1906.08237) with Masked Language Modeling (MLM) training technique (see our [tutorial](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples/tutorial) for details) for training a session-based model.

In this end-to-end-session-based recommnender model example, we use `Transformers4Rec` library, which leverages the popular [HuggingFace’s Transformers](https://github.com/huggingface/transformers) NLP library and make it possible to experiment with cutting-edge implementation of such architectures for sequential and session-based recommendation problems. For detailed explanations of the building blocks of Transformers4Rec meta-architecture visit [getting-started-session-based](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples/getting-started-session-based) and [tutorial](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples/tutorial) example notebooks.

## 1. Model definition using Transformers4Rec

In the previous notebook, we have created sequential features and saved our processed data frames as parquet files. Now we use these processed parquet files to train a session-based recommendation model with the XLNet architecture.

### 1.1 Get the schema 

The library uses a schema format to configure the input features and automatically creates the necessary layers. This *protobuf* text file contains the description of each input feature by defining: the name, the type, the number of elements of a list column,  the cardinality of a categorical feature and the min and max values of each feature. In addition, the annotation field contains the tags such as specifying the `continuous` and `categorical` features, the `target` column or the `item_id` feature, among others.

In [2]:
import os
import glob
import torch 
from transformers4rec import torch as tr
from transformers4rec.torch.ranking_metric import NDCGAt, AvgPrecisionAt, RecallAt
from transformers4rec.torch.utils.examples_utils import wipe_memory


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
#if this errors manually edit and remove "extra meta data" from the schema file

from merlin_standard_lib import Schema
SCHEMA_PATH = "../data/schema_generated/schema.pbtxt"
schema = Schema().from_proto_text(SCHEMA_PATH)
!cat $SCHEMA_PATH

# We can select the subset of features we want to use for training the model by their tags or their names.

# schema = schema.select_by_name(
#    ['pc9-list_seq', 
#     'product_item_type-list_seq', 
#     'color_group-list_seq',
#     'price_log_norm-list_seq',
#     'product_recency_days_log_norm-list_seq',
#     'et_dayofweek_sin-list_seq']
# )

feature {
  name: "session_id"
  type: INT
  int_domain {
    name: "session_id"
    max: 3714319
    is_categorical: true
  }
  annotation {
    tag: "categorical"
  }
}
feature {
  name: "pc9-count"
  type: INT
  int_domain {
    name: "pc9"
    max: 14644
    is_categorical: true
  }
  annotation {
    tag: "categorical"
    tag: "list"
    tag: "item"

  }
}
feature {
  name: "et_dayofweek_sin-first"
  type: FLOAT
  annotation {
    tag: "categorical"
    tag: "list"
    tag: "item"

  }
}
feature {
  name: "pc9-list_seq"
  value_count {
  }
  type: INT
  int_domain {
    name: "pc9"
    max: 14644
    is_categorical: true
  }
  annotation {
    tag: "categorical"
    tag: "id"
    tag: "item"
    tag: "list"
    tag: "item_id"

  }
}
feature {
  name: "product_item_type-list_seq"
  value_count {
  }
  type: INT
  int_domain {
    name: "product_item_type"
    max: 50
    is_categorical: true
  }
  annotation {
    tag: "categorical"
    tag: "list"
    tag: "item"

  }
}
feature {

### 1.2 Define the end-to-end Session-based Transformer-based recommendation model

For defining a session-based recommendation model, the end-to-end model definition requires four steps:

1. Instantiate [TabularSequenceFeatures](https://nvidia-merlin.github.io/Transformers4Rec/main/api/transformers4rec.tf.features.html?highlight=tabularsequence#transformers4rec.tf.features.sequence.TabularSequenceFeatures) input-module from schema to prepare the embedding tables of categorical variables and project continuous features, if specified. In addition, the module provides different aggregation methods (e.g. 'concat', 'elementwise-sum') to merge input features and generate the sequence of interactions embeddings. The module also supports language modeling tasks to prepare masked labels for training and evaluation (e.g: 'mlm' for masked language modeling) 

2. Next, we need to define one or multiple prediction tasks. For this demo, we are going to use [NextItemPredictionTask](https://nvidia-merlin.github.io/Transformers4Rec/main/api/transformers4rec.tf.model.html?highlight=nextitem#transformers4rec.tf.model.prediction_task.NextItemPredictionTask) with `Masked Language modeling`: during training, randomly selected items are masked and predicted using the unmasked sequence items. For inference, it is meant to always predict the next item to be interacted with.

3. Then we construct a `transformer_config` based on the architectures provided by [Hugging Face Transformers](https://github.com/huggingface/transformers) framework. </a>

4. Finally we link the transformer-body to the inputs and the prediction tasks to get the final pytorch `Model` class.
    
For more details about the features supported by each sub-module, please check out the library [documentation](https://nvidia-merlin.github.io/Transformers4Rec/main/index.html) page.

In [4]:

max_sequence_length= 20 
d_model = 64
d_output=100

# Define input module to process tabular input-features and to prepare masked inputs
inputs = tr.TabularSequenceFeatures.from_schema(
    schema,
    max_sequence_length=max_sequence_length,
    continuous_projection=64,
    aggregation="concat",
    d_output=d_output,
    masking="mlm",
)


In [5]:
from transformers4rec.torch.ranking_metric import NDCGAt, AvgPrecisionAt, RecallAt

# Define XLNetConfig class and set default parameters for HF XLNet config  
transformer_config = tr.XLNetConfig.build(
    d_model=d_model, n_head=4, n_layer=2, total_seq_length=max_sequence_length
)
# Define the model block including: inputs, masking, projection and transformer block.
body = tr.SequentialBlock(
    inputs, tr.MLPBlock([64]), tr.TransformerBlock(transformer_config, masking=inputs.masking)
)

# Define the evaluation top-N metrics and the cut-offs
metrics = [NDCGAt(top_ks=[10, 20], labels_onehot=True),  
           RecallAt(top_ks=[10, 20], labels_onehot=True)]

# Define a head related to next item prediction task 
head = tr.Head(
    body,
    tr.NextItemPredictionTask(weight_tying=True, 
                              metrics=metrics),
    inputs=inputs,
)

# Get the end-to-end Model class 
model = tr.Model(head)

You can print out the model structure by uncommenting the line below.

In [6]:
model

Model(
  (heads): ModuleList(
    (0): Head(
      (body): SequentialBlock(
        (0): TabularSequenceFeatures(
          (_aggregation): ConcatFeatures()
          (to_merge): ModuleDict(
            (continuous_module): SequentialBlock(
              (0): ContinuousFeatures(
                (filter_features): FilterFeatures()
                (_aggregation): ConcatFeatures()
              )
              (1): SequentialBlock(
                (0): DenseBlock(
                  (0): Linear(in_features=2, out_features=64, bias=True)
                  (1): ReLU(inplace=True)
                )
              )
              (2): AsTabular()
            )
            (categorical_module): SequenceEmbeddingFeatures(
              (filter_features): FilterFeatures()
              (embedding_tables): ModuleDict(
                (session_id): Embedding(3714320, 64, padding_idx=0)
                (pc9-count): Embedding(14645, 64, padding_idx=0)
                (pc9-list_seq): Embedding(14645, 6

### 1.3. Daily Fine-Tuning: Training over a time window¶

Now that the model is defined, we are going to launch training. For that, Transfromers4rec extends HF Transformers Trainer class to adapt the evaluation loop for session-based recommendation task and the calculation of ranking metrics. The original `train()` method is not modified meaning that we leverage the efficient training implementation from that library, which manages, for example, half-precision (FP16) training.

#### Set the training arguments

An additional argument `data_loader_engine` is defined to automatically load the features needed for training using the schema. The default value is `nvtabular` for optimized GPU-based data-loading.  Optionally a `PyarrowDataLoader` (`pyarrow`) can also be used as a basic option, but it is slower and works only for small datasets, as the full data is loaded to CPU memory.

In [7]:

# Set hyperparameters for training
training_args = tr.trainer.T4RecTrainingArguments(
                                    data_loader_engine='nvtabular',
                                    dataloader_drop_last = True,
                                    gradient_accumulation_steps = 1,
                                    per_device_train_batch_size = 256, 
                                    per_device_eval_batch_size = 32,
                                    output_dir = "./tmp", 
                                    learning_rate=0.0001,
                                    lr_scheduler_type='cosine', 
                                    learning_rate_num_cosine_cycles_by_epoch=1.5,
                                    num_train_epochs= 9,
                                    max_sequence_length=20, 
                                    report_to = [],
                                    logging_steps=500,
                                    no_cuda=False)




#### Instantiate the trainer

In [8]:
trainer = tr.Trainer(
    model=model,
    args=training_args,
    schema=schema,
    compute_metrics=True)

#### Launch daily training and evaluation

In this demo, we will use the `fit_and_evaluate` method that allows us to conduct a time-based finetuning by iteratively training and evaluating using a sliding time window: At each iteration, we use the training data of a specific time index $t$ to train the model; then we evaluate on the validation data of the next index $t + 1$. Particularly, we set start time to 178 and end time to 180.

In [10]:
import cudf

In [11]:
cudf.io.read_parquet('../data/preproc_sessions_by_day/1/valid.parquet').head()

Unnamed: 0,session_id,pc9-count,et_dayofweek_sin-first,pc9-list_seq,product_item_type-list_seq,color_group-list_seq,action-list_seq,price_log_norm-list_seq,relative_price_to_avg_product_item_type-list_seq
7,198,299,-0.433885,"[1037, 1037, 1037, 1048, 1048, 2286, 2286, 203...","[7, 7, 7, 7, 7, 29, 29, 29, 6, 6, 6, 6, 7, 7, ...","[8, 8, 8, 8, 8, 5, 5, 5, 8, 8, 5, 6, 8, 8, 8, ...","[3, 2, 4, 3, 3, 3, 2, 3, 3, 2, 2, 2, 3, 2, 3, ...","[-1.9052631, -1.9052631, -1.9052631, -1.333002...","[-0.44284406, -0.44284406, -0.44284406, -0.263..."
35,785,187,-0.433885,"[8, 8, 7, 1114, 1114, 1114, 381, 381, 381, 788...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[4, 4, 2, 14, 14, 14, 8, 8, 8, 11, 11, 11, 9, ...","[3, 2, 2, 3, 2, 4, 3, 2, 4, 3, 2, 4, 3, 2, 3, ...","[0.2555583, 0.2555583, 0.2555583, -1.833608, -...","[-0.033521444, -0.033521444, -0.033521444, -0...."
41,847,182,-0.433885,"[21, 45, 18, 21, 13, 42, 55, 28, 10, 111, 89, ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 2, 2, ...","[4, 3, 3, 4, 2, 2, 3, 5, 2, 3, 2, 4, 2, 3, 2, ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[0.69261944, 0.69261944, 0.89606464, 0.6926194...","[0.1913089, 0.1913089, 0.3129329, 0.1913089, 0..."
48,1031,170,-0.433885,"[207, 207, 1581, 1581, 271, 1285, 18, 21, 429,...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[2, 2, 2, 2, 2, 3, 3, 4, 2, 2, 2, 2, 2, 3, 2, ...","[3, 2, 3, 2, 2, 2, 2, 2, 3, 2, 3, 2, 2, 2, 2, ...","[-0.024955014, -0.024955014, -0.4382261, -0.43...","[-0.15514545, -0.15514545, -0.30731583, -0.307..."
67,1579,146,-0.433885,"[207, 207, 409, 423, 487, 97, 207, 1297, 507, ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[2, 2, 2, 5, 3, 3, 2, 2, 3, 6, 2, 2, 4, 4, 3, ...","[3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, ...","[-0.024955014, -0.024955014, -0.024955014, -0....","[-0.15514545, -0.15514545, -0.15514545, -0.155..."


In [12]:
cudf.io.read_parquet('../data/preproc_sessions_by_day/1/valid.parquet').info()

<class 'cudf.core.dataframe.DataFrame'>
Int64Index: 9196 entries, 7 to 91786
Data columns (total 9 columns):
 #   Column                                            Non-Null Count  Dtype
---  ------                                            --------------  -----
 0   session_id                                        9196 non-null   int64
 1   pc9-count                                         9196 non-null   int32
 2   et_dayofweek_sin-first                            9196 non-null   float32
 3   pc9-list_seq                                      9196 non-null   list
 4   product_item_type-list_seq                        9196 non-null   list
 5   color_group-list_seq                              9196 non-null   list
 6   action-list_seq                                   9196 non-null   list
 7   price_log_norm-list_seq                           9196 non-null   list
 8   relative_price_to_avg_product_item_type-list_seq  9196 non-null   list
dtypes: float32(1), int32(1), int64(1), list(6)


In [13]:


INPUT_DATA_DIR = f'../data'
OUTPUT_DIR = f'{INPUT_DATA_DIR}/preproc_sessions_by_day'
start_time_window_index = 1
final_time_window_index = 2

In [14]:
%%time
#Iterating over days of one week
for time_index in range(start_time_window_index, final_time_window_index):
    # Set data 
    time_index_train = time_index
    time_index_eval = time_index + 1
    train_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_train}/train.parquet"))
    eval_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_eval}/valid.parquet"))
    print(train_paths)
    
    # Train on day related to time_index 
    print('*'*20)
    print("Launch training for day %s are:" %time_index)
    print('*'*20 + '\n')
    trainer.train_dataset_or_path = train_paths
    trainer.reset_lr_scheduler()
    trainer.train()
    trainer.state.global_step +=1
    print('finished')
    
    # Evaluate on the following day
    trainer.eval_dataset_or_path = eval_paths
    train_metrics = trainer.evaluate(metric_key_prefix='eval')
    print('*'*20)
    print("Eval results for day %s are:\t" %time_index_eval)
    print('\n' + '*'*20 + '\n')
    for key in sorted(train_metrics.keys()):
        print(" %s = %s" % (key, str(train_metrics[key]))) 
    wipe_memory()

['../data/preproc_sessions_by_day/1/train.parquet']
********************
Launch training for day 1 are:
********************



***** Running training *****
  Num examples = 73472
  Num Epochs = 9
  Instantaneous batch size per device = 256
  Total train batch size (w. parallel, distributed & accumulation) = 256
  Gradient Accumulation steps = 1
  Total optimization steps = 2583


Step,Training Loss
500,8.9136
1000,7.8472
1500,7.4616
2000,7.3423
2500,7.2696


Saving model checkpoint to ./tmp/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to ./tmp/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to ./tmp/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to ./tmp/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to ./tmp/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.


Training completed. Do not forget to share your model on huggingface.co/models =)




finished


********************
Eval results for day 2 are:	

********************

 eval_/loss = 7.305938720703125
 eval_/next-item/ndcg_at_10 = 0.035522446036338806
 eval_/next-item/ndcg_at_20 = 0.04520158842206001
 eval_/next-item/recall_at_10 = 0.06450892984867096
 eval_/next-item/recall_at_20 = 0.10234375298023224
 eval_runtime = 6.3355
 eval_samples_per_second = 1414.263
 eval_steps_per_second = 44.196
CPU times: user 2min 12s, sys: 3min 40s, total: 5min 53s
Wall time: 7min 41s


In [15]:
model.compute_metrics()

{'next-item/ndcg_at_10': tensor(0.0355, device='cuda:0'),
 'next-item/ndcg_at_20': tensor(0.0452, device='cuda:0'),
 'next-item/recall_at_10': tensor(0.0645, device='cuda:0'),
 'next-item/recall_at_20': tensor(0.1023, device='cuda:0')}

#### Save the model

In [16]:
trainer._save_model_and_checkpoint(save_model_class=True)

Saving model checkpoint to ./tmp/checkpoint-2584
Trainer.model is not a `PreTrainedModel`, only saving its state dict.


#### Export the preprocessing workflow and model in the format required by Triton server:

NVTabular’s `export_pytorch_ensemble()` function enables us to create model files and config files to be served to Triton Inference Server. 

In [17]:
!rm -rf ../models
!mkdir ../models

In [18]:
#save model metrics
with open("../models/results.txt", 'w') as f: 
    f.write('GRU accuracy results:')
    f.write('\n')
    for key, value in  model.compute_metrics().items(): 
        f.write('%s:%s\n' % (key, value.item()))

In [19]:
from nvtabular.inference.triton import export_pytorch_ensemble
from nvtabular.workflow import Workflow
workflow = Workflow.load("../data/workflow_etl")

export_pytorch_ensemble(
    model,
    workflow,
    sparse_max=trainer.get_train_dataloader().dataset.sparse_max,
    name= "t4r_pytorch",
    model_path= "../models/",
    label_columns =[],
)