In [1]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions anda
# limitations under the License.
# ==============================================================================

# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.

<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_models-transformers-net-item-prediction/nvidia_logo.png" style="width: 90px; float: right;">

# Transformer-based architecture for next-item prediction task with pretrained embeddings

This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container.

## Overview

In this use case we will train a Transformer-based architecture for next-item prediction task with pretrained embeddings.

**You can chose to download the full dataset manually or use synthetic data.**

We will use the [SIGIR eCOM 2021 Data Challenge Dataset](https://github.com/coveooss/SIGIR-ecom-data-challenge) to train a session-based model. The dataset contains 36M events of users browsing an online store.

We will reshape the data to organize it into 'sessions'. Each session will be a full customer online journey in chronological order. The goal will be to predict the `url` of the next action taken.


### Learning objectives

- Training a Transformer-based architecture for next-item prediction task

## Downloading and preparing the dataset

In [2]:
import os
import cudf
import numpy as np
import pandas as pd
import nvtabular as nvt
from merlin.schema import ColumnSchema, Schema, Tags

OUTPUT_DATA_DIR = os.environ.get('OUTPUT_DATA_DIR', '/workspace/data')
NUM_EPOCHS = int(os.environ.get('NUM_EPOCHS', 5))
NUM_EXAMPLES = int(os.environ.get('NUM_EXAMPLES', 100_000))
MINIMUM_SESSION_LENGTH = int(os.environ.get('MINIMUM_SESSION_LENGTH', 5))

2023-06-19 06:29:09.356331: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
2023-06-19 06:29:10.731838: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 06:29:10.732246: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 06:29:10.732418: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executo

You can download the full dataset by registering [here](https://www.coveo.com/en/ailabs/sigir-ecom-data-challenge). If you chose to download the data, please place it alongside this notebook in the `sigir_dataset` directory and extract it.

To process the downloaded data uncomment the cell below.

By default, in this notebook, we will be using synthetically generated data based on the SIGIR dataset.

In [3]:
# # Unocomment this cell to use the original SIGIR dataset.

# train = nvt.Dataset('/workspace/sigir_dataset/train/browsing_train.csv', part_size='500MB')
# skus = nvt.Dataset('/workspace/sigir_dataset/train/sku_to_content.csv')

# skus = pd.read_csv('/workspace/sigir_dataset/train/sku_to_content.csv')

# skus['description_vector'] = skus['description_vector'].replace(np.nan, '')
# skus['image_vector'] = skus['image_vector'].replace(np.nan, '')

# skus['description_vector'] = skus['description_vector'].apply(lambda x: [] if len(x) == 0 else eval(x))
# skus['image_vector'] = skus['image_vector'].apply(lambda x: [] if len(x) == 0 else eval(x))
# skus = skus[skus.description_vector.apply(len) > 0]
# skus = nvt.Dataset(skus)

In [4]:
# Comment out this cell to use the original SIGIR dataset.

from merlin.datasets.synthetic import generate_data

train = generate_data('sigir-browsing', NUM_EXAMPLES)
skus = generate_data('sigir-sku', NUM_EXAMPLES)

The `skus` dataset contains the mapping between the `product_sku_hash` (essentially an item id) to the `description_vector` -- an embedding obtained from the description.

To use this information in our model, we need to map the `product_sku_hash` information to an id.

But we need to make sure that the way we process `skus` and the `train` dataset (event information) is consistent. That the same `product_sku_hash` is mapped to the same id both when processing `skus` and `train`.

We do so by defining and fitting a `Categorify` op and using it to process both datasets.

In [5]:
cat_op = nvt.ops.Categorify()
out = ['product_sku_hash'] >> cat_op >> nvt.ops.TagAsItemID()
out += ['event_type', 'product_action', 'session_id_hash', 'hashed_url'] >> nvt.ops.Categorify()
out += ['server_timestamp_epoch_ms'] >> nvt.ops.NormalizeMinMax()

wf = nvt.Workflow(out)

train = wf.fit_transform(train)

train.head()

Unnamed: 0,product_sku_hash,event_type,product_action,session_id_hash,hashed_url,server_timestamp_epoch_ms
0,414,3,4,18,319,0.451051
1,367,4,3,3,281,0.201051
2,770,4,5,39,22,0.828151
3,253,4,4,10,430,0.574333
4,233,3,5,67,802,0.990511


Now that we have processed the train set, we can use the mapping preserved in the `cat_op` to process the `skus` dataset containing the embeddings we are after.

Let's now `Categorify` the `product_sku_hash` in `skus` and grab just the description embedding information.

In [6]:
skus.head()

Unnamed: 0,product_sku_hash,description_vector,category_hash,price_bucket
0,19,"[0.3705129044640459, 0.47817123716100624, -0.2...",103,0.824568
1,14,"[-0.12168216732028825, -0.36741614058090766, 0...",126,0.898105
2,7,"[-0.15603990752781077, -0.05013981585010846, 0...",8,0.042153
3,41,"[0.5273695884672907, -0.20898938492964325, 0.3...",97,0.206373
4,9,"[0.5498125175648669, 0.40719267183015934, -0.4...",93,0.809931


In [7]:
out = ['product_sku_hash'] >> cat_op
wf = nvt.Workflow(out + 'description_vector')
skus_ds = wf.transform(skus)

skus_ds.head()

Unnamed: 0,product_sku_hash,description_vector
0,898,"[0.3705129044640459, 0.47817123716100624, -0.2..."
1,575,"[-0.12168216732028825, -0.36741614058090766, 0..."
2,650,"[-0.15603990752781077, -0.05013981585010846, 0..."
3,206,"[0.5273695884672907, -0.20898938492964325, 0.3..."
4,734,"[0.5498125175648669, 0.40719267183015934, -0.4..."


Let us now export the embedding information to a `numpy` array and write it to disk.

We will later pass this information so that the `Loader` will load the correct emebedding for the products corresponding to the given step of a customer journey.

The embeddings are linked to the train set using the `product_sku_hash` information.

In [8]:
skus_ds.to_npy('skus.npy')

How will the `Loader` know which embedding to associated with a given row of the train set?

The `product_sku_hash` ids have been exported along with the embeddings and are contained in the first column of the output `numpy` array.

Here is the id of the first embedding stored in `skus.npy`.

In [9]:
np.load('skus.npy')[0, 0]

898.0

and here is the embedding vector corresponding to `product_sku_hash` of id referenced above:

In [10]:
np.load('skus.npy')[0, 1:]

array([ 0.3705129 ,  0.47817124, -0.24102604, -0.37363357, -0.2522079 ,
       -0.33410996,  0.09743571, -0.36023316, -0.20305507,  0.06740313,
       -0.40918888, -0.28355211, -0.25972716,  0.57435913,  0.3128781 ,
        0.10486589,  0.38189105,  0.31816563,  0.51948144,  0.36713079,
       -0.28067433,  0.17459828,  0.44254675,  0.05245209,  0.57712163,
       -0.32762393, -0.08714026,  0.30571312,  0.5466538 ,  0.35925525,
        0.26257309, -0.29264912,  0.28919014, -0.01429584,  0.30158994,
        0.24051505,  0.14223966, -0.22407006, -0.19739325,  0.12602873,
        0.29371442, -0.07461826, -0.39044766,  0.25260037, -0.20516685,
        0.55645921,  0.45233973, -0.40245309,  0.2398928 , -0.43392385])

Let us now construct the `Loader` that will provide the data to our model.

Let us first rearrange the `train` dataset to group the actions by `session_id_hash`. Actions within a session will be contained in a single row.

In [11]:
groupby_features = train.head().columns.tolist() >> nvt.ops.Groupby(
    groupby_cols=['session_id_hash'],
    aggs={
        'product_sku_hash': ['list'],
        'event_type': ['list'],
        'product_action': ['list'],
        'hashed_url': ['list', 'count'],
        'server_timestamp_epoch_ms': ['list']
    },
    sort_cols="server_timestamp_epoch_ms"
)

filtered_sessions = groupby_features >> nvt.ops.Filter(f=lambda df: df["hashed_url_count"] >= MINIMUM_SESSION_LENGTH)

# We won't be needing the `session_id_hash` nor the `hashed_url_count` any longer
wf = nvt.Workflow(
    filtered_sessions[
        'product_sku_hash_list',
        'event_type_list',
        'product_action_list',
        'hashed_url_list',
    ]
)
train_processed = wf.fit_transform(train)

train_processed.head()

Unnamed: 0,product_sku_hash_list,event_type_list,product_action_list,hashed_url_list
0,"[348, 167, 67, 874, 390, 660, 340, 338, 687, 5...","[3, 3, 3, 3, 4, 4, 3, 3, 4, 3, 3, 3, 3, 4, 4, ...","[3, 5, 6, 4, 4, 3, 6, 6, 3, 5, 5, 6, 6, 4, 3, ...","[749, 106, 380, 100, 576, 135, 14, 640, 41, 13..."
1,"[952, 387, 534, 440, 974, 190, 598, 268, 335, ...","[4, 3, 3, 4, 3, 3, 4, 3, 4, 3, 4, 3, 4, 4, 4, ...","[4, 6, 3, 4, 6, 6, 4, 4, 4, 3, 4, 3, 3, 3, 6, ...","[356, 148, 531, 947, 8, 23, 780, 104, 217, 383..."
2,"[17, 733, 668, 734, 150, 209, 428, 314, 510, 8...","[4, 4, 3, 4, 3, 4, 4, 4, 3, 4, 4, 3, 3, 3, 4, ...","[4, 6, 3, 4, 3, 3, 5, 5, 6, 4, 6, 6, 3, 5, 6, ...","[163, 464, 752, 679, 884, 41, 408, 713, 274, 9..."
3,"[610, 656, 267, 33, 900, 636, 95, 429, 736, 50...","[3, 4, 4, 3, 4, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, ...","[5, 4, 5, 6, 6, 4, 3, 6, 6, 5, 4, 3, 5, 3, 5, ...","[616, 544, 776, 913, 827, 651, 943, 788, 197, ..."
4,"[449, 658, 731, 515, 959, 814, 9, 330, 175, 36...","[3, 4, 4, 3, 4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 4, ...","[5, 5, 6, 6, 3, 5, 4, 4, 3, 3, 3, 6, 3, 5, 5, ...","[669, 123, 911, 431, 46, 229, 794, 657, 939, 7..."


We are now ready to construct the `Loader` that will feed the data to our model.

We begin by reading in the embeddings information.

In [12]:
embeddings = np.load('skus.npy')

We are now ready to define the `Loader`.

In [13]:
from merlin.dataloader.tensorflow import Loader
from merlin.dataloader.ops.embeddings import EmbeddingOperator
import merlin.models.tf as mm

embedding_operator = EmbeddingOperator(
    embeddings[:, 1:].astype(np.float32),
    id_lookup_table=embeddings[:, 0].astype(int),
    lookup_key="product_sku_hash_list",
    embedding_name='product_embeddings'
)

loader = Loader(
    train_processed,
    batch_size=10,
    transforms=[
        embedding_operator
    ],
    shuffle=True
)

[INFO]: sparse_operation_kit is imported
[SOK INFO] Import /usr/local/lib/python3.8/dist-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libsok_experiment.so
[SOK INFO] Import /usr/local/lib/python3.8/dist-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libsok_experiment.so
[SOK INFO] Initialize finished, communication tool: horovod


2023-06-19 06:29:23.394752: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-19 06:29:23.395735: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 06:29:23.395942: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 06:29:23.396098: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must 

Using the `EmbeddingOperator` object we referenced our `embeddings` and advised the model what to use as a key to look up the information.

Below is an example batch of data that our model will consume.

In [14]:
batch = mm.sample_batch(loader, batch_size=10, include_targets=False, prepare_features=True)
batch

{'product_sku_hash_list': <tf.RaggedTensor [[[737],
   [617],
   [565],
   [345],
   [420],
   [19],
   [719],
   [280],
   [971],
   [264],
   [128],
   [468],
   [557],
   [152],
   [754],
   [238],
   [311],
   [53],
   [647],
   [511],
   [513],
   [887],
   [347],
   [264],
   [884],
   [596],
   [649],
   [261],
   [367],
   [617],
   [742],
   [45],
   [786],
   [509],
   [497],
   [480],
   [505],
   [313],
   [75],
   [175],
   [37],
   [427],
   [826],
   [519],
   [208],
   [956],
   [154],
   [472],
   [709],
   [501],
   [555],
   [775],
   [372],
   [249],
   [426],
   [191],
   [425],
   [4],
   [463],
   [760],
   [275],
   [350],
   [468],
   [683],
   [448],
   [23],
   [369],
   [99],
   [29],
   [868],
   [825],
   [340],
   [256],
   [398],
   [430],
   [916],
   [691],
   [261],
   [100],
   [833],
   [797],
   [452],
   [638],
   [8],
   [869],
   [364],
   [221],
   [382],
   [356],
   [760],
   [366],
   [718],
   [961],
   [924],
   [255],
   [373],
   [899],


`product_embeddings` are included in the batch.

In [15]:
batch.keys()

dict_keys(['product_sku_hash_list', 'event_type_list', 'product_action_list', 'hashed_url_list', 'product_embeddings'])

## Creating and training the model

We are now ready to construct our model.

In [16]:
import merlin.models.tf as mm

input_block = mm.InputBlockV2(
    loader.output_schema,
    embeddings=mm.Embeddings(
        loader.output_schema.select_by_tag(Tags.CATEGORICAL),
        sequence_combiner=None,
    ),
    pretrained_embeddings=mm.PretrainedEmbeddings(
        loader.output_schema.select_by_tag(Tags.EMBEDDING),
        sequence_combiner=None,
        normalizer="l2-norm",
        output_dims={"product_embeddings": 128},
    )
)

We have now constructed an `input_block` that will take our batch and transform it in a fashion that will make it amenable for further processing by subsequent layers of our model.

To test that everything has worked, we can pass our example `batch` through the `input_block`.

In [17]:
input_batch = input_block(batch)

Let us now construct the remaining layers of our model.

In [18]:
target = 'hashed_url_list'

dmodel=128
mlp_block = mm.MLPBlock(
                [128,dmodel],
                activation='relu',
                no_activation_last_layer=True,
            )
transformer_block = mm.XLNetBlock(d_model=dmodel, n_head=4, n_layer=2)
model = mm.Model(
    input_block,
    mlp_block,
    transformer_block,
    mm.CategoricalOutput(
        train_processed.schema.select_by_name(target),
        default_loss="categorical_crossentropy",
    ),
)

And let us train it.

In [19]:
model.compile(run_eagerly=False, optimizer='adam', loss="categorical_crossentropy")
model.fit(loader, batch_size=64, epochs=NUM_EPOCHS, pre=mm.SequenceMaskRandom(schema=loader.output_schema, target=target, masking_prob=0.3, transformer=transformer_block))



Epoch 1/5


2023-06-19 06:29:31.217516: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8700




2023-06-19 06:29:42.631145: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: model/xl_net_block/sequential_block_7/replace_masked_embeddings/RaggedWhere/Assert/AssertGuard/branch_executed/_95


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f2e3a087370>

## Serving predictions

Now that we have prepared a workflow for processing our data (`wf`), defined the embedding operator (`embedding_operator`) and trained our model (`model`), we have all the components we need to serve our model using the Triton Inference Server (TIS).

Let us define a set of inference operators (a pipeline for processing our data all the way to obtaining predictions) and export them as an ensemble that we will be able to serve using TIS.

In [20]:
from merlin.systems.dag.ops.tensorflow import PredictTensorflow
from merlin.systems.dag.ensemble import Ensemble
from merlin.systems.dag.ops.workflow import TransformWorkflow

In [21]:
inference_operators = wf.input_schema.column_names >> TransformWorkflow(wf) >> embedding_operator >> PredictTensorflow(model)

  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.
  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (produc



INFO:tensorflow:Assets written to: /tmp/tmp9z21riwn/assets


INFO:tensorflow:Assets written to: /tmp/tmp9z21riwn/assets


In [22]:
ensemble = Ensemble(inference_operators, wf.input_schema)
ensemble.export(os.path.join(OUTPUT_DATA_DIR, 'ensemble'));

  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([10, None, 1])
    (event_type_list): TensorShape([10, None, 1])
    (product_action_list): TensorShape([10, None, 1])
    (hashed_url_list): TensorShape([10, None, 1])
    (product_embeddings): TensorShape([10, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


INFO:tensorflow:Assets written to: /workspace/data/ensemble/1_predicttensorflowtriton/1/model.savedmodel/assets


INFO:tensorflow:Assets written to: /workspace/data/ensemble/1_predicttensorflowtriton/1/model.savedmodel/assets
  config[key] = tf.keras.utils.serialize_keras_object(maybe_value)
  config[i] = tf.keras.utils.serialize_keras_object(layer)
  return serialization.serialize_keras_object(obj)






After we export the ensemble, we are ready to start the Triton Inference Server.

The server is installed in Merlin Tensorflow and Merlin PyTorch containers. If you are not using one of our containers, then ensure it is installed in your environment. For more information, see the Triton Inference Server [documentation](https://github.com/triton-inference-server/server/blob/r22.03/README.md#documentation).

You can start the server by running the following command:

```tritonserver --model-repository={OUTPUT_DATA_DIR}/ensemble/```

For the --model-repository argument, specify the same value as the `export_path` that you specified previously in the `ensemble.export` method.

After you run the `tritonserver` command, wait until your terminal shows messages like the following example:

I0414 18:29:50.741833 4067 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001<br>
I0414 18:29:50.742197 4067 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000<br>
I0414 18:29:50.783470 4067 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002

Let us now package our data for inference. We will send 5 rows of data, which corresponds to a single customer journey (session) through the online store. The data will be first processed by the `NVTabular` workflow and subsequentally passed to our transformer model for predicting. 

In [23]:
# obtaining five rows of data
df = train.head(5)
# making sure all the rows correspond to the same online session (have the same `session_id_hash`)
df['session_id_hash'] = df['session_id_hash'].iloc[0]

Let us now send the data to the Triton Inference Server for inference.

In [24]:
from merlin.systems.triton import convert_df_to_triton_input
import tritonclient.grpc as grpcclient

inputs = convert_df_to_triton_input(wf.input_schema, df)

with grpcclient.InferenceServerClient("localhost:8001") as client:
    response = client.infer('executor_model', inputs)

Let's parse the response.

In [25]:
predictions = response.as_numpy("hashed_url_list/categorical_output")
predictions

array([[-3.5892355 , -3.8800368 , -3.9715683 , ..., -0.68930686,
        -0.30754495, -0.70330954]], dtype=float32)

The response contains logits predicting the id of the url the customer is most likely to arrive at as next step of their journey through the online store.

Here is the predicted hashed url id:

In [26]:
predicted_hashed_url_id = predictions.argmax()
predicted_hashed_url_id

326

## Summary

We have trained a transformer model for the next item prediction task using language model masking.

For another session-based example that goes deeper into data preprocessing and that covers several advanced techniques (Weight Tying, Temperature Scaling) please see [Session-Based Next Item Prediction for Fashion E-Commerce](https://github.com/NVIDIA-Merlin/models/blob/t4rec_use_case/examples/usecases/ecommerce-session-based-next-item-prediction-for-fashion.ipynb). 