In [1]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#`
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions anda
# limitations under the License.
# ==============================================================================

# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.

<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_models-transformers-net-item-prediction/nvidia_logo.png" style="width: 90px; float: right;">

# Transformer-based architecture for next-item prediction task with pretrained embeddings

This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container.

## Overview

In this use case we will train a Transformer-based architecture for next-item prediction task with pretrained embeddings.

**You can chose to download the full dataset manually or use synthetic data.**

We will use the [SIGIR eCOM 2021 Data Challenge Dataset](https://github.com/coveooss/SIGIR-ecom-data-challenge) to train a session-based model. The dataset contains 36M events of users browsing an online store.

We will reshape the data to organize it into 'sessions'. Each session will be a full customer online journey in chronological order. The goal will be to predict the `url` of the next action taken.


### Learning objectives

- Training a Transformer-based architecture for next-item prediction task

## Downloading and preparing the dataset

In [2]:
import os
import cudf
import numpy as np
import pandas as pd
import nvtabular as nvt
from merlin.schema import ColumnSchema, Schema, Tags

OUTPUT_DATA_DIR = os.environ.get('OUTPUT_DATA_DIR', '/workspace/data')
NUM_EPOCHS = int(os.environ.get('NUM_EPOCHS', 5))
NUM_EXAMPLES = int(os.environ.get('NUM_EXAMPLES', 100_000))
MINIMUM_SESSION_LENGTH = int(os.environ.get('MINIMUM_SESSION_LENGTH', 5))

2023-06-19 11:42:34.743087: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
2023-06-19 11:42:36.302816: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 11:42:36.303199: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 11:42:36.303349: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executo

You can download the full dataset by registering [here](https://www.coveo.com/en/ailabs/sigir-ecom-data-challenge). If you chose to download the data, please place it alongside this notebook in the `sigir_dataset` directory and extract it.

By default, in this notebook we will be using synthetically generated data based on the SIGIR dataset, but you can run on the full dataset by changing the value of the boolean flag below.

In [3]:
RUN_ON_SYNTHETIC_DATA = True

### Clean downloaded data

If you are training on the full SIGIR dataset, the following code will pre-process it.

Here we deal with `nan` values, drop rows with missing information and parse strings containing lists to lists.

The synthetically generated data is already clean -- it doesn't require this pre-processing.

In [4]:
if not RUN_ON_SYNTHETIC_DATA:
    train = nvt.Dataset('/workspace/sigir_dataset/train/browsing_train.csv', part_size='500MB')
    skus = nvt.Dataset('/workspace/sigir_dataset/train/sku_to_content.csv')

    skus = pd.read_csv('/workspace/sigir_dataset/train/sku_to_content.csv')

    skus['description_vector'] = skus['description_vector'].replace(np.nan, '')
    skus['image_vector'] = skus['image_vector'].replace(np.nan, '')

    skus['description_vector'] = skus['description_vector'].apply(lambda x: [] if len(x) == 0 else eval(x))
    skus['image_vector'] = skus['image_vector'].apply(lambda x: [] if len(x) == 0 else eval(x))
    skus = skus[skus.description_vector.apply(len) > 0]
    skus = nvt.Dataset(skus)

### Generate synthetic data

If you are not running on the full dataset, the following lines of code will generate its synthetic counterpart.

In [5]:
if RUN_ON_SYNTHETIC_DATA:
    from merlin.datasets.synthetic import generate_data

    train = generate_data('sigir-browsing', NUM_EXAMPLES)
    skus = generate_data('sigir-sku', NUM_EXAMPLES)

## Constructing a workflow

We need to process our data further before we can use it to train our model.

In particular, the `skus` dataset contains the mapping between the `product_sku_hash` (essentially an item id) to the `description_vector` -- an embedding obtained from the description.

This is a piece of information that we would like to use in our model. In order to do so, we need to map the `product_sku_hash` information to an id.

But we need to make sure that the way we process `skus` and the `train` dataset (event information) is consistent. That the same `product_sku_hash` is mapped to the same id both when processing `skus` and `train`.

We do so by defining and fitting a `Categorify` op once and using it to process both the `skus` and the `train` datasets.

Additionally, we apply some further processing to the `train` dataset. We group rows of data by `session_id_hash` so that each training example will contain events from a single customer visit to the online store arranged in chronological order.

If you would like to learn more about leveraging `NVTabular` to process tabular data on the GPU using a set of industry standard operators, please consult the examples available [here](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples).

Let's first process the `train` dataset and retain the `Categorify` operator (`cat_op`) for processing of `skus`.

In [6]:
cat_op = nvt.ops.Categorify()
out = ['product_sku_hash'] >> cat_op >> nvt.ops.TagAsItemID()
out += ['event_type', 'product_action', 'session_id_hash', 'hashed_url'] >> nvt.ops.Categorify()
out += ['server_timestamp_epoch_ms'] >> nvt.ops.NormalizeMinMax()

groupby_features = out >> nvt.ops.Groupby(
    groupby_cols=['session_id_hash'],
    aggs={
        'product_sku_hash': ['list'],
        'event_type': ['list'],
        'product_action': ['list'],
        'hashed_url': ['list', 'count'],
        'server_timestamp_epoch_ms': ['list']
    },
    sort_cols="server_timestamp_epoch_ms"
)

filtered_sessions = groupby_features >> nvt.ops.Filter(f=lambda df: df["hashed_url_count"] >= MINIMUM_SESSION_LENGTH)

# We won't be needing the `session_id_hash` nor the `hashed_url_count` any longer
wf = nvt.Workflow(
    filtered_sessions[
        'product_sku_hash_list',
        'event_type_list',
        'product_action_list',
        'hashed_url_list',
    ]
)

# Let's save the output of our workflow -- transformed `train` for later use (training of our model).
wf.fit_transform(train).to_parquet('train_transformed')

Here are a couple of example rows from `train` transformed.

In [7]:
nvt.Dataset('train_transformed', engine='parquet').head()

Unnamed: 0,product_sku_hash_list,event_type_list,product_action_list,hashed_url_list
0,"[406, 151, 787, 718, 332, 939, 466, 772, 467, ...","[3, 3, 4, 4, 3, 3, 4, 3, 3, 4, 3, 4, 3, 3, 3, ...","[3, 3, 3, 6, 5, 4, 4, 5, 5, 4, 3, 4, 6, 5, 6, ...","[483, 75, 900, 346, 57, 997, 767, 406, 792, 66..."
1,"[492, 148, 892, 482, 777, 401, 186, 958, 549, ...","[4, 4, 4, 3, 3, 3, 4, 4, 4, 3, 4, 3, 4, 3, 3, ...","[6, 5, 3, 6, 5, 4, 4, 6, 6, 5, 3, 4, 5, 6, 4, ...","[90, 194, 75, 45, 866, 169, 278, 702, 113, 147..."
2,"[501, 480, 511, 160, 196, 929, 567, 305, 781, ...","[3, 3, 3, 4, 4, 4, 3, 4, 3, 4, 3, 4, 4, 4, 3, ...","[5, 3, 4, 3, 6, 3, 6, 5, 4, 6, 6, 4, 5, 3, 6, ...","[970, 881, 956, 669, 707, 616, 682, 723, 243, ..."
3,"[72, 649, 642, 535, 103, 755, 152, 158, 815, 9...","[3, 4, 4, 3, 4, 3, 4, 4, 4, 4, 3, 3, 3, 4, 3, ...","[5, 4, 5, 3, 4, 3, 4, 6, 3, 3, 5, 3, 5, 5, 6, ...","[427, 925, 881, 641, 82, 151, 108, 954, 14, 37..."
4,"[613, 240, 229, 39, 339, 723, 476, 897, 678, 5...","[4, 3, 3, 3, 4, 4, 3, 3, 3, 4, 3, 3, 4, 4, 4, ...","[4, 4, 6, 6, 5, 3, 5, 3, 5, 6, 6, 6, 5, 3, 5, ...","[161, 322, 752, 313, 990, 790, 743, 984, 420, ..."


Now that we have processed the train set, we can use the mapping preserved in the `cat_op` to process the `skus` dataset containing the embeddings we are after.

Let's now `Categorify` the `product_sku_hash` in `skus` and grab just the description embedding information.

In [8]:
skus.head()

Unnamed: 0,product_sku_hash,description_vector,category_hash,price_bucket
0,1,"[-0.0726655802618047, -0.11872599678889867, -0...",131,0.379625
1,17,"[0.039721458967498124, -0.33268736739852606, -...",93,0.072194
2,8,"[0.5538721321346369, -0.20581214837288417, -0....",69,0.895861
3,15,"[0.2653254056823971, -0.3550436895508866, 0.57...",25,0.38587
4,6,"[0.1684285401357472, -0.39828157446368395, 0.3...",10,0.752337


In [9]:
out = ['product_sku_hash'] >> cat_op
wf_skus = nvt.Workflow(out + 'description_vector')
skus_ds = wf_skus.transform(skus)

skus_ds.head()

Unnamed: 0,product_sku_hash,description_vector
0,555,"[-0.0726655802618047, -0.11872599678889867, -0..."
1,105,"[0.039721458967498124, -0.33268736739852606, -..."
2,75,"[0.5538721321346369, -0.20581214837288417, -0...."
3,701,"[0.2653254056823971, -0.3550436895508866, 0.57..."
4,402,"[0.1684285401357472, -0.39828157446368395, 0.3..."


Let us now export the embedding information to a `numpy` array and write it to disk.

We will later pass this information so that the `Loader` will load the correct emebedding for the product corresponding to the given step of a customer journey.

The embeddings are linked to the train set using the `product_sku_hash` information.

In [10]:
skus_ds.to_npy('skus.npy')

How will the `Loader` know which embedding to associate with a given row of the train set?

The `product_sku_hash` ids have been exported along with the embeddings and are contained in the first column of the output `numpy` array.

Here is the id of the first embedding stored in `skus.npy`.

In [11]:
np.load('skus.npy')[0, 0]

555.0

and here is the embedding vector corresponding to `product_sku_hash` of id referenced above:

In [12]:
np.load('skus.npy')[0, 1:]

array([-0.07266558, -0.118726  , -0.2043746 ,  0.24600699,  0.14059197,
        0.57885933, -0.09720917,  0.05745345,  0.38007381,  0.0089302 ,
       -0.14735974, -0.26629164,  0.52613226,  0.19372434, -0.05036817,
        0.34785588, -0.2789483 ,  0.55859555,  0.07543182, -0.19300545,
       -0.40272555, -0.35062145,  0.05617401, -0.43663303,  0.16449501,
       -0.38836731,  0.2188186 , -0.17435912,  0.21507282, -0.33047473,
        0.29786609, -0.02070877,  0.56022643,  0.42869981,  0.28771574,
       -0.33769037, -0.27026837, -0.04175104, -0.43697575,  0.52561801,
       -0.24987143,  0.54709774,  0.51354977,  0.32771942,  0.4961752 ,
        0.07266798, -0.03690612, -0.12020759,  0.17333933, -0.18343979])

We are now ready to construct the `Loader` that will feed the data to our model.

We begin by reading in the embeddings information.

In [13]:
embeddings = np.load('skus.npy')

We are now ready to define the `Loader`.

We are passing in an `EmbeddingOperator` that will ensure that correct `sku` information (correct `description_vector`) is associated with the correct step in the customer journey (with the lookup key being contained in the `product_sku_hash_list`)

When specifying the dataset, we are creating a `Merlin Dataset` based on the `train_transformed` data we saved above.

Depending on the hardware that you will be running this on and the size of the dataset that you will be using, should you run out of GPU memory, you can specify one of the several parameters that can ease the memory load (`npartitions`, `part_size`, or `part_mem_fraction`).

The `BATCH_SIZE` of 16 should work on a broad set of hardware, but if you are training on a lot of data and your hardware permitting, you might want to significantly increase it.

In [14]:
BATCH_SIZE = 16

from merlin.dataloader.tensorflow import Loader
from merlin.dataloader.ops.embeddings import EmbeddingOperator
import merlin.models.tf as mm

embedding_operator = EmbeddingOperator(
    embeddings[:, 1:].astype(np.float32),
    id_lookup_table=embeddings[:, 0].astype(int),
    lookup_key="product_sku_hash_list",
    embedding_name='product_embeddings'
)

loader = Loader(
    dataset=nvt.Dataset('train_transformed', engine='parquet'),
    batch_size=BATCH_SIZE,
    transforms=[
        embedding_operator
    ],
    shuffle=True
)

[INFO]: sparse_operation_kit is imported
[SOK INFO] Import /usr/local/lib/python3.8/dist-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libsok_experiment.so
[SOK INFO] Import /usr/local/lib/python3.8/dist-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libsok_experiment.so
[SOK INFO] Initialize finished, communication tool: horovod


2023-06-19 11:42:50.629508: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-19 11:42:50.630392: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 11:42:50.630592: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-19 11:42:50.630748: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must 

Using the `EmbeddingOperator` object we referenced our `embeddings` and advised the model what to use as a key to look up the information.

Below is an example batch of data that our model will consume.

In [15]:
batch = mm.sample_batch(loader, batch_size=BATCH_SIZE, include_targets=False, prepare_features=True)
batch

{'product_sku_hash_list': <tf.RaggedTensor [[[478],
   [627],
   [551],
   [41],
   [834],
   [491],
   [854],
   [87],
   [937],
   [619],
   [832],
   [496],
   [312],
   [954],
   [399],
   [405],
   [270],
   [706],
   [600],
   [175],
   [111],
   [839],
   [131],
   [436],
   [771],
   [316],
   [141],
   [820],
   [264],
   [730],
   [361],
   [168],
   [529],
   [578],
   [707],
   [821],
   [203],
   [816],
   [130],
   [263],
   [215],
   [906],
   [703],
   [440],
   [20],
   [589],
   [731],
   [605],
   [120],
   [499],
   [47],
   [153],
   [587],
   [391],
   [946],
   [527],
   [730],
   [21],
   [946],
   [304],
   [464],
   [121],
   [386],
   [422],
   [665],
   [804],
   [533],
   [447],
   [463],
   [945],
   [631],
   [428],
   [725],
   [66],
   [180],
   [720],
   [391],
   [157],
   [155],
   [448],
   [169],
   [473],
   [180],
   [211],
   [902],
   [692],
   [366],
   [925],
   [348],
   [151],
   [227],
   [666],
   [862],
   [864],
   [824],
   [188],
   [

`product_embeddings` are included in the batch.

In [16]:
batch.keys()

dict_keys(['product_sku_hash_list', 'event_type_list', 'product_action_list', 'hashed_url_list', 'product_embeddings'])

## Creating and training the model

We are now ready to construct our model.

In [17]:
import merlin.models.tf as mm

input_block = mm.InputBlockV2(
    loader.output_schema,
    embeddings=mm.Embeddings(
        loader.output_schema.select_by_tag(Tags.CATEGORICAL),
        sequence_combiner=None,
    ),
    pretrained_embeddings=mm.PretrainedEmbeddings(
        loader.output_schema.select_by_tag(Tags.EMBEDDING),
        sequence_combiner=None,
        normalizer="l2-norm",
        output_dims={"product_embeddings": 64},
    )
)

We have now constructed an `input_block` that will take our batch and transform it in a fashion that will make it amenable for further processing by subsequent layers of our model.

To test that everything has worked, we can pass our example `batch` through the `input_block`.

In [18]:
input_batch = input_block(batch)
input_batch

<tf.RaggedTensor [[[0.040083896, 0.042916153, -0.0039782748, ..., 0.027303409, -0.038241126,
   0.019656722],
  [0.040083896, 0.042916153, -0.0039782748, ..., -0.042541612,
   -0.0053907745, -0.040022336],
  [0.040083896, 0.042916153, -0.0039782748, ..., 0.041293588, 0.023040716,
   0.0042234547],
  ...,
  [0.040083896, 0.042916153, -0.0039782748, ..., 0.028160218, -0.020016467,
   -0.021396875],
  [0.040083896, 0.042916153, -0.0039782748, ..., -0.021709753,
   -0.006258916, 0.030134965],
  [-0.04305446, -0.047425237, 0.045270134, ..., -0.019296885, 0.026841488,
   0.047015373]]                                                           ,
 [[0.040083896, 0.042916153, -0.0039782748, 0.037083637, -0.046425987,
   -0.029759957, -0.035465933, 0.012391172, 0.021638181, 0.045441154,
   0.021133985, 0.02455667, 0.04394945, -0.020857288, 0.04652107,
   0.047459994, 0.04200275, -0.0031253807, -0.016242288, 0.020285297,
   0.03347284, 0.0075957775, -0.0017552488, 0.042558257, 0.02921087,
   0.030

Let us now construct the remaining layers of our model.

In [19]:
target = 'hashed_url_list'

# We do not need the `train_transformed` dataset here, but we do need
# to access the schema.
# It contains important information that will help our model construct itself.
schema = nvt.Dataset('train_transformed', engine='parquet').schema

dmodel=64
mlp_block = mm.MLPBlock(
                [128,dmodel],
                activation='relu',
                no_activation_last_layer=True,
            )
transformer_block = mm.XLNetBlock(d_model=dmodel, n_head=4, n_layer=2)
model = mm.Model(
    input_block,
    mlp_block,
    transformer_block,
    mm.CategoricalOutput(
        schema.select_by_name(target),
        default_loss="categorical_crossentropy",
    ),
)

And let us train it.

In [20]:
model.compile(run_eagerly=False, optimizer='adam', loss="categorical_crossentropy")
model.fit(loader, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS, pre=mm.SequenceMaskRandom(schema=loader.output_schema, target=target, masking_prob=0.3, transformer=transformer_block))

2023-06-19 11:42:59.164710: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8700


Epoch 1/5


2023-06-19 11:43:11.869013: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: model/xl_net_block/sequential_block_7/replace_masked_embeddings/RaggedWhere/Assert/AssertGuard/branch_executed/_95




2023-06-19 11:43:40.529208: W tensorflow/tsl/framework/bfc_allocator.cc:360] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f81b81b1910>

## Serving predictions

Now that we have prepared a workflow for processing our data (`wf`), defined the embedding operator (`embedding_operator`) and trained our model (`model`), we have all the components we need to serve our model using the Triton Inference Server (TIS).

Let us define a set of inference operators (a pipeline for processing our data all the way to obtaining predictions) and export them as an ensemble that we will be able to serve using TIS.

In [21]:
from merlin.systems.dag.ops.tensorflow import PredictTensorflow
from merlin.systems.dag.ensemble import Ensemble
from merlin.systems.dag.ops.workflow import TransformWorkflow

In [22]:
inference_operators = wf.input_schema.column_names >> TransformWorkflow(wf) >> embedding_operator >> PredictTensorflow(model)

  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.
  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (produc



INFO:tensorflow:Assets written to: /tmp/tmp6be41l_e/assets


INFO:tensorflow:Assets written to: /tmp/tmp6be41l_e/assets


In [23]:
ensemble = Ensemble(inference_operators, wf.input_schema)
ensemble.export(os.path.join(OUTPUT_DATA_DIR, 'ensemble'));

  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


  (_feature_shapes): Dict(
    (product_sku_hash_list): TensorShape([16, None, 1])
    (event_type_list): TensorShape([16, None, 1])
    (product_action_list): TensorShape([16, None, 1])
    (hashed_url_list): TensorShape([16, None, 1])
    (product_embeddings): TensorShape([16, None, 50])
  )
  (_feature_dtypes): Dict(
    (product_sku_hash_list): tf.int64
    (event_type_list): tf.int64
    (product_action_list): tf.int64
    (hashed_url_list): tf.int64
    (product_embeddings): tf.float32
  )
), because it is not built.


INFO:tensorflow:Assets written to: /workspace/data/ensemble/1_predicttensorflowtriton/1/model.savedmodel/assets


INFO:tensorflow:Assets written to: /workspace/data/ensemble/1_predicttensorflowtriton/1/model.savedmodel/assets
  config[key] = tf.keras.utils.serialize_keras_object(maybe_value)
  config[i] = tf.keras.utils.serialize_keras_object(layer)
  return serialization.serialize_keras_object(obj)






After we export the ensemble, we are ready to start the Triton Inference Server.

The server is installed in Merlin Tensorflow and Merlin PyTorch containers. If you are not using one of our containers, then ensure it is installed in your environment. For more information, see the Triton Inference Server [documentation](https://github.com/triton-inference-server/server/blob/r22.03/README.md#documentation).

You can start the server by running the following command:

```tritonserver --model-repository={OUTPUT_DATA_DIR}/ensemble/```

For the --model-repository argument, specify the same value as the `export_path` that you specified previously in the `ensemble.export` method.

After you run the `tritonserver` command, wait until your terminal shows messages like the following example:

I0414 18:29:50.741833 4067 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001<br>
I0414 18:29:50.742197 4067 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000<br>
I0414 18:29:50.783470 4067 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002

Let us now package our data for inference. We will send 5 rows of data, which corresponds to a single customer journey (session) through the online store. The data will be first processed by the `NVTabular` workflow and subsequentally passed to our transformer model for predicting. 

In [24]:
# obtaining five rows of data
df = train.head(5)
# making sure all the rows correspond to the same online session (have the same `session_id_hash`)
df['session_id_hash'] = df['session_id_hash'].iloc[0]

Let us now send the data to the Triton Inference Server for inference.

In [25]:
from merlin.systems.triton import convert_df_to_triton_input
import tritonclient.grpc as grpcclient

inputs = convert_df_to_triton_input(wf.input_schema, df)

with grpcclient.InferenceServerClient("localhost:8001") as client:
    response = client.infer('executor_model', inputs)

Let's parse the response.

In [26]:
predictions = response.as_numpy("hashed_url_list/categorical_output")
predictions

array([[-2.3481152 , -2.2816658 , -2.2346332 , ..., -0.48081234,
        -0.5075794 , -0.4446747 ]], dtype=float32)

The response contains logits predicting the id of the url the customer is most likely to arrive at as next step of their journey through the online store.

Here is the predicted hashed url id:

In [27]:
predicted_hashed_url_id = predictions.argmax()
predicted_hashed_url_id

80

## Summary

We have trained a transformer model for the next item prediction task using language model masking.

For another session-based example that goes deeper into data preprocessing and that covers several advanced techniques (Weight Tying, Temperature Scaling) please see [Session-Based Next Item Prediction for Fashion E-Commerce](https://github.com/NVIDIA-Merlin/models/blob/t4rec_use_case/examples/usecases/ecommerce-session-based-next-item-prediction-for-fashion.ipynb). 