In [1]:
# %%bash

# cd /models && git checkout main && git pull && pip install .
# cd /nvtabular && git checkout main && git pull && pip install .
# cd /core && git checkout main && git pull && pip install .
# cd /systems && git checkout main && git pull && pip install .
# cd /dataloader && git checkout main && git pull && pip install .
# cd /workspace

In [2]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions anda
# limitations under the License.
# ==============================================================================

# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.

<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_models-transformers-net-item-prediction/nvidia_logo.png" style="width: 90px; float: right;">

# Transformer-based architecture for next-item prediction task

## Overview

In this use case we will train a Transformer-based architecture for next-item prediction task.

**Note, the data for this notebook will be automatically downloaded to the folder specified in the cells below.**

We will use the [booking.com dataset](https://github.com/bookingcom/ml-dataset-mdt) to train a session-based model. The dataset contains 1,166,835 of anonymized hotel reservations in the train set and 378,667 in the test set. Each reservation is a part of a customer's trip (identified by `utrip_id`) which includes consecutive reservations.

We will reshape the data to organize it into 'sessions'. Each session will be a full customer itinerary in chronological order. The goal will be to predict the city_id of the final reservation of each trip.


### Learning objectives

- Training a Transformer-based architecture for next-item prediction task

## Downloading and preparing the dataset

We will download the dataset using a functionality provided by merlin models. The dataset can be found on GitHub [here](https://github.com/bookingcom/ml-dataset-mdt).

**Read more about libraries used in the import statements below**

- [get_lib](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/core/dispatch.py)
- [get_booking](https://github.com/NVIDIA-Merlin/models/tree/main/merlin/datasets/ecommerce)
- [nvtabular](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/nvtabular)
- [nvtabular ops](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/nvtabular/ops)
- [schema tags](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/schema/tags.py)
- [merlin models tensorflow](https://github.com/NVIDIA-Merlin/models/tree/main/merlin/models/tf)
- [get_booking](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/datasets/ecommerce/booking/dataset.py)

In [3]:
# Resetting the TF memory allocation to not be 50% by default. 
import os
os.environ["TF_GPU_ALLOCATOR"]="cuda_malloc_async"

from merlin.core.dispatch import get_lib
from merlin.datasets.ecommerce import get_booking

import numpy as np
import cudf

from nvtabular import *
from nvtabular import ops

from merlin.schema.tags import Tags
import merlin.models.tf as mm

get_booking('/workspace/data')
train = get_lib().read_csv('/workspace/data/train_set.csv', parse_dates=['checkin', 'checkout'])

print('Training data import complete')

2023-04-04 00:44:09.226877: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.




2023-04-04 00:44:10.336814: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-04 00:44:10.337228: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-04 00:44:10.337383: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")


[INFO]: sparse_operation_kit is imported
[SOK INFO] Import /usr/local/lib/python3.8/dist-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libsok_experiment.so
[SOK INFO] Initialize finished, communication tool: horovod


2023-04-04 00:44:10.734626: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-04 00:44:10.735432: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-04 00:44:10.735641: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-04 00:44:10.735800: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning

Training data import complete




Each reservation has a unique `utrip_id`. During each trip a customer vists several destinations.

In [4]:
# When displaying cudf dataframes use print() or display(), otherwise Jupyter creates hidden copies.
print(train.head())

   user_id    checkin   checkout  city_id device_class  affiliate_id  \
0  1000027 2016-08-13 2016-08-14     8183      desktop          7168   
1  1000027 2016-08-14 2016-08-16    15626      desktop          7168   
2  1000027 2016-08-16 2016-08-18    60902      desktop          7168   
3  1000027 2016-08-18 2016-08-21    30628      desktop           253   
4  1000033 2016-04-09 2016-04-11    38677       mobile           359   

  booker_country hotel_country   utrip_id  
0        Elbonia        Gondal  1000027_1  
1        Elbonia        Gondal  1000027_1  
2        Elbonia        Gondal  1000027_1  
3        Elbonia        Gondal  1000027_1  
4         Gondal  Cobra Island  1000033_1  


We will train on sequences of `city_id` and `booker_country` and based on this information, our model will attempt to predict the next `city_id` (the next hop in the journey).

We will train a transformer model that can work with sequences of variable length within a batch. This functionality is provided to us out of the box and doesn't require any changes to the architecture. Thanks to it we do not have to pad or trim our sequences to any particular length -- our model can make effective use of all of the data!

*With one exception.* For a masked language model that we will be training, we need to discard sequences that are shorter than two hops. This makes sense as there is nothing our model could learn if it was only presented with an itinerary with a single destination on it!

Let us begin by splitting the data into a train and validation set based on trip ID.

Let's see how many unique trips there are in the dataset. Also, let us shuffle the trips along the way so that our validation set consists of a random sample of our train set.

In [5]:
# Unique trip ids.
utrip_ids = train.sample(frac=1).utrip_id.unique()
print('Number of unique trips is :', len(utrip_ids))

Number of unique trips is : 217686


Now let's assign data to our train and validation sets. Furthermore, we sort the data by `utrip_id` and `checkin`. This way we ensure our sequences of visited `city_ids` will be in proper order!

In [6]:
train = cudf.from_pandas(
    train.to_pandas().join(train.to_pandas().groupby('utrip_id').size().rename('num_examples'), on='utrip_id')
)

In [7]:
train = train[train.num_examples > 1]

In [8]:
train.checkin = train.checkin.astype('int')
train.checkout = train.checkout.astype('int')

train_set_utrip_ids = utrip_ids[:int(0.8 * utrip_ids.shape[0])]
validation_set_utrip_ids = utrip_ids[int(0.8 * utrip_ids.shape[0]):]

train_set = train[train.utrip_id.isin(train_set_utrip_ids)].sort_values(['utrip_id', 'checkin'])
validation_set = train[train.utrip_id.isin(validation_set_utrip_ids)].sort_values(['utrip_id', 'checkin'])

##  Preprocessing with NVTabular

We can now begin with data preprocessing.

We will combine trips into "sessions", discard trips that are too short and calculate total trip length.

We will use NVTabular for this work. It offers optimized tabular data preprocessing operators that run on the GPU. If you would like to learn more about the NVTabular library, please take a look [here](https://github.com/NVIDIA-Merlin/NVTabular).

Read more about the [Merlin's Dataset API](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/io/dataset.py)  
Read more about how [parquet files are read in and processed by Merlin](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/io/parquet.py)  
Read more about [Tags](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/schema/tags.py)  
- [schema_select_by_tag](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/schema/schema.py)  

Read more about [NVTabular Workflows](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/workflow/workflow.py)  
- [fit_transform](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/workflow/workflow.py)
- [transform](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/workflow/workflow.py)  

Read more about the [NVTabular Operators]()  
- [Categorify](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/categorify.py)
- [AddTags](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/add_metadata.py)
- [LambdaOp](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/lambdaop.py)
- [Rename](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/rename.py)
- [Filter](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/filter.py)



In [9]:
train_set_dataset = Dataset(train_set)
validation_set_dataset = Dataset(validation_set)

In [10]:
categorical_features = (['city_id']) >> ops.Categorify(start_index=1)  

groupby_features = categorical_features + ['utrip_id', 'checkin'] >> ops.Groupby(
    groupby_cols=['utrip_id'],
    aggs={
        'city_id': ['list'],
    },
    sort_cols="checkin"
)

list_features = (
            groupby_features['city_id_list'] >> ops.AddTags([Tags.SEQUENCE])
)

# Filter out sessions with less than 2 interactions 
MINIMUM_SESSION_LENGTH = 2
features = list_features >>  ops.AddTags([Tags.CATEGORICAL])
filtered_sessions = features

In [11]:
wf = Workflow(filtered_sessions)

In [12]:
train_set_processed = wf.fit_transform(train_set_dataset)
validation_set_processed = wf.transform(validation_set_dataset)

Our data consists of a sequence of visited `city_ids`, a sequence of `booker_countries` (represented as integer categories) and a `city_id_count` column (which contains the count of visited cities in a trip).

In [13]:
train_set_processed.compute().head()

Unnamed: 0,city_id_list
0,"[8239, 157, 2279, 2098]"
1,"[64, 1161, 88, 619, 64]"
2,"[8, 7, 25, 1051, 66, 53, 4]"
3,"[1033, 758, 141, 4]"
4,"[3604, 263, 663, 251, 360]"


We are now ready to train our model.

Here is the schema of the data that our model will use.

In [14]:
seq_schema = train_set_processed.schema.select_by_tag(Tags.SEQUENCE)
seq_schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.num_buckets,properties.freq_threshold,properties.max_size,properties.start_index,properties.cat_path,properties.domain.min,properties.domain.max,properties.domain.name,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.value_count.min,properties.value_count.max
0,city_id_list,"(Tags.CATEGORICAL, Tags.SEQUENCE)","DType(name='int64', element_type=<ElementType....",True,True,,0,0,1,.//categories/unique.city_id.parquet,0,37203,city_id,37204,512,0,


Align the schema of train and validation datasets with the model's schema

In [15]:
train_set_processed.schema = seq_schema
validation_set_processed.schema = seq_schema

Let's also identify the target column.

In [16]:
target = train_set_processed.schema.select_by_tag(Tags.SEQUENCE).column_names[0]
target

'city_id_list'

## Constructing the model

Let's construct our model.

We can specify various hyperparameters, such as the number of heads and number of layers to use.

For the transformer portion of our model, we will use the `XLNet` architecture.

Later, when we run the `fit` method on our model, we will specify the `masking_probability` of `0.3` and link it to the transformer block defined in out model. Through the combination of these parameters, our model will train on sequences where any given timestep will be masked with a probability of 0.3 and it will be our model's training task to infer the target value for that step!

To summarize, Masked Language Modeling is implemented by:

* `SequenceMaskRandom()` - Used as a pre for model.fit(), it randomly selects items from the sequence to be masked for prediction as targets, by using Keras masking. This block also adds the necessary configuration to the specified `transformer` block so as it
is pre-configured with the necessary layers needed to prepare the inputs to the HuggingFace transformer layer and to post-process its outputs. For example, one pre-processing operation is to replace the input embeddings at masked positions for prediction by a dummy trainable embedding, to avoid leakage of the targets.


**Read more about the apis used to construct models** 
- [blocks](https://github.com/NVIDIA-Merlin/models/tree/main/merlin/models/tf/blocks)
- [MLPBlock](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/blocks/mlp.py)
- [InputBlockV2](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/inputs/base.py)
- [Embeddings](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/inputs/embedding.py)
- [XLNetBlock](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/transformers/block.py)
- [CategoricalOutput](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/outputs/classification.py)
- [.schema.select_by_name](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/schema/schema.py)
- [.schema.select_by_tag](https://github.com/NVIDIA-Merlin/core/blob/main/merlin/schema/schema.py)
- [model.compile()](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/models/base.py)
- [model.fit()](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/models/base.py)
- [model.evaluate()](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/models/base.py)
- [mm.SequenceMaskRandom](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/transforms/sequence.py)
- [mm.SequenceMaskLast](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/transforms/sequence.py)

In [17]:
dmodel=48
mlp_block = mm.MLPBlock(
                [128,dmodel],
                activation='relu',
                no_activation_last_layer=True,
            )
transformer_block = mm.XLNetBlock(d_model=dmodel, n_head=4, n_layer=2)
model = mm.Model(
    mm.InputBlockV2(
        seq_schema,
        embeddings=mm.Embeddings(
            train_set_processed.schema.select_by_tag(Tags.CATEGORICAL), sequence_combiner=None
        ),
    ),
    mlp_block,
    transformer_block,
    mm.CategoricalOutput(
        train_set_processed.schema.select_by_name(target),
        default_loss="categorical_crossentropy",
    ),
)

## Model training

In [18]:
model.compile(run_eagerly=False, optimizer='adam', loss="categorical_crossentropy")
model.fit(train_set_processed, batch_size=64, epochs=1, pre=mm.SequenceMaskRandom(schema=seq_schema, target=target, masking_prob=0.3, transformer=transformer_block))

2023-04-04 00:44:18.668090: I tensorflow/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8700








2023-04-04 00:44:27.604085: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: model/xl_net_block/sequential_block_5/replace_masked_embeddings/RaggedWhere/Assert/AssertGuard/branch_executed/_31




<keras.callbacks.History at 0x7fc794ee8ee0>

In [19]:
from merlin.systems.dag.ops.tensorflow import PredictTensorflow
from merlin.systems.dag.ensemble import Ensemble
from merlin.systems.dag.ops.workflow import TransformWorkflow

inf_ops = wf.input_schema.column_names >> TransformWorkflow(wf) >> PredictTensorflow(model)

ensemble = Ensemble(inf_ops, wf.input_schema)
ensemble.export('/workspace/models_for_benchmarking');

  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.
  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.
  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.




INFO:tensorflow:Assets written to: /tmp/tmpw3pw0y1l/model.savedmodel/assets


INFO:tensorflow:Assets written to: /tmp/tmpw3pw0y1l/model.savedmodel/assets
  config[key] = tf.keras.utils.serialize_keras_object(maybe_value)
  config[i] = tf.keras.utils.serialize_keras_object(layer)
  return generic_utils.serialize_keras_object(obj)










  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.


  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.


  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.


  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.


  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.


  (_feature_shapes): Dict(
    (city_id_list): TensorShape([64, None, 1])
  )
  (_feature_dtypes): Dict(
    (city_id_list): tf.int64
  )
), because it is not built.


INFO:tensorflow:Assets written to: /workspace/models_for_benchmarking/0_predicttensorflowtriton/1/model.savedmodel/assets


INFO:tensorflow:Assets written to: /workspace/models_for_benchmarking/0_predicttensorflowtriton/1/model.savedmodel/assets
  config[key] = tf.keras.utils.serialize_keras_object(maybe_value)
  config[i] = tf.keras.utils.serialize_keras_object(layer)
  return generic_utils.serialize_keras_object(obj)






In [20]:
import nvtabular.inference.triton as nvt_triton
import tritonclient.grpc as grpcclient
import subprocess

subprocess.Popen(['tritonserver', '--model-repository=/workspace/models_for_benchmarking/'])

<subprocess.Popen at 0x7fc4cd6dd9d0>

I0404 00:46:38.812106 643 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f618a000000' with size 268435456
I0404 00:46:38.813393 643 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0404 00:46:38.816888 643 model_lifecycle.cc:459] loading: 0_predicttensorflowtriton:1
I0404 00:46:38.816907 643 model_lifecycle.cc:459] loading: executor_model:1
I0404 00:46:39.177492 643 tensorflow.cc:2536] TRITONBACKEND_Initialize: tensorflow
I0404 00:46:39.177677 643 tensorflow.cc:2546] Triton TRITONBACKEND API version: 1.10
I0404 00:46:39.177681 643 tensorflow.cc:2552] 'tensorflow' TRITONBACKEND API version: 1.10
I0404 00:46:39.177683 643 tensorflow.cc:2576] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0404 00:46:39.182139 643 tensorflow.cc:2642] TRITONBACKEND_ModelInitialize: 0_predicttensorflowtriton (version 

2023-04-04 00:46:48.147818: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-04 00:46:48.148202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-04 00:46:48.148385: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I0404 00:46:48.660257 643 model_lifecycle.cc:694] successfully loaded 'executor_model' version 1
I0404 00:46:48.667171 643 server.cc:563] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0404 00:46:48.667219 643 server.cc:590] 
+------------+--------------------------------

In [21]:
import tritonhttpclient
try:
    triton_client = tritonhttpclient.InferenceServerClient(url="localhost:8000", verbose=True)
    print("client created.")
except Exception as e:
    print("channel creation failed: " + str(e))
triton_client.is_server_live()

client created.
GET /v2/health/live, headers None
<HTTPSocketPoolResponse status=200 headers={'content-length': '0', 'content-type': 'text/plain'}>




True

In [22]:
validation_data = validation_set_dataset.compute()
validation_data = validation_data[['city_id', 'checkin', 'utrip_id']]

In [23]:
validation_data.columns

Index(['city_id', 'checkin', 'utrip_id'], dtype='object')

In [24]:
from merlin.systems.triton import convert_df_to_triton_input

In [25]:
inputs = convert_df_to_triton_input(validation_set_dataset.schema.select_by_name(['city_id', 'checkin', 'utrip_id']), validation_data.iloc[:10])

In [26]:
with grpcclient.InferenceServerClient("localhost:8001") as client:
    response = client.infer('executor_model', inputs)

  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
Failed to transform operator <merlin.systems.dag.runtimes.triton.ops.tensorflow.PredictTensorflowTriton object at 0x7f85c98b3a30>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/conversions.py", line 164, in triton_response_to_tensor_table
    values = _array_from_triton_tensor(response, f"{out_col_name}__values")
  File "/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/conversions.py", line 201, in _array_from_triton_tensor
    raise ValueError(f"Column {name} not found in {type(triton_obj)}")
ValueError: Column city_id_list/categorical_output__values not found in <class 'c_python_backend_utils.InferenceResponse'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/merlin/dag/executors.py", line 183, in _transform_data
    output_data

InferenceServerException: [StatusCode.INTERNAL] Column city_id_list/categorical_output not found in <class 'c_python_backend_utils.InferenceResponse'>

In [24]:
!pkill triton

Signal (15) received.


I0331 04:09:46.862105 643 server.cc:264] Waiting for in-flight requests to complete.
I0331 04:09:46.862120 643 server.cc:280] Timeout 30: Found 0 model versions that have in-flight inferences
I0331 04:09:46.862226 643 server.cc:295] All models are stopped, unloading models
I0331 04:09:46.862232 643 server.cc:302] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0331 04:09:46.862354 643 tensorflow.cc:2729] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0331 04:09:46.862435 643 tensorflow.cc:2668] TRITONBACKEND_ModelFinalize: delete model state
I0331 04:09:46.930082 643 model_lifecycle.cc:579] successfully unloaded '1_predicttensorflowtriton' version 1
I0331 04:09:47.862379 643 server.cc:302] Timeout 29: Found 2 live models and 0 in-flight non-inference requests
  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
I0331 04:09:48.6330

In [25]:
cat /workspace/models_for_benchmarking/executor_model/config.pbtxt

name: "executor_model"
platform: "merlin_executor"
input {
  name: "city_id"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "checkin"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "utrip_id"
  data_type: TYPE_STRING
  dims: -1
}
output {
  name: "city_id_list/categorical_output"
  data_type: TYPE_FP32
  dims: -1
  dims: 37204
}
backend: "python"


In [26]:
%%writefile /workspace/models_for_benchmarking/executor_model/config.pbtxt

name: "executor_model"
platform: "merlin_executor"
input {
  name: "city_id"
  data_type: TYPE_INT64
  dims: -1
  dims: 1
}
input {
  name: "checkin"
  data_type: TYPE_INT64
  dims: -1
  dims: 1
}
input {
  name: "utrip_id"
  data_type: TYPE_STRING
  dims: -1
  dims: 1
}
output {
  name: "city_id_list/categorical_output"
  data_type: TYPE_FP32
  dims: -1
  dims: 37204
}
backend: "python"

Overwriting /workspace/models_for_benchmarking/executor_model/config.pbtxt


In [27]:
subprocess.Popen(['tritonserver', '--model-repository=/workspace/models_for_benchmarking/'])

<subprocess.Popen at 0x7fc669ea8b80>

I0331 04:10:21.241112 1140 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f0136000000' with size 268435456
I0331 04:10:21.241478 1140 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0331 04:10:21.243629 1140 model_lifecycle.cc:459] loading: 0_transformworkflowtriton:1
I0331 04:10:21.243663 1140 model_lifecycle.cc:459] loading: 1_predicttensorflowtriton:1
I0331 04:10:21.243687 1140 model_lifecycle.cc:459] loading: executor_model:1
I0331 04:10:21.426653 1140 tensorflow.cc:2536] TRITONBACKEND_Initialize: tensorflow
I0331 04:10:21.426673 1140 tensorflow.cc:2546] Triton TRITONBACKEND API version: 1.10
I0331 04:10:21.426678 1140 tensorflow.cc:2552] 'tensorflow' TRITONBACKEND API version: 1.10
I0331 04:10:21.426680 1140 tensorflow.cc:2576] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
2023-03-31 04:10

2023-03-31 04:10:35.325589: I tensorflow/cc/saved_model/loader.cc:215] Running initialization op on SavedModel bundle at path: /workspace/models_for_benchmarking/1_predicttensorflowtriton/1/model.savedmodel
2023-03-31 04:10:35.439054: I tensorflow/cc/saved_model/loader.cc:325] SavedModel load for tags { serve }; Status: success: OK. Took 427522 microseconds.
I0331 04:10:35.439161 1140 python_be.cc:1856] TRITONBACKEND_ModelInstanceInitialize: executor_model (GPU device 0)
I0331 04:10:35.439379 1140 model_lifecycle.cc:694] successfully loaded '1_predicttensorflowtriton' version 1
2023-03-31 04:10:36.712996: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-31 04:10:38.035153: I tensorflow/stream_executor/cu

In [28]:
with grpcclient.InferenceServerClient("localhost:8001") as client:
    response = client.infer('executor_model', inputs)

  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
Failed to transform operator <merlin.systems.dag.runtimes.triton.ops.workflow.TransformWorkflowTriton object at 0x7f074ca31280>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/merlin/dag/executors.py", line 183, in _transform_data
    output_data = node.op.transform(selection, input_data)
  File "/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/runtimes/triton/ops/workflow.py", line 97, in transform
    raise tritonclient.utils.InferenceServerException(
tritonclient.utils.InferenceServerException: unexpected inference output 'city_id_list' for model '0_transformworkflowtriton'


InferenceServerException: [StatusCode.INTERNAL] unexpected inference output 'city_id_list' for model '0_transformworkflowtriton'

In [31]:
cat /workspace/models_for_benchmarking/0_transformworkflowtriton/config.pbtxt

name: "0_transformworkflowtriton"
input {
  name: "city_id"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "checkin"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "utrip_id"
  data_type: TYPE_STRING
  dims: -1
}
output {
  name: "city_id_list__values"
  data_type: TYPE_INT64
  dims: -1
  dims: -1
}
output {
  name: "city_id_list__offsets"
  data_type: TYPE_INT32
  dims: -1
  dims: -1
}
parameters {
  key: "cats"
  value {
  }
}
parameters {
  key: "conts"
  value {
  }
}
parameters {
  key: "output_model"
  value {
  }
}
parameters {
  key: "python_module"
  value {
    string_value: "merlin.systems.triton.models.workflow_model"
  }
}
backend: "python"
