In [1]:
# Copyright (c) 2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_merlin_getting-started-movielens-01-download-convert/nvidia_logo.png" style="width: 90px; float: right;">

# Training and Serving Merlin on AWS SageMaker

This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container.
Note that AWS libraries in this notebook require AWS credentials, and if you are running this notebook in a container, you might need to restart the container with the AWS credentials mounted, e.g., `-v $HOME/.aws:$HOME/.aws`.


With AWS Sagemaker, you can package your own models that can then be trained and deployed in the SageMaker environment. This notebook shows you how to use Merlin for training and inference in the SageMaker environment.

It assumes that readers are familiar wtth some basic concepts in NVIDIA Merlin,
such as:

- Using NVTabular to GPU-accelerate preprocessing and feature engineering,
- Training a ranking model using Merlin Models, and
- Inference with the Triton Inference Server and Merlin Models for Tensorflow.

To learn more about these concepts in NVIDIA Merlin, see for example
[Deploying a Multi-Stage Recommender System](../Building-and-deploying-multi-stage-RecSys/README.md)
in this repository or example notebooks in
[Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples).

To run this notebook, you need to have [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) installed.

In [2]:
! python -m pip install sagemaker

Collecting sagemaker
  Downloading sagemaker-2.116.0.tar.gz (592 kB)
[K     |████████████████████████████████| 592 kB 4.4 MB/s eta 0:00:01
Collecting importlib-metadata<5.0,>=1.4.0
  Downloading importlib_metadata-4.13.0-py3-none-any.whl (23 kB)
Collecting pathos
  Downloading pathos-0.3.0-py3-none-any.whl (79 kB)
[K     |████████████████████████████████| 79 kB 10.5 MB/s eta 0:00:01
[?25hCollecting protobuf3-to-dict<1.0,>=0.1.5
  Downloading protobuf3-to-dict-0.1.5.tar.gz (3.5 kB)
Collecting schema
  Downloading schema-0.7.5-py2.py3-none-any.whl (17 kB)
Collecting smdebug_rulesconfig==1.0.1
  Downloading smdebug_rulesconfig-1.0.1-py2.py3-none-any.whl (20 kB)
Collecting pox>=0.3.2
  Downloading pox-0.3.2-py3-none-any.whl (29 kB)
Collecting dill>=0.3.6
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[K     |████████████████████████████████| 110 kB 17.3 MB/s eta 0:00:01
[?25hCollecting multiprocess>=0.70.14
  Downloading multiprocess-0.70.14-py38-none-any.whl (132 kB)
[K     |███

## Part 1: Generating Dataset and Docker image

### Generating Dataset

In this notebook, we use the synthetic train and test datasets generated by mimicking the real [Ali-CCP](https://tianchi.aliyun.com/dataset/dataDetail?dataId=408#1): Alibaba Click and Conversion Prediction dataset to build our recommender system ranking models. The Ali-CCP is a dataset gathered from real-world traffic logs of the recommender system in Taobao, the largest online retail platform in the world.

If you would like to use real Ali-CCP dataset instead, you can download the training and test datasets on [tianchi.aliyun.com](https://tianchi.aliyun.com/dataset/dataDetail?dataId=408#1). You can then use [get_aliccp()](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/datasets/ecommerce/aliccp/dataset.py#L43) function to curate the raw csv files and save them as parquet files.

In [3]:
import os

from merlin.datasets.synthetic import generate_data

DATA_FOLDER = os.environ.get("DATA_FOLDER", "/workspace/data/")
NUM_ROWS = os.environ.get("NUM_ROWS", 1_000_000)
SYNTHETIC_DATA = eval(os.environ.get("SYNTHETIC_DATA", "True"))
BATCH_SIZE = int(os.environ.get("BATCH_SIZE", 512))

if SYNTHETIC_DATA:
    train, valid = generate_data("aliccp-raw", int(NUM_ROWS), set_sizes=(0.7, 0.3))
    # save the datasets as parquet files
    train.to_ddf().to_parquet(os.path.join(DATA_FOLDER, "train"))
    valid.to_ddf().to_parquet(os.path.join(DATA_FOLDER, "valid"))



### Training Script

The training script [train.py](./train.py) in this example starts with the synthethic dataset we have created in the previous cell and produces a ranking model by performing the following tasks:
- Perform feature engineering and preprocessing with [NVTabular](https://github.com/NVIDIA-Merlin/NVTabular). NVTabular implements common feature engineering and preprocessing operators in easy-to-use, high-level APIs.
- Use [Merlin Models](https://github.com/NVIDIA-Merlin/models/) to train [Facebook's DLRM model](https://arxiv.org/pdf/1906.00091.pdf) in Tensorflow.
- Prepares [ensemble models](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#ensemble-models) for serving on [Triton Inference Server](https://github.com/triton-inference-server/server).
The training script outputs to `model_dir` the final NVTabular workflow and the trained DLRM model as an ensemble model. You want to make sure that your script generates any artifacts within `model_dir`, since SageMaker packages any files in this directory into a compressed tar archive and made available at the S3 location. The ensemble model that is uploaded to S3 will be used later to handle predictions in Triton inference server later in this notebook.

In [4]:
%%writefile train.py
#
# Copyright (c) 2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import argparse
import json
import logging
import os
import sys
import tempfile

# We can control how much memory to give tensorflow with this environment variable
# IMPORTANT: make sure you do this before you initialize TF's runtime, otherwise
# TF will have claimed all free GPU memory
os.environ["TF_MEMORY_ALLOCATION"] = "0.7"  # fraction of free memory

import merlin.io
import merlin.models.tf as mm
import nvtabular as nvt
import tensorflow as tf
from merlin.schema.tags import Tags
from merlin.systems.dag.ops.workflow import TransformWorkflow
from merlin.systems.dag.ops.tensorflow import PredictTensorflow
from merlin.systems.dag.ensemble import Ensemble
import numpy as np
from nvtabular.ops import *


logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))


def parse_args():
    """
    Parse arguments passed from the SageMaker API to the container.
    """

    parser = argparse.ArgumentParser()

    # Hyperparameters sent by the client are passed as command-line arguments to the script
    parser.add_argument("--epochs", type=int, default=1)
    parser.add_argument("--batch_size", type=int, default=1024)

    # Data directories
    parser.add_argument(
        "--train_dir", type=str, default=os.environ.get("SM_CHANNEL_TRAIN")
    )
    parser.add_argument(
        "--valid_dir", type=str, default=os.environ.get("SM_CHANNEL_VALID")
    )

    # Model directory: we will use the default set by SageMaker, /opt/ml/model
    parser.add_argument("--model_dir", type=str, default=os.environ.get("SM_MODEL_DIR"))

    return parser.parse_known_args()


def create_nvtabular_workflow(train_path, valid_path):
    user_id = ["user_id"] >> Categorify() >> TagAsUserID()
    item_id = ["item_id"] >> Categorify() >> TagAsItemID()
    targets = ["click"] >> AddMetadata(tags=[Tags.BINARY_CLASSIFICATION, "target"])

    item_features = (
        ["item_category", "item_shop", "item_brand"]
        >> Categorify()
        >> TagAsItemFeatures()
    )

    user_features = (
        [
            "user_shops",
            "user_profile",
            "user_group",
            "user_gender",
            "user_age",
            "user_consumption_2",
            "user_is_occupied",
            "user_geography",
            "user_intentions",
            "user_brands",
            "user_categories",
        ]
        >> Categorify()
        >> TagAsUserFeatures()
    )

    outputs = user_id + item_id + item_features + user_features + targets

    workflow = nvt.Workflow(outputs)

    return workflow


def create_ensemble(workflow, model):
    serving_operators = (
        workflow.input_schema.column_names
        >> TransformWorkflow(workflow)
        >> PredictTensorflow(model)
    )
    ensemble = Ensemble(serving_operators, workflow.input_schema)
    return ensemble


def train():
    """
    Train the Merlin model.
    """
    train_path = os.path.join(args.train_dir, "*.parquet")
    valid_path = os.path.join(args.valid_dir, "*.parquet")

    workflow = create_nvtabular_workflow(
        train_path=train_path,
        valid_path=valid_path,
    )

    train_dataset = nvt.Dataset(train_path)
    valid_dataset = nvt.Dataset(valid_path)

    output_path = tempfile.mkdtemp()
    workflow_path = os.path.join(output_path, "workflow")

    workflow.fit(train_dataset)
    workflow.transform(train_dataset).to_parquet(
        output_path=os.path.join(output_path, "train")
    )
    workflow.transform(valid_dataset).to_parquet(
        output_path=os.path.join(output_path, "valid")
    )

    workflow.save(workflow_path)
    logger.info(f"Workflow saved to {workflow_path}.")

    train_data = merlin.io.Dataset(os.path.join(output_path, "train", "*.parquet"))
    valid_data = merlin.io.Dataset(os.path.join(output_path, "valid", "*.parquet"))

    schema = train_data.schema
    target_column = schema.select_by_tag(Tags.TARGET).column_names[0]

    model = mm.DLRMModel(
        schema,
        embedding_dim=64,
        bottom_block=mm.MLPBlock([128, 64]),
        top_block=mm.MLPBlock([128, 64, 32]),
        prediction_tasks=mm.BinaryClassificationTask(target_column),
    )

    model.compile("adam", run_eagerly=False, metrics=[tf.keras.metrics.AUC()])

    batch_size = args.batch_size
    epochs = args.epochs
    logger.info(f"batch_size = {batch_size}, epochs = {epochs}")

    model.fit(
        train_data,
        validation_data=valid_data,
        batch_size=args.batch_size,
        epochs=epochs,
        verbose=2,
    )

    model_path = os.path.join(output_path, "dlrm")
    model.save(model_path)
    logger.info(f"Model saved to {model_path}.")

    # We remove the label columns from its inputs.
    # This removes all columns with the TARGET tag from the workflow.
    # We do this because we need to set the workflow to only require the
    # features needed to predict, not train, when creating an inference
    # pipeline.
    label_columns = workflow.output_schema.select_by_tag(Tags.TARGET).column_names
    workflow.remove_inputs(label_columns)

    ensemble = create_ensemble(workflow, model)
    ensemble_path = args.model_dir
    ensemble.export(ensemble_path)
    logger.info(f"Ensemble graph saved to {ensemble_path}.")


if __name__ == "__main__":
    args, _ = parse_args()
    train()

Overwriting train.py


### The `Dockerfile`

The `Dockerfile` describes the image that will be used on SageMaker for training and inference.
We start from the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) docker image and install the [sagemaker-training-toolkit](https://github.com/aws/sagemaker-training-toolkit) library, which makes the image compatible with Sagemaker for training models.

In [5]:
%%writefile container/Dockerfile

FROM nvcr.io/nvidia/merlin/merlin-tensorflow:22.10

RUN pip3 install sagemaker-training

Overwriting container/Dockerfile


### Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is available as the shell script `build_and_push_image.sh`. If you are running this notebook inside the [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) docker container, you probably need to execute the script outside the container (e.g., in your terminal where you can run the `docker` command).

You need to have the AWS CLI installed to run this code. To install the AWS CLI, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html#getting-started-install-instructions).

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it.

Note that running the following script requires permissions to create new repositories on Amazon ECR.

In [6]:
%%writefile ./build_and_push_image.sh

#!/bin/bash

set -euo pipefail

# The name of our algorithm
ALGORITHM_NAME=sagemaker-merlin-tensorflow
REGION=us-east-1

cd container

ACCOUNT=$(aws sts get-caller-identity --query Account --output text --region ${REGION})

# Get the region defined in the current configuration (default to us-west-2 if none defined)

REPOSITORY="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com"
IMAGE_URI="${REPOSITORY}/${ALGORITHM_NAME}:latest"

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${REGION} | docker login --username AWS --password-stdin ${REPOSITORY}

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${ALGORITHM_NAME}" --region ${REGION} > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${ALGORITHM_NAME}" --region ${REGION} > /dev/null
fi

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${ALGORITHM_NAME} .
docker tag ${ALGORITHM_NAME} ${IMAGE_URI}

docker push ${IMAGE_URI}

Overwriting ./build_and_push_image.sh


In [7]:
# If you are able to run `docker` from the notebook environment, you can uncomment and run the below script.
# ! ./build_and_push_image.sh

## Part 2: Training your Merlin model on Sagemaker

To deploy the training script onto Sagemaker, we use the Sagemaker Python SDK.
Here, we create a Sagemaker session that we will use to perform our Sagemaker operations, specify the bucket to use, and the role for working with Sagemaker.

In [8]:
import sagemaker

sess = sagemaker.Session()

# S3 prefix
prefix = "DEMO-merlin-tensorflow-aliccp"

role = sagemaker.get_execution_role()

print(role)

Couldn't call 'get_role' to get Role ARN from role name AWSOS-AD-Engineer to get Role path.


arn:aws:iam::843263297212:role/AWSOS-AD-Engineer


We can use the Sagemaker Python SDK to upload the Ali-CCP synthetic data to our S3 bucket.

In [9]:
data_location = sess.upload_data(DATA_FOLDER, key_prefix=prefix)

print(data_location)

s3://sagemaker-us-east-1-843263297212/DEMO-merlin-tensorflow-aliccp


### Training on Sagemaker using the Python SDK

Sagemaker provides the Python SDK for training a model on Sagemaker.

Here, we start by using the ECR image URL of the image we pushed in the previous section.

In [10]:
import boto3

sts_client = boto3.client("sts")
account = sts_client.get_caller_identity()["Account"]

my_session = boto3.session.Session()
region = my_session.region_name

algorithm_name = "sagemaker-merlin-tensorflow"

ecr_image = "{}.dkr.ecr.{}.amazonaws.com/{}:latest".format(
    account, region, algorithm_name
)

print(ecr_image)

843263297212.dkr.ecr.us-east-1.amazonaws.com/sagemaker-merlin-tensorflow:latest


We can call `Estimator.fit()` to start training on Sagemaker. Here, we use a `g4dn` GPU instance that are equipped with NVIDIA T4 GPUs.
Our training script `train.py` is passed to the Estimator through the `entry_point` parameter.
Behind the scenes, the Sagemaker Python SDK will upload the training script specified in the`entry_point` field (`train.py` in our case)
to the S3 bucket and set the `SAGEMAKER_PROGRAM` environment variable in the training instance to the S3 location so that the training instance
can download the training script on S3 to the training instance.
We also adjust our hyperparameters in the `hyperparameters` field.
We have uploaded our training dataset to our S3 bucket in the previous code cell, and the S3 URLs to our training and validation sets are passed into the `fit()` method.

In [11]:
import os
from sagemaker.estimator import Estimator


training_instance_type = "ml.g4dn.xlarge"  # GPU instance, T4

estimator = Estimator(
    role=role,
    instance_count=1,
    instance_type=training_instance_type,
    image_uri=ecr_image,
    entry_point="train.py",
    hyperparameters={
        "batch_size": 1_024,
        "epoch": 10,
    },
)

estimator.fit(
    {
        "train": f"{data_location}/train/",
        "valid": f"{data_location}/valid/",
    }
)

2022-11-09 10:18:31 Starting - Starting the training job...
2022-11-09 10:18:54 Starting - Preparing the instances for trainingProfilerReport-1667989110: InProgress
......
2022-11-09 10:19:54 Downloading - Downloading input data...
[34m== Triton Inference Server Base ==[0m
[34mNVIDIA Release 22.08 (build 42766143)[0m
[34mCopyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.[0m
[34mVarious files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.[0m
[34mThis container image and its contents are governed by the NVIDIA Deep Learning Container License.[0m
[34mBy pulling and using the container, you accept the terms and conditions of this license:[0m
[34mhttps://developer.nvidia.com/ngc/nvidia-deep-learning-container-license[0m
[34mNOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 11.7 driver version 515.65.01 with kernel driver version 510.47.03.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for deta

In [12]:
print(estimator.model_data)

s3://sagemaker-us-east-1-843263297212/sagemaker-merlin-tensorflow-2022-11-09-10-18-29-376/output/model.tar.gz


In [13]:
from sagemaker.s3 import S3Downloader as s3down

s3down.download(estimator.model_data, "/tmp/ensemble/")

In [14]:
! cd /tmp/ensemble && tar xvzf model.tar.gz

1_predicttensorflow/
1_predicttensorflow/config.pbtxt
1_predicttensorflow/1/
1_predicttensorflow/1/model.savedmodel/
1_predicttensorflow/1/model.savedmodel/assets/
1_predicttensorflow/1/model.savedmodel/variables/
1_predicttensorflow/1/model.savedmodel/variables/variables.index
1_predicttensorflow/1/model.savedmodel/variables/variables.data-00000-of-00001
1_predicttensorflow/1/model.savedmodel/saved_model.pb
1_predicttensorflow/1/model.savedmodel/keras_metadata.pb
ensemble_model/
ensemble_model/config.pbtxt
ensemble_model/1/
0_transformworkflow/
0_transformworkflow/config.pbtxt
0_transformworkflow/1/
0_transformworkflow/1/model.py
0_transformworkflow/1/workflow/
0_transformworkflow/1/workflow/categories/
0_transformworkflow/1/workflow/categories/unique.user_profile.parquet
0_transformworkflow/1/workflow/categories/unique.user_age.parquet
0_transformworkflow/1/workflow/categories/unique.user_group.parquet
0_transformworkflow/1/workflow/categories/unique.user_intentions.parquet
0_transfo

## Part 3: Retrieving Recommendations from Triton Inference Server

Although we use the Sagemaker Python SDK to train our model, here we will use `boto3` to launch our inference endpoint as it offers more low-level control than the Python SDK.

The model artificat `model.tar.gz` uploaded to S3 from the Sagemaker training job contained three directories: `0_transformworkflow` for the NVTabular workflow, `1_predicttensorflow` for the Tensorflow model, and `ensemble_model` for the ensemble graph that we can use in Triton.

```shell
/tmp/ensemble/
├── 0_transformworkflow
│   ├── 1
│   │   ├── model.py
│   │   └── workflow
│   │       ├── categories
│   │       │   ├── unique.item_brand.parquet
│   │       │   ├── unique.item_category.parquet
│   │       │   ├── unique.item_id.parquet
│   │       │   ├── unique.item_shop.parquet
│   │       │   ├── unique.user_age.parquet
│   │       │   ├── unique.user_brands.parquet
│   │       │   ├── unique.user_categories.parquet
│   │       │   ├── unique.user_consumption_2.parquet
│   │       │   ├── unique.user_gender.parquet
│   │       │   ├── unique.user_geography.parquet
│   │       │   ├── unique.user_group.parquet
│   │       │   ├── unique.user_id.parquet
│   │       │   ├── unique.user_intentions.parquet
│   │       │   ├── unique.user_is_occupied.parquet
│   │       │   ├── unique.user_profile.parquet
│   │       │   └── unique.user_shops.parquet
│   │       ├── metadata.json
│   │       └── workflow.pkl
│   └── config.pbtxt
├── 1_predicttensorflow
│   ├── 1
│   │   └── model.savedmodel
│   │       ├── assets
│   │       ├── keras_metadata.pb
│   │       ├── saved_model.pb
│   │       └── variables
│   │           ├── variables.data-00000-of-00001
│   │           └── variables.index
│   └── config.pbtxt
├── ensemble_model
│   ├── 1
│   └── config.pbtxt
└── model.tar.gz
```

We specify that we only want to use `ensemble_model` in Triton by passing the environment variable `SAGEMAKER_TRITON_DEFAULT_MODEL_NAME`.

In [15]:
import time

import boto3

sm_client = boto3.client(service_name="sagemaker")

container = {
    "Image": ecr_image,
    "ModelDataUrl": estimator.model_data,
    "Environment": {
        "SAGEMAKER_TRITON_TENSORFLOW_VERSION": "2",
        "SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": "ensemble_model",
    },
}

model_name = "model-triton-merlin-ensemble-" + time.strftime(
    "%Y-%m-%d-%H-%M-%S", time.gmtime()
)

create_model_response = sm_client.create_model(
    ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=container
)

model_arn = create_model_response["ModelArn"]

print(f"Model Arn: {model_arn}")

Model Arn: arn:aws:sagemaker:us-east-1:843263297212:model/model-triton-merlin-ensemble-2022-11-09-10-29-57


We again use the `g4dn` GPU instance that are equipped with NVIDIA T4 GPUs for launching the Triton inference server.

In [16]:
endpoint_instance_type = "ml.g4dn.xlarge"

endpoint_config_name = "endpoint-config-triton-merlin-ensemble-" + time.strftime(
    "%Y-%m-%d-%H-%M-%S", time.gmtime()
)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": endpoint_instance_type,
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

endpoint_config_arn = create_endpoint_config_response["EndpointConfigArn"]

print(f"Endpoint Config Arn: {endpoint_config_arn}")

Endpoint Config Arn: arn:aws:sagemaker:us-east-1:843263297212:endpoint-config/endpoint-config-triton-merlin-ensemble-2022-11-09-10-29-58


In [17]:
endpoint_name = "endpoint-triton-merlin-ensemble-" + time.strftime(
    "%Y-%m-%d-%H-%M-%S", time.gmtime()
)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)

endpoint_arn = create_endpoint_response["EndpointArn"]

print(f"Endpoint Arn: {endpoint_arn}")

Endpoint Arn: arn:aws:sagemaker:us-east-1:843263297212:endpoint/endpoint-triton-merlin-ensemble-2022-11-09-10-29-58


In [18]:
status = sm_client.describe_endpoint(EndpointName=endpoint_name)["EndpointStatus"]
print(f"Endpoint Creation Status: {status}")

while status == "Creating":
    time.sleep(60)
    rv = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = rv["EndpointStatus"]
    print(f"Endpoint Creation Status: {status}")

endpoint_arn = rv["EndpointArn"]

print(f"Endpoint Arn: {endpoint_arn}")
print(f"Endpoint Status: {status}")

Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: InService
Endpoint Arn: arn:aws:sagemaker:us-east-1:843263297212:endpoint/endpoint-triton-merlin-ensemble-2022-11-09-10-29-58
Endpoint Status: InService


### Send a Request to Triton Inference Server to Transform a Raw Dataset

Once we have an endpoint running, we can test it by sending requests.
Here, we use the raw validation set and transform it using the saved NVTabular workflow we have downloaded from S3 in the previous section.

In [19]:
from merlin.schema.tags import Tags
from merlin.core.dispatch import get_lib
from nvtabular.workflow import Workflow

df_lib = get_lib()

workflow = Workflow.load("/tmp/ensemble/0_transformworkflow/1/workflow/")

label_columns = workflow.output_schema.select_by_tag(Tags.TARGET).column_names
workflow.remove_inputs(label_columns)

# read in data for request
batch = df_lib.read_parquet(
    os.path.join(DATA_FOLDER, "valid", "part.0.parquet"),
    columns=workflow.input_schema.column_names,
)[:10]
print(batch)

                     user_id  item_id  item_category  item_shop  item_brand  \
__null_dask_index__                                                           
700000                    12        2              3        194          67   
700001                    12       30             80       5621        1936   
700002                    18        5             12        776         267   
700003                    35        6             14        970         334   
700004                    51       11             28       1939         668   
700005                    22       83            226      15893        5474   
700006                    13       38            102       7172        2470   
700007                    10        7             17       1163         401   
700008                     4        4              9        582         201   
700009                     4       24             64       4458        1536   

                     user_shops  user_profile  user

In the following code cell, we use a utility function provided in [Merlin Systems](https://github.com/NVIDIA-Merlin/systems) to convert our dataframe to the payload format that can be used as inference request format for Triton.

In [20]:
from merlin.systems.triton import convert_df_to_triton_input
import tritonclient.http as httpclient

inputs = convert_df_to_triton_input(workflow.input_schema, batch, httpclient.InferInput)

request_body, header_length = httpclient.InferenceServerClient.generate_request_body(
    inputs
)

print(request_body)

b'{"inputs":[{"name":"user_id","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"item_id","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"item_category","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"item_shop","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"item_brand","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"user_shops","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"user_profile","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"user_group","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"user_gender","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"user_age","shape":[10,1],"datatype":"INT32","parameters":{"binary_data_size":40}},{"name":"user_consumption_2","shape":[10,1],"datatype":"INT3

Triton uses the [KServe community standard inference protocols](https://github.com/triton-inference-server/server/blob/main/docs/protocol/README.md).
Here, we use the [binary+json format](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md) for optimal performance in the inference request.

In order for Triton to correctly parse the binary payload, we have to specify the length of the request metadata in the header `json-header-size`.

In [21]:
runtime_sm_client = boto3.client("sagemaker-runtime")

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=f"application/vnd.sagemaker-triton.binary+json;json-header-size={header_length}",
    Body=request_body,
)

# Parse json header size length from the response
header_length_prefix = "application/vnd.sagemaker-triton.binary+json;json-header-size="
header_length_str = response["ContentType"][len(header_length_prefix) :]

# Read response body
result = httpclient.InferenceServerClient.parse_response_body(
    response["Body"].read(), header_length=int(header_length_str)
)
output_data = result.as_numpy("click/binary_classification_task")
print("predicted sigmoid result:\n", output_data)

predicted sigmoid result:
 [[0.48595208]
 [0.4647554 ]
 [0.50048226]
 [0.53553176]
 [0.5209902 ]
 [0.54944164]
 [0.5032344 ]
 [0.475241  ]
 [0.5077254 ]
 [0.5009623 ]]


## Terminate endpoint and clean up artifacts

Don't forget to clean up artifacts and terminate the endpoint, or the endpoint will continue to incur costs.

In [22]:
sm_client.delete_model(ModelName=model_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '6ad24616-5c7c-4525-a63c-62d1b06ee8ad',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '6ad24616-5c7c-4525-a63c-62d1b06ee8ad',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 09 Nov 2022 10:38:12 GMT'},
  'RetryAttempts': 0}}