# Training and Serving Merlin on AWS SageMaker

## Testing your algorithm on your local machine

We use the synthetic train and test datasets generated by mimicking the real Ali-CCP: Alibaba Click and Conversion Prediction dataset to build our recommender system ranking models.

If you would like to use real Ali-CCP dataset instead, you can download the training and test datasets on tianchi.aliyun.com. You can then use get_aliccp() function to curate the raw csv files and save them as parquet files.


```python
from merlin.datasets.synthetic import generate_data

DATA_FOLDER = os.environ.get("DATA_FOLDER", "/workspace/data/aliccp-raw-synthetic")
NUM_ROWS = os.environ.get("NUM_ROWS", 1000000)
SYNTHETIC_DATA = eval(os.environ.get("SYNTHETIC_DATA", "True"))
BATCH_SIZE = int(os.environ.get("BATCH_SIZE", 512))

if SYNTHETIC_DATA:
    train, valid = generate_data("aliccp-raw", int(NUM_ROWS), set_sizes=(0.7, 0.3))
    # save the datasets as parquet files
    train.to_ddf().to_parquet(os.path.join(DATA_FOLDER, "train"))
    valid.to_ddf().to_parquet(os.path.join(DATA_FOLDER, "valid"))
```

In [1]:
DATA_DIRECTORY = "/workspace/data/aliccp-raw-synthetic/"

In [2]:
! ls {DATA_DIRECTORY}

train  valid


In [3]:
! python3 container/train.py \
    --train_dir=/workspace/data/aliccp_raw_synthetic/train/ \
    --valid_dir=/workspace/data/aliccp_raw_synthetic/valid/ \
    --model_dir=/tmp/local_training/ \
    --batch_size=512 \
    --epochs=2

python3: can't open file 'container/train.py': [Errno 2] No such file or directory


### How Amazon SageMaker runs your Docker container

Because you can run the same image in training or hosting, Amazon SageMaker runs your container with the argument `train` or `serve`. How your container processes this argument depends on the container.

* In this example, we don't define a `ENTRYPOINT` in the `Dockerfile`, so Docker runs the command [`train` at training time](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html) and [`serve` at serving time](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html). In this example, we define these as executable Python scripts, but they could be any program that we want to start in that environment.
* If you specify a program as a `ENTRYPOINT` in the `Dockerfile`, that program will be run at startup and its first argument will be `train` or `serve`. The program can then look at that argument and decide what to do.
* If you are building separate containers for training and hosting (or building only for one or the other), you can define a program as a `ENTRYPOINT` in the `Dockerfile` and ignore (or verify) the first argument passed in.

#### Running your container during training

When Amazon SageMaker runs training, your `train` script is run, as in a regular Python program. A number of files are laid out for your use, under the `/opt/ml` directory:

```
    /opt/ml
    |-- input
    |   |-- config
    |   |   |-- hyperparameters.json
    |   |    -- resourceConfig.json
    |    -- data
    |        -- <channel_name>
    |            -- <input data>
    |-- model
    |   -- <model files>
     -- output
        -- failure
```

##### The input

* `/opt/ml/input/config` contains information to control how your program runs. `hyperparameters.json` is a JSON-formatted dictionary of hyperparameter names to values. These values are always strings, so you may need to convert them. `resourceConfig.json` is a JSON-formatted file that describes the network layout used for distributed training.
* `/opt/ml/input/data/<channel_name>/` (for File mode) contains the input data for that channel. The channels are created based on the call to `CreateTrainingJob`, but it's generally important that channels match algorithm expectations. The files for each channel are copied from S3 to this directory, preserving the tree structure indicated by the S3 key structure.
* `/opt/ml/input/data/<channel_name>_<epoch_number>` (for Pipe mode) is the pipe for a given epoch. Epochs start at zero and go up by one each time you read them. There is no limit to the number of epochs that you can run, but you must close each pipe before reading the next epoch.

##### The output

* `/opt/ml/model/` is the directory where you write the model that your algorithm generates. Your model can be in any format that you want. It can be a single file or a whole directory tree. SageMaker packages any files in this directory into a compressed tar archive file. This file is made available at the S3 location returned to the `DescribeTrainingJob` result.
* `/opt/ml/output` is a directory where the algorithm can write a file `failure` that describes why the job failed. The contents of this file are returned to the `FailureReason` field of the `DescribeTrainingJob` result. For jobs that succeed, there is no reason to write this file as it is ignored.

#### Running your container during hosting

Hosting has a very different model than training because hosting is responding to inference requests that come in via HTTP. In this example, we use [TensorFlow Serving](https://www.tensorflow.org/serving/), however the hosting solution can be customized. One example is the [Python serving stack within the `scikit learn` example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb).

Amazon SageMaker uses two URLs in the container:

* `/ping` receives `GET` requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.
* `/invocations` is the endpoint that receives client inference `POST` requests. The format of the request and the response is up to the algorithm. If the client supplied `ContentType` and `Accept` headers, these are passed in as well. 

The container has the model files in the same place that they were written to during training:

    /opt/ml
    `-- model
        `-- <model files>

In [4]:
! cat container/Dockerfile

FROM nvcr.io/nvidia/merlin/merlin-tensorflow:22.09

RUN pip3 install sagemaker-training

ENV SAGEMAKER_TRITON_TENSORFLOW_VERSION 2

#EXPOSE 8080


### Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `build_and_push_image.sh`.

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it.

In [5]:
! cat build_and_push_image.sh

#!/bin/bash

set -euo pipefail

# The name of our algorithm
ALGORITHM_NAME=sagemaker-merlin-tensorflow
REGION=us-east-1

cd container

ACCOUNT=$(aws sts get-caller-identity --query Account --output text --region ${REGION})

# Get the region defined in the current configuration (default to us-west-2 if none defined)

REPOSITORY="${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com"
IMAGE_URI="${REPOSITORY}/${ALGORITHM_NAME}:lastest"

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${REGION} | docker login --username AWS --password-stdin ${REPOSITORY}

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${ALGORITHM_NAME}" --region ${REGION} > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${ALGORITHM_NAME}" --region ${REGION} > /dev/null
fi

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${ALGOR

In [6]:
import sagemaker as sage

sess = sage.Session()

In [7]:
# S3 prefix
prefix = "DEMO-merlin-tensorflow-aliccp"

In [8]:
data_location = sess.upload_data(DATA_DIRECTORY, key_prefix=prefix)

In [9]:
print(data_location)

s3://sagemaker-us-east-1-843263297212/DEMO-merlin-tensorflow-aliccp


## Making predictions using Sagemaker Python SDK

In [10]:
from sagemaker import get_execution_role

role = get_execution_role()

print(role)

Couldn't call 'get_role' to get Role ARN from role name AWSOS-AD-Engineer to get Role path.


arn:aws:iam::843263297212:role/AWSOS-AD-Engineer


In [11]:
import boto3

client = boto3.client("sts")
account = client.get_caller_identity()["Account"]

my_session = boto3.session.Session()
region = my_session.region_name

algorithm_name = "sagemaker-merlin-tensorflow"

ecr_image = "{}.dkr.ecr.{}.amazonaws.com/{}:testing".format(account, region, algorithm_name)

print(ecr_image)

843263297212.dkr.ecr.us-east-1.amazonaws.com/sagemaker-merlin-tensorflow:testing


In [12]:
import os
from sagemaker.estimator import Estimator


training_instance_type = "ml.g4dn.xlarge"  # GPU instance, T4

estimator = Estimator(
    role=role,
    instance_count=1,
    instance_type=training_instance_type,
    image_uri=ecr_image,
    entry_point="train.py",
    hyperparameters={
        "batch_size": 1_024,
        "epoch": 10, 
    },
)

estimator.fit(
    {
        "train": f"{data_location}/train/",
        "valid": f"{data_location}/valid/",
    }
)

2022-10-18 08:15:09 Starting - Starting the training job...
2022-10-18 08:15:34 Starting - Preparing the instances for trainingProfilerReport-1666080908: InProgress
.........
2022-10-18 08:17:14 Downloading - Downloading input data...
2022-10-18 08:17:34 Training - Downloading the training image....................................
[34m== Triton Inference Server Base ==[0m
[34mNVIDIA Release 22.08 (build 42766143)[0m
[34mCopyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.[0m
[34mVarious files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.[0m
[34mThis container image and its contents are governed by the NVIDIA Deep Learning Container License.[0m
[34mBy pulling and using the container, you accept the terms and conditions of this license:[0m
[34mhttps://developer.nvidia.com/ngc/nvidia-deep-learning-container-license[0m
[34mNOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 11.7 driver version 515.65.01 wi

In [13]:
print(estimator.model_data)

s3://sagemaker-us-east-1-843263297212/sagemaker-merlin-tensorflow-2022-10-18-08-15-06-593/output/model.tar.gz


In [14]:
! aws s3 cp {estimator.model_data} /tmp/ensemble/

download: s3://sagemaker-us-east-1-843263297212/sagemaker-merlin-tensorflow-2022-10-18-08-15-06-593/output/model.tar.gz to ../../../../tmp/ensemble/model.tar.gz


In [15]:
! tar xvzf /tmp/ensemble/model.tar.gz

ensemble_model/
ensemble_model/1/
ensemble_model/config.pbtxt
1_predicttensorflow/
1_predicttensorflow/1/
1_predicttensorflow/1/model.savedmodel/
1_predicttensorflow/1/model.savedmodel/keras_metadata.pb
1_predicttensorflow/1/model.savedmodel/assets/
1_predicttensorflow/1/model.savedmodel/variables/
1_predicttensorflow/1/model.savedmodel/variables/variables.data-00000-of-00001
1_predicttensorflow/1/model.savedmodel/variables/variables.index
1_predicttensorflow/1/model.savedmodel/saved_model.pb
1_predicttensorflow/config.pbtxt
0_transformworkflow/
0_transformworkflow/1/
0_transformworkflow/1/model.py
0_transformworkflow/1/workflow/
0_transformworkflow/1/workflow/categories/
0_transformworkflow/1/workflow/categories/unique.item_shop.parquet
0_transformworkflow/1/workflow/categories/unique.item_category.parquet
0_transformworkflow/1/workflow/categories/unique.user_categories.parquet
0_transformworkflow/1/workflow/categories/unique.item_brand.parquet
0_transformworkflow/1/workflow/categorie

## Retrieving Recommendations from Triton Inference Server


In [16]:
import time

import boto3

sm_client = boto3.client(service_name="sagemaker")

container = {
    "Image": ecr_image,
    "ModelDataUrl": estimator.model_data,
    "Environment": {
        "SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": "ensemble_model",
    },
}

model_name = "model-triton-merlin-ensemble-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

create_model_response = sm_client.create_model(
    ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=container
)

model_arn = create_model_response["ModelArn"]

print(f"Model Arn: {model_arn}")

Model Arn: arn:aws:sagemaker:us-east-1:843263297212:model/model-triton-merlin-ensemble-2022-10-18-08-27-45


In [17]:
endpoint_instance_type = "ml.g4dn.xlarge"
endpoint_config_name = "endpoint-config-triton-merlin-ensemble-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": endpoint_instance_type,
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

endpoint_config_arn = create_endpoint_config_response["EndpointConfigArn"]

print(f"Endpoint Config Arn: {endpoint_config_arn}")

Endpoint Config Arn: arn:aws:sagemaker:us-east-1:843263297212:endpoint-config/endpoint-config-triton-merlin-ensemble-2022-10-18-08-27-46


In [18]:
endpoint_name = "endpoint-triton-merlin-ensemble-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)

endpoint_arn = create_endpoint_response["EndpointArn"]

print(f"Endpoint Arn: {endpoint_arn}")

Endpoint Arn: arn:aws:sagemaker:us-east-1:843263297212:endpoint/endpoint-triton-merlin-ensemble-2022-10-18-08-27-46


In [19]:
status = sm_client.describe_endpoint(EndpointName=endpoint_name)["EndpointStatus"]
print(f"Endpoint Creation Status: {status}")

while status == "Creating":
    time.sleep(60)
    rv = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = rv["EndpointStatus"]
    print(f"Endpoint Creation Status: {status}")

endpoint_arn = rv["EndpointArn"]

print(f"Endpoint Arn: {endpoint_arn}")
print(f"Endpoint Status: {status}")

Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: Creating
Endpoint Creation Status: InService
Endpoint Arn: arn:aws:sagemaker:us-east-1:843263297212:endpoint/endpoint-triton-merlin-ensemble-2022-10-18-08-27-46
Endpoint Status: InService


In [21]:
from merlin.schema.tags import Tags
from merlin.core.dispatch import get_lib
from nvtabular.workflow import Workflow

df_lib = get_lib()

original_data_path = DATA_DIRECTORY
workflow = Workflow.load("/tmp/ensemble/0_transformworkflow/1/workflow/")

label_columns = workflow.output_schema.select_by_tag(Tags.TARGET).column_names
workflow.remove_inputs(label_columns)

# read in data for request
batch = df_lib.read_parquet(
    os.path.join(original_data_path, "valid", "part.0.parquet"),
    columns=workflow.input_schema.column_names
)[:3]
print(batch)

                     user_id  item_id  item_category  item_shop  item_brand  \
__null_dask_index__                                                           
700000                    23       23             66       4590        1581   
700001                    11       10             27       1878         647   
700002                    30       25             72       5007        1725   

                     user_shops  user_profile  user_group  user_gender  \
__null_dask_index__                                                      
700000                     1024             1           1            1   
700001                      466             1           1            1   
700002                     1349             2           1            1   

                     user_age  user_consumption_2  user_is_occupied  \
__null_dask_index__                                                   
700000                      1                   1                 1   
700001              

In [22]:
from merlin.systems.triton import convert_df_to_triton_input
import tritonclient.http as httpclient


inputs = convert_df_to_triton_input(workflow.input_schema, batch, httpclient.InferInput)

request_body, header_length = httpclient.InferenceServerClient.generate_request_body(inputs)

print(request_body)

b'{"inputs":[{"name":"user_id","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"item_id","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"item_category","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"item_shop","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"item_brand","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"user_shops","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"user_profile","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"user_group","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"user_gender","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"user_age","shape":[3,1],"datatype":"INT32","parameters":{"binary_data_size":12}},{"name":"user_consumption_2","shape":[3,1],"datatype":"INT32","paramet

In [23]:
runtime_sm_client = boto3.client("sagemaker-runtime")

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=f"application/vnd.sagemaker-triton.binary+json;json-header-size={header_length}",
    Body=request_body,
)

# Parse json header size length from the response
header_length_prefix = "application/vnd.sagemaker-triton.binary+json;json-header-size="
header_length_str = response["ContentType"][len(header_length_prefix) :]

# Read response body
result = httpclient.InferenceServerClient.parse_response_body(
    response["Body"].read(), header_length=int(header_length_str)
)
output_data = result.as_numpy("click/binary_classification_task")
print(output_data)

[[0.5311371 ]
 [0.51526797]
 [0.4596776 ]]


## Terminate endpoint and clean up artifacts

In [24]:
sm_client.delete_model(ModelName=model_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '4c4d2158-9744-4eff-88ee-725e66bd12e9',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4c4d2158-9744-4eff-88ee-725e66bd12e9',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Tue, 18 Oct 2022 08:36:27 GMT'},
  'RetryAttempts': 0}}

## Next steps

- Polish notebook.
- Hyperparameter tuning: Sagemaker parases the model training logs using regular expressinos to extract the metrics. Write a regex for parsing logs.
- Sagemaker feature store
- kNN using sagemaker algorithms
- Distributed/multi-GPU training