**<div style='font-size:200%'>Batch Transform using the gluonts entrypoint</div>**

In this notebook, we first register a model artifact into a SageMaker model, then perform a batch evaluation. Optionally, we deregister the model.

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

import logging
import sagemaker as sm
from sagemaker.mxnet.model import MXNetModel

from smallmatter.sm import get_sm_execution_role, get_model_and_output_tgz

# smallmatter.sm.get_sm_execution_role() will:
# - on SageMaker classic notebook instance, simply call sagemaker.get_execution_role()
# - outside of SageMaker classic notebook instance, return the first role whose name
#   startswith "AmazonSageMaker-ExecutionRole-"
role: str = get_sm_execution_role()

sess = sm.Session()
region: str = sess.boto_session.region_name

# Global config

In [None]:
bucket = 'BUCKETNAME'

# I/O S3 paths MUST have trailing '/'
bt_input = f's3://{bucket}/gluonts-examples-dataset/synthetic-dataset/test/'   # Reuse test-split from notebook 01.
bt_output = f's3://{bucket}/bt_output/'

# Use artifacts from this training job.
train_job = "mxnet-training-2021-09-29-08-04-10-326"

# Observe training results

As in any SageMaker training job, entrypoint script will generate two artifacts in the S3: `model.tar.gz` and `output.tar.gz`.

The `model.tar.gz` contains the persisted model that can be used later on for inference.

The `output.tar.gz` contains the following:
- individual plot of each test timeseries
- montage of plots of all test timeseries
- backtest evaluation metrics.

In [None]:
model_tgz, output_tgz = (str(path) for path in get_model_and_output_tgz(train_job))

%set_env MODEL_S3=$model_tgz
%set_env OUTPUT_S3=$output_tgz

In [None]:
%%bash
echo -e "\nModel artifacts $MODEL_S3:"
aws s3 cp $MODEL_S3 - | tar -tzvf -

echo -e "\nOutput $OUTPUT_S3:"
aws s3 cp $OUTPUT_S3 - | tar -tzvf - | head  # NOTE: "[Errno 32] Broken pipe" can be safely ignored.

# Create model

Let SDK auto-generates the new model name, so we can safely make this notebook reentrant.

In [None]:
mxnet_model = MXNetModel(
        model_data=model_tgz,
        role=role,
        entry_point='inference.py',
        source_dir='../src/entrypoint',
        py_version="py3",
        framework_version="1.7.0",
        sagemaker_session=sess,
        container_log_level=logging.DEBUG,   # Comment this line to reduce the amount of logs in CloudWatch.
    )

A bit of reverse engineering, to confirm env. vars that the model will end-up using. Will be useful when the time comes where I need to do all these in boto3 or botocore.

In [None]:
# Before create model
mxnet_model._framework_env_vars()

In [None]:
# Create model
mxnet_model._create_sagemaker_model(instance_type='ml.m5.xlarge')

In [None]:
# Model name
mxnet_model.name

In [None]:
mxnet_model._framework_env_vars()

In [None]:
# Peek into model's model.tar.gz (which is different from training artifact model.tar.gz).
model_s3 = mxnet_model._framework_env_vars()['SAGEMAKER_SUBMIT_DIRECTORY']
%set_env MODEL_S3=$model_s3
!aws s3 cp $MODEL_S3 - | tar -tzvf -

# Batch Transform

In [None]:
instance_type = 'ml.m5.4xlarge'

# By default, GluonTS runs inference with multiple cores.
# On ml.m5.4xlarge with 8 cpu cores (= vcpu_count / 2), a single request
# already reported 75% CPU utilization (viewed in CloudWatch metrics; measured
# with gluonts-0.5).
#
# Note that this number was specific to the gluonts-0.5's DeepAR example.
# Other algorithms and gluonts versions may need different configurations.
max_concurrent_transforms = 1

bt = mxnet_model.transformer(
    instance_count=1,
    instance_type='ml.m5.4xlarge',
    strategy='MultiRecord',
    assemble_with='Line',
    output_path=bt_output,
    accept='application/json',
    env={
        'SAGEMAKER_MODEL_SERVER_TIMEOUT': '3600',
        'SAGEMAKER_MODEL_SERVER_WORKERS': str(max_concurrent_transforms),
    },
    max_payload=1,
    max_concurrent_transforms=max_concurrent_transforms,
)

In [None]:
bt.base_transform_job_name

In [None]:
# Setting wait=False (which is the default) frees this notebook
# from getting blocked by the transform job.
bt.transform(
    data=bt_input,
    data_type='S3Prefix',
    content_type='application/json',
    split_type='Line',
    join_source='Input',
    output_filter='$',
    wait=False,
    logs=False,
)

By setting `wait=False` (which is the default for transform jobs), while the transform job is running, you can may shutdown this notebook's kernel, close this notebook, and go to the SageMaker console to monitor the batch-transform progress. The batch-transform job's console also contains links to CloudWatch log.

Once the job finishes, from the batch-transform job's console, you can follow through the S3 output location, where you can preview or download the output.

# Delete model

Uncomment and execute cell to "deregister" the model from SageMaker. The inference model artifacts remain untouched in S3.

In [None]:
#mxnet_model.delete_model()