**<div style='font-size:200%'>Batch Transform using the sm-gluonts entrypoint</div>**

In this notebook, we first register a model artifact into a SageMaker model, then perform a batch evaluation. Optionally, we deregister the model.

In [3]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

import logging
import sagemaker as sm
from sagemaker.mxnet.model import MXNetModel

role: str = sm.get_execution_role()    # When running on SageMaker notebook instance.
sess = sm.Session()
region: str = sess.boto_session.region_name

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Global config

In [4]:
# I/O S3 paths MUST have trailing '/'
# Good data from yesterday (20200924)
#
# Today's data 20200925 from:
# s3://app01-nvsgisrssr07-be-sh-modelartifactbucketdd4cb-9qo6k8mkeset/planning/inference-runs-NIBR-Cambridge/weekly/inference-230911/preprocess-output/DeepAR-NGB-LSTM-XGB/

bt_input = 's3://BUCKET/BT_INPUT/'
bt_output = 's3://BUCKET/BT_OUTPUT/'
data_json = 'fcast-input.json'

train_model_artifact = "model_s3_FROM_NOTEBOOK_02-hpo-train"

%set_env BT_INPUT=$bt_input
%set_env BT_OUTPUT=$bt_output
%set_env DATA_JSON=$data_json

env: BT_INPUT=s3://app01-nvsgisrssr07-be-sh-modelartifactbucketdd4cb-9qo6k8mkeset/planning/inference-runs-NIBR-Cambridge/weekly/inference-23232/pogr/preprocess-output/
env: BT_OUTPUT=s3://app01-nvsgisrssr07-be-sh-modelartifactbucketdd4cb-9qo6k8mkeset/planning/inference-runs-NIBR-Cambridge/weekly/inference-23232/pogr/batch-output/DeepAR/
env: DATA_JSON=fcast-input.json


In [5]:
train_model_artifact

's3://app01-nvsgisrssr07-be-sh-modelartifactbucketdd4cb-9qo6k8mkeset/planning/training-runs-NIBR-Cambridge/weekly/training-23232/pogr/train-output/DeepAR/nibr-deepar-tuning-23232-pogr-001-7061ec72/repacked/model.tar.gz'

# Create model

Let SDK auto-generates the model name, so we can safely make this notebook reentrant.

In [6]:
mxnet_model = MXNetModel(
        model_data=train_model_artifact,
        role=role,
        entry_point='entrypoint.py',
        source_dir='../../src/entrypoint',
        py_version="py3",
        framework_version="1.6.0",
        sagemaker_session=sess,
        container_log_level=logging.DEBUG,
    )

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


A bit of reverse engineering, to confirm env. vars that the model will end-up using. Will be useful when the time comes where I need to do all these in boto3 or botocore.

In [7]:
# Before create model
mxnet_model._framework_env_vars()

{'SAGEMAKER_PROGRAM': 'entrypoint.py',
 'SAGEMAKER_SUBMIT_DIRECTORY': 'file://../../src/sm_gluonts',
 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false',
 'SAGEMAKER_CONTAINER_LOG_LEVEL': '10',
 'SAGEMAKER_REGION': 'us-east-1'}

In [8]:
# Create model
mxnet_model._create_sagemaker_model(instance_type='ml.m5.xlarge')

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


In [9]:
# Model name
mxnet_model.name

'mxnet-inference-2020-10-07-05-46-29-183'

In [10]:
mxnet_model._framework_env_vars()

{'SAGEMAKER_PROGRAM': 'entrypoint.py',
 'SAGEMAKER_SUBMIT_DIRECTORY': 's3://sagemaker-us-east-1-925515096152/mxnet-inference-2020-10-07-05-46-28-631/model.tar.gz',
 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false',
 'SAGEMAKER_CONTAINER_LOG_LEVEL': '10',
 'SAGEMAKER_REGION': 'us-east-1'}

In [11]:
# Peek into model's model.tar.gz (which is different from training artifact model.tar.gz).
model_s3 = mxnet_model._framework_env_vars()['SAGEMAKER_SUBMIT_DIRECTORY']
%set_env MODEL_S3=$model_s3
!aws s3 cp $MODEL_S3 - | tar -tzvf -

env: MODEL_S3=s3://sagemaker-us-east-1-925515096152/mxnet-inference-2020-10-07-05-46-28-631/model.tar.gz
tar: Removing leading `/' from member names
drwxr-xr-x ec2-user/ec2-user 0 2020-10-07 05:46 /
-rw-r--r-- ec2-user/ec2-user 51 2020-10-05 11:21 type.txt
-rw-r--r-- ec2-user/ec2-user 161067 2020-10-05 11:21 prediction_net-0000.params
drwxrwxr-x ec2-user/ec2-user      0 2020-10-07 05:38 code/
-rw-rw-r-- ec2-user/ec2-user   2604 2020-09-25 13:47 code/metrics.py
-rw-r--r-- ec2-user/ec2-user  18241 2020-10-07 05:00 code/inf_deep_NIBR_cam_entrypoint.py
-rw-rw-r-- ec2-user/ec2-user  20593 2020-09-25 14:18 code/entrypoint-Copy1.py
-rw-rw-r-- ec2-user/ec2-user  14689 2020-09-25 13:47 code/evaluator.py
-rw-rw-r-- ec2-user/ec2-user     26 2020-09-17 08:30 code/requirements.txt
-rw-rw-r-- ec2-user/ec2-user   4270 2020-09-25 13:47 code/sm_util.py
-rw-rw-r-- ec2-user/ec2-user  18683 2020-10-07 05:38 code/entrypoint.py
drwxrwxr-x ec2-user/ec2-user      0 2020-09-25 14:18 code/.ipynb_checkpoints/
-r

# Batch Transform

In [12]:
# Batch Transform
bt = mxnet_model.transformer(
    instance_count=1,
    instance_type='ml.m5.4xlarge',
    strategy='MultiRecord',
    assemble_with='Line',
    output_path=bt_output,
    accept='application/json',
    env={'SAGEMAKER_MODEL_SERVER_TIMEOUT': '3600'},
    max_concurrent_transforms=8,
    max_payload=1,
)

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
Using already existing model: mxnet-inference-2020-10-07-05-46-29-183


In [13]:
bt.base_transform_job_name

'mxnet-inference-2020-10-07-05-46-29-183'

In [14]:
bt.transform(
    data=bt_input,
    data_type='S3Prefix',
    content_type='application/json',
    split_type='Line',
    wait=True,
    logs=True,
    #wait=False,
    #logs=False,
)

.........................[34mCollecting gluonts==0.4.3
  Downloading gluonts-0.4.3-py3-none-any.whl (323 kB)[0m
[34mCollecting holidays<0.10,>=0.9
  Downloading holidays-0.9.12.tar.gz (85 kB)[0m
[34mCollecting ujson~=1.35
  Downloading ujson-1.35.tar.gz (192 kB)[0m
[34mCollecting pandas<0.26,>=0.25
  Downloading pandas-0.25.3-cp36-cp36m-manylinux1_x86_64.whl (10.4 MB)[0m
[34mCollecting pydantic~=1.1
  Downloading pydantic-1.6.1-cp36-cp36m-manylinux2014_x86_64.whl (8.7 MB)[0m
[34mCollecting boto3~=1.0
  Downloading boto3-1.15.13.tar.gz (97 kB)[0m
[34mCollecting python-dateutil==2.8.0
  Downloading python_dateutil-2.8.0-py2.py3-none-any.whl (226 kB)[0m
[34mCollecting pytz>=2017.2
  Downloading pytz-2020.1-py2.py3-none-any.whl (510 kB)[0m
[34mCollecting dataclasses>=0.6; python_version < "3.7"
  Downloading dataclasses-0.7-py3-none-any.whl (18 kB)[0m
[34mCollecting botocore<1.19.0,>=1.18.13
  Downloading botocore-1.18.13-py2.py3-none-any.whl (6.7 MB)[0m
[34mCollecting

# Quick check on the results

**<font color="firebrick">NOTE:</font>** if you don't see two exact same numbers, something's wrong, and scream very very loud ASAP!

In [None]:
!echo $(aws s3 cp ${BT_INPUT}${DATA_JSON} - | wc -l)
!echo $(aws s3 cp ${BT_OUTPUT}${DATA_JSON}.out - | wc -l)

In [None]:
!aws s3 cp ${BT_OUTPUT}${DATA_JSON}.out - | head -1 | jq

# Delete model

Uncomment and execute cell to "deregister" the model from SageMaker. The inference model artifacts remain untouched in S3.

In [32]:
#mxnet_model.delete_model()