**<div style='font-size:200%'>Minimalistic example to train using gluonts meta-entrypoint script</div>**

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

import json
from time import gmtime, strftime

import boto3
import sagemaker as sm
from sagemaker.mxnet.estimator import MXNet
from sagemaker.tuner import CategoricalParameter, ContinuousParameter, HyperparameterTuner, IntegerParameter
from smallmatter.pathlib import Path2
from smallmatter.sm import get_sm_execution_role

# smallmatter.sm.get_sm_execution_role() will:
# - on SageMaker classic notebook instance, simply call sagemaker.get_execution_role()
# - outside of SageMaker classic notebook instance, return the first role whose name
#   startswith "AmazonSageMaker-ExecutionRole-"
role: str = get_sm_execution_role()

sess = sm.Session()
region: str = sess.boto_session.region_name

# Global config

NOTE: you may want to reduce the value of `epochs` for quicker tests.

In [None]:
# Input data for training. Ensure the same as bucket, prefix, and dataset_name from 01-convert-csv-to-gluonts-format.ipynb.
bucket = 'BUCKETNAME'
prefix = 'gluonts-examples-dataset/synthetic-dataset'   # Ensure no trailing '/'.
data_channels = {'s3_dataset': f's3://{bucket}/{prefix}'}
print(data_channels)

# Export S3 path as environment variable for cells that run shell commands.
%set_env BUCKET=$bucket
%set_env PREFIX=$prefix

# Location of entrypoint train script.
source_dir = '../src/entrypoint/'

# Model hyperparameters.
epochs = 200

# Forecast length -- demonstrate that:
# - the dataset embeds its recommended forecast length (=30 days) in its metadata.
# - However, each train job is free to override the forecast length by changing
#   fcast_length to a different value.
fcast_length = 30

# Metric emitted by each training job. The entrypoint script may emit even more metrics, but
# we choose to capture only a few.
metric=[
    {"Name": "train:loss", "Regex": r"Epoch\[\d+\] Evaluation metric 'epoch_loss'=(\S+)"},
    {"Name": "train:learning_rate", "Regex": r"Epoch\[\d+\] Learning rate is (\S+)"},
    {"Name": "test:abs_error", "Regex": r"gluonts\[metric-abs_error\]: (\S+)"},
    {"Name": "test:rmse", "Regex": r"gluonts\[metric-RMSE\]: (\S+)"},
    {"Name": "test:mape", "Regex": r"gluonts\[metric-MAPE\]: (\S+)"},
    {"Name": "test:smape", "Regex": r"gluonts\[metric-sMAPE\]: (\S+)"},
    {"Name": "test:wmape", "Regex": r"gluonts\[metric-wMAPE\]: (\S+)"},
]

# Read cardinality of static feature

In [None]:
# If this cell fails, then please upgrade s3fs.
with Path2(f's3://{bucket}/{prefix}/metadata/metadata.json').open('rb') as f:
    cardinality = [int(json.load(f)['feat_static_cat'][0]['cardinality'])]
cardinality

We leave as an exercise to read the recommended forecast length in the dataset's metadata in `f's3://{bucket}/{prefix}/metadata/metadata.json'`.

# Training job

The same `train.py` is a meta-entrypoint script, as it supports various gluonts estimators and their specific hyperparameters.

To run the meta-entrypoint script as a SageMaker training job, two key informations are needed:

1. Set hyperparameter `algo` to `gluonts.model.deepar.DeepAREstimator` (i.e., the fully-qualified class name of the DeepAR estimator).
2. Set additional hyperparameters that correspond to the estimator's
   [kwargs](https://ts.gluon.ai/api/gluonts/gluonts.model.deepar.html#module-gluonts.model.deepar).

Feel free to experiment with another estimator e.g., `gluonts.model.deepstate.DeepStateEstimator`.

**TIPS**: remember to always consult the estimator's documentation for their hyperparameters aka. `kwargs`,
especially when experimenting with a different estimator or a different version of gluonts.

> Some well-known cases:
> - hyperparameter `trainer.__class__`, which corresponds to the `trainer` kwarg, was changed from
>   `gluonts.trainer.Trainer` (gluonts-0.5) to `gluonts.mx.trainer.Trainer` (gluonts-0.8).
> - hyperparameter `distr_output.__class__`, which corresponds to the `distr_output` kwarg, was
>   changed from `gluonts.distribution.gaussian.GaussianOutput` (gluonts-0.5) to
>   `gluonts.mx.distribution.GaussianOutput` (gluonts-0.8).

In [None]:
# This cell submits an MXNet estimator to run as a SageMaker training job.
#
# It is equivalent to directly running the train.py script as:
#
# python train.py \
#     --plot_transparent 0 \
#     --num_samples 1000 \
#     --algo 'gluonts.model.deepar.DeepAREstimator' \
#     --use_feat_static_cat True \
#     --cardinality <json string in cardinality variable> \
#     --prediction_length <fcast_len> \
#     --cell_type gru \
#     --distr_output.__class__ gluonts.mx.distribution.GaussianOutput \
#     --trainer.__class__ gluonts.mx.trainer.Trainer \
#     --trainer.epochs 200 \
#     --s3_dataset /opt/ml/input/s3_dataset \
mxnet_estimator = MXNet(
                    entry_point='train.py',
                    source_dir=source_dir,
                    framework_version='1.7.0',
                    py_version='py3',
                    role=role,
                    instance_count=1,
                    instance_type='ml.m5.2xlarge',
                    sagemaker_session=sess,
                    hyperparameters={
                        # Let's start with non-algorithm hyperparameters
                        'plot_transparent': 0,   # Whether plot should be transparent or white background
                        'num_samples': 1000,     # Number of samples during backtesting.
                        #'y_transform': 'log1p',  # Uncomment to train model with log-transformed targets.

                        # Here, you specify the algorithm to use, such as DeepAR, DeepFactor, DeepState, Transformer,
                        # etc. See glounts.model packages for the list of available algorithms.
                        #
                        # If 'algo' is not specified, then defaults to 'gluonts.model.deepar.DeepAREstimator'.
                        'algo': 'gluonts.model.deepar.DeepAREstimator',

                        # The remaining here are kwargs to the chosen estimator. For e.g., for DeepAR, consult the
                        # documentation for gluonts.model.deepar.DeepAREstimator.
                        #
                        # There're two types of kwargs hyperparameters:
                        # - primitive python types (incl. dictionaries & lists that can be deserialized from JSON).
                        #   Note that string "True", "False", and "None" will automatically become True, False, and
                        #   None, respectively.
                        # - Custom classes, notably Trainer and distribution output.
                        #   Note that time_feat is unsupported at this point in time.
                        
                        # Kwargs: Primitive python types.
                        'use_feat_static_cat': 'True',
                        'cardinality': cardinality,
                        'prediction_length': fcast_length,
                        'cell_type': 'gru',

                        # Equivalent to DeepAREstimator(..., distr_output=GaussianOutput(), ...)
                        'distr_output.__class__': 'gluonts.mx.distribution.GaussianOutput',

                        # Equivalent to DeepAREstimator(..., trainer=Trainer(epochs=...), ...)
                        'trainer.__class__': 'gluonts.mx.trainer.Trainer',
                        'trainer.epochs': epochs,
                    },

                    # Metric emitted by each training job. The entrypoint script may emit even more metrics,
                    # but we choose to capture only a few.
                    metric_definitions=metric,
)
 
# Setting wait=False frees this notebook from getting blocked by the training job.
mxnet_estimator.fit(data_channels, wait=False)

# Display reminder.
print()
print('Training job name:', mxnet_estimator.latest_training_job.job_name)
print()
print('While the training job runs, you can safely shutdown this notebook kernel and close this notebook,')
print('and go to the SageMaker console to monitor the training progress. The training job\'s console')
print('also contains links to the CloudWatch log.')

Keep track the expected S3 locations of model and output artifacts. Once the training job completes succesfully, the files will be available in these locations.

While the training job is running, you may shutdown this notebook's kernel, close this notebook, and go to the SageMaker console to monitor the training progress. The training job's console also contains links to CloudWatch log.

<div style="color:green;font-weight:bold;font-size:250%">IMPORTANT:</div>

Please note the *training job name* in the above cell. Once the training job completes, open the notebook `03-batch-transform.ipynb` to:

1. Download, extract, and inspect the content of `output_s3` to observe the training output: metrics, and the forecast plots.

2. Register the model artifact to Amazon Sagemaker, then perform Batch Transform.

---

# Optional, Advance Topics: HPO

We provide a few more examples how to run the entrypoint scripts through SageMaker HPO using another gluonts's built-in algorithms.

In [None]:
def create_tuning_job(objective_metric_name, estimator_hp, tuner_hp, metric, role, sess):
    estimator = MXNet(entry_point='train.py',
                      source_dir=source_dir,
                      framework_version='1.7.0',
                      py_version='py3',
                      role=role,
                      instance_count=1,
                      instance_type='ml.m5.2xlarge',
                      sagemaker_session=sess,
                      hyperparameters=estimator_hp,
                      metric_definitions=metric,
    )

    tuner = HyperparameterTuner(
                estimator,
                objective_metric_name,
                tuner_hp,
                metric,   # Also needed for custom algo. (i.e., entrypoint script).
                objective_type='Minimize',
                max_jobs=4,   # Hardcoded (for testing).
                max_parallel_jobs=1)
    return tuner

def get_ts():
    return strftime("%y%m%d-%H%M%S", gmtime())

## Create a tuner for DeepAR

\[SFG17\] Salinas, David, Valentin Flunkert, and Jan Gasthaus. “DeepAR: Probabilistic forecasting with autoregressive recurrent networks.” arXiv preprint arXiv:1704.04110 (2017).

In [None]:
tuner_deepar = create_tuning_job(
    objective_metric_name='test:wmape',

    # Fixed hyperparameters, i.e., same for all training jobs.
    estimator_hp={
        # Let's start with non-algorithm hyperparameters
        'plot_transparent': 0,   # Whether plot should be transparent or white background
        'num_samples': 1000,     # Number of samples during backtesting.
        #'y_transform': 'log1p',  # Uncomment to train model with log-transformed targets.

        # Here, you specify the algorithm to use, such as DeepAR, DeepFactor, DeepState, Transformer,
        # etc. See glounts.model packages for the list of available algorithms.
        #
        # If 'algo' is not specified, then defaults to 'gluonts.model.deepar.DeepAREstimator'.
        'algo': 'gluonts.model.deepar.DeepAREstimator',

        # The remaining here are kwargs to the chosen estimator. For e.g., for DeepAR, consult the
        # documentation for gluonts.model.deepar.DeepAREstimator.
        #
        # There're two types of kwargs hyperparameters:
        # - primitive python types (incl. dictionaries & lists that can be deserialized from JSON).
        #   Note that string "True", "False", and "None" will automatically become True, False, and
        #   None, respectively.
        # - Custom classes, notably Trainer and distribution output.
        #   Note that time_feat is unsupported at this point in time.

        # Kwargs: Primitive python types.
        'use_feat_static_cat': 'True',
        'cardinality': cardinality,
        'prediction_length': fcast_length,
        'cell_type': 'gru',

        # Equivalent to DeepAREstimator(..., distr_output=GaussianOutput(), ...)
        'distr_output.__class__': 'gluonts.mx.distribution.GaussianOutput',

        # Equivalent to DeepAREstimator(..., trainer=Trainer(epochs=2), ...)
        'trainer.__class__': 'gluonts.mx.trainer.Trainer',
        'trainer.epochs': epochs,
    },

    # Tunable hyperparameters, i.e., may vary across training jobs.
    tuner_hp={
        "num_cells": IntegerParameter(30, 100),
        "num_layers": IntegerParameter(2, 4),

        # Minimum learning rate is 5e-5, which equals to the default `minimum_learning_rate`
        # kwarg of the trainer. See: gluonts.mx.trainer.Trainer
        "trainer.learning_rate": ContinuousParameter(5e-5, 1e-3, scaling_type='Logarithmic'),
    },

    # Metric emitted by each training job. The entrypoint script may emit even more metrics,
    # but we choose to capture only a few.
    metric=metric,

    role=role,
    sess=sess,
)

## Create a tuner for DeepState

\[RSG+18\] Rangapuram, Syama Sundar, et al. “Deep state space models for time series forecasting.” Advances in Neural Information Processing Systems. 2018.

In [None]:
tuner_deepstate = create_tuning_job(
    objective_metric_name='test:wmape',

    # Fixed hyperparameters, i.e., same for all training jobs.
    estimator_hp={
        # Let's start with non-algorithm hyperparameters
        'plot_transparent': 0,   # Whether plot should be transparent or white background
        'num_samples': 1000,     # Number of samples during backtesting.
        #'y_transform': 'log1p',  # Uncomment to train model with log-transformed targets.

        # Here, you specify the algorithm to use, such as DeepAR, DeepFactor, DeepState, Transformer,
        # etc. See glounts.model packages for the list of available algorithms.
        #
        # If 'algo' is not specified, then defaults to 'gluonts.model.deepar.DeepAREstimator'.
        'algo': 'gluonts.model.deepstate.DeepStateEstimator',

        # The remaining here are kwargs to the chosen estimator. For e.g., for DeepState, consult the
        # documentation for gluonts.model.deepstate.DeepStateEstimator.
        #
        # There're two types of kwargs hyperparameters:
        # - primitive python types (incl. dictionaries & lists that can be deserialized from JSON).
        #   Note that string "True", "False", and "None" will automatically become True, False, and
        #   None, respectively.
        # - Custom classes, notably Trainer and distribution output.
        #   Note that time_feat is unsupported at this point in time.

        # Kwargs: Primitive python types.
        'use_feat_static_cat': 'True',
        'cardinality': cardinality,
        'prediction_length': fcast_length,
        'cell_type': 'gru',

        # Equivalent to DeepStateEstimator(..., noise_std_bound=ParameterBounds(lower=1e-06, upper=1.0), ...)
        # This bounds are exactly the default, so you can choose not to specify the hyperparameters.
        # However, we specify here to show you how you can customize the bounds. Please consult the
        # DeepState documentations all the bounds it supports.
        "noise_std_bounds.__class__": "gluonts.mx.distribution.lds.ParameterBounds",
        "noise_std_bounds.lower": "1e-6",
        "noise_std_bounds.upper": "1e-1",

        # Equivalent to DeepAREstimator(..., trainer=Trainer(epochs=2), ...)
        'trainer.__class__': 'gluonts.mx.trainer.Trainer',
        'trainer.epochs': epochs,
    },

    # Tunable hyperparameters, i.e., may vary across training jobs.
    tuner_hp={
        "num_cells": IntegerParameter(30, 100),
        "num_layers": IntegerParameter(2, 4),

        # Minimum learning rate is 5e-5, which equals to the default `minimum_learning_rate`
        # kwarg of the trainer. See: gluonts.mx.trainer.Trainer
        "trainer.learning_rate": ContinuousParameter(5e-5, 1e-3, scaling_type='Logarithmic'),
    },

    # Metric emitted by each training job. The entrypoint script may emit even more metrics, but
    # we choose to capture only a few.
    metric=metric,

    role=role,
    sess=sess,
)

## Start each tuner

In [None]:
tuner_deepar.fit(data_channels, job_name='gtsdeepar-'+get_ts(), include_cls_metadata=False)
tuner_deepstate.fit(data_channels, job_name='gtsdeepstate-'+get_ts(), include_cls_metadata=False)

# Show tuning names
for tuner in [tuner_deepar, tuner_deepstate]:
    print(tuner.latest_tuning_job.job_name)

Next, monitor the tuning jobs in the console until they complete.

**<div style="color:firebrick">NOTE: If you start a restart this notebook kernel, then you need run the next cell to attach to an existing tuning job.
Sample code shown below. Be aware that you must update the cell with your tuning jobs before running it.</div>**

In [None]:
#for tuner in completed_tuners:
for tuner in [tuner_deepar, tuner_deepstate]:
    # Travel all down the way to botocore level to query tuning status.
    if int(
            tuner
               .sagemaker_session
               .boto_session
               .client('sagemaker')
               .describe_hyper_parameter_tuning_job(
                   HyperParameterTuningJobName=tuner.latest_tuning_job.name
               )['ObjectiveStatusCounters']['Pending']
    ) > 0:
        status = 'NOT_DONE'
    else:
        status = 'DONE'

    try:
        best_training_job = tuner.best_training_job()
    except:
        # Exception: Best training job not available for tuning job: gtsdeepar-200423-064644
        best_training_job = None

    print('\n',
        status,
        tuner.latest_tuning_job.name,
        best_training_job,
        #sess.sagemaker_client.describe_training_job(TrainingJobName=tuner.best_training_job())['HyperParameters']['likelihood'],
        tuner.objective_metric_name,
        tuner.analytics().dataframe()['FinalObjectiveValue'].min(),
          
        sep='\n',
    )