# Deep Demand Forecasting with Amazon SageMaker

In [None]:
import sagemaker
from sagemaker import get_execution_role

In [None]:
session = sagemaker.Session()
role = get_execution_role()

## Copy raw data to S3

The dataset we use here is the **multivariate time-series** [electricity](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014) data taken from *Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml), Irvine, CA: University of California, School of Information and Computer Science.* A cleaned version of the data containing **321** time-series with **1H** frequency, starting from **2012-01-01** with **26304** time-steps, is available to download directly via [gluonts](https://github.com/awslabs/gluon-ts).

For the ease of access, with have made the cleaned data available in the following S3 bucket

In [None]:
from sagemaker.s3 import S3Downloader

original_data_bucket = 'sagemaker-solutions-us-west-2'
original_data_prefix = 'sagemaker-deep-demand-forecast/electricity'
original_data = 's3://{}/{}'.format(original_data_bucket, original_data_prefix)
print("original data: ")
S3Downloader.list(original_data)

So we will copy it to our own S3 bucket first

In [None]:
import boto3

bucket = 'your-s3-bucket-name'
prefix = 'tst'

s3 = boto3.client('s3')

for file in s3.list_objects(Bucket=original_data_bucket, Prefix=original_data_prefix)['Contents']:
    copy_source = {
      'Bucket': original_data_bucket,
      'Key': file['Key']
    }
    s3.copy(copy_source, bucket, file['Key'].replace(original_data_prefix, prefix))

In [None]:
input_data = 's3://{}/{}'.format(bucket, prefix)
print(f"input data: {S3Downloader.list(input_data)}")
train_data = input_data
preprocessed_data = 's3://{}/{}/processed_data'.format(bucket, prefix)
train_output = 's3://{}/{}/output'.format(bucket, prefix)
code_location = 's3://{}/{}/code'.format(bucket, prefix)

## Build container for Preprocessing and Feature Engineering

Data preprocessing and feature engineering is an important component of the ML lifecycle, and Amazon SageMaker Processing allows you to do these easily on a managed infrastructure. Now, we'll create a lightweight container that will serve as the environment for our data preprocessing. The container can also be easily customized to add more dependencies when needed.

In [None]:
region = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')
ecr_repository = 'sagemaker-deep-demand-forecast-preprocessing-container'
ecr_repository_uri = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id,
                                                                    region,
                                                                    ecr_repository)

!bash preprocess/container/build_and_push.sh $ecr_repository docker

### Run Preprocessing job with Amazon SageMaker Processing

Since the data is already clean, the script `src/preprocess/data_preprocessor.py` demostrates schematically how to use SageMaker `ScriptProcessor` to perform some data preprocessing and feature engineering transformations on your raw data.

In [None]:
from sagemaker.processing import ScriptProcessor

script_processor = ScriptProcessor(command=['python3'],
                                   image_uri=ecr_repository_uri,  # we build and push above
                                   role=role,
                                   instance_count=1,
                                   instance_type='ml.c4.xlarge')

In [None]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

script_processor.run(code='preprocess/data_preprocessor.py',
                     inputs=[ProcessingInput(source=input_data,
                                             destination='/opt/ml/processing/input')],
                     outputs=[ProcessingOutput(destination=preprocessed_data,
                                                source='/opt/ml/processing/output')],
                    )

### View Results of Data Preprocessing

Once the preprocessing job is complete, we can take a look at the contents of the S3 bucket.

In [None]:
from sagemaker.s3 import S3Downloader
processed_files = S3Downloader.list(preprocessed_data)
print('\n'.join(processed_files))

# optionally download processed data
# S3Downloader.download(preprocessed_data, preprocessed_data.split("/")[-1])

## Train your LSTNet model with GluonTS

**LSTNet** is a Deep Learning model that incorporates traditional *auto-regressive* linear models *in parallel* to the non-linear neural network part, which makes the *non-linear* deep learning model more *robust* for the time series which *violate scale changes*. 

For more details, please checkout the paper [Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks](https://arxiv.org/abs/1703.07015).

### Hyperparameters

Here is a set of hyperparameters for LSTNet model for train for **1 epoch** (for demonstration)

In [None]:
hyperparameters = {
    'context_length': 12,
    'prediction_length': 6,
    'skip_size': 4,
    'ar_window': 4,
    'rnn_num_layers': 50,
    'skip_rnn_num_layers': 50,
    'channels': 72,
    'epochs': 1,
}

### Create and Fit SageMaker Estimator

With the hyperparameters defined, we can execute the training job. We will be using the [GluonTS](https://gluon-ts.mxnet.io/), with **MXNet** as the backend deep learning framework, to define and train our *LSTNet* model. **Amazon SageMaker** makes it do this with the Framework estimators which have the deep learning frameworks already setup. Here, we create a SageMaker MXNet estimator and pass in our model training script, hyperparameters, as well as the number and type of training instances we want.

We can then `fit` the estimator on the the training data location in S3.

In [None]:
from sagemaker.mxnet import MXNet

estimator = MXNet(entry_point='train.py',
                  source_dir='deep_demand_forecast',
                  role=role,
                  train_instance_count=1, 
                  train_instance_type='ml.p3.2xlarge', # 'ml.c4.2xlarge'
                  framework_version="1.6.0",
                  py_version='py3',
                  hyperparameters=hyperparameters,
                  output_path=train_output,
                  code_location=code_location,
                  sagemaker_session=session,
                  # container_log_level=10,  # debug logs
                 )

estimator.fit(input_data)

### Examine the training evaluation

We can now access the training artifacts from the specified `output_path` in the above estimator and visual the training results

In [None]:
output_files = S3Downloader.list(train_output)
print('\n'.join(output_files))

In [None]:
import os
output_path = os.path.join(train_output, estimator._current_job_name, 'output')

S3Downloader.download(output_path, 'output')
!tar -xvf output/output.tar.gz -C output/

In [None]:
import pandas as pd

item_metrics = pd.read_csv('output/item_metrics.csv.gz', compression='gzip')
item_metrics.head()

### Visualizing the outputs

For the visualization we will use [altair package](https://github.com/altair-viz/altair) with declarative API. If you want to export to different file formats, follow [altair_saver](https://github.com/altair-viz/altair_saver). 

Note that after exporting to `html` you can go to `output` and open the generated `html` files inside notebook.

Here, we compare the [**Mean Absolute Scaled Error (MASE)**](https://en.wikipedia.org/wiki/Mean_absolute_scaled_error) against the [**symmetric Mean Absolute Percentage Error**](https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error)

In [None]:
!pip install --upgrade -q pip && pip install -q altair==4.1

In [None]:
import altair as alt

col_a = 'MASE'
col_b = 'sMAPE'

scatter = alt.Chart(item_metrics).mark_circle(size=100, fillOpacity=0.8).encode(
    alt.X(col_a, scale=alt.Scale(domain=[-0.5, 9])),
    alt.Y(col_b, scale=alt.Scale(domain=[0, 2.5])),
    tooltip=[col_a, col_b]
).interactive()
scatter.save(os.path.join('output', f'{col_a}_vs_{col_b}.html'))
scatter

In [None]:
col_a_plot = alt.Chart(item_metrics).mark_bar().encode(
        alt.X(col_a, bin=True),
        y='count()',
)
col_b_plot = alt.Chart(item_metrics).mark_bar().encode(
    alt.X(col_b, bin=True),
    y='count()',
)

col_a_b_plot = col_a_plot | col_b_plot
col_a_b_plot.save(os.path.join('output', f'{col_a}_{col_b}_barplots.html'))
col_a_b_plot

## Deploy an endpoint

To serve the model, we can deploy an endpoint where the `src/deep_demand_forecast/inference.py` script handles the predictions using the trained model as follows

In [None]:
from sagemaker.mxnet import MXNetModel

model = MXNetModel(model_data=os.path.join(output_path, 'model.tar.gz'),
                   role=role,
                   entry_point='inference.py',
                   source_dir='deep_demand_forecast',
                   py_version='py3',
                   framework_version='1.6.0',
                  )

predictor = model.deploy(instance_type='ml.m4.xlarge', initial_instance_count=1)

### Testing the endpoint

Here we can test the endpoint by requesting predictions for a randomly generated data. The `predictor` handles serialization and deserialization of the requests.

In [None]:
import numpy as np

np.random.seed(1)
random_test = np.random.randn(321, 6)

# json serializable request format
test_data = {}
test_data['target'] = random_test.tolist()
test_data['start'] = '2014-01-01'
test_data['source'] = []

ret = predictor.predict(test_data)

and then loads the return json objects

In [None]:
import json
import numpy as np

forecasts = np.array(ret["forecasts"]["samples"])
print("Forecasts shape with 10 samples: {}".format(forecasts.shape))
print("RMSE: {}".format(json.loads(ret["agg_metrics"])["RMSE"]))

## Optional: Delete the endpoint and model

When you're done with the endpoint, you should clean it up.

All of the training jobs, models and endpoints we created can be viewed through the SageMaker console of your AWS account.

In [None]:
predictor.delete_endpoint()

In [None]:
predictor.delete_model()