## DeepAR Model - Predict Bike Rental with Dynamic Features

Note: This dataset is not a true timeseries as there a lot of gaps

We have data only for first 20 days of each month and model needs to predict the rentals for 
the remaining days of the month. The dataset consists of two years data. DeepAR will shine with true multiple-timeseries dataset like the electricity example given below

In [None]:
import time
import numpy as np
import pandas as pd
import json
import matplotlib.pyplot as plt
import datetime

import boto3
import sagemaker
from sagemaker import get_execution_role

# This code is derived from AWS SageMaker Samples:
# https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_amazon_algorithms/deepar_electricity
# https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_amazon_algorithms/deepar_synthetic

In [None]:
with_categories = False
# Set a good base job name
# It will help in identifying trained models and endpoints
base_job_name = 'deepar-biketrain-with-dynamic-feat'

In [None]:
bucket = 'chandra-ml-sagemaker'
prefix = 'deepar/bikerental'

# This structure allows multiple training and test files for model development and testing
s3_data_path = "{}/{}/data_dynamic".format(bucket, prefix)
s3_output_path = "{}/{}/output".format(bucket, prefix)

In [None]:
s3_data_path,s3_output_path

In [None]:
# File name is referred as key name in S3
# Files stored in S3 are automatically replicated across
# three different availability zones in the region where the bucket was created.
# http://boto3.readthedocs.io/en/latest/guide/s3.html
def write_to_s3(filename, bucket, key):
    with open(filename,'rb') as f: # Read in binary mode
        return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)

In [None]:
# Upload one or more training files and test files to S3
write_to_s3('train_dynamic_feat.json',bucket,'deepar/bikerental/data_dynamic/train/train_dynamic_feat.json')
write_to_s3('test_dynamic_feat.json',bucket,'deepar/bikerental/data_dynamic/test/test_dynamic_feat.json')

In [None]:
# Use Spot Instance - Save up to 90% of training cost by using spot instances when compared to on-demand instances
# Reference: https://github.com/aws-samples/amazon-sagemaker-managed-spot-training/blob/main/xgboost_built_in_managed_spot_training_checkpointing/xgboost_built_in_managed_spot_training_checkpointing.ipynb

# if you are still on two-month free-tier you can use the on-demand instance by setting:
#   use_spot_instances = False

# We will use spot for training
use_spot_instances = True
max_run = 3600 # in seconds
max_wait = 3600 if use_spot_instances else None # in seconds

job_name = base_job_name

checkpoint_s3_uri = None

if use_spot_instances:
    checkpoint_s3_uri = f's3://{bucket}/{prefix}/checkpoints/{job_name}'
    
print (f'Checkpoint uri: {checkpoint_s3_uri}')

In [None]:
# Establish a session with AWS
sess = sagemaker.Session()
role = get_execution_role()

In [None]:
# This role contains the permissions needed to train, deploy models
# SageMaker Service is trusted to assume this role
print(role)

In [None]:
# https://sagemaker.readthedocs.io/en/stable/api/utility/image_uris.html#sagemaker.image_uris.retrieve

# SDK 2 uses image_uris.retrieve the container image location

# Use DeepAR Container
container = sagemaker.image_uris.retrieve("forecasting-deepar",sess.boto_region_name)

print (f'Using DeepAR Container {container}')

In [None]:
container

In [None]:
freq='H' # Timeseries consists Hourly Data and we need to predict hourly rental count

# how far in the future predictions can be made
# 12 days worth of hourly forecast 
prediction_length = 288 

# aws recommends setting context same as prediction length as a starting point. 
# This controls how far in the past the network can see
context_length = 288

In [None]:
# Configure the training job
# Specify type and number of instances to use
#   Reference: http://sagemaker.readthedocs.io/en/latest/estimators.html
# SDK 2.x version does not require train prefix for instance count and type


# With Dynamic Feat - Using a large instance ml.c5.4xlarge = 16 CPU, 32 GB
# Smaller instances are running into out of memory error 

estimator = sagemaker.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type='ml.c5.4xlarge', # using larger instance for this dynamic feature training - resource intensive
    output_path="s3://" + s3_output_path,
    sagemaker_session=sess,
    base_job_name = job_name,
    use_spot_instances=use_spot_instances,
    max_run=max_run,
    max_wait=max_wait,
    checkpoint_s3_uri=checkpoint_s3_uri)

In [None]:
freq, context_length, prediction_length

In [None]:
# https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_hyperparameters.html
hyperparameters = {
    "time_freq": freq,
    "epochs": "400",
    "early_stopping_patience": "10",
    "mini_batch_size": "64",
    "learning_rate": "5E-4",
    "context_length": str(context_length),
    "prediction_length": str(prediction_length),
    "cardinality" : "auto" if with_categories else ''
}

In [None]:
hyperparameters

In [None]:
estimator.set_hyperparameters(**hyperparameters)

In [None]:
# Here, we are simply referring to train path and test path
# You can have multiple files in each path
# SageMaker will use all the files
data_channels = {
    "train": "s3://{}/train/".format(s3_data_path),
    "test": "s3://{}/test/".format(s3_data_path)
}

In [None]:
data_channels

In [None]:
# This step takes around 35 minutes to train the model with m4.xlarge instance
estimator.fit(inputs=data_channels)

In [None]:
job_name = estimator.latest_training_job.name

In [None]:
print ('job name: {0}'.format(job_name))

In [None]:
# Create an endpoint for real-time predictions
# SDK 2. parameter name for container: image_uri

endpoint_name = sess.endpoint_from_job(
    job_name=job_name,
    initial_instance_count=1,
    instance_type='ml.m4.xlarge',
    image_uri=container,
    role=role)

In [None]:
print ('endpoint name: {0}'.format(endpoint_name))

In [None]:
# In the next lab, we will use the above endpoint for inference
# We will delete the endpoint in the next lab