# Time Series Explainability DeepAR notebook

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

---

## Runtime

This notebook takes approximately 30 minutes to run.

### Contents

1. [Summary](#Summary)

2. [Prerequisites](#Prerequisites)    

    a. [Install Mercury](#Install-Mercury)

    b. [Install SageMaker](#Install-SageMaker)
    
    c. [Loading the data: Everlane Dataset](#Loading-the-data:-Everlane-Dataset)
    
    d. [Train TimeSeries Model](#Train-TimeSeries-Model)
    
    e. [Create an Endpoint from Training Job](#Create-an-Endpoint-from-Training-Job)

4. [Time Series Explainability](#Time-Series-Explainability)

    a. [Create `TimeSeriesDataConfig`](#Create-TimeSeriesDataConfig)
     
    b. [Create `TimeSeriesModelConfig`](#Create-TimeSeriesModelConfig)
     
    c. [Create `AsymmetricShapleyValueConfig`](#Create-AsymmetricShapleyValueConfig)
    
    d. [Create `DataConfig`](#Create-DataConfig)
    
    e. [Create `ModelConfig`](#Create-ModelConfig)
     
    f. [Setup Processor](#Setup-Processor)
    
    g. [Run Explainability Call](#Run-Explainability-Call)

5. [Analysis Config](#Analysis-Config)

    a. [Retrieve Config From s3](#Retrieve-Config-From-s3)
        
    b. [Display Config](#Display-Config)

6. [Explainability Results](#Explainability-Results)

    a. [Retrieve Results From s3](#Retrieve-Results-From-s3)
    
    b. [Display Results](#Display-Results)

7. [Clean Up](#Clean-Up)

## Summary

This notebook is created to demonstrate the invocation of a SageMaker Clarify explainability processing job for a time series DeepAR forecasting model. Given a real model and a training dataset, a processingJob will be create to analyze the SHAP score for each feature attribute.

## Prerequisites

### Install Mercury

If not already installed, the following cell will install the `mercury` package in order to display the `analysis_config.json` and explainability job output within the notebook.

In [1]:
!pip install mercury -q

[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.31.7 requires botocore==1.33.7, but you have botocore 1.29.165 which is incompatible.
awscli 1.31.7 requires s3transfer<0.9.0,>=0.8.0, but you have s3transfer 0.6.2 which is incompatible.
sagemaker 2.219.0 requires boto3<2.0,>=1.33.3, but you have boto3 1.26.83 which is incompatible.[0m[31m
[0m

### Install SageMaker

In [2]:
!pip install --force-reinstall sagemaker

[0mCollecting sagemaker
  Using cached sagemaker-2.219.0-py3-none-any.whl.metadata (14 kB)
Collecting attrs<24,>=23.1.0 (from sagemaker)
  Using cached attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting boto3<2.0,>=1.33.3 (from sagemaker)
  Downloading boto3-1.34.103-py3-none-any.whl.metadata (6.6 kB)
Collecting cloudpickle==2.2.1 (from sagemaker)
  Using cached cloudpickle-2.2.1-py3-none-any.whl.metadata (6.9 kB)
Collecting google-pasta (from sagemaker)
  Using cached google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting numpy<2.0,>=1.9.0 (from sagemaker)
  Using cached numpy-1.26.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting protobuf<5.0,>=3.12 (from sagemaker)
  Using cached protobuf-4.25.3-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Collecting smdebug-rulesconfig==1.0.1 (from sagemaker)
  Using cached smdebug_rulesconfig-1.0.1-py2.py3-none-any.whl.metadata (943 bytes)
Collecting importlib-metadata<7.0,>=1.4

    Uninstalling schema-0.7.7:
      Successfully uninstalled schema-0.7.7
  Attempting uninstall: pytz
    Found existing installation: pytz 2024.1
    Uninstalling pytz-2024.1:
      Successfully uninstalled pytz-2024.1
  Attempting uninstall: zipp
    Found existing installation: zipp 3.18.1
    Uninstalling zipp-3.18.1:
      Successfully uninstalled zipp-3.18.1
  Attempting uninstall: urllib3
    Found existing installation: urllib3 1.26.18
    Uninstalling urllib3-1.26.18:
      Successfully uninstalled urllib3-1.26.18
  Attempting uninstall: tzdata
    Found existing installation: tzdata 2024.1
    Uninstalling tzdata-2024.1:
      Successfully uninstalled tzdata-2024.1
  Attempting uninstall: tqdm
    Found existing installation: tqdm 4.66.4
    Uninstalling tqdm-4.66.4:
      Successfully uninstalled tqdm-4.66.4
  Attempting uninstall: tblib
    Found existing installation: tblib 3.0.0
    Uninstalling tblib-3.0.0:
      Successfully uninstalled tblib-3.0.0
  Attempting uninst

### Import Libraries

The model used in this example notebook is DeepAr forecasting model in AWS. For more information, please check https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html

Besides, a separate notebook will introduce how to bring your own time series model into TSX.

In [3]:
import yaml
import pandas as pd
import numpy as np
import sagemaker
import boto3
import json
import mercury
import pprint

from sagemaker import get_execution_role, session

session = boto3.Session()
s3_client = session.client("s3")
sagemaker_session = sagemaker.Session()
sm_client = boto3.client("sagemaker")
region = session.region_name
role = get_execution_role()
bucket = sagemaker.Session().default_bucket()
image_uri = sagemaker.image_uris.retrieve("forecasting-deepar", region)
training_job_name = "DeepArTest"

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/zicanl/.config/sagemaker/config.yaml


### Loading the data: Everlane Dataset

In the example dataset below, here are the corresponding variable name for TimeSeries model attributes. Same column names will be applied to `time_series_mock_data.json` as well.

- `item_id`: unique identifier for the object to be analyzed
- `target_value`: The univariate TimeSeries instance which the residuals will be computed for, also named as `series` in some open source model
- `dynamic_feature_x`: several past-observed or future-known covariate time series, also named as `past_covariates` or `future_covariates`
- `static_feature`: characteristics of a time series / constants which do not change over time, also named as `static_covariates`

In [4]:
data_tts = pd.read_json("training_dataset_lines.json", orient="records", lines=True)
data_tts.head()

Unnamed: 0,item_id,timestamp,target_value,dynamic_feature_1,dynamic_feature_2,dynamic_feature_3,static_feature_1,static_feature_2
0,mosfets,2019-09-11,47650.3,0.4576,0.2164,0.1906,1,1
1,mosfets,2019-09-12,47380.3,0.4839,0.2274,0.1889,1,1
2,mosfets,2019-09-13,50905.9,0.5047,0.2391,0.1877,1,1
3,mosfets,2019-09-14,52401.3,0.5189,0.2516,0.1871,1,1
4,mosfets,2019-09-15,68734.1,0.5258,0.2647,0.187,1,1


Group the data into the format that can be accepted by the DeepAR model

In [5]:
grouped_data = data_tts.groupby("item_id")
prediction_length = 14
dynamic_feat_list = ["dynamic_feature_1", "dynamic_feature_2", "dynamic_feature_3"]
deepar_training = []
deepar_test = []

for item_id, group in grouped_data:
    deepar_training.append(
        {
            "start": str(group["timestamp"].min()),
            "target": group["target_value"].fillna("NaN").tolist()[:-prediction_length],
            "dynamic_feat": [
                group[feature_name].tolist()[:-prediction_length]
                for feature_name in dynamic_feat_list
            ],
            "cat": [int(group.iloc[0]["static_feature_1"]), int(group.iloc[0]["static_feature_2"])],
        }
    )

    deepar_test.append(
        {
            "start": str(group["timestamp"].min()),
            "target": group["target_value"].fillna("NaN").tolist()[:-prediction_length],
            "dynamic_feat": [
                group[feature_name].tolist()[:-prediction_length]
                for feature_name in dynamic_feat_list
            ],
            "cat": [int(group.iloc[0]["static_feature_1"]), int(group.iloc[0]["static_feature_2"])],
        }
    )

Upload training and testing file

In [6]:
def write_dicts_to_json(path, data):
    with open(path, "wb") as file_path:
        for ts in data:
            file_path.write(json.dumps(ts).encode("utf-8"))
            file_path.write("\n".encode("utf-8"))


deepar_training_path = "train.json"
deepar_test_path = "test.json"
write_dicts_to_json(deepar_training_path, deepar_training)
write_dicts_to_json(deepar_test_path, deepar_test)

# upload the training and test file to S3
deepar_s3_training_path = "sagemaker/endpoint_test/train.json"
deepar_s3_test_path = "sagemaker/endpoint_test/test.json"

s3_client.upload_file(deepar_training_path, bucket, deepar_s3_training_path)
s3_client.upload_file(deepar_test_path, bucket, deepar_s3_test_path)

### Train TimeSeries Model

The example time series model is the built in [Deep-Ar forecasting model](https://docs.aws.amazon.com/sagemaker/latest/dg/algorithms-time-series.html) in SageMaker.

In [7]:
estimator = sagemaker.estimator.Estimator(
    sagemaker_session=sagemaker_session,
    image_uri=image_uri,
    role=role,
    instance_count=1,
    instance_type="ml.c5.2xlarge",
    base_job_name=training_job_name,
    use_spot_instances=False,
    output_path=f"s3://{bucket}/deepar/endpoint_test/model_output",  # specify a model output path in S3
)

hyperparameters = {
    "epochs": "10",
    "time_freq": "H",  # the intersection of the time in provided dataset
    "prediction_length": prediction_length,
    "context_length": prediction_length,
}

estimator.set_hyperparameters(**hyperparameters)

estimator.fit(
    inputs={
        "train": "s3://{}/{}".format(bucket, deepar_s3_training_path),
        "test": "s3://{}/{}".format(bucket, deepar_s3_test_path),
    }
)

INFO:sagemaker:Creating training-job with name: DeepArTest-2024-05-10-23-05-33-616


2024-05-10 23:05:33 Starting - Starting the training job...
2024-05-10 23:05:48 Starting - Preparing the instances for training...
2024-05-10 23:06:25 Downloading - Downloading the training image..................
2024-05-10 23:09:30 Training - Training image download completed. Training in progress...Docker entrypoint called with argument(s): train
Running default environment configuration script
Running custom environment configuration script
  if num_device is 1 and 'dist' not in kvstore:
[05/10/2024 23:09:47 INFO 140087617984320] Reading default configuration from /opt/amazon/lib/python3.8/site-packages/algorithm/resources/default-input.json: {'_kvstore': 'auto', '_num_gpus': 'auto', '_num_kv_servers': 'auto', '_tuning_objective_metric': '', 'cardinality': 'auto', 'dropout_rate': '0.10', 'early_stopping_patience': '', 'embedding_dimension': '10', 'learning_rate': '0.001', 'likelihood': 'student-t', 'mini_batch_size': '128', 'num_cells': '40', 'num_dynamic_feat': 'auto', 'num_eval_s

[05/10/2024 23:09:50 INFO 140087617984320] Epoch[2] Batch[0] avg_epoch_loss=11.122544
[05/10/2024 23:09:50 INFO 140087617984320] #quality_metric: host=algo-1, epoch=2, batch=0 train loss <loss>=11.122544288635254
[05/10/2024 23:09:50 INFO 140087617984320] Epoch[2] Batch[5] avg_epoch_loss=11.161819
[05/10/2024 23:09:50 INFO 140087617984320] #quality_metric: host=algo-1, epoch=2, batch=5 train loss <loss>=11.161818663279215
[05/10/2024 23:09:50 INFO 140087617984320] Epoch[2] Batch [5]#011Speed: 1728.28 samples/sec#011loss=11.161819
[05/10/2024 23:09:50 INFO 140087617984320] Epoch[2] Batch[10] avg_epoch_loss=11.044181
[05/10/2024 23:09:50 INFO 140087617984320] #quality_metric: host=algo-1, epoch=2, batch=10 train loss <loss>=10.903014755249023
[05/10/2024 23:09:50 INFO 140087617984320] Epoch[2] Batch [10]#011Speed: 1637.96 samples/sec#011loss=10.903015
[05/10/2024 23:09:50 INFO 140087617984320] processed a total of 1319 examples
#metrics {"StartTime": 1715382589.60822, "EndTime": 17153825

[05/10/2024 23:09:56 INFO 140087617984320] #quality_metric: host=algo-1, epoch=7, batch=10 train loss <loss>=11.110967826843261
[05/10/2024 23:09:56 INFO 140087617984320] Epoch[7] Batch [10]#011Speed: 1685.69 samples/sec#011loss=11.110968
[05/10/2024 23:09:56 INFO 140087617984320] processed a total of 1325 examples
#metrics {"StartTime": 1715382595.5593684, "EndTime": 1715382596.7048838, "Dimensions": {"Algorithm": "AWS/DeepAR", "Host": "algo-1", "Operation": "training"}, "Metrics": {"update.time": {"sum": 1145.308017730713, "count": 1, "min": 1145.308017730713, "max": 1145.308017730713}}}
[05/10/2024 23:09:56 INFO 140087617984320] #throughput_metric: host=algo-1, train throughput=1156.8152766274286 records/second
[05/10/2024 23:09:56 INFO 140087617984320] #progress_metric: host=algo-1, completed 80.0 % of epochs
[05/10/2024 23:09:56 INFO 140087617984320] #quality_metric: host=algo-1, epoch=7, train loss <loss>=10.883915034207432
[05/10/2024 23:09:57 INFO 140087617984320] Epoch[8] Batc

### Create an Endpoint from Training Job

In [8]:
job_name = estimator.latest_training_job.name

# Deploy the endpoint from a training job
endpoint_name = sagemaker_session.endpoint_from_job(
    job_name=job_name,
    initial_instance_count=1,
    instance_type="ml.c5.large",
    image_uri=image_uri,
    role=role,
)
print(endpoint_name)

INFO:sagemaker:Creating model with name: DeepArTest-2024-05-10-23-05-33-616
INFO:sagemaker:Creating endpoint-config with name DeepArTest-2024-05-10-23-05-33-616
INFO:sagemaker:Creating endpoint with name DeepArTest-2024-05-10-23-05-33-616


---------------!DeepArTest-2024-05-10-23-05-33-616


In [9]:
boto3.client("sagemaker").describe_endpoint(EndpointName=endpoint_name)

{'EndpointName': 'DeepArTest-2024-05-10-23-05-33-616',
 'EndpointArn': 'arn:aws:sagemaker:us-west-2:678264136642:endpoint/DeepArTest-2024-05-10-23-05-33-616',
 'EndpointConfigName': 'DeepArTest-2024-05-10-23-05-33-616',
 'ProductionVariants': [{'VariantName': 'AllTraffic',
   'DeployedImages': [{'SpecifiedImage': '156387875391.dkr.ecr.us-west-2.amazonaws.com/forecasting-deepar:1',
     'ResolvedImage': '156387875391.dkr.ecr.us-west-2.amazonaws.com/forecasting-deepar@sha256:8b63df4ba9d9c28fda01804397d5226a78dd3ac12fed2bedbd2a12b05e85c7b8',
     'ResolutionTime': datetime.datetime(2024, 5, 10, 23, 10, 48, 222000, tzinfo=tzlocal())}],
   'CurrentWeight': 1.0,
   'DesiredWeight': 1.0,
   'CurrentInstanceCount': 1,
   'DesiredInstanceCount': 1}],
 'EndpointStatus': 'InService',
 'CreationTime': datetime.datetime(2024, 5, 10, 23, 10, 47, 718000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2024, 5, 10, 23, 18, 43, 225000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': '28

### Verify the endpoint

In [10]:
request = {
    "target": [33.1595, 30.1788, 28.5022, 27.5708, 27.7571],
    "start": "2014-05-30 01:00:00",
    "dynamic_feat": [
        [
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
            1.0,
            2.0,
            3.0,
            4.0,
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
        ],
        [
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
            1.0,
            2.0,
            3.0,
            4.0,
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
        ],
        [
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
            1.0,
            2.0,
            3.0,
            4.0,
            1.0,
            2.0,
            3.0,
            4.0,
            5.0,
        ],
    ],
    "cat": [0, 1],
}

input_instances = [request]
predictor_input = {
    "instances": input_instances,
}

In [11]:
from sagemaker.serializers import JSONSerializer

predictor = sagemaker.predictor.Predictor(
    endpoint_name=endpoint_name, sagemaker_session=sagemaker_session, serializer=JSONSerializer()
)
prediction = predictor.predict(predictor_input)
print(json.loads(prediction))

{'predictions': [{'mean': [25.1382884979, 20.8679351807, 18.1757259369, 13.9214792252, 10.9966878891, 18.6233577728, 15.5454044342, 12.0037488937, 9.954161644, 15.3388795853, 12.2299814224, 10.1631307602, 10.1644105911, 6.5872197151]}]}


## Time Series Explainability

### Import Components

Import the components needed to make a TSX call.

In [12]:
from sagemaker.clarify import (
    AsymmetricShapleyValueConfig,  # config for the explainability algorithm
    DataConfig,  # general-purpose DataConfig. time series-specific data config object is provided to this
    ModelConfig,  # general-purpose ModelConfig. time series-specific data config object is provided to this
    SageMakerClarifyProcessor,  # processor object, the job call is made via this
    TimeSeriesDataConfig,  # time series-specific data config object
    TimeSeriesModelConfig,  # time series-specific predictor config object
    TimeSeriesJSONDatasetFormat,  # time series-specific dataset format
)

### Set Configurations

Here is an example of `content_template` and `record_template` for time series for more information: please check: https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-processing-job-data-format-time-series-request-jsonlines.html

In [13]:
dataset_name = "time_series_mock_data.json"

# Content template
c_template = '{"instances": $records}'
# Record template
r_template = '{"start": $start_time, "target": $target_time_series, "dynamic_feat": $related_time_series, "cat": $static_covariates}'

s3 = boto3.client("s3")  # s3 client
bucket_name = sagemaker.Session().default_bucket()
bucket_uri = "s3://" + bucket_name + "/deepar/"

s3_client.upload_file(dataset_name, bucket_name, f"deepar/data/{dataset_name}")

### Asymmetric Shapley value

Our time series forecasting explainability algorithm hinges on the application of the asymmetric Shapley values (ASV) from the theory of cooperative games. The ASV is a modification of the well-known Shapley value (e.g SHAP) that discards the symmetry axiom, but retains the efficiency exioms (i.e. attributions sum up to the predictions). Coalitions of features are generated based on a given probability distribution over feature *permutations* (rather than over *subsets* in the case of the Shapley value). In the case of time series, the distributions we use puts zero probability on permutations of features that do not respect the temporal dependencies, i.e. that have "holes". 

**References:**
- Our main scientific reference is https://arxiv.org/abs/1910.06358. We scale the approach of the paper to include also static covariates, related time series and implement a stochastic estimator for efficiency.
- A very useful math reference is [Probabilistic values by RJ Weber](http://www.library.fa.ru/files/Roth2.pdf#page=109);  specifically, section 8 about random-order values (these are the same mathematical construction of ASV). 

### Create `AsymmetricShapleyValueConfig`

An `AsymmetricShapleyValueConfig` is used to configure the algorithm Clarify uses for time series explainability. It takes the following arguments:

- `direction`: direction of explanation to be used. Available explanation types are `"chronological"`, `"anti_chronological"`, `"bidirectional"`. The cronological direction highlights the effect of older timesteps over more recent one, while the anti-chronological direction higlights the effect of timesteps closer to the forecasting. Bidirectional is a combination of the previous two modes. 
- `granularity`: Granularity of explanation to be used. Available granularities are `"timewise"` and `"fine_grained"`. The first granularity is fast and computes the attribution of individual timesteps toward the forecast, not making distinctions of related time series. The fine-grained mode is slower, but computes an attribution for every timestep and every feature dynamic, distinguishing between related and target TS.
- `num_samples`: Number of samples to be used in the Asymmetric Shapley Value forecasting algorithm. Only applicable when using `"fine_grained"`  explanations. This represents the number of permutations sampled for computing the ASV.
- `baseline`: baseline configuration (dictionary). The baseline config is used to replace out-of-coalition values for the corresponding datasets (also known as background data). For temporal data (target time series, related time series), the baseline value types are `"zero"`, where all out-of-coalition values will be replaced with `0.0`, or `"mean"`, all out-of-coalition values will be replaced with the average of a time series. For static data(static covariates), a baseline value for each covariate should be provided for each possible item_id. An example config follows, where ``item1`` and ``item2`` are item ids::
```
{
 "target_time_series": "zero",
 "related_time_series": "zero",
 "static_covariates":
  "item1": [1, 1],
  "item2": [0, 1],
 }
}
```

The notebook sets `explanation_direction` and `granularity` as variables for later reference.

In [14]:
direction = "chronological"
granularity = "fine_grained"

Only then does the notebook create the `AsymmetricShapleyValueConfig` object.

In [15]:
asym_shap_val_config = AsymmetricShapleyValueConfig(
    direction=direction,
    granularity=granularity,
    num_samples=9,  # (dimension of target_time_series + dimension of related_time_series) ^ 2
    baseline={
        "target_time_series": "zero",
        "related_time_series": "zero",
        "static_covariates": {
            "mosfets": [0, 1],
            "interpol": [1, 1],
        },
    },
)

### Create `TimeSeriesDataConfig`

A `TimeSeriesDataConfig` object is used to configure data I/O settings specific to TSX. It takes the following arguments:

- `target_time_series`: A string or a zero-based integer index. Used to locate the target time series in the shared input dataset. If this parameter is a string, then all other parameters must also be strings or lists of strings. If this parameter is an int, then all others must be ints or lists of ints.
- `item_id`: A string or a zero-based integer index. Used to locate item id in the shared input dataset.
- `timestamp`: A string or a zero-based integer index. Used to locate timestamp in the shared input dataset.
- `related_time_series`: Optional. An array of strings or array of zero-based integer indices. Used to locate all related time series in the shared input dataset (if present).
- `static_covariates`: Optional. An array of strings or array of zero-based integer indices. Used to locate all item metadata fields in the shared input dataset (if present).
- `dataset_format`: Optional. A string which describes the format of the data files provided for analysis. Should only be provided when dataset is in JSON format. Currently, we support `columns` and `timestamp_records` where example mock data files `ts_cols.json` and `time_series_mock_data.json` are provided respectively.

This `TimeSeriesDataConfig` helps the container to parse the data needed for the analysis. Any additional data columns will be excluded if not providing corresponding Jmes_path to locate them.

In [16]:
ts_data_config = TimeSeriesDataConfig(
    target_time_series="[].target_value",
    item_id="[].item_id",
    timestamp="[].timestamp",
    related_time_series=[f"[].dynamic_feature_{x+1}" for x in range(3)],
    static_covariates=["[].static_feature_1", "[].static_feature_2"],
    dataset_format=TimeSeriesJSONDatasetFormat.TIMESTAMP_RECORDS,
)

### Create `TimeSeriesModelConfig`

A `TimeSeriesModelConfig` is used to configure model settings specific to TSX. At the moment it has only one argument:

- `forecast`: JMESPath expression to extract the forecast result.

In [17]:
ts_model_config = TimeSeriesModelConfig(
    forecast="predictions[*].mean",
)

### Create DataConfig

General information about the dataset the TimeSeries model uses is provided to `DataConfig`. Here, we are providing where to retrieve the dataset, where to output the explainability job results, what format the dataset is in, and our TSX specific data settings.

In [18]:
input_uri = bucket_uri + "data/" + dataset_name
output_path = bucket_uri + "output"

data_config = DataConfig(
    s3_data_input_path=input_uri,
    s3_output_path=output_path,
    dataset_type="application/json",
    time_series_data_config=ts_data_config,
    headers=[
        "item_id",
        "timestamp",
        "target_value",
        "dynamic_feature_1",
        "dynamic_feature_2",
        "dynamic_feature_3",
        "static_feature_1",
        "static_feature_2",
    ],
)

### Create ModelConfig

With `ModelConfig` is configured here, Clarify will deploy the specified model to a new endpoint.

In [19]:
model_config = ModelConfig(
    endpoint_name=endpoint_name,
    content_type="application/json",
    accept_type="application/json",
    content_template=c_template,
    record_template=r_template,
    time_series_model_config=ts_model_config,
)

It is also possible to specify an existing endpoint for Clarify to use with the following modifications to the `ModifyConfig` call:

1. Omitting `model_name`, `instance_count`, `instance_type`, and `endpoint_name_prefix`.
2. Provided `endpoint_name`.

### Setup Processor

Create the `Processor` object that will setup the explainability job.

In [20]:
instance_count = 1
instance_type = "ml.c5.2xlarge"

clarify_processor = SageMakerClarifyProcessor(
    role=role,
    sagemaker_session=sagemaker_session,
    instance_count=instance_count,
    instance_type=instance_type,
    job_name_prefix="clarify-tsx-job-demo",
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: 1.0.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


### Run Explainability Call

In [21]:
clarify_processor.run_explainability(
    data_config=data_config,
    model_config=model_config,
    explainability_config=asym_shap_val_config,
)

INFO:sagemaker.clarify:Analysis Config: {'dataset_type': 'application/json', 'headers': ['item_id', 'timestamp', 'target_value', 'dynamic_feature_1', 'dynamic_feature_2', 'dynamic_feature_3', 'static_feature_1', 'static_feature_2'], 'time_series_data_config': {'target_time_series': '[].target_value', 'item_id': '[].item_id', 'timestamp': '[].timestamp', 'related_time_series': ['[].dynamic_feature_1', '[].dynamic_feature_2', '[].dynamic_feature_3'], 'static_covariates': ['[].static_feature_1', '[].static_feature_2'], 'dataset_format': 'timestamp_records'}, 'predictor': {'endpoint_name': 'DeepArTest-2024-05-10-23-05-33-616', 'accept_type': 'application/json', 'content_type': 'application/json', 'content_template': '{"instances": $records}', 'record_template': '{"start": $start_time, "target": $target_time_series, "dynamic_feat": $related_time_series, "cat": $static_covariates}', 'time_series_predictor_config': {'forecast': 'predictions[*].mean'}}, 'methods': {'report': {'name': 'report',

INFO:sagemaker-clarify-processing:Starting SageMaker Clarify Processing job
INFO:analyzer.data_loading.data_loader_util:Analysis config path: /opt/ml/processing/input/config/analysis_config.json
INFO:analyzer.data_loading.data_loader_util:Analysis result path: /opt/ml/processing/output
INFO:analyzer.data_loading.data_loader_util:This host is algo-1.
INFO:analyzer.data_loading.data_loader_util:This host is the leader.
INFO:analyzer.data_loading.data_loader_util:Number of hosts in the cluster is 1.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/05/10 23:24:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
INFO:analyzer.predictor.managed_endpoint:Checking endpoint status:
Legend:
(OutOfService: x, Creating: -, Updating: -, InService: !, RollingBack: <, Deleting: o, Failed: *)
INFO:analyzer.predictor.managed_endpoint:Endpoint is

## Analysis Config

### Retrieve Config From s3

In [22]:
s3.download_file(bucket_name, "deepar/output/analysis_config.json", "analysis_config.json")

### Display Config

In [23]:
with open("./analysis_config.json", "r") as analyis_config_file:
    analysis_config = json.load(analyis_config_file)
    # mercury.JSON(analysis_config, level=3)
    config_printer = pprint.PrettyPrinter(width=200, compact=False)
    config_printer.pprint(analysis_config)

{'dataset_type': 'application/json',
 'headers': ['item_id', 'timestamp', 'target_value', 'dynamic_feature_1', 'dynamic_feature_2', 'dynamic_feature_3', 'static_feature_1', 'static_feature_2'],
 'methods': {'asymmetric_shapley_value': {'baseline': {'related_time_series': 'zero', 'static_covariates': {'interpol': [1, 1], 'mosfets': [0, 1]}, 'target_time_series': 'zero'},
                                          'direction': 'chronological',
                                          'granularity': 'fine_grained',
                                          'num_samples': 9},
             'report': {'name': 'report', 'title': 'Analysis Report'}},
 'predictor': {'accept_type': 'application/json',
               'content_template': '{"instances": $records}',
               'content_type': 'application/json',
               'endpoint_name': 'DeepArTest-2024-05-10-23-05-33-616',
               'record_template': '{"start": $start_time, "target": $target_time_series, "dynamic_feat": $related_ti

## Explainability Results

### Retrieve Results From s3

In [24]:
full_result_path = f"deepar/output/asymmetric_shapley_value/{granularity}_{direction}/out.jsonl"

s3.download_file(bucket_name, full_result_path, "results.jsonl")

### Display Results

In [25]:
with open("./results.jsonl", "r") as results_file:
    results_lines = results_file.readlines()
    explainability_results = [json.loads(jsonline) for jsonline in results_lines]
    # mercury.JSON(explainability_results, level=5)
    results_printer = pprint.PrettyPrinter(width=200, depth=5, compact=False)
    results_printer.pprint(explainability_results)

[{'explanations': [{'feature_name': 'target_value',
                    'scores': [-82043.3488498264,
                               -87784.30300564234,
                               -95570.80181206595,
                               -100703.38704427084,
                               -108422.38612196181,
                               -115080.98632812497,
                               -121441.78054470489,
                               -131199.48415798612,
                               -134081.09966362847,
                               -137848.36431206597,
                               -143613.4234483507,
                               -144842.0398763021,
                               -146705.1466471354,
                               -147841.47526041666],
                    'timestamp': '2019-09-11'},
                   {'feature_name': 'target_value',
                    'scores': [7689.5171440972435,
                               6485.490559895823,
                         

                    'scores': [-563.064453125,
                               -562.7842881944445,
                               -718.6601562499999,
                               -346.71918402777777,
                               -347.92664930555554,
                               -139.30642361111111,
                               -468.6245659722222,
                               -12.202907986111,
                               -605.9900173611111,
                               -99.1861979166667,
                               -505.09461805555543,
                               310.63541666666663,
                               -134.44704861111111,
                               -24.413628472222186],
                    'timestamp': '2019-09-15'},
                   {'feature_name': 'dynamic_feature_1',
                    'scores': [-46.96440972222223,
                               37.3315972222222,
                               -767.8138020833333,
                              

                    'scores': [489.5473090277778,
                               -0.12369791666662877,
                               -559.84765625,
                               267.04427083333326,
                               -160.44574652777786,
                               -45.55338541666664,
                               -750.3932291666667,
                               -750.2881944444445,
                               -620.0759548611112,
                               -476.6644965277777,
                               -27.673177083333336,
                               -564.2052951388889,
                               -55.33159722222226,
                               -384.2868923611111],
                    'timestamp': '2019-09-25'},
                   {'feature_name': 'dynamic_feature_1',
                    'scores': [388.1753472222223,
                               208.61979166666663,
                               111.1050347222222,
                               

                    'scores': [197.22135416666663,
                               261.11783854166663,
                               262.8357204861111,
                               -96.21875000000001,
                               -101.90364583333333,
                               -215.96093749999994,
                               -281.478515625,
                               -242.67708333333334,
                               757.7749565972221,
                               87.9772135416667,
                               438.9995659722222,
                               130.95203993055557,
                               496.5668402777777,
                               260.42057291666663],
                    'timestamp': '2019-09-15'},
                   {'feature_name': 'dynamic_feature_2',
                    'scores': [389.94791666666663,
                               -19.075520833333314,
                               407.54774305555554,
                               25

                    'scores': [-131.72960069444446,
                               -508.34331597222223,
                               185.19184027777783,
                               583.2400173611111,
                               455.5464409722222,
                               -1127.2196180555554,
                               241.14800347222217,
                               -110.51736111111116,
                               -221.92968749999994,
                               129.87847222222223,
                               -565.3042534722222,
                               529.3055555555558,
                               -205.93402777777777,
                               100.5355902777778],
                    'timestamp': '2019-09-25'},
                   {'feature_name': 'dynamic_feature_2',
                    'scores': [-151.48567708333337,
                               -491.49131944444434,
                               -625.3285590277778,
                       

                               -28.083767361111114,
                               126.58854166666664,
                               -72.64800347222227,
                               287.26475694444446,
                               99.79079861111111,
                               97.09244791666669,
                               -40.82291666666657,
                               -570.3268229166666,
                               -525.1814236111112,
                               780.3298611111111,
                               28.94748263888897,
                               -372.99348958333337,
                               194.08680555555554],
                    'timestamp': '2019-09-15'},
                   {'feature_name': 'dynamic_feature_3',
                    'scores': [72.60416666666664,
                               -648.7157118055555,
                               -668.7304687499999,
                               -285.97829861111114,
                             

                               194.72699652777777,
                               -425.32812500000006,
                               -313.32291666666663,
                               93.29079861111111,
                               96.75390625,
                               689.0373263888888,
                               91.44618055555554,
                               350.39539930555554,
                               176.57725694444449,
                               -392.16276041666663,
                               -395.5551215277777,
                               336.7803819444444],
                    'timestamp': '2019-09-25'},
                   {'feature_name': 'dynamic_feature_3',
                    'scores': [-548.1961805555554,
                               360.04991319444434,
                               128.63020833333334,
                               129.4691840277778,
                               320.2430555555556,
                               26.194

                    'timestamp': '2020-04-06'},
                   {'feature_name': 'target_value',
                    'scores': [64727.56944444444,
                               68681.20399305556,
                               59555.7126736111,
                               71931.05121527777,
                               56163.52951388888,
                               57878.43402777778,
                               56298.8125

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



-6242.9236111111095,
                               -2361.6249999999995,
                               1269.552083333333,
                               -1675.9062500000002,
                               -2840.9756944444443,
                               2166.701388888889,
                               3390.6076388888887,
                               6338.232638888888,
                               2333.9027777777774,
                               -5401.322916666667,
                               1423.3472222222217],
                    'timestamp': '2020-04-20'},
                   {'feature_name': 'dynamic_feature_1',
                    'scores': [2976.7847222222213,
                               2883.4583333333335,
                               -2344.5763888888887,
                               5402.187499999999,
                               1556.0277777777778,
                               1672.0416666666665,
                               2931.4201388888887,
      

                               884.0937499999999,
                               -3206.1701388888887,
                               -1030.6249999999995,
                               -426.40624999999966,
                               -3520.715277777778,
                               848.4444444444446,
                               4196.760416666666,
                               -2037.1874999999998,
                               -4840.243055555555,
                               -6562.885416666666],
                    'timestamp': '2020-04-10'},
                   {'feature_name': 'dynamic_feature_2',
                    'scores': [-2478.0902777777774,
                               6971.85763888889,
                               -3095.8125000000005,
                               -752.8263888888889,
                               2977.895833333333,
                               -3053.6770833333335,
                               2253.8055555555557,
                          

                               -2297.5,
                               -6882.1423611111095,
                               -3430.777777777777,
                               3102.7152777777783,
                               3994.1527777777774,
                               -857.6111111111104,
                               4963.857638888888,
                               -2983.475694444444,
                               9046.923611111111,
                               -3318.3715277777774],
                    'timestamp': '2020-04-20'},
                   {'feature_name': 'dynamic_feature_2',
                    'scores': [1264.5,
                               5942.038194444444,
                               5776.211805555556,
                               -2278.027777777778,
                               17.534722222221944,
                               4893.451388888887,
                               36.309027777777885,
                               7534.628472222223,
   

                               4457.052083333332,
                               2090.322916666667,
                               1322.201388888889,
                               7918.211805555553,
                               4547.215277777777,
                               3524.107638888889,
                               -3745.4826388888887,
                               1478.8368055555554],
                    'timestamp': '2020-04-10'},
                   {'feature_name': 'dynamic_feature_3',
                    'scores': [3654.3368055555557,
                               3086.5972222222217,
                               246.72222222222177,
                               1385.9062499999998,
                               1881.2743055555552,
                               2920.4513888888887,
                               5863.270833333332,
                               313.6180555555552,
                               7925.489583333332,
                               435.

                               -2181.8541666666665,
                               -3910.052083333333,
                               3539.833333333333,
                               -5.249999999999943,
                               -158.19097222222194,
                               -1041.4131944444453,
                               1981.9131944444443],
                    'timestamp': '2020-04-20'},
                   {'feature_name': 'dynamic_feature_3',
                    'scores': [-5243.788194444445,
                               -9694.184027777777,
                               -834.3055555555552,
                               -718.0069444444441,
                               1505.6215277777778,
                               -2263.78125,
                               3751.3750000000005,
                               -4682.649305555556,
                               -4790.541666666667,
                               2113.7986111111113,
                               1

## Clean Up

Remove downloaded/installed files and deployed resources as necessary.

In [26]:
# remove the model, endpoint_config and endpoint from sagemaker
boto3.client("sagemaker").delete_model(ModelName=endpoint_name)
boto3.client("sagemaker").delete_endpoint_config(EndpointConfigName=endpoint_name)
boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '8d05daba-cdf0-407c-96c6-606bba8e0f9a',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '8d05daba-cdf0-407c-96c6-606bba8e0f9a',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 10 May 2024 23:25:47 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}

In [27]:
# remove the results and analysis config files
# !rm -r ./sagemaker-python-sdk -f
# !rm analysis_config.json
# !rm results.jsonl

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-clarify|time_series_deepar|time_series_deepar.ipynb)
