# Getting Started with Amazon Forecast (Python SDK)

> *This notebook should work well in the `Python 3 (Data Science)` kernel in SageMaker Studio, or `conda_python3` in SageMaker Notebook Instances*

Now the only mandatory dataset (the TTS) is prepared in compatible CSV format and uploaded to Amazon S3, we're ready to start our experiments with Amazon Forecast.

**In this notebook** we'll use the **AWS Python SDK** to:

- Create the "Dataset Group" wrapper in Forecast to store our project
- Define our TTS dataset schema and import the prepared data
- Create some **predictors** - training models on our data
- Evaluate metrics on how good our predictors seem from the training process
- Create and export some **forecasts**

Here we use Python commands to automate the various steps. Check out [Notebook 2a](2a.%20Getting%20Started%20with%20Forecast%20(Console).ipynb) for an alternative guide through the same steps using the [Amazon Forecast Console](https://console.aws.amazon.com/forecast/home) instead!

Before starting we'll load the required libraries, restore our saved variables from previous notebooks, and establish a connection to the Forecast service:

In [None]:
%load_ext autoreload
%autoreload 2

# Python Built-Ins:
import json
import os
from pprint import pprint as prettyprint
import secrets
import string
import time
from types import SimpleNamespace  # (because Python dict ["key"] notation can get boring)

# External Dependencies:
import boto3
from IPython.display import Markdown
import pandas as pd

# Local Dependencies:
import util

In [None]:
%store -r

In [None]:
session = boto3.Session(region_name=region)

forecast = session.client("forecast")
forecast_query = session.client("forecastquery")

s3 = session.resource("s3")
export_bucket = s3.Bucket(export_bucket_name)

## Setting up the Dataset Group

A **Dataset Group** is the highest level of abstraction in Amazon Forecast, and contains all the source datasets for a particular collection of Forecasts. A Dataset Group contains **up to one of each type** of dataset (Target Time-Series, Related Time-Series, and Item Metadata) and no information is shared between Dataset Groups - so if you'd like to try out various alternatives to the schemas we create below, you could create a new Dataset Group and make your changes inside its corresponding Datasets.

To create our Dataset Group, all we need is:

- **A name** - We'll create a semi-random name so you can re-run this notebook to create new trials.
- **Our chosen [domain](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html)** - here using `CUSTOM` because the traffic forecasting use case doesn't have clear mapping to other predefined domains.

In [None]:
project = "forecast_poc_{}".format(
    # Add a random suffix for uniqueness:
    "".join(secrets.choice(string.ascii_lowercase + string.digits) for i in range(4))
)
%store project
print(f"Project name '{project}'")

dataset_group_name = project + "_dsg"
%store dataset_group_name

In [None]:
# Create the DatasetGroup
create_dataset_group_response = forecast.create_dataset_group(
    DatasetGroupName=dataset_group_name,
    Domain="CUSTOM",
)
dataset_group_arn = create_dataset_group_response["DatasetGroupArn"]
%store dataset_group_arn
print(f"Created Dataset Group {dataset_group_arn}")

In [None]:
forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

## Defining the TTS Dataset

Next we'll define the **schema** of our TTS dataset, including both the **mandatory** fields for our chosen [domain](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) and any **optional** fields we've chosen to add.

> ⚠️ The schema must **match the prepared data exactly**, *including the order of columns*, because Forecast will validate the data against it when importing!

In [None]:
schema = {
    "Attributes": [
        {
            "AttributeName": "timestamp",
            "AttributeType": "timestamp",
        },
        {
            "AttributeName": "target_value",
            "AttributeType": "float",
        },
        {
            "AttributeName": "item_id",
            "AttributeType": "string",
        },
        # A TTS schema may include extra columns to split the forecast out by other dimensions e.g. location, beyond item_id.
    ],
}

Together with this schema, a few other core attributes define our dataset:

- The **domain**, which must match our target Dataset Group
- The [**frequency**](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-datasets-groups.html#howitworks-data-alignment) (e.g. hourly, daily, etc), which will determine what frequencies of forecasts we can build from our data (e.g. can't build hourly forecasts from weekly source data!)
- A **name**, which we'll set in line with our overall project

As discussed in the ['Resolving Conflicts in Data Collection Frequency' doc](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-datasets-groups.html#howitworks-data-alignment), the raw CSV data will be automatically mapped and aggregated to the chosen frequency time-steps if it doesn't match already... Just check the right aggregation is being applied in later steps, in case you have any mismatches!

![Illustration of Time Step Binning](https://docs.aws.amazon.com/forecast/latest/dg/images/data-alignment.png)

In [None]:
response = forecast.create_dataset(
    Domain="CUSTOM",
    DatasetType="TARGET_TIME_SERIES",
    DatasetName=project + "_tts",
    DataFrequency="H",  # Hourly data in our case 
    Schema=schema,
)

tts_arn = response["DatasetArn"]
%store tts_arn
print(f"Created dataset {tts_arn}")

In [None]:
forecast.describe_dataset(DatasetArn=tts_arn)

Finally we must **attach** our new dataset to the dataset group, to associate it and have it show in the console:

In [None]:
# Attach the Dataset to the Dataset Group:
forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=[tts_arn])

## Importing the TTS Data

We're now ready to populate our Amazon Forecast dataset with the contents of our CSV file from S3.

Since this requires the Amazon Forecast service to access the Amazon S3 bucket, this is where we need the *service role* created in Notebook 0: Which has access to the target bucket and trusts the Forecast service. If you don't have such a role set up in your account yet, refer to notebook 0 for details:

In [None]:
%store -r forecast_role_arn
assert isinstance(forecast_role_arn, str), "`forecast_role_arn` must be an IAM role ARN (string)"

Below we trigger a **dataset import job**, which is a batch **overwriting** process that clears out any pre-existing data in the dataset: *not* appending data to existing records.

Triggering the import requires:

- **Naming** the import job, which will be trackable as an entity in the console
- Identifying the **target dataset** by its Amazon Resource Name (ARN)
- Configuring the **data source**, including the S3 location and also the IAM role used to grant access
- Specifying the **timestamp format**, since some variations are permitted according to the [dataset guidelines](https://docs.aws.amazon.com/forecast/latest/dg/dataset-import-guidelines-troubleshooting.html)

In [None]:
ds_import_job_response = forecast.create_dataset_import_job(
    # You might append a timestamp to the import name in practice, to keep it unique... But here we choose a
    # *static* value deliberately, to avoid accidentally & unnecessarily re-importing the PoC data!
    DatasetImportJobName="poc_import_tts",
    DatasetArn=tts_arn,
    DataSource={
        "S3Config": {
            "Path": target_s3uri,
            "RoleArn": forecast_role_arn,
        },
    },
    # (e.g. daily data might omit the hh:mm:ss component)
    TimestampFormat="yyyy-MM-dd hh:mm:ss",
)

ds_import_job_arn = ds_import_job_response["DatasetImportJobArn"]
print(ds_import_job_arn)

> ⏰ The import process can **take a little time** (on the order of ~10-15 minutes for our sample dataset) because of validation, filling & aggregation, and the overhead of spinning up infrastructure to execute the import

On small datasets like this, overheads can dominate the run-time and you should expect much better-than-linear scaling as dataset size is increased from this level.

Below we'll set up a poll every 30 seconds to wait for the import to complete, after which we'll be ready to use the dataset to train predictors:

In [None]:
def is_import_status_finished(desc):
    status = desc["Status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(f"Data import failed!\n{desc}")

util.progress.polling_spinner(
    fn_poll_result=lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn),
    fn_is_finished=is_import_status_finished,
    fn_stringify_result=lambda d: d["Status"],
    poll_secs=30,
    timeout_secs=60*60,  # Max 1 hour
)
print("Data imported")

Once the import is complete, we can also query the API or console UI for some metrics to verify that the data was processed as we expect:

In [None]:
forecast.describe_dataset_import_job(DatasetImportJobArn=ds_import_job_arn)

## Creating and Training Predictors

As soon as the only mandatory dataset (TTS) is set up and populated with data, we're able to start training forecast models or "Predictors".

We can create multiple predictors within our Dataset Group, and that's exactly what we'll do in this notebook to compare the results of a few different algorithms offered by the service.

Amazon Forecast offers 6 (at the time of writing) algorithms as described in more detail on the ["Choosing an Algorithm" doc page](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html) - which we've grouped here into 3 rough categories:

In [None]:
# Baseline/statistical methods:
arima_algorithm_arn = "arn:aws:forecast:::algorithm/ARIMA"
ets_algorithm_arn = "arn:aws:forecast:::algorithm/ETS"

# Probabilistic methods:
npts_algorithm_arn = "arn:aws:forecast:::algorithm/NPTS"
prophet_algorithm_arn = "arn:aws:forecast:::algorithm/Prophet"

# Deep learning methods:
deeparp_algorithm_arn = "arn:aws:forecast:::algorithm/Deep_AR_Plus"
cnnqr_algorithm_arn = "arn:aws:forecast:::algorithm/CNN-QR"

Although an **AutoML** option is available which will automatically try each algorithm, list the metrics of each, and keep the best model; we'll create a set of models manually in this example so that we're able to generate a forecast output for each, to plot and compare.

As well as the algorithm (by ARN), we'll need to specify when creating our predictors:

- A **Name**, which we'll use to identify which algorithm is used in each experiment
- The **Forecast Frequency**, which must not be less than the dataset (e.g. hourly, daily, etc.)
- The **Forecast Horizon** in terms of this frequency, which must not be more than 500 or 1/3 of the dataset length per the [quotas page](https://docs.aws.amazon.com/forecast/latest/dg/limits.html)
- Whether to use the **Built-In Holiday Dataset** for any country, to augment the data
- How to **Backtest** the model for accuracy metrics from the training data, as described in the ["Evaluating Predictor Accuracy" doc page](https://docs.aws.amazon.com/forecast/latest/dg/metrics.html)
- The **Featurization** configuration for how to [handle missing values](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-missing-values.html) in each field.


In [None]:
forecast_frequency = "H"
forecast_horizon = 240

evaluation_parameters = {
    "NumberOfBacktestWindows": 1,
    "BackTestWindowOffset": 240,
}

input_data_config = {
    "DatasetGroupArn": dataset_group_arn,
    "SupplementaryFeatures": [
        { "Name": "holiday", "Value": "US" },
    ],
}

featurization_config = {
    "ForecastFrequency": forecast_frequency,
    "Featurizations": [
        {
            "AttributeName": "target_value",
            "FeaturizationPipeline": [
                {
                    "FeaturizationMethodName": "filling",
                    "FeaturizationMethodParameters": {
                        "frontfill": "none",
                        "middlefill": "zero",
                        "backfill": "zero",
                    },
                },
            ],
        },
    ],
}

In the following sections, we'll set up a top-level dictionary to store the results from our experiments and then kick off each Predictor's training:

In [None]:
results = {}

### ARIMA

In [None]:
arima_create_predictor_response = forecast.create_predictor(
    PredictorName=f"{project}_arima_algo_1",
    AlgorithmArn=arima_algorithm_arn,
    ForecastHorizon=forecast_horizon,
    PerformAutoML=False,
    PerformHPO=False,
    EvaluationParameters=evaluation_parameters,
    InputDataConfig=input_data_config,
    FeaturizationConfig=featurization_config,
)
results["ARIMA"] = SimpleNamespace(predictor_arn=arima_create_predictor_response["PredictorArn"])

### Prophet

In [None]:
prophet_create_predictor_response = forecast.create_predictor(
    PredictorName=f"{project}_prophet_algo_1",
    AlgorithmArn=prophet_algorithm_arn,
    ForecastHorizon=forecast_horizon,
    PerformAutoML=False,
    PerformHPO=False,
    EvaluationParameters=evaluation_parameters,
    InputDataConfig=input_data_config,
    FeaturizationConfig=featurization_config,
)
results["Prophet"] = SimpleNamespace(predictor_arn=prophet_create_predictor_response["PredictorArn"])

### DeepAR+

In [None]:
deeparp_create_predictor_response = forecast.create_predictor(
    PredictorName=f"{project}_deeparp_algo_1",
    AlgorithmArn=deeparp_algorithm_arn,
    ForecastHorizon=forecast_horizon,
    PerformAutoML=False,
    PerformHPO=False,
    EvaluationParameters=evaluation_parameters,
    InputDataConfig=input_data_config,
    FeaturizationConfig=featurization_config,
)
results["DeepAR+"] = SimpleNamespace(predictor_arn=deeparp_create_predictor_response["PredictorArn"])

### CNN-QR

We've commented out CNN-QR sections because in our tests the algorithm takes longer to train on the small sample dataset than DeepAR+ - with comparable accuracy. On many larger "real" datasets CNN-QR can be much faster, so we'd recommend experimenting with it on your own data!

In [None]:
# cnnqr_create_predictor_response = forecast.create_predictor(
#     PredictorName=f"{project}_cnnqr_algo_1",
#     AlgorithmArn=cnnqr_algorithm_arn,
#     ForecastHorizon=forecast_horizon,
#     PerformAutoML=False,
#     PerformHPO=False,
#     EvaluationParameters=evaluation_parameters,
#     InputDataConfig=input_data_config,
#     FeaturizationConfig=featurization_config,
# )
# results["CNN-QR"] = SimpleNamespace(predictor_arn=cnnqr_create_predictor_response["PredictorArn"])

Normally in our notebooks we would have a while loop that polls for each of these to determine the status of the models in training. The cell below will poll for the ARNs of each and return when they are all available so you can move on to the next step.

In [None]:
in_progress_predictors = [results[r].predictor_arn for r in results]
failed_predictors = []

def check_status():
    """Check and update in_progress_predictors"""
    just_stopped = []  # Can't edit the in_progress list directly the loop!
    for arn in in_progress_predictors:
        predictor_desc = forecast.describe_predictor(PredictorArn=arn)
        status = predictor_desc["Status"]
        if status == "ACTIVE":
            print(f"\nBuild succeeded for {arn}")
            just_stopped.append(arn)
        elif "FAILED" in status:
            print(f"\nBuild failed for {arn}")
            just_stopped.append(arn)
            failed_predictors.append(arn)
    for arn in just_stopped:
        in_progress_predictors.remove(arn)
    return in_progress_predictors

util.progress.polling_spinner(
    fn_poll_result=check_status,
    fn_is_finished=lambda l: len(l) == 0,
    fn_stringify_result=lambda l: f"{len(l)} predictor builds in progress",
    poll_secs=60,  # Poll every minute
    timeout_secs=3*60*60,  # Max 3 hours
)

if len(failed_predictors):
    raise RuntimeError(f"The following predictors failed to train:\n{failed_predictors}")

## Examining the Predictors

Once each of the Predictors is in an `Active` state you can get metrics about it to better understand its accuracy and behavior. These are computed based on the hold out periods we defined when building the Predictor. The metrics are meant to guide our decisions when we use a particular Predictor to generate a forecast.

Below we'll define a utility function below which retrieves (and prints) the raw accuracy metrics response, and also builds up a leaderboard. In the following cells, we'll run the function against each trained predictor.

In [None]:
def evaluate_trial_metrics(trial_name=None) -> pd.DataFrame:
    """Utility to fetch the accuracy metrics for a predictor and output the leaderboard so far"""
    if (trial_name):
        # Print the raw API response:
        metrics_response = forecast.get_accuracy_metrics(PredictorArn=results[trial_name].predictor_arn)
        print(f"Raw metrics for {trial_name}:")
        prettyprint(metrics_response)

        # Save the payload section to results:
        evaluation_results = metrics_response["PredictorEvaluationResults"]
        results[trial_name].evaluation_results = evaluation_results

        # Construct simplified version for our comparison:
        try:
            summary_metrics = next(
                w for w in evaluation_results[0]["TestWindows"] if w["EvaluationType"] == "SUMMARY"
            )["Metrics"]
        except StopIteration:
            raise ValueError("Couldn't find SUMMARY metrics in Forecast API response")
        results[trial_name].summary_metrics = {
            "RMSE": summary_metrics["RMSE"],
            "10% wQL": next(
                l["LossValue"] for l in summary_metrics["WeightedQuantileLosses"] if l["Quantile"] == 0.1
            ),
            "50% wQL (MAPE)": next(
                l["LossValue"] for l in summary_metrics["WeightedQuantileLosses"] if l["Quantile"] == 0.5
            ),
            "90% wQL": next(
                l["LossValue"] for l in summary_metrics["WeightedQuantileLosses"] if l["Quantile"] == 0.9
            ),
        }
    # Render the leaderboard:
    return pd.DataFrame([
        { "Predictor": name, **results[name].summary_metrics } for name in results
        if "summary_metrics" in results[name].__dict__
    ]).set_index("Predictor")

### ARIMA

ARIMA is one of the gold standards for time series forecasting. This algorithm is not particularly sophisticated but it is reliable and can help us understand a baseline of performance. To note it does not really understand seasonality very well and it does not support any item metadata or related time series information. Due to that we will explore it here but not after adding other datasets.

In [None]:
evaluate_trial_metrics("ARIMA")

In our test, ARIMA scored a RMSE of ~1950 and 50% weighted quantile loss (=MAPE) of ~0.4715. ARIMA results will help us benchmark other predictors, looking for a reduction versus these baseline loss figures. Your results may vary a little across the predictors.

### Prophet

As with ARIMA, let's explore the metrics:

In [None]:
evaluate_trial_metrics("Prophet")

In our test, Prophet achieved a slightly lower RMSE than ARIMA at around 1910 - but the weighted quantile losses were pretty similar and worse on some quantiles.

At this point, Prophet has not had a chance to use any of its abilities to integrate related time-series information since only the target time-series has been uploaded.

### DeepAR+

As with Prophet and ARIMA, let's explore the metrics:

In [None]:
evaluate_trial_metrics("DeepAR+")

In our test, DeepAR+ achieved a significant improvement in accuracy as measured both by RMSE (~1570) and at the 50% and 90% quantiles. The 10% quantile showed somewhat poorer performance, but overall accuracy was still good.

To see what all this looks like in a visual format, we'll now create a Forecast with each Predictor and then export them to S3 to explore.

### CNN-QR

In [None]:
# evaluate_trial_metrics("CNN-QR")

## Creating Forecasts

Inside Amazon Forecast a Forecast is a rendered collection of all of your items, at every time interval, for all selected quantiles, for your given forecast horizon. This process takes the Predictor you just created and uses it to generate these inferences and to store them in a useful state. Once a Forecast exists within the service you can query it and obtain a JSON response or use another API call to export it to a CSV that is stored in S3.

First, we'll *create* forecasts from our models using the commands below:

In [None]:
create_forecast_response = forecast.create_forecast(
    ForecastName=f"{project}_arima_algo_forecast",
    PredictorArn=results["ARIMA"].predictor_arn,
)
results["ARIMA"].forecast_arn = create_forecast_response["ForecastArn"]

In [None]:
create_forecast_response = forecast.create_forecast(
    ForecastName=f"{project}_prophet_algo_forecast",
    PredictorArn=results["Prophet"].predictor_arn,
)
results["Prophet"].forecast_arn = create_forecast_response["ForecastArn"]

In [None]:
create_forecast_response = forecast.create_forecast(
    ForecastName=f"{project}_deeparp_algo_forecast",
    PredictorArn=results["DeepAR+"].predictor_arn,
)
results["DeepAR+"].forecast_arn = create_forecast_response["ForecastArn"]

In [None]:
# create_forecast_response = forecast.create_forecast(
#     ForecastName=f"{project}_cnnqr_algo_forecast",
#     PredictorArn=results["CNN-QR"].predictor_arn,
# )
# results["CNN-QR"].forecast_arn = create_forecast_response["ForecastArn"]

Again as you saw in the training step, you should poll until completion to know that you are ready to proceed to the next step. The cell below will do that.

In [None]:
in_progress_forecasts = [results[r].forecast_arn for r in results]
failed_forecasts = []

def check_status():
    """Check and update in_progress_forecasts"""
    just_stopped = []  # Can't edit the in_progress list directly the loop!
    for arn in in_progress_forecasts:
        desc_response = forecast.describe_forecast(ForecastArn=arn)
        status = desc_response["Status"]
        if status == "ACTIVE":
            print(f"\nBuild succeeded for {arn}")
            just_stopped.append(arn)
        elif "FAILED" in status:
            print(f"\nBuild failed for {arn}")
            just_stopped.append(arn)
            failed_forecasts.append(arn)
    for arn in just_stopped:
        in_progress_forecasts.remove(arn)
    return in_progress_forecasts

util.progress.polling_spinner(
    fn_poll_result=check_status,
    fn_is_finished=lambda l: len(l) == 0,
    fn_stringify_result=lambda l: f"{len(l)} forecast builds in progress",
    poll_secs=60,  # Poll every 60s
    timeout_secs=2*60*60,  # Max 2 hours
)

if len(failed_forecasts):
    raise RuntimeError(f"The following forecasts failed:\n{failed_forecasts}")

## Exporting Forecasts

Although forecasts may be either queried directly through the API (or the *Forecast lookup* tab of the [Amazon Forecast Console](https://console.aws.amazon.com/forecast/home)), **exporting** the forecast to S3 bucket is also possible and we'll use this method to visualize our results in this notebook.

Once the forecasts have entered `Active` status, they are ready to be exported. Below, we create **export jobs** for each forecast in our experiment:

In [None]:
export_locations = {
    "ARIMA": f"s3://{export_bucket_name}/exports/arima/tts-only",
    "Prophet": f"s3://{export_bucket_name}/exports/prophet/tts-only",
    "DeepAR+": f"s3://{export_bucket_name}/exports/deeparp/tts-only",
    #"CNN-QR": f"s3://{export_bucket_name}/exports/cnnqr/tts-only",
}

In [None]:
export_response = forecast.create_forecast_export_job(
    ForecastExportJobName="arima_export",
    ForecastArn=results["ARIMA"].forecast_arn,
    Destination={
        "S3Config": {
            "Path": export_locations["ARIMA"],
            "RoleArn": forecast_role_arn,
        },
    },
)
results["ARIMA"].export_arn = export_response["ForecastExportJobArn"]

In [None]:
export_response = forecast.create_forecast_export_job(
    ForecastExportJobName="prophet_export",
    ForecastArn=results["Prophet"].forecast_arn,
    Destination={
        "S3Config": {
            "Path": export_locations["Prophet"],
            "RoleArn": forecast_role_arn,
        },
    },
)
results["Prophet"].export_arn = export_response["ForecastExportJobArn"]

In [None]:
export_response = forecast.create_forecast_export_job(
    ForecastExportJobName="deeparp_export",
    ForecastArn=results["DeepAR+"].forecast_arn,
    Destination={
        "S3Config": {
            "Path": export_locations["DeepAR+"],
            "RoleArn": forecast_role_arn,
        },
    },
)
results["DeepAR+"].export_arn = export_response["ForecastExportJobArn"]

In [None]:
# export_response = forecast.create_forecast_export_job(
#     ForecastExportJobName="cnnqr_export",
#     ForecastArn=results["CNN-QR"].forecast_arn,
#     Destination={
#         "S3Config": {
#             "Path": export_locations["CNN-QR"],
#             "RoleArn": forecast_role_arn,
#         },
#     },
# )
# results["CNN-QR"].export_arn = export_response["ForecastExportJobArn"]

This exporting process is another one of those items that will take several minutes to complete. Once again, poll with the cell below then move on to the next section.

In [None]:
in_progress_exports = [results[r].export_arn for r in results]
failed_exports = []

def check_status():
    """Check and update in_progress_exports"""
    just_stopped = []  # Can't edit the in_progress list directly the loop!
    for arn in in_progress_exports:
        desc_response = forecast.describe_forecast_export_job(ForecastExportJobArn=arn)
        status = desc_response["Status"]
        if status == "ACTIVE":
            print(f"\nExport succeeded for {arn}")
            just_stopped.append(arn)
        elif "FAILED" in status:
            print(f"\nExport failed for {arn}")
            just_stopped.append(arn)
            failed_exports.append(arn)
    for arn in just_stopped:
        in_progress_exports.remove(arn)
    return in_progress_exports

util.progress.polling_spinner(
    fn_poll_result=check_status,
    fn_is_finished=lambda l: len(l) == 0,
    fn_stringify_result=lambda l: f"{len(l)} forecast exports in progress",
    poll_secs=20,  # Poll every 20s
    timeout_secs=60*60,  # Max 1 hour
)

if len(failed_exports):
    raise RuntimeError(f"The following exports failed:\n{failed_exports}")

In [None]:
# Note that it's also possible to retrieve the output location for a completed export by ARN:
job_desc = forecast.describe_forecast_export_job(ForecastExportJobArn=results["Prophet"].export_arn)
s3uri = job_desc["Destination"]["S3Config"]["Path"]

## Validation

Now it's time to explore the exported results - downloading the CSV files and plotting the forecasts against our held-out validation data to check the format and forecasts are as expected.

> ⚠️ **Note:** This first cell is provided for those who have followed the AWS Console notebook 2a and are picking up this notebook here - you don't need to run it if you've already run the earlier sections of this notebook!

In [None]:
# Catch-up cell for console tutorial users joining here for visualization:

%load_ext autoreload
%autoreload 2

import json
import os
from pprint import pprint as prettyprint
import secrets
import string
import time
from types import SimpleNamespace  # (because Python dict ["key"] notation can get boring)

import boto3
from IPython.display import Markdown
import pandas as pd

import util

%store -r


# CHECK that these locations match yours!
export_locations = {
    "ARIMA": f"s3://{export_bucket_name}/exports/arima/tts-only",
    "Prophet": f"s3://{export_bucket_name}/exports/prophet/tts-only",
    "DeepAR+": f"s3://{export_bucket_name}/exports/deeparp/tts-only",
    #"CNN-QR": f"s3://{export_bucket_name}/exports/cnnqr/tts-only",
}
print("Forecast exports at:\n" + json.dumps(export_locations, indent=2))

### Downloading the Forecasts

To load and visualize the forecasts in our notebook, we first need to download the files from S3.

Although [Boto3 S3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html) provides functions for listing and fetching files, we'll simplify the process a bit by calling the `aws s3 sync` command from the [AWS CLI](https://aws.amazon.com/cli/).

> ⚠️ **Note:** Large exports may split output into multiple files, so here we store *lists* of filenames for each export.

In [None]:
forecast_filenames = {}

In [None]:
s3uri = export_locations["ARIMA"]
local_folder = f"data/exports/arima/tts-only"

!aws s3 sync $s3uri $local_folder

forecast_filenames["ARIMA"] = util.list_files_with_extension(
    local_folder,
    ext="csv",
)

In [None]:
s3uri = export_locations["Prophet"]
local_folder = f"data/exports/prophet/tts-only"

!aws s3 sync $s3uri $local_folder

forecast_filenames["Prophet"] = util.list_files_with_extension(
    local_folder,
    ext="csv",
)

In [None]:
s3uri = export_locations["DeepAR+"]
local_folder = f"data/exports/deeparp/tts-only"

!aws s3 sync $s3uri $local_folder

forecast_filenames["DeepAR+"] = util.list_files_with_extension(
    local_folder,
    ext="csv",
)

In [None]:
# s3uri = export_locations["CNN-QR"]
# local_folder = f"data/exports/cnn-qr/tts-only"

# !aws s3 sync $s3uri $local_folder

# forecast_filenames["CNN-QR"] = util.list_files_with_extension(
#     local_folder,
#     ext="csv",
# )

Now the forecast CSVs are downloaded, we're ready to plot them!

### ARIMA

We'll take things slowly with our first (ARIMA) forecast to demonstrate the details, and then speed things up for the others.

First, let's take a look at the raw DataFrame as loaded by Pandas:

In [None]:
arima_predicts = util.read_multipart_csv(forecast_filenames["ARIMA"])
arima_predicts.head()

In [None]:
arima_predicts.plot()

We can tidy things up a bit by parsing the timestamps and indexing the dataframe by them:

In [None]:
# Convert the column to datetime
arima_predicts["date"] = pd.to_datetime(arima_predicts["date"])
arima_predicts.head()

In [None]:
# Remove the timezone and make date the index
arima_predicts['date'] = arima_predicts['date'].dt.tz_convert(None)
arima_predicts.set_index('date', inplace=True)
arima_predicts.plot()

In [None]:
print(arima_predicts.index.min())
print(arima_predicts.index.max())

Here we can see our prediction goes from Jan 01 to Jan 10 as expected given our 240 interval forecast horizon. Also we can see the cyclical nature of the predictions over the entire timeframe.

To visualize our forecast against the validation data, it would be helpful to:

- Remove/deal with the `item_id` column (which is always constant for our our single-item sample data)
- Overlay the actual value from the validation data

Let's do that now:

In [None]:
# Cut out item_id:
arima_simple = arima_predicts[['p10', 'p50', 'p90']]
arima_simple.plot()

In [None]:
# Take the time window of validation data we'd like to overlay:
validation_df = validation_time_series_df.rename(columns={"traffic_volume": "actual"}).loc["2018-01-01":"2018-01-10"]
print(validation_df.index.min())
print(validation_df.index.max())
validation_df.plot()

In [None]:
# Join the dataframes together:
arima_val_df = arima_simple.join(validation_df, how="outer")
arima_val_df.plot()

Given that this particular plot is hard to see, let us pick a random day January 5th to compare.

In [None]:
arima_val_df.loc["2018-01-05":"2018-01-06"].plot()

As we can see the actual traffic tracks quite closely to the p50 median prediction, and should stay within the p10-p90 band for most or all of the forecast horizon.

As a final step, we've implemented a utility function to improve the clarity a little further by expanding the plot area and plotting the p10/p90 interval as a **confidence interval** rather than independent lines.

Check you're comfortable with the plot below, and then we'll move on to comparing ARIMA with our other predictors:

In [None]:
util.plot_forecasts(arima_predicts, actuals=validation_df, ylabel="Traffic Volume")

util.plot_forecasts(
    arima_predicts["2018-01-05":"2018-01-06"],
    actuals=validation_df["2018-01-05":"2018-01-06"],
    ylabel="Traffic Volume"
)

### Prophet

In [None]:
print("\n".join(["Loading..."] + forecast_filenames["Prophet"]))
prophet_predicts = util.read_multipart_csv(forecast_filenames["Prophet"])

# Set up date index:
prophet_predicts["date"] = pd.to_datetime(prophet_predicts["date"]).dt.tz_convert(None)
prophet_predicts.set_index("date", inplace=True)

util.plot_forecasts(prophet_predicts, actuals=validation_df, ylabel="Traffic Volume")

util.plot_forecasts(
    prophet_predicts.loc["2018-01-05":"2018-01-06"],
    actuals=validation_df.loc["2018-01-05":"2018-01-06"],
    ylabel="Traffic Volume"
)

### DeepAR+

In [None]:
print("\n".join(["Loading..."] + forecast_filenames["DeepAR+"]))
deeparp_predicts = util.read_multipart_csv(forecast_filenames["DeepAR+"])

# Set up date index:
deeparp_predicts["date"] = pd.to_datetime(deeparp_predicts["date"]).dt.tz_convert(None)
deeparp_predicts.set_index("date", inplace=True)

util.plot_forecasts(deeparp_predicts, actuals=validation_df, ylabel="Traffic Volume")

util.plot_forecasts(
    deeparp_predicts.loc["2018-01-05":"2018-01-06"],
    actuals=validation_df.loc["2018-01-05":"2018-01-06"],
    ylabel="Traffic Volume"
)

What is particularly interesting here is that even the p90 prediction is significantly below the actual numbers for a good portion of some days.

Consider how the evaluation metrics of these algorithms relate to the observed performance characteristics in the validation plots. Clearly for probabilistic forecasts, central metrics like RMSE and MAPE/wQL0.5 tell only part of the story of model accuracy - and this is why additional quantile loss metrics are presented.

Note that different algorithms generate quantiles by different methods. In particular, CNN-QR does not guarantee the ordering of quantiles (although good-quality fits should converge towards quantiles being ordered) - so there might be brief periods where e.g. the `p10` forecast quantile is higher than `p50`!

### CNN-QR

In [None]:
# print("\n".join(["Loading..."] + forecast_filenames["CNN-QR"]))
# prophet_predicts = util.read_multipart_csv(forecast_filenames["CNN-QR"])

# # Set up date index:
# cnnqr_predicts["date"] = pd.to_datetime(cnnqr_predicts["date"]).dt.tz_convert(None)
# cnnqr_predicts.set_index("date", inplace=True)

# util.plot_forecasts(cnnqr_predicts, actuals=validation_df, ylabel="Traffic Volume")

# util.plot_forecasts(
#     cnnqr_predicts.loc["2018-01-05":"2018-01-06"],
#     actuals=validation_df.loc["2018-01-05":"2018-01-06"],
#     ylabel="Traffic Volume"
# )

## Recap and Next Steps

We've now explored some initial models on the target timeseries alone, and can start exploring additional **related data** as a way to improve forecast accuracy. The [next notebook, #3](3.%20Preparing%20Related%20Time-Series%20Data.ipynb) will guide you through the process of preparing a *related time-series* file ready to upload to Amazon Forecast.

In [None]:
%store results