# Satellite Communications Capacity prediction using Amazon Forecast

Using Amazon Forecast involves the following 3 steps.

![Amazon Forecast Workflow](https://github.com/aws-samples/amazon-forecast-samples/raw/main/notebooks/basic/Getting_Started/images/workflow.png)

This notebook focuses on a Maritime shipping use-case. It uses Amazon Forecast to build a time series predictor model for each satellite beam in the ship(s) path, accounting for the impact of weather conditions in a given location. 

We start by importing historical bandwidth data for a set of spot-beams in the ship's route, along with related NOAA buoy data with the air-pressure (weather) in a location within the spot-beam. 
Next, we train a Predictor using this data. 
Finally, we generate a forecast for 1 day at 10 minute intervals and compare it with the actual data from that day.

## Table of Contents
* [Pre-requisites](#prerequisites)
* Step 1: [Import your data](#import)
* Step 2: [Train a predictor](#predictor)
* Step 3: [Generate forecasts](#forecast)
* BONUS! [Explaining the predictor](#explaining)
* [Clean-up](#cleanup)

## Pre-requisites <a name="prerequisites"></a>
Before we get started, lets set up the notebook environment, the AWS SDK client for Amazon Forecast and IAM Role used by Amazon Forecast to access your data.

#### Setup Notebook Environment

In [None]:
!pip install pandas s3fs matplotlib ipywidgets
!pip install boto3 --upgrade

#### Setup Imports

In [None]:
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

import sys
import os
import pandas as pd
import boto3
import botocore
#import s3fs
import numpy as np
import json
from sagemaker import get_execution_role

# This notebook uses the util module from amazon-forecast-samples
# https://github.com/aws-samples/amazon-forecast-samples
# Make sure it is present in the util folder
import util

#### Create an instance of AWS SDK client for Amazon Forecast

In [None]:
region = 'us-west-2'                 # edit to use your AWS region
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

# Checking to make sure we can communicate with Amazon Forecast
assert forecast.list_predictors()

#### Setup IAM Role used by Amazon Forecast to access your data

In [None]:
# IAM role with full S3 access and full Forecast access is required
role_name = "ForecastNotebookRole-Basic"
print(f"Creating Role {role_name}...")
role_arn = util.get_or_create_iam_role( role_name = role_name )

# echo user inputs without account
print(f"Success! Created role = {role_arn.split('/')[1]}")

## Step 1: Import your data. <a name="import"></a>

In this step, we will create a **Dataset** and import the TTS dataset of satellite bandwidth historical usage from S3 to Amazon Forecast. To train a Predictor we will need a **DatasetGroup** that groups the input **Datasets**. So, we will end this step by creating a **DatasetGroup** with the imported **Dataset**.

#### Peek at the data and upload it to S3.

The dataset has the following 3 columns:
1. **timestamp:** Timetamp at which pick-ups are requested.
3. **target_value:** SatCom bandwidth usage in MHz by spot-beam.
2. **item_id:** spot-beam name

In [None]:
# *** Edit the following bucket name and key to access the TTS dataset *** 
bucket_name = "forecast-satcom-capacity"
key="dataset/tts/satcom-cap_1691527103.csv"

def check_bucket_permission(bucket):
    # check if the bucket exists
    permission = False
    try:
        boto3.Session().client("s3").head_bucket(Bucket=bucket)
    except botocore.exceptions.ParamValidationError as e:
        print(
            "Hey! You either forgot to specify your S3 bucket"
            " or you gave your bucket an invalid name!"
        )
    except botocore.exceptions.ClientError as e:
        if e.response["Error"]["Code"] == "403":
            print(f"Hey! You don't have permission to access the bucket, {bucket}.")
        elif e.response["Error"]["Code"] == "404":
            print(f"Hey! Your bucket, {bucket}, doesn't exist!")
        else:
            raise
    else:
        permission = True
    return permission


if check_bucket_permission(bucket_name):
    print(f"Using TTS dataset: s3://{bucket_name}/{key}")

#### Create the Target Time Series Dataset Group and Dataset

In [None]:
DATA_VERSION = 3      # Increment the DATA_VERSION each time datasets are pushed into Amazon Forecast
DATASET_FREQUENCY = "10min"           # set the frequency to whatever your timeseries intervals are
PROJECT = 'SATCOM_BW'
TS_DATASET_GRP = "SATCOM_TS_GRP"
TS_DATASET = "SATCOM_TTS"
TS_SCHEMA = {
	"Attributes": [
		{
			"AttributeName": "timestamp",
			"AttributeType": "timestamp"
		},
		{
			"AttributeName": "target_value",
			"AttributeType": "float"
		},
		{
			"AttributeName": "item_id",
			"AttributeType": "string"
		}
	]
}

dataset_group = f"{PROJECT}_{TS_DATASET_GRP}_{DATA_VERSION}"

dataset_arns = []
create_dataset_group_response = \
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=dataset_group,
                                  DatasetArns=dataset_arns)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']

forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)


ts_dataset_name = f"{PROJECT}_{TS_DATASET}_{DATA_VERSION}"

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=ts_dataset_name,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']
describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)

print(f"The Dataset with ARN {ts_dataset_arn} is now {describe_dataset_response['Status']}.")

dataset_arns = []
dataset_arns.append(ts_dataset_arn)
forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=dataset_arns)

#### Importing the Dataset

In [None]:
TIMESTAMP_FORMAT = "yyyy-MM-dd HH:mm:ss"

ts_s3_data_path = "s3://"+bucket_name+"/"+key
print(f"S3 URI for your data file = {ts_s3_data_path}")

# here we just import a single CSV however you can specify a path to an S3 bucket folder for up to 10000 files
ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=dataset_group,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts_s3_data_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       TimestampFormat=TIMESTAMP_FORMAT)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']
describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)

print(f"Waiting for Dataset Import Job with ARN {ts_dataset_import_job_arn} to become ACTIVE. This process could take 5-10 minutes.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))

describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)
print(f"\n\nThe Dataset Import Job with ARN {ts_dataset_import_job_arn} is now {describe_dataset_import_job_response['Status']}.")

#### Create the Related Time Series Dataset

In [None]:
related_ts_dataset_name = f"{PROJECT+'_RTS'}_{DATA_VERSION}"
RTS_SCHEMA = {
	"Attributes": [
		{
			"AttributeName": "timestamp",
			"AttributeType": "timestamp"
		},
		{
			"AttributeName": "air_pressure",
			"AttributeType": "integer"
		},
		{
			"AttributeName": "item_id",
			"AttributeType": "string"
		},
		{
			"AttributeName": "day_of_week",
			"AttributeType": "integer"
		},
		{
			"AttributeName": "hour_of_day",
			"AttributeType": "integer"
		}
	]
}

response = \
forecast.create_dataset(Domain="CUSTOM",
                        DatasetType='RELATED_TIME_SERIES',
                        DatasetName=related_ts_dataset_name,
                        DataFrequency=DATASET_FREQUENCY,
                        Schema=RTS_SCHEMA
                       )

related_ts_dataset_arn = response['DatasetArn']

forecast.describe_dataset(DatasetArn=related_ts_dataset_arn)

dataset_arns.append(related_ts_dataset_arn)
forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=dataset_arns)

#### Create Related Time Series Dataset Import Job

In [None]:
related_time_series_key = "dataset/rts/maritime-weather_1691527103.csv"
related_ts_s3_data_path = "s3://"+bucket_name+"/"+related_time_series_key
print(f"S3 URI for your data file = {related_ts_s3_data_path}")

# here we just import a single CSV however you can specify a path to an S3 bucket folder for up to 10000 files
related_ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=dataset_group,
                                       DatasetArn=related_ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": related_ts_s3_data_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       TimestampFormat=TIMESTAMP_FORMAT)

related_ts_dataset_import_job_arn=related_ts_dataset_import_job_response['DatasetImportJobArn']

print(f"Waiting for Dataset Import Job with ARN {related_ts_dataset_import_job_arn} to become ACTIVE. This process could take 5-10 minutes.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=related_ts_dataset_import_job_arn))

describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=related_ts_dataset_import_job_arn)
print(f"\n\nThe Dataset Import Job with ARN {related_ts_dataset_import_job_arn} is now {describe_dataset_import_job_response['Status']}.")

## <a class="anchor" id="predictor"></a>Step 2: Train a predictor 

In this step, we will create a **Predictor** using the **DatasetGroup** that was created above. After creating the predictor, we will review the accuracy obtained through the backtesting process to get a quantitative understanding of the performance of the predictor.

#### Train a predictor

In [None]:
PREDICTOR_NAME = "SATCOM_PREDICTOR"
FORECAST_HORIZON = 144        # RTS forecast : 1d (10 min granularity : 6 * 24)
FORECAST_FREQUENCY = "10min"

create_auto_predictor_response = \
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                    },
                                   ExplainPredictor = True)

predictor_arn = create_auto_predictor_response['PredictorArn']
print(f"Waiting for Predictor with ARN {predictor_arn} to become ACTIVE. Depending on data size and predictor setting，it can take several hours to be ACTIVE.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))

describe_auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)
print(f"\n\nThe Predictor with ARN {predictor_arn} is now {describe_auto_predictor_response['Status']}.")

#### Review accuracy metrics

* **Weighted Quantile Loss (wQL)** metric measures the accuracy of a model at a specified quantile. It is particularly useful when there are different costs for underpredicting and overpredicting. The P90 wQL is useful since Satellite Operators typically want to slightly overprovision ensuring consumers have enough bandwidth the majority of the time

* **Root Mean Square Error (RMSE)** uses the squared value of the residuals, which amplifies the impact of outliers. In use cases where only a few large mispredictions can be very costly, the RMSE is the more relevant metric.

* **Weighted Absolute Percentage Error (WAPE)** is more robust to outliers than Root Mean Square Error (RMSE) because it uses the absolute error instead of the squared error.

* **Mean Absolute Percentage Error (MAPE)** is useful for cases where values differ significantly between time points and outliers have a significant impact.

* **Mean Absolute Scaled Error (MASE)** is ideal for datasets that are cyclical in nature or have seasonal properties.

In [None]:
get_accuracy_metrics_response = forecast.get_accuracy_metrics(PredictorArn=predictor_arn)
wql = get_accuracy_metrics_response['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['WeightedQuantileLosses']
accuracy_scores = get_accuracy_metrics_response['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['ErrorMetrics'][0]

print(f"Weighted Quantile Loss (wQL): {json.dumps(wql, indent=2)}\n")

print(f"Root Mean Square Error (RMSE): {accuracy_scores['RMSE']}\n")

print(f"Weighted Absolute Percentage Error (WAPE): {accuracy_scores['WAPE']}\n")

print(f"Mean Absolute Percentage Error (MAPE): {accuracy_scores['MAPE']}\n")

print(f"Mean Absolute Scaled Error (MASE): {accuracy_scores['MASE']}\n")

## Step 3: Generate forecasts <a name="forecast"></a>
Finally, we will generate the forecasts using the above predictor and demonstrate actual performance of Amazon Forecast on this dataset.

#### Generate forecasts

In [None]:
FORECAST_NAME = "SATCOM_FORECAST"

# by default we get a P10, P50, P90 wQL - we are most interested in the P90 wQL
create_forecast_response = \
    forecast.create_forecast(ForecastName=FORECAST_NAME,
                             PredictorArn=predictor_arn)

forecast_arn = create_forecast_response['ForecastArn']
print(f"Waiting for Forecast with ARN {forecast_arn} to become ACTIVE. Depending on data size and predictor settings，it can take several hours to be ACTIVE.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn))

describe_forecast_response = forecast.describe_forecast(ForecastArn=forecast_arn)
print(f"\n\nThe Forecast with ARN {forecast_arn} is now {describe_forecast_response['Status']}.")

#### Export forecast to S3

In [None]:
FORECAST_EXPORT_JOB_NAME = FORECAST_NAME + "_export_job"
FORECAST_EXPORT_JOB_DESTINATION = f"s3://{bucket_name}/forecast-export-results/"

forecast_export_response = forecast.create_forecast_export_job(
                                                                       ForecastExportJobName=FORECAST_EXPORT_JOB_NAME,
                                                                       ForecastArn=forecast_arn, 
                                                                       Destination={
                                                                           "S3Config": {
                                                                               "Path": FORECAST_EXPORT_JOB_DESTINATION,
                                                                               "RoleArn": role_arn}
                                                                            }
                                                                      )

forecast_export_arn = forecast_export_response['ForecastExportJobArn']

status = util.wait(lambda: forecast.describe_forecast_export_job(ForecastExportJobArn=forecast_export_arn))

describe_forecast_export_response = forecast.describe_forecast_export_job(ForecastExportJobArn=forecast_export_arn)
print(f"\n\nThe forecast_export with ARN {forecast_export_arn} is now {describe_forecast_export_response['Status']}.")

#### Query forecast for a specific item_id

In [None]:
ITEM_ID = "SpotH12"

forecast_response = forecastquery.query_forecast(
    ForecastArn=forecast_arn,
    Filters={"item_id": ITEM_ID}
)

# P90 WQL indicates the confidence level of the true value being lower than the predicted value 90% of the time
forecasts_p90_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p90'])

In [None]:
forecasts_p90_df.head(5)

In [None]:
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator


p90_title = ITEM_ID + '-plot-P90-forecast'
x = forecasts_p90_df['Timestamp']
y = forecasts_p90_df['Value']

fig, ax = plt.subplots(1, 1, figsize=(15, 5))
ax.plot(x,y)
ax.set_title(p90_title)
ax.set_xlabel('timestamp')
ax.set_ylabel('MHz')
ax.tick_params(axis='x', rotation=90)

locator=MaxNLocator(prune='both', nbins=24)
ax.xaxis.set_major_locator(locator)

#plt.show()
fig

## BONUS! Explaining the predictor <a class="anchor" id="explaining"></a>
In Step 1, we added an additional RTS dataset - weather data - before creating the predictor. Let us now see how impactful the additional dataset feature was. You can do the same for additional datasets that you bring in.

In [None]:
account_id = session.client("sts").get_caller_identity()["Account"]
explainability_arn = "arn:aws:forecast:" + region + ":" + account_id + ":explainability/" + PREDICTOR_NAME
print(explainability_arn)

In [None]:
EXPLAINABILITY_EXPORT_NAME = "SATCOM_PREDICTOR_EXPLANATION_EXPORT"
EXPLAINABILITY_EXPORT_DESTINATION = f"s3://{bucket_name}/explanation/{EXPLAINABILITY_EXPORT_NAME}"

explainability_export_response = forecast.create_explainability_export(ExplainabilityExportName=EXPLAINABILITY_EXPORT_NAME, 
                                                                       ExplainabilityArn=explainability_arn, 
                                                                       Destination={
                                                                           "S3Config": {
                                                                               "Path": EXPLAINABILITY_EXPORT_DESTINATION,
                                                                               "RoleArn": role_arn}
                                                                            }
                                                                      )

explainability_export_arn = explainability_export_response['ExplainabilityExportArn']

status = util.wait(lambda: forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_arn))

describe_explainability_export_response = forecast.describe_explainability_export(ExplainabilityExportArn=explainability_export_arn)
print(f"\n\nThe explainability_export with ARN {explainability_export_arn} is now {describe_explainability_export_response['Status']}.")

* **Impact scores** measure the relative impact attributes have on forecast values. The largest value has the most impact on the model
* **Impact scores** also provide information on whether an attribute increases or decreases the forecasted value. A negative impact scores reflects that the attribute tends to decrease the value of the forecast.

## Clean-up <a class="anchor" id="cleanup"></a>
Uncomment the code section to delete all resources that were created in this notebook.
The delete calls return immediately, however the actual resource deletion may take up to 15 minutes
See https://docs.aws.amazon.com/forecast/latest/dg/delete-resource.html to review resource trees hierarchy
NOTE - none of the S3 artifacts will be deleted!

In [None]:
# forecast.delete_resource_tree(ResourceArn = dataset_group_arn)
# forecast.delete_resource_tree(ResourceArn = ts_dataset_arn)
# forecast.delete_resource_tree(ResourceArn = related_ts_dataset_arn)