# AWS Machine Learning Nandoegree Capstone Project
# Forecasting with Amazon Forecast

## Setup


### References
Note! These steps were taken from the below reference Forecast walkthrough: 
https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/basic/Getting_Started/Amazon_Forecast_Quick_Start_Guide.ipynb
https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/common/util/fcst_utils.py

### Setup Notebook Environment

In [6]:
%%capture --no-stderr setup

!pip install pandas s3fs matplotlib ipywidgets
!pip install boto3 --upgrade

%reload_ext autoreload

### Setup Imports

In [2]:
import sys
import os
import glob 
#sys.path.insert( 0, os.path.abspath("../../common") )

import json
from util import * #.fcst_utils import *
import boto3
import s3fs
import pandas as pd

In [115]:
import matplotlib.pyplot as plt
plt.close("all")
import numpy as np

### Setup IAM Role used by Amazon Forecast to access your data

In [3]:
#role was manually setup in AWS console, with AmazonS3FullAccess
role_arn = 'arn:aws:iam::054619787751:role/my-forecast-role'

### Create an instance of AWS SDK client for Amazon Forecast

In [33]:
region = 'us-east-1'
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

# Checking to make sure we can communicate with Amazon Forecast
assert forecast.list_predictors()

## Step 1: Import your data. <a class="anchor" id="import"></a>

In this step, we will create a **Dataset** and **Import** the Taiwan stock dataset from S3 to Amazon Forecast. To train a Predictor we will need a **DatasetGroup** that groups the input **Datasets**. So, we will end this step by creating a **DatasetGroup** with the imported **Dataset**.

In [5]:
s3 = boto3.Session().resource('s3')
bucket_name = "forecast-exp-1111"

In [6]:
keys=[]
files = glob.glob(os.path.join(os.getcwd(), "forecast_import", "*"))
for file in files:
    keys.append(r"forecast_import/"+os.path.split(file)[1])

In [7]:
keys

['forecast_import/target_wl.parquet']

In [16]:
for key in keys:
    s3.Bucket(bucket_name).Object(key).upload_file(key)
    ts_s3_path = f"s3://{bucket_name}/{key}"

print(f"\nDone, the dataset is uploaded to S3 at {ts_s3_path}.")


Done, the dataset is uploaded to S3 at s3://forecast-exp-1111/forecast_import/target_wl.parquet.


#### Creating the Dataset

In [None]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
DATASET_FREQUENCY = "D" # H for hourly.
TS_DATASET_NAME = "WATCHLIST_TS"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      
      {
         "AttributeName":"target_value",
         "AttributeType":"integer"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']
describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)

print(f"The Dataset with ARN {ts_dataset_arn} is now {describe_dataset_response['Status']}")

In [6]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn from error message when running above cell after already created
ts_dataset_arn = 'arn:aws:forecast:us-east-1:054619787751:dataset/WATCHLIST_TS'

#### Importing the Dataset

In [None]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"
TS_IMPORT_JOB_NAME = "PREFUNDING_TTS_IMPORT"
TIMEZONE = "EST"

ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts_s3_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       Format="PARQUET",
                                       TimestampFormat=TIMESTAMP_FORMAT,
                                       TimeZone = TIMEZONE)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']
describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)

print(f"Waiting for Dataset Import Job with ARN {ts_dataset_import_job_arn} to become ACTIVE. This process could take 5-10 minutes.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))

describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)
print(f"\n\nThe Dataset Import Job with ARN {ts_dataset_import_job_arn} is now {describe_dataset_import_job_response['Status']}.")

Waiting for Dataset Import Job with ARN arn:aws:forecast:us-east-1:054619787751:dataset-import-job/WATCHLIST_TS/PREFUNDING_TTS_IMPORT to become ACTIVE. This process could take 5-10 minutes.

Current Status:
CREATE_PENDING .
CREATE_IN_PROGRESS ............................................................

In [8]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
# target dataset (watchlist) imported at:
ts_dataset_import_job_arn = 'arn:aws:forecast:us-east-1:054619787751:dataset-import-job/WATCHLIST_TS/PREFUNDING_TTS_IMPORT'

#### Creating a DatasetGroup

In [99]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
DATASET_GROUP_NAME = "TAIWAN_PREFUNDING"
DATASET_ARNS = [ts_dataset_arn]

create_dataset_group_response = \
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=DATASET_GROUP_NAME,
                                  DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']
describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

print(f"The DatasetGroup with ARN {dataset_group_arn} is now {describe_dataset_group_response['Status']}.")

The DatasetGroup with ARN arn:aws:forecast:us-east-1:054619787751:dataset-group/TAIWAN_PREFUNDING is now ACTIVE.


In [11]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn original cell execution above
dataset_group_arn = 'arn:aws:forecast:us-east-1:054619787751:dataset-group/TAIWAN_PREFUNDING'

## Step 2: Train a predictor - Experiment 01 <a class="anchor" id="predictor"></a>

In this step, we will create a **Predictor** using the **DatasetGroup** that was created above. After creating the predictor, we will review the accuracy obtained through the backtesting process to get a quantitative understanding of the performance of the predictor.

This will be the baseline predictor and experiment which we will expand on later with related datasets.

In [None]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
PREDICTOR_NAME = "PREFUNDING_PREDICTOR_01"
FORECAST_HORIZON = 1
FORECAST_FREQUENCY = "D"
#HOLIDAY_DATASET = [{
#        'Name': 'holiday',
#        'Configuration': {
#        'CountryCode': ['TW']
#    }
#}]

create_auto_predictor_response = \
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                       #,'AdditionalDatasets': HOLIDAY_DATASET
                                        },
                                   ExplainPredictor = True)

predictor_arn = create_auto_predictor_response['PredictorArn']
print(f"Waiting for Predictor with ARN {predictor_arn} to become ACTIVE. Depending on data size and predictor setting，it can take several hours to be ACTIVE.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))

describe_auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)
print(f"\n\nThe Predictor with ARN {predictor_arn} is now {describe_auto_predictor_response['Status']}.")

Waiting for Predictor with ARN arn:aws:forecast:us-east-1:054619787751:predictor/PREFUNDING_PREDICTOR_01_01GKW0Q4KG85PR41MRZXVXR7F5 to become ACTIVE. Depending on data size and predictor setting，it can take several hours to be ACTIVE.

Current Status:
CREATE_PENDING ..
CREATE_IN_PROGRESS .................

In [5]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn original cell execution above
predictor_arn = 'arn:aws:forecast:us-east-1:054619787751:predictor/PREFUNDING_PREDICTOR_01_01GKW0Q4KG85PR41MRZXVXR7F5'

#### Review accuracy metrics

* **Weighted Quantile Loss (wQL)** metric measures the accuracy of a model at a specified quantile. It is particularly useful when there are different costs for underpredicting and overpredicting.

* **Root Mean Square Error (RMSE)** uses the squared value of the residuals, which amplifies the impact of outliers. In use cases where only a few large mispredictions can be very costly, the RMSE is the more relevant metric.

* **Weighted Absolute Percentage Error (WAPE)** is more robust to outliers than Root Mean Square Error (RMSE) because it uses the absolute error instead of the squared error.

* **Mean Absolute Percentage Error (MAPE)** is useful for cases where values differ significantly between time points and outliers have a significant impact.

* **Mean Absolute Scaled Error (MASE)** is ideal for datasets that are cyclical in nature or have seasonal properties.

In [7]:
get_accuracy_metrics_response = forecast.get_accuracy_metrics(PredictorArn=predictor_arn)
wql = get_accuracy_metrics_response['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['WeightedQuantileLosses']
accuracy_scores = get_accuracy_metrics_response['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['ErrorMetrics'][0]

print(f"Weighted Quantile Loss (wQL): {json.dumps(wql, indent=2)}")

print(f"Root Mean Square Error (RMSE): {accuracy_scores['RMSE']}")

print(f"Weighted Absolute Percentage Error (WAPE): {accuracy_scores['WAPE']}")

print(f"Mean Absolute Percentage Error (MAPE): {accuracy_scores['MAPE']}")

print(f"Mean Absolute Scaled Error (MASE): {accuracy_scores['MASE']}")

Weighted Quantile Loss (wQL): [
  {
    "Quantile": 0.9,
    "LossValue": 0.13846153846153844
  },
  {
    "Quantile": 0.5,
    "LossValue": 0.38461538461538464
  },
  {
    "Quantile": 0.1,
    "LossValue": 0.10769230769230768
  }
]
Root Mean Square Error (RMSE): 0.38682272190477407
Weighted Absolute Percentage Error (WAPE): 0.4653846153846153
Mean Absolute Percentage Error (MAPE): 0.051818181818181826
Mean Absolute Scaled Error (MASE): 1e-130


#### Reviewing forecast
I generated a forecast using the AWS console but noticed that the full universe of stocks wasn't generated by the forecast. See below:

In [15]:
!aws s3 sync s3://forecast-exp-1111/my_forecast_exp01/ ./exp_01/forecast_01

download: s3://forecast-exp-1111/my_forecast_exp01/_SUCCESS to exp_01/forecast_01/_SUCCESS
download: s3://forecast-exp-1111/my_forecast_exp01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part1.csv to exp_01/forecast_01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part1.csv
download: s3://forecast-exp-1111/my_forecast_exp01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part0.csv to exp_01/forecast_01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part0.csv
download: s3://forecast-exp-1111/my_forecast_exp01/_CHECK to exp_01/forecast_01/_CHECK
download: s3://forecast-exp-1111/my_forecast_exp01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part3.csv to exp_01/forecast_01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part3.csv
download: s3://forecast-exp-1111/my_forecast_exp01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part2.csv to exp_01/forecast_01/my_forecast_exp1_01_export_2022-12-10T22-22-10Z_part2.csv


In [23]:
dfs = [pd.read_csv(f) for f in glob.glob(os.path.join(os.getcwd(),"exp_01","forecast_01","*.csv"))]
forecasts_01 = pd.concat(dfs)

In [27]:
forecasts_01.sort_values('item_id')

Unnamed: 0,item_id,date,p50,p90,p95
3,1213,2022-11-01T00:00:00Z,1.0,1.0,1.0
2,1418,2022-11-01T00:00:00Z,0.0,1.0,1.0
1,1472,2022-11-01T00:00:00Z,1.0,1.0,1.0
5,1512,2022-11-01T00:00:00Z,1.0,1.0,1.0
0,1538,2022-11-01T00:00:00Z,1.0,1.0,1.0
1,2025,2022-11-01T00:00:00Z,1.0,1.0,1.0
2,2321,2022-11-01T00:00:00Z,1.0,1.0,1.0
0,2364,2022-11-01T00:00:00Z,0.0,1.0,1.0
2,2443,2022-11-01T00:00:00Z,1.0,1.0,1.0
5,2841,2022-11-01T00:00:00Z,0.0,1.0,1.0


In [68]:
#download backtest exports
#!aws s3 sync s3://forecast-exp-1111/backtest_exports/ ./backtest_exports

In [69]:
#accuracy_metric_values_files = glob.glob(
#                                    os.path.join(os.getcwd(),
#                                                 'backtest_exports', 
#                                                 'accuracy-metrics-values',
#                                                 '*.csv'))

In [70]:
#accuracy_metric_values_files

In [71]:
#accuracy_metric_values = [pd.read_csv(f,sep=',') for f in accuracy_metric_values_files]

In [72]:
#acc_01 = pd.concat(accuracy_metric_values, axis=0).sort_values(by=['item_id','backtestwindow_end_time'])

In [73]:
#acc_01[acc_01['backtest_window']=='Summary']

In [4]:
msci_tw = ['2330',
'2317',
'2454',
'2308',
'2303',
'2881',
'2412',
'2891',
'1301',
'2882',
'1303',
'2886',
'2002',
'3711',
'1216',
'2884',
'5871',
'2892',
'5880',
'1326',
'2885',
'3008',
'1101',
'2880',
'2883',
'2382',
'2357',
'3037',
'2207',
'5876',
'2890',
'3034',
'2327',
'3045',
'2887',
'2603',
'2912',
'6415',
'8069',
'2395',
'2379',
'1590',
'2301',
'4938',
'2345',
'1605',
'2888',
'6409',
'2474',
'2609',
'3481',
'4904',
'1402',
'6446',
'2409',
'6488',
'6770',
'3529',
'1102',
'6505',
'1476',
'2324',
'9910',
'2801',
'2377',
'2347',
'6669',
'2834',
'9945',
'3702',
'4958',
'5347',
'9904',
'2618',
'9921',
'2353',
'2408',
'8046',
'4966',
'2105',
'2344',
'2356',
'2610',
'2633',
'3105',
'2615',
'8464',
'8454',]

## Experiment 01.2 <a class="anchor" id="predictor"></a>
### Training a new predictor on updated watchlist data (correctly handling nulls). Update forecast horizon to 5 days.

#### Import updated data to s3

In [21]:
!aws s3 sync ./forecast_import/ s3://forecast-exp-1111/forecast_import/ #--dryrun

upload: forecast_import/target_wl2.parquet to s3://forecast-exp-1111/forecast_import/target_wl2.parquet


#### Creating new dataset

In [13]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
DATASET_FREQUENCY = "D" # H for hourly.
TS_DATASET_NAME = "WATCHLIST_TS_2"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      
      {
         "AttributeName":"target_value",
         "AttributeType":"integer"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']
describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)

print(f"The Dataset with ARN {ts_dataset_arn} is now {describe_dataset_response['Status']}")

The Dataset with ARN arn:aws:forecast:us-east-1:054619787751:dataset/WATCHLIST_TS_2 is now ACTIVE


In [6]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn original cell execution above
ts2_dataset_arn = 'arn:aws:forecast:us-east-1:054619787751:dataset/WATCHLIST_TS_2'

#### Importing new dataset

In [14]:
ts2_s3_path = 's3://forecast-exp-1111/forecast_import/target_wl2.parquet'

In [22]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"
TS_IMPORT_JOB_NAME = "PREFUNDING_TTS_IMPORT_02"
TIMEZONE = "EST"

ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
                                       DatasetArn=ts2_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts2_s3_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       Format="PARQUET",
                                       TimestampFormat=TIMESTAMP_FORMAT,
                                       TimeZone = TIMEZONE)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']
describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)

print(f"Waiting for Dataset Import Job with ARN {ts_dataset_import_job_arn} to become ACTIVE. This process could take 5-10 minutes.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))

describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)
print(f"\n\nThe Dataset Import Job with ARN {ts_dataset_import_job_arn} is now {describe_dataset_import_job_response['Status']}.")

Waiting for Dataset Import Job with ARN arn:aws:forecast:us-east-1:054619787751:dataset-import-job/WATCHLIST_TS_2/PREFUNDING_TTS_IMPORT_02 to become ACTIVE. This process could take 5-10 minutes.

Current Status:
CREATE_PENDING .
CREATE_IN_PROGRESS .........
ACTIVE 


The Dataset Import Job with ARN arn:aws:forecast:us-east-1:054619787751:dataset-import-job/WATCHLIST_TS_2/PREFUNDING_TTS_IMPORT_02 is now ACTIVE.


In [7]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn original cell execution above
dataset_group_arn = 'arn:aws:forecast:us-east-1:054619787751:dataset-import-job/WATCHLIST_TS_2/PREFUNDING_TTS_IMPORT_02'

#### Creating a DatasetGroup

In [24]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
DATASET_GROUP_NAME = "TAIWAN_PREFUNDING_01_02"
DATASET_ARNS = [ts2_dataset_arn]

create_dataset_group_response = \
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=DATASET_GROUP_NAME,
                                  DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']
describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

print(f"The DatasetGroup with ARN {dataset_group_arn} is now {describe_dataset_group_response['Status']}.")

The DatasetGroup with ARN arn:aws:forecast:us-east-1:054619787751:dataset-group/TAIWAN_PREFUNDING_01_02 is now ACTIVE.


In [8]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn original cell execution above
dataset_group_arn = 'arn:aws:forecast:us-east-1:054619787751:dataset-group/TAIWAN_PREFUNDING_01_02'

#### Train a predictor

In [9]:
#ONLY NEED TO RUN THIS ONCE. SKIP TO NEXT CELL IF CREATED PREVIOUSLY.
PREDICTOR_NAME = "PREFUNDING_PREDICTOR_01_02"
FORECAST_HORIZON = 5
FORECAST_FREQUENCY = "D"
#HOLIDAY_DATASET = [{
#        'Name': 'holiday',
#        'Configuration': {
#        'CountryCode': ['TW']
#    }
#}]

create_auto_predictor_response = \
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                       #,'AdditionalDatasets': HOLIDAY_DATASET
                                        },
                                   ExplainPredictor = True)

predictor_arn = create_auto_predictor_response['PredictorArn']
print(f"Waiting for Predictor with ARN {predictor_arn} to become ACTIVE. Depending on data size and predictor setting，it can take several hours to be ACTIVE.")

#status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))

Waiting for Predictor with ARN arn:aws:forecast:us-east-1:054619787751:predictor/PREFUNDING_PREDICTOR_01_02_01GM3KN277EYERQD7VNAMN254K to become ACTIVE. Depending on data size and predictor setting，it can take several hours to be ACTIVE.

Current Status:


In [13]:
#Get current status of predictor
describe_auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)
print(f"\n\nThe Predictor with ARN {predictor_arn} is now {describe_auto_predictor_response['Status']}.")



The Predictor with ARN arn:aws:forecast:us-east-1:054619787751:predictor/PREFUNDING_PREDICTOR_01_02_01GM3KN277EYERQD7VNAMN254K is now ACTIVE.


In [14]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn original cell execution above
predictor_arn = 'arn:aws:forecast:us-east-1:054619787751:predictor/PREFUNDING_PREDICTOR_01_02_01GM3KN277EYERQD7VNAMN254K'

#### Accuracy metrics

In [15]:
get_accuracy_metrics_response = forecast.get_accuracy_metrics(PredictorArn=predictor_arn)
wql = get_accuracy_metrics_response['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['WeightedQuantileLosses']
accuracy_scores = get_accuracy_metrics_response['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['ErrorMetrics'][0]

print(f"Weighted Quantile Loss (wQL): {json.dumps(wql, indent=2)}")

print(f"Root Mean Square Error (RMSE): {accuracy_scores['RMSE']}")

print(f"Weighted Absolute Percentage Error (WAPE): {accuracy_scores['WAPE']}")

print(f"Mean Absolute Percentage Error (MAPE): {accuracy_scores['MAPE']}")

print(f"Mean Absolute Scaled Error (MASE): {accuracy_scores['MASE']}")

Weighted Quantile Loss (wQL): [
  {
    "Quantile": 0.9,
    "LossValue": 0.033315385946153776
  },
  {
    "Quantile": 0.5,
    "LossValue": 0.024813485476922994
  },
  {
    "Quantile": 0.1,
    "LossValue": 0.038043852469230935
  }
]
Root Mean Square Error (RMSE): 0.0069419847508758
Weighted Absolute Percentage Error (WAPE): 0.06988694954358965
Mean Absolute Percentage Error (MAPE): 0.00024693699124686194
Mean Absolute Scaled Error (MASE): 0.0008089736696903767


#### Generate forecasts

In [16]:
FORECAST_NAME = "MY_FORECAST_EXP_01_02"

create_forecast_response = \
    forecast.create_forecast(ForecastName=FORECAST_NAME,
                             PredictorArn=predictor_arn)

forecast_arn = create_forecast_response['ForecastArn']
print(f"Waiting for Forecast with ARN {forecast_arn} to become ACTIVE. Depending on data size and predictor settings，it can take several hours to be ACTIVE.\n\nCurrent Status:")

#status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn))

Waiting for Forecast with ARN arn:aws:forecast:us-east-1:054619787751:forecast/MY_FORECAST_EXP_01_02 to become ACTIVE. Depending on data size and predictor settings，it can take several hours to be ACTIVE.

Current Status:


In [20]:
describe_forecast_response = forecast.describe_forecast(ForecastArn=forecast_arn)
print(f"\n\nThe Forecast with ARN {forecast_arn} is now {describe_forecast_response['Status']}.")



The Forecast with ARN arn:aws:forecast:us-east-1:054619787751:forecast/MY_FORECAST_EXP_01_02 is now ACTIVE.


In [21]:
#ONLY RUN IF YOU HAVE ALREADY RUN THE ABOVE CELL
#Obtained arn original cell execution above
predictor_arn = 'arn:aws:forecast:us-east-1:054619787751:forecast/MY_FORECAST_EXP_01_02'

In [None]:
#### Create Forecast export job

In [35]:
response = forecast.create_forecast_export_job(
    ForecastExportJobName='my_forecast_exp_01_02',
    ForecastArn=forecast_arn,
    Destination={
        'S3Config': {
            'Path': 's3://forecast-exp-1111/my_forecast_exp_01_02/',
            'RoleArn': role_arn
            #'KMSKeyArn': 'string'
        }
    }
    #Format='CSV'
)

In [38]:
#!aws s3 sync s3://forecast-exp-1111/my_forecast_exp_01_02 ./exp_01_02/forecast_01_02 #--dryrun

In [39]:
dfs = [pd.read_csv(f) for f in glob.glob(os.path.join(os.getcwd(),"exp_01_02","forecast_01_02","*.csv"))]
forecasts_01_02 = pd.concat(dfs)

In [60]:
forecasts_01_02['item_id'].nunique()

1195

In [126]:
forecasts_01_02#['item_id'].unique()

Unnamed: 0,item_id,date,p10,p50,p90
0,1235,2022-11-01T00:00:00Z,-0.000050,-0.000024,0.000014
1,1235,2022-11-02T00:00:00Z,-0.000053,-0.000022,0.000009
2,1235,2022-11-03T00:00:00Z,-0.000044,-0.000021,0.000007
3,1235,2022-11-04T00:00:00Z,-0.000050,-0.000019,0.000022
4,1235,2022-11-05T00:00:00Z,-0.000045,-0.000017,0.000018
...,...,...,...,...,...
30,9136,2022-11-01T00:00:00Z,-0.000044,-0.000020,0.000012
31,9136,2022-11-02T00:00:00Z,-0.000053,-0.000024,0.000002
32,9136,2022-11-03T00:00:00Z,-0.000044,-0.000019,0.000024
33,9136,2022-11-04T00:00:00Z,-0.000048,-0.000020,0.000014


In [66]:
actuals_1101 = ['1213','1472','1512','1538','2025','2321','2443','3018','3043','3536','6225','8101','9110']

In [142]:
#showing the 5-day predictions for the 13 stocks that show up on the actual watchlist for 11/01/2022:
forecasts_01_02[forecasts_01_02['item_id'].isin(actuals_1101)].sort_values(['item_id','date'])

Unnamed: 0,item_id,date,p10,p50,p90
15,1213,2022-11-01T00:00:00Z,0.988473,1.001231,1.014586
16,1213,2022-11-02T00:00:00Z,0.986222,0.9986,1.009422
17,1213,2022-11-03T00:00:00Z,0.988338,0.997103,1.017501
18,1213,2022-11-04T00:00:00Z,0.922382,0.977046,1.015014
19,1213,2022-11-05T00:00:00Z,-0.000159,0.005434,0.011387
25,2443,2022-11-01T00:00:00Z,0.989733,1.000846,1.016798
26,2443,2022-11-02T00:00:00Z,0.985751,0.999528,1.012005
27,2443,2022-11-03T00:00:00Z,0.987436,0.996902,1.007511
28,2443,2022-11-04T00:00:00Z,0.928538,0.980124,1.031793
29,2443,2022-11-05T00:00:00Z,-0.000949,0.006161,0.010979


## Experiment 02 <a class="anchor" id="predictor"></a>
### Incorporating related data into the predictions