# AWS Machine Learning Nandoegree Capstone Project
# Forecasting with Amazon Forecast

## Note! These steps were taken from the below reference Forecast walkthrough: 
https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/basic/Getting_Started/Amazon_Forecast_Quick_Start_Guide.ipynb
https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/common/util/fcst_utils.py

#### Setup Notebook Environment

In [5]:
%%capture --no-stderr setup

!pip install pandas s3fs matplotlib ipywidgets
!pip install boto3 --upgrade

%reload_ext autoreload

#### Setup Imports

In [71]:
import sys
import os
import glob 
sys.path.insert( 0, os.path.abspath("../../common") )

import json
from util.fcst_utils import *
import boto3
import s3fs
import pandas as pd

#### Setup IAM Role used by Amazon Forecast to access your data

In [65]:
#role was manually setup in AWS console, with AmazonS3FullAccess
role_arn = 'arn:aws:iam::054619787751:role/my-forecast-role'

#### Create an instance of AWS SDK client for Amazon Forecast

In [68]:
region = 'us-east-1'
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

# Checking to make sure we can communicate with Amazon Forecast
assert forecast.list_predictors()

## Step 1: Import your data. <a class="anchor" id="import"></a>

In this step, we will create a **Dataset** and **Import** the Taiwan stock dataset from S3 to Amazon Forecast. To train a Predictor we will need a **DatasetGroup** that groups the input **Datasets**. So, we will end this step by creating a **DatasetGroup** with the imported **Dataset**.

In [69]:
s3 = boto3.Session().resource('s3')
bucket_name = "forecast-exp-1111"

In [92]:
keys=[]
files = glob.glob(os.path.join(os.getcwd(), "forecast_import", "*"))
for file in files:
    keys.append(r"forecast_import/"+os.path.split(file)[1])

In [93]:
keys

['forecast_import/ratios_rel.parquet',
 'forecast_import/stockquote_rel.parquet',
 'forecast_import/shortsales_rel.parquet',
 'forecast_import/forecast_target.parquet']

In [94]:
for key in keys:
    s3.Bucket(bucket_name).Object(key).upload_file(key)
    ts_s3_path = f"s3://{bucket_name}/{key}"

print(f"\nDone, the dataset is uploaded to S3 at {ts_s3_path}.")


Done, the dataset is uploaded to S3 at s3://forecast-exp-1111/forecast_import/forecast_target.parquet.


#### Creating the Dataset

In [98]:
DATASET_FREQUENCY = "D" # H for hourly.
TS_DATASET_NAME = "WATCHLIST_TS"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      
      {
         "AttributeName":"target_value",
         "AttributeType":"integer"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']
describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)

print(f"The Dataset with ARN {ts_dataset_arn} is now {describe_dataset_response['Status']}")

The Dataset with ARN arn:aws:forecast:us-east-1:054619787751:dataset/WATCHLIST_TS is now ACTIVE


#### Importing the Dataset

In [103]:
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"
TS_IMPORT_JOB_NAME = "PREFUNDING_TTS_IMPORT"
TIMEZONE = "EST"

ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts_s3_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       Format="PARQUET",
                                       TimestampFormat=TIMESTAMP_FORMAT,
                                       TimeZone = TIMEZONE)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']
describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)

print(f"Waiting for Dataset Import Job with ARN {ts_dataset_import_job_arn} to become ACTIVE. This process could take 5-10 minutes.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))

describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)
print(f"\n\nThe Dataset Import Job with ARN {ts_dataset_import_job_arn} is now {describe_dataset_import_job_response['Status']}.")

InvalidInputException: An error occurred (InvalidInputException) when calling the CreateDatasetImportJob operation: Parquet input data has unspecified attributes. Please ensure only these attributes are present: [[item_id, timestamp, target_value]]. Found [[sec_code, file_date, __index_level_0__]] attributes in input data.

#### Creating a DatasetGroup

In [99]:
DATASET_GROUP_NAME = "TAIWAN_PREFUNDING"
DATASET_ARNS = [ts_dataset_arn]

create_dataset_group_response = \
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=DATASET_GROUP_NAME,
                                  DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']
describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

print(f"The DatasetGroup with ARN {dataset_group_arn} is now {describe_dataset_group_response['Status']}.")

The DatasetGroup with ARN arn:aws:forecast:us-east-1:054619787751:dataset-group/TAIWAN_PREFUNDING is now ACTIVE.


## Step 2: Train a predictor - Experiment 01 <a class="anchor" id="predictor"></a>

In this step, we will create a **Predictor** using the **DatasetGroup** that was created above. After creating the predictor, we will review the accuracy obtained through the backtesting process to get a quantitative understanding of the performance of the predictor.

This will be the baseline predictor and experiment which we will expand on later with related datasets.

In [100]:
PREDICTOR_NAME = "PREFUNDING_PREDICTOR_01"
FORECAST_HORIZON = 1
FORECAST_FREQUENCY = "D"
HOLIDAY_DATASET = [{
        'Name': 'holiday',
        'Configuration': {
        'CountryCode': ['TW']
    }
}]

create_auto_predictor_response = \
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn, 
                                       'AdditionalDatasets': HOLIDAY_DATASET
                                    },
                                   ExplainPredictor = True)

predictor_arn = create_auto_predictor_response['PredictorArn']
print(f"Waiting for Predictor with ARN {predictor_arn} to become ACTIVE. Depending on data size and predictor setting，it can take several hours to be ACTIVE.\n\nCurrent Status:")

status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))

describe_auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)
print(f"\n\nThe Predictor with ARN {predictor_arn} is now {describe_auto_predictor_response['Status']}.")

ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the CreateAutoPredictor operation: Datasets: [arn:aws:forecast:us-east-1:054619787751:dataset/WATCHLIST_TS] have never been successfully imported.