# AWS Machine Learning Nandoegree Capstone Project
# Forecasting with Amazon Forecast

## Note! These steps were taken from the below reference Forecast walkthrough: 
https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/basic/Getting_Started/Amazon_Forecast_Quick_Start_Guide.ipynb
https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/common/util/fcst_utils.py

#### Setup Notebook Environment

In [5]:
%%capture --no-stderr setup

!pip install pandas s3fs matplotlib ipywidgets
!pip install boto3 --upgrade

%reload_ext autoreload

#### Setup Imports

In [71]:
import sys
import os
import glob 
sys.path.insert( 0, os.path.abspath("../../common") )

import json
from util.fcst_utils import *
import boto3
import s3fs
import pandas as pd

#### Setup IAM Role used by Amazon Forecast to access your data

In [65]:
#role was manually setup in AWS console, with AmazonS3FullAccess
role_arn = 'arn:aws:iam::054619787751:role/my-forecast-role'

#### Create an instance of AWS SDK client for Amazon Forecast

In [68]:
region = 'us-east-1'
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

# Checking to make sure we can communicate with Amazon Forecast
assert forecast.list_predictors()

## Step 1: Import your data. <a class="anchor" id="import"></a>

In this step, we will create a **Dataset** and **Import** the Taiwan stock dataset from S3 to Amazon Forecast. To train a Predictor we will need a **DatasetGroup** that groups the input **Datasets**. So, we will end this step by creating a **DatasetGroup** with the imported **Dataset**.

In [69]:
s3 = boto3.Session().resource('s3')
bucket_name = "forecast-exp-1111"

In [86]:
keys=[]
files = glob.glob(os.path.join(os.getcwd(), "forecast_import", "*"))
for file in files:
    keys.append(r"forecast_import/"+os.path.split(file)[1])

In [87]:
keys

['forecast_import/ratios_rel.parquet',
 'forecast_import/stockquote_rel.parquet',
 'forecast_import/shortsales_rel.parquet',
 'forecast_import/forecast_target.parquet']

In [88]:
for key in keys:
    s3.Bucket(bucket_name).Object(key).upload_file(key)
    ts_s3_path = f"s3://{bucket_name}/{key}"

print(f"\nDone, the dataset is uploaded to S3 at {ts_s3_path}.")


Done, the dataset is uploaded to S3 at s3://forecast-exp-1111/forecast_import/forecast_target.parquet.


#### Creating the Dataset

In [91]:
DATASET_FREQUENCY = "D" # H for hourly.
TS_DATASET_NAME = "WATCHLIST_TS"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"file_date",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"Security Code Clean",
         "AttributeType":"string"
      },
      {
         "AttributeName":"on_watchlist",
         "AttributeType":"integer"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']
describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)

print(f"The Dataset with ARN {ts_dataset_arn} is now {describe_dataset_response['Status']}")

ClientError: An error occurred (ValidationException) when calling the CreateDataset operation: 1 validation error detected: Value 'Security Code Clean' at 'schema.attributes.2.member.attributeName' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[a-zA-Z][a-zA-Z0-9_]*