# How to use Amazon Forecast

Helps advanced users start with Amazon Forecast quickly. The demo notebook runs through a typical end to end usecase for a simple timeseries forecasting scenario. 

Prerequisites: 
[AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/installing.html) . 

For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html)

## Table Of Contents
* [Setting up](#setup)
* [Test Setup - Running first API](#hello)
* [Forecasting Example with Amazon Forecast](#forecastingExample)

**Read Every Cell FULLY before executing it**


## Set up Preview SDK<a class="anchor" id="setup"></a>

In [1]:
# Configures your AWS CLI to now understand our up and coming service Amazon Forecast
!aws configure add-model --service-model file://../sdk/forecastquery-2018-06-26.normal.json --service-name forecastquery
!aws configure add-model --service-model file://../sdk/forecast-2018-06-26.normal.json --service-name forecast

In [2]:
# Prerequisites : 1 time install only, remove the comments to execute the lines.
#!pip install boto3
#!pip install pandas

In [3]:
import boto3
from time import sleep
import subprocess

In [4]:
session = boto3.Session(region_name='us-west-2') #us-east-1 is also supported

forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

## Test Setup <a class="anchor" id="hello"></a>
Let's say Hi to the Amazon Forecast to interact with our Simple API ListRecipes. The API returns a list of the global recipes Forecast offers that you could potentially use as a part of your forecasting solution. 

In [5]:
forecast.list_recipes()

{'RecipeNames': ['forecast_ARIMA',
  'forecast_DEEP_AR',
  'forecast_DEEP_AR_PLUS',
  'forecast_ETS',
  'forecast_MDN',
  'forecast_MQRNN',
  'forecast_NPTS',
  'forecast_PROPHET',
  'forecast_SQF'],
 'ResponseMetadata': {'RequestId': 'bb5de440-357f-4e45-aa47-5a0d40d8a448',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 01 Mar 2019 01:57:05 GMT',
   'x-amzn-requestid': 'bb5de440-357f-4e45-aa47-5a0d40d8a448',
   'content-length': '174',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

*If this ran successfully, kudos! If there are any errors at this point runing the following list_recipes, please contact us at the [AWS support forum](https://forums.aws.amazon.com/forum.jspa?forumID=327)

## Forecasting with Amazon Forecast<a class="anchor" id="forecastingExample"></a>
### Preparing your Data

In Amazon Forecast , a dataset is a collection of file(s) which contain data that is relevant for a forecasting task. A dataset must conform to a schema provided by Amazon Forecast. 

For this exercise, we use the individual household electric power consumption dataset. (Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.) We aggregate the usage data hourly. 

# Data Type

Amazon forecast can import data from Amazon S3. We first explore the data locally to see the fields

In [6]:
import pandas as pd
df = pd.read_csv("../data/item-demand-time.csv", dtype = object)
df.head(3)

Unnamed: 0,2014-01-01 01:00:00,38.34991708126038,client_12
0,2014-01-01 02:00:00,33.5820895522388,client_12
1,2014-01-01 03:00:00,34.41127694859037,client_12
2,2014-01-01 04:00:00,39.800995024875625,client_12


Now upload the data to S3. But before doing that, go into your AWS Console, select S3 for the service and create a new bucket inside the `Oregon` or `us-west-2` region. Use that bucket name convention of `amazon-forecast-unique-value-data`. The name must be unique, if you get an error, just adjust until your name works, then update the `bucketName` cell below.

In [7]:
s3 = session.client('s3')

In [8]:
accountId = boto3.client('sts').get_caller_identity().get('Account')

In [9]:
bucketName = 'amazon-forecast-chngyan-data'# Update the unique-value bit here.
key="elec_data/item-demand-time.csv"

In [10]:
#s3.upload_file(Filename="../data/item-demand-time.csv", Bucket=bucketName, Key=key)

In [11]:
bucketName

'amazon-forecast-chngyan-data'

In [12]:
# One time setup only, uncomment the following command to create the role to provide to Amazon Forecast. 
# Save the generated role for all future calls to use for importing or exporting data. 

cmd = 'python ../setup_forecast_permissions.py '+bucketName
p = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)

In [13]:
roleArn = 'arn:aws:iam::%s:role/amazonforecast'%accountId

### CreateDataset

More details about `Domain` and dataset type can be found on the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html) . For this example, we are using [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`. Also for your project name, update it to reflect your name in a lowercase format.

In [14]:
DATASET_FREQUENCY = "H" 
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"

In [15]:
project = 'try_new_sdk' # Replace this with a unique name here, make sure the entire name is < 30 characters.
datasetName= project+'_ds'
datasetGroupName= project +'_gp'
s3DataPath = "s3://"+bucketName+"/"+key

In [16]:
datasetName

'try_new_sdk_ds'

In [17]:
# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files.
schema ={
   "Attributes":[
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"float"
      },
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      }
   ]
}

response=forecast.create_dataset(
                    Domain="CUSTOM",
                    DatasetType='TARGET_TIME_SERIES',
                    DataFormat='CSV',
                    DatasetName=datasetName,
                    DataFrequency=DATASET_FREQUENCY, 
                    TimeStampFormat=TIMESTAMP_FORMAT,
                    Schema = schema
                   )

In [18]:
forecast.describe_dataset(DatasetName=datasetName)

{'DatasetName': 'try_new_sdk_ds',
 'DatasetType': 'TARGET_TIME_SERIES',
 'DataFormat': 'CSV',
 'DataFrequency': 'H',
 'TimeStampFormat': 'yyyy-MM-dd hh:mm:ss',
 'Schema': {'Attributes': [{'AttributeName': 'timestamp',
    'AttributeType': 'timestamp',
    'AggregationMethod': 'min',
    'FillMethod': 'previous',
    'FeatureType': 'NONE'},
   {'AttributeName': 'target_value',
    'AttributeType': 'float',
    'AggregationMethod': 'sum',
    'FillMethod': 'zero',
    'FrontFillMethod': 'none',
    'BackFillMethod': 'zero',
    'FeatureType': 'TIME_SERIES'},
   {'AttributeName': 'item_id',
    'AttributeType': 'string',
    'AggregationMethod': 'min',
    'FillMethod': 'previous',
    'FeatureType': 'NONE'}]},
 'Domain': 'CUSTOM',
 'ScheduleExpression': 'none',
 'DatasetArn': 'arn:aws:forecast:us-west-2:938097332257:ds/try_new_sdk_ds',
 'Status': 'ACTIVE',
 'ResponseMetadata': {'RequestId': 'f98d9c02-ae2c-43b1-a48b-373676dc62dc',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type':

In [19]:
forecast.create_dataset_group(DatasetGroupName=datasetGroupName,RoleArn=roleArn,DatasetNames=[datasetName])

{'DatasetGroupName': 'try_new_sdk_gp',
 'DatasetGroupArn': 'arn:aws:forecast:us-west-2:938097332257:dsgroup/try_new_sdk_gp',
 'ResponseMetadata': {'RequestId': '4d947b47-4e37-473d-8bde-463e8063a4c0',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 01 Mar 2019 01:57:24 GMT',
   'x-amzn-requestid': '4d947b47-4e37-473d-8bde-463e8063a4c0',
   'content-length': '120',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

If you have an existing datasetgroup, you can update it

In [20]:
forecast.describe_dataset_group(DatasetGroupName=datasetGroupName)

{'DatasetGroupName': 'try_new_sdk_gp',
 'DatasetGroupArn': 'arn:aws:forecast:us-west-2:938097332257:dsgroup/try_new_sdk_gp',
 'Datasets': ['try_new_sdk_ds'],
 'RoleArn': 'arn:aws:iam::938097332257:role/amazonforecast',
 'ResponseMetadata': {'RequestId': 'd36d7d41-fb36-4b0e-a739-c854eb606b2b',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 01 Mar 2019 01:57:26 GMT',
   'x-amzn-requestid': 'd36d7d41-fb36-4b0e-a739-c854eb606b2b',
   'content-length': '208',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Create Data Import Job
Brings the data into Amazon Forecast system ready to forecast from raw data. 

In [21]:
ds_import_job_response=forecast.create_dataset_import_job(DatasetName=datasetName,Delimiter=',', DatasetGroupName =datasetGroupName ,S3Uri= s3DataPath)


In [22]:
ds_versionId=ds_import_job_response['VersionId']
print(ds_versionId)

7de98e6b


Check the status of dataset, when the status change from **CREATING** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**. This process will take 5 to 10 minutes.

In [23]:
while True:
    dataImportStatus = forecast.describe_dataset_import_job(DatasetName=datasetName,VersionId=ds_versionId)['Status']
    print(dataImportStatus)
    if dataImportStatus != 'ACTIVE' and dataImportStatus != 'FAILED':
        sleep(30)
    else:
        break

QUEUED
CREATING
CREATING
CREATING
CREATING
ACTIVE


In [24]:
forecast.describe_dataset_import_job(DatasetName=datasetName,VersionId=ds_versionId)

{'DatasetArn': 'arn:aws:forecast:us-west-2:938097332257:ds/try_new_sdk_ds',
 'DatasetName': 'try_new_sdk_ds',
 'VersionId': '7de98e6b',
 'Status': 'ACTIVE',
 'FieldStatistics': {'date': {'Count': 26280,
   'CountDistinct': 8760,
   'CountNull': 0,
   'Min': '2014-01-01T01:00:00Z',
   'Max': '2015-01-01T00:00:00Z'},
  'item': {'Count': 26280, 'CountDistinct': 3, 'CountNull': 0},
  'target': {'Count': 26280,
   'CountDistinct': 5059,
   'CountNull': 0,
   'CountNan': 0,
   'Min': '0.0',
   'Max': '212.27197346600326',
   'Avg': 50.82350576202014,
   'Stddev': 37.9125549309785}},
 'S3Uri': 's3://amazon-forecast-chngyan-data/elec_data/item-demand-time.csv',
 'StartTime': datetime.datetime(2019, 2, 28, 17, 57, 37, 842000, tzinfo=tzlocal()),
 'LastModificationTime': datetime.datetime(2019, 2, 28, 17, 59, 53, 652000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': '1cf33faf-a716-4c72-86ab-7c8f59e57612',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1

### Recipe

In [25]:
recipesResponse=forecast.list_recipes()
recipesResponse

{'RecipeNames': ['forecast_ARIMA',
  'forecast_DEEP_AR',
  'forecast_DEEP_AR_PLUS',
  'forecast_ETS',
  'forecast_MDN',
  'forecast_MQRNN',
  'forecast_NPTS',
  'forecast_PROPHET',
  'forecast_SQF'],
 'ResponseMetadata': {'RequestId': '7ce7c58d-14e7-4eba-89d2-502b4e7b4726',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 01 Mar 2019 02:00:33 GMT',
   'x-amzn-requestid': '7ce7c58d-14e7-4eba-89d2-502b4e7b4726',
   'content-length': '174',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

Get details about each recipe.

In [26]:
forecast.describe_recipe(RecipeName='forecast_MQRNN')

{'Recipe': {'Name': 'forecast_MQRNN',
  'Train': [{'TrainingInfo': {'TrainedModelName': 'algorithm_MQRNN',
     'AlgorithmName': 'MQRNN',
     'TrainingParameters': {'epochs': '60',
      'learning_rate': '3E-3',
      'mini_batch_size': '32',
      'quantiles': '[0.1,0.5,0.9]'}},
    'BackTestWindowCount': 1,
    'MetricsBuckets': []}]},
 'ResponseMetadata': {'RequestId': 'c353275d-64f1-42a0-9e19-398252d8ae3e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 01 Mar 2019 02:00:33 GMT',
   'x-amzn-requestid': 'c353275d-64f1-42a0-9e19-398252d8ae3e',
   'content-length': '281',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Create Solution with customer forecast horizon

Forecast horizon is how long in future the forecast should be predicting. For weekly data, a value of 12 means 12 weeks. Our example is hourly data, we try forecast the next day, so we can set to 24.

In [27]:
predictorName= project+'_mqrnn'

In [28]:
forecastHorizon = 24

In [29]:
createPredictorResponse=forecast.create_predictor(RecipeName='forecast_MQRNN',DatasetGroupName= datasetGroupName ,PredictorName=predictorName, 
  ForecastHorizon = forecastHorizon)

In [30]:
predictorVerionId=createPredictorResponse['VersionId']

In [31]:
forecast.list_predictor_versions(PredictorName=predictorName)

{'PredictorVersions': [{'PredictorName': 'try_new_sdk_mqrnn',
   'VersionId': '7ecf0cdf'}],
 'ResponseMetadata': {'RequestId': 'ce5db5aa-1264-4e1c-b9d2-10aee3959817',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 01 Mar 2019 02:00:37 GMT',
   'x-amzn-requestid': 'ce5db5aa-1264-4e1c-b9d2-10aee3959817',
   'content-length': '84',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

Check the status of solutions, when the status change from **CREATING** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parameters，it can take 10 mins to more than one hour to be **ACTIVE**.

In [None]:
while True:
    predictorStatus = forecast.describe_predictor(PredictorName=predictorName,VersionId=predictorVerionId)['Status']
    print(predictorStatus)
    if predictorStatus != 'ACTIVE' and predictorStatus != 'FAILED':
        sleep(30)
    else:
        break

CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING
CREATING


### Get Error Metrics

In [None]:
forecastquery.get_accuracy_metrics(PredictorName=predictorName)

### Deploy Predictor

In [None]:
forecast.deploy_predictor(PredictorName=predictorName)

In [None]:
deployedPredictorsResponse=forecast.list_deployed_predictors()
print(deployedPredictorsResponse)

Please note that the following cell can also take 10 minutes or more to be fully operational. There's no output here, but that is fine as long as the * is there.

In [None]:
while True:
    deployedPredictorStatus = forecast.describe_deployed_predictor(PredictorName=predictorName)['Status']
    print(deployedPredictorStatus)
    if deployedPredictorStatus != 'ACTIVE' and deployedPredictorStatus != 'FAILED':
        sleep(30)
    else:
        break
print(deployedPredictorStatus)

### Get Forecast

When the solution is deployed and forecast results are ready, you can view them. 

In [None]:
forecastResponse = forecastquery.get_forecast(
    PredictorName='prophet',
    Interval="hour",
    Filters={"item_id":"client_12"}
)
print(forecastResponse)

# Export Forecast

You can batch export forecast to s3 bucket. To do so an role with s3 put access is needed, but this has already been created.

In [None]:
forecastInfoList= forecast.list_predictors()

In [None]:
forecastInfoList

In [None]:
outputPath="s3://"+bucketName+"/output"

In [None]:
forecastExportResponse = forecast.create_forecast_export_job(ForecastId=forecastId, OutputPath={"S3Uri": outputPath,"RoleArn":roleArn})

In [None]:
forecastExportJobId = forecastExportResponse['ForecastExportJobId']

In [None]:
while True:
    forecastExportStatus = forecast.describe_forecast_export_job(ForecastExportJobId=forecastExportJobId)['Status']
    print(forecastExportStatus)
    if forecastExportStatus != 'ACTIVE' and forecastExportStatus != 'FAILED':
        sleep(30)
    else:
        break

Check s3 bucket for results

In [None]:
s3.list_objects(Bucket=bucketName,Prefix="output")

# Cleanup

While Forecast is in preview there are no charges for using it, but to future proof this work below are the instructions to cleanup your work space.

In [None]:
# Delete Deployed Predictor 
forecast.delete_deployed_predictor(PredictorName=predictorName)

In [None]:
# Delete the Predictor: 
forecast.delete_predictor(PredictorName=predictorName)

In [None]:
# Delete Import
forecast.delete_dataset_import(DatasetName=datasetName)

In [None]:
# Delete Dataset Group
forecast.delete_dataset_group(DatasetGroupName=datasetGroupName)