# How to use Amazon Forecast

Although you can use Amazon Forecast by interacting with the API directly, we created this notebook to help users get started quickly.  We picked a typical timeseries forecasting scenario so stepping through this notebook will show you how to setup access, import your data, create predictors, and finally, use your predictors.

### Prerequisites
You will need the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/installing.html) to setup access to the service and interact with the API.  For more informations about APIs, please check the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/what-is-forecast.html).

### Table of contents
* [Setup](#setup)
* [Test Setup](#hello)
* [Forecasting with Amazon Forecast](#forecastingExample)

## Set up Preview Access<a class="anchor" id="setup"></a>

Amazon Forecast is not generally available yet.  To use the service during the preview, we need to configure the CLI (once for every CLI instance) so it knows about the service and set up IAM Policies in the AWS Account (once for every AWS Account) so Forecast can use data and deploy predictors in your account.

### Configure CLI
Remember that you need to do this once for each CLI instance you would like to use to interact with ForecastQuery and Forecast, the two APIs provided by Amazon Forecast.

First, confirm that your AWS CLI is configured to interact with your AWS Account as expected using:
```
aws ec2 describe-regions
```

Next use the following commands to "teach" the CLI about the two new APIs (remember to specify the correct location for the files you obtained from the GitHub repo where you got this notebook).
```
aws configure add-model --service-model file://../sdk/forecastquery-2018-06-26.normal.json --service-name forecastquery
aws configure add-model --service-model file://../sdk/forecast-2018-06-26.normal.json --service-name forecast

```

To confirm that this worked as expected, try getting help for the `list-recipes` command.
```
aws forecast list-recipes help
```

### Configure AWS Account (IAM setup)
During the preview period, you'll need to do this once for every AWS Account where you intend to use Amazon Forecast.

First, please create an Amazon S3 bucket in `us-west-2` that you intend to use for the rest of this exercise.
Remember that you'll need to find a unique name for your bucket.
```
aws s3 s3://my-amazon-forecast-labs --region us-west-2
```
*And yes, you can reuse an existing bucket, but you will need to tweak the IAM policies to use it.*


Finally, use `setup_forecast_permissions` utility (also available at the GitHub repo where you got this notebook) to setup the relevant IAM policies.
```
# remember to use your bucket name here
python3 ./setup_forecast_permissions.py my-amazon-forecast-labs
```

When this is done, you should see that the relevant IAM user or role will have a `ForecastUserPolicy` and a `PassRoleToForecastPolicy` attached.

## Test Setup <a class="anchor" id="hello"></a>
Let's say "Hi" to Amazon Forecast and use the simple ListRecipes API method. This method returns a list of the global recipes that you can use as a part of your forecasting solution.

In [277]:
# Prerequisites : 1 time install only
# !pip install boto3
# !pip install pandas

In [278]:
import boto3
from time import sleep

# remember to use your bucket name here
bucketName='my-amazon-forecast-labs'

In [311]:
session = boto3.Session(region_name='us-west-2') #us-east-1 is also supported

forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')

accountId = boto3.client('sts').get_caller_identity().get('Account')
# this role should have been created in the IAM setup step above
roleArn = 'arn:aws:iam::%s:role/amazonforecast'%accountId
print('IAM role for Amazon Forecast:  %s'%roleArn)

In [285]:
recipeList = forecast.list_recipes()
recipeList['RecipeNames']

*If this ran successfully, kudos! If there are any errors at this point runing the following list_recipes, please contact us at the [AWS support forum](https://forums.aws.amazon.com/forum.jspa?forumID=327)

Check out the documentation for more information about these recipes.  For this notebook, we'll be using the [Multi-Quantile Recurrent Neural Network (MQRNN)](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-mqrnn.html) recipe to forecast demand with a one-dimensional time series dataset.


## Forecasting with Amazon Forecast<a class="anchor" id="forecastingExample"></a>
### Preparing your Data

With Amazon Forecast, a dataset is a collection of file(s) which contain data that are relevant for a forecasting task. A dataset must conform to the schema provided by Amazon Forecast.

For this exercise, we use the individual household electric power consumption dataset. (Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.) 

We aggregate the usage data hourly. 

### Get the data

Amazon forecast can import data from Amazon S3. We first explore the data locally to see the fields

In [286]:
# for example, do this once to get the dataset
# !wget "https://raw.githubusercontent.com/aws-samples/amazon-forecast-samples/master/data/item-demand-time.csv"

In [290]:
import pandas as pd

df = pd.read_csv("item-demand-time.csv", dtype = object)
df.head(3)

Now upload the data to S3.

In [291]:
s3 = session.client('s3')
accountId = boto3.client('sts').get_caller_identity().get('Account')

In [292]:
# bucketName = 'amazon-forecast-%s-data'%accountId 
# remember that we've already set bucketName above

In [54]:
# upload the dataset to S3 (set file location to where you have this file saved)
s3.upload_file(Filename="../data/item-demand-time.csv", Bucket=bucketName, Key="elec_data/item-demand-time.csv")

### CreateDataset

For more on `Domain` and dataset types, check out the [documentation](https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html). For this example, we are using the [CUSTOM](https://docs.aws.amazon.com/forecast/latest/dg/custom-domain.html) domain with 3 required attributes `timestamp`, `target_value` and `item_id`. 

In [312]:
DATASET_FREQUENCY = "H" 
TIMESTAMP_FORMAT = "yyyy-MM-dd hh:mm:ss"

In [313]:
project = 'workshop'
datasetName = project + '_ds'
datasetGroupName = project + '_gp'
s3DataPath = "s3://" + bucketName + "/private/labs/forecast/elec_data"

In [314]:
datasetName

In [316]:
# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files
schema ={
   "Attributes":[
      {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"float"
      },
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      }
   ]
}

response = forecast.create_dataset(
                    Domain="CUSTOM",
                    DatasetType='TARGET_TIME_SERIES',
                    DataFormat='CSV',
                    DatasetName=datasetName,
                    DataFrequency=DATASET_FREQUENCY, 
                    TimeStampFormat=TIMESTAMP_FORMAT,
                    Schema = schema
                   )

In [318]:
# did it work?
forecast.describe_dataset(DatasetName=datasetName)

Next, add the new Dataset to a Dataset Group so we can use it to train a predictor.

In [320]:
forecast.create_dataset_group(DatasetGroupName=datasetGroupName, RoleArn=roleArn, DatasetNames=[datasetName])

If you have an existing Dataset Group, you can also update it.

In [322]:
# did it work?
forecast.describe_dataset_group(DatasetGroupName=datasetGroupName)

### Create Import Job
Next, let's create an import job to make the data available to Amazon Forecast for training.

In [64]:
ds_import_job_response = forecast.create_dataset_import_job(
    DatasetName=datasetName, Delimiter=',', DatasetGroupName =datasetGroupName, S3Uri= s3DataPath)


In [65]:
ds_versionId=ds_import_job_response['VersionId']
print(ds_versionId)

Check the status of dataset; we can continue once the status changes from **CREATING** to **ACTIVE**. Depending on the data size, this can take a few minutes.

In [323]:
while True:
    dataImportStatus = forecast.describe_dataset_import_job(DatasetName=datasetName,VersionId=ds_versionId)['Status']
    print('.', end='')
    if dataImportStatus != 'ACTIVE' and dataImportStatus != 'FAILED':
        sleep(10)
    else:
        break
print('')
print(dataImportStatus)

In [324]:
# did it work?
forecast.describe_dataset_import_job(DatasetName=datasetName, VersionId=ds_versionId)

### Choose your recipe
We will use the MQRNN recipe.

In [326]:
recipesResponse = forecast.list_recipes()
recipesResponse['RecipeNames']

To get more information about a recipe ...

In [327]:
forecast.describe_recipe(RecipeName='forecast_MQRNN')

### Create a solution with customer forecast horizon

The MQRNN recipe allows us to choose how many intervals (in future) to predict using the forecast horizon.  For weekly data, a value of 1 implies one week.  Our example uses hourly data and we want to forecast the next day, so we can set the forecast horizon to `24`.

In [341]:
predictorName = project + '_mqrnn'
forecastHorizon = 24
createPredictorResponse = forecast.create_predictor(
    RecipeName='forecast_MQRNN',
    DatasetGroupName= datasetGroupName,
    PredictorName=predictorName, 
    ForecastHorizon = forecastHorizon)

In [342]:
predictorVerionId = createPredictorResponse['VersionId']

In [343]:
forecast.list_predictor_versions(PredictorName=predictorName)['PredictorVersions']

Check the status of the predictor; when the status changes from **CREATING** to **ACTIVE**, we can continue to the next step.  Depending on data size, model selection, and hyper parameters，it can take 10 mins to more than one hour to be **ACTIVE**.

In [344]:
while True:
    predictorStatus = forecast.describe_predictor(PredictorName=predictorName,VersionId=predictorVerionId)['Status']
    # print(predictorStatus)
    print('.', end='')
    if predictorStatus != 'ACTIVE' and predictorStatus != 'FAILED':
        sleep(10)
    else:
        break
print('')
print(predictorStatus)

### Get Error Metrics
Let's take a look at how well we can expect our predictor to perform.

In [345]:
# metrics show the error loss data for each quantile
forecastquery.get_accuracy_metrics(PredictorName=predictorName)

### Deploy Predictor

In [346]:
forecast.deploy_predictor(PredictorName=predictorName)

This can take anywhere from 10 minutes to an hour; the predictor will be ready for use when the status becomes **ACTIVE**.

In [None]:
deployedPredictorsResponse=forecast.list_deployed_predictors()

In [None]:
while True:
    deployedPredictorStatus = forecast.describe_deployed_predictor(PredictorName=predictorName)['Status']
    print('.', end='')
    if deployedPredictorStatus != 'ACTIVE' and deployedPredictorStatus != 'FAILED':
        sleep(10)
    else:
        break
print('')
print(deployedPredictorStatus)

### Get Forecast

When the solution is deployed and forecast results are ready, you can view them. 

In [None]:
forecastResponse = forecastquery.get_forecast(
    PredictorName=predictorName,
    Interval="hour",
    Filters={"item_id":"client_12"}
)

df = pd.merge(
    left=pd.DataFrame(forecastResponse['Forecast']['Predictions']['mean']).rename(columns={ 'Val': 'p10'}),
    right=pd.DataFrame(forecastResponse['Forecast']['Predictions']['p10']).rename(columns={ 'Val': 'p50'})
)
df = pd.merge(
    left=df,
    right=pd.DataFrame(forecastResponse['Forecast']['Predictions']['p90']).rename(columns={ 'Val': 'p90'})
)
df.tail(3)

Let's plot out the forecast...

In [265]:
df.plot(title="Forecast", sharey=True, figsize=(18, 9), xticks=df.index)

# Export Forecast

You can batch export forecast to s3 bucket. To do so an role with s3 put access is needed. 

In [None]:
forecast.create_forecast_export_job(ForecastId=forecastId, OutputPath={"S3Uri": s3DataPath,"RoleArn":roleArn})