# Multiple models comparison

This notebook will run three forecasting algorithms on the same dataset and compare their performances.

The algorithms are:
  - Prophet
  - ETS
  - DeepAR+
 

## Setup

In [None]:
import boto3
from time import sleep
import pandas as pd
import seaborn as sns
import pprint
pp = pprint.PrettyPrinter(indent=2)  # Better display for dictionaries

The line below will retrieve your shared variables from the first notebook.

In [None]:
%store -r

The last part of the setup process is to validate that your account can communicate with Amazon Forecast, the cell below does just that.

In [None]:
forecast = boto3.client(service_name='forecast')
forecastquery = boto3.client(service_name='forecastquery')

## Create the predictors

The first step is to create a dictionary where to store useful information about the algorithms: their name, ARN and eventually their performance metrics.

In [None]:
algos = ['Prophet', 'ETS', 'Deep_AR_Plus']

predictors = {a:{} for a in algos}

for p in predictors:
    predictors[p]['predictor_name'] = project + '_' + p + '_algo'
    predictors[p]['algorithm_arn'] = 'arn:aws:forecast:::algorithm/' + p

pp.pprint(predictors)

Here we also define our forecast horizon: the number of time points to be predicted in the future. For weekly data, a value of 12 means 12 weeks. Our example is hourly data, we try forecast the next day, so we can set to 24.

In [None]:
forecastHorizon = 24

The following function actually creates the predictor as specified by several parameters. We will call this function once for each of the 3 algorithms.

In [None]:
def create_predictor_response(pred_name, algo_arn, forecast_horizon):
    response=forecast.create_predictor(PredictorName=pred_name, 
                                       AlgorithmArn=algo_arn,
                                       ForecastHorizon=forecast_horizon,
                                       PerformAutoML= False,
                                       PerformHPO=False,
                                       EvaluationParameters= {"NumberOfBacktestWindows": 1, 
                                                              "BackTestWindowOffset": 24}, 
                                       InputDataConfig= {"DatasetGroupArn": datasetGroupArn},
                                       FeaturizationConfig= {"ForecastFrequency": "H", 
                                                             "Featurizations": 
                                                             [
                                                                 {"AttributeName": "target_value", 
                                                                  "FeaturizationPipeline": 
                                                                  [
                                                                      {"FeaturizationMethodName": "filling", 
                                                                       "FeaturizationMethodParameters": 
                                                                       {"frontfill": "none", 
                                                                        "middlefill": "zero", 
                                                                        "backfill": "zero"}
                                                                      }
                                                                  ]
                                                                 }
                                                             ]
                                                            }
                                      )
    return response

For all 3 algorithms, we invoke their creation and wait until they are complete. We also store their performance in our dictionary.

In [None]:
for p in predictors.keys():
    predictor_response = create_predictor_response(predictors[p]['predictor_name'], predictors[p]['algorithm_arn'], forecastHorizon)
    
    predictorArn=predictor_response['PredictorArn']
    
    # wait for the predictor to be actually created
    print('------------------ Creating ' + p)
    while True:
        predictorStatus = forecast.describe_predictor(PredictorArn=predictorArn)['Status']
        print(predictorStatus)
        if predictorStatus != 'ACTIVE' and predictorStatus != 'CREATE_FAILED':
            sleep(30)
        else:
            predictors[p]['predictor_arn'] = predictorArn  # save it, just for reference
            break
            
    # compute and store performance metrics, then proceed with the next algorithm        
    predictors[p]['accuracy'] = forecast.get_accuracy_metrics(PredictorArn=predictorArn)

**TODO:** (Bar?)plot RMSE, 0.9-, 0.5- and 0.1-quantile LossValues for each algorithm

This is what we stored so far for DeepAR+:

In [None]:
pp.pprint(predictors['Deep_AR_Plus'])

## Visualize results

We use `seaborn` as it interacts well with `pandas` DataFrames.

Looping over our dictionary, we can retrieve the Root Mean Square Error (RMSE) for each predictor and plot it as a bar plot.

In [None]:
scores = pd.DataFrame(columns=['predictor', 'RMSE'])
for p in predictors:
    score = predictors[p]['accuracy']['PredictorEvaluationResults'][0]['TestWindows'][0]['Metrics']['RMSE']
    scores = scores.append(pd.DataFrame({'predictor':[p], 'RMSE':[score]}), ignore_index=True)

In [None]:
fig = sns.barplot(data=scores, x='predictor', y='RMSE').set_title('Root Mean Square Error')