# Backtesting on Historical Data for Amazon Lookout for Metrics

Amazon Lookout for Metrics supports backtesting against your historical information and in this notebook we will demonstrate this functionality on the same dataset you just prepped. Once the backtesting job has completed you can see all of the anomalies that Amazon Lookout for Metrics detected in the last 30% of your historical data. From here you can begin to unpack the kinds of results you will see from Amazon Lookout for Metrics in the future when you start streaming in new data. **NOTE YOU MUST CREATE A NEW DETECTOR TO LEVERAGE REAL TIME DATA. BACKTESTING IS ONLY FOR EXPLORATION.**

This notebook assumes that you already completed the prerequisites in the `1.PrereqSetupPackages.ipynb` and `2.PrereqSetupData.ipynb`.If you have not, go back and complete those first.

## Initial Steps

First restore the variables from the previous notebook and then import the libraries needed.

In [None]:
%store -r

Just as in the last notebook, connect to AWS through the SDK.

In [None]:
import boto3
import utility

**IF THE CELL BELOW GENERATES AN ERROR:** This is totally normal, it may mean that your version of Boto3 is simply out of date. To correct this go to the cell below the one that errored and run it to update to the latest version of Boto3 inside SageMaker. IF you are not using a SageMaker Notebook simply follow the instructions for your Python environment. 

After running the upgrade cell, please click `Kernel` then `Restart Kernel` in the menu at the top. Once that has completed, start over at the top of this notebook again.

**DO NOT RUN THE UPGRADE CELL UNLESS YOU NEED TO**



In [None]:
L4M = boto3.client( "lookoutmetrics")

In [None]:
# THIS IS OPTIONAL, DO NOT RUN IT IF YOU DO NOT NEED TO 
!pip install --upgrade boto3

## Creating A Detector

Now the basic external resources are ready, so it is time to get started with Amazon Lookout for Metrics, that starts with creating a `Detector`.

### Detectors

To detect outliers, Amazon Lookout for Metrics builds a machine learning model that is trained with your source data. This model, called a `Detector`, is automatically trained with the machine learning algorithm that best fits your data and use case. You can either provide your historical data for training, if you have any, or get started with real-time data, and Amazon Lookout for Metrics will learn on-the-go. In this example for `Backtesting` you will only be providing historical data.

Here you will specify the S3 location of the historicla data, however you could specify the Amazon S3 location that Amazon Lookout for Metrics would continuously monitor for new data if you were creating a `Continous` detector. When you create a `Detector`, you also specify a `detecting domain` and an `outlier detection frequency`. 

The `anomaly detection frequency` specifies how frequently the detector should wake-up and look for new data, run analysis and alert you with any interesting findings.


In [None]:
project = "initial-lookoutmetrics-backtesting-test"

frequency = "PT1H" # one of 'P1D', 'PT1H', 'PT10M' and 'PT5M', this one means every one hour

In [None]:
response = L4M.create_anomaly_detector( 
    AnomalyDetectorName = project + "-detector",
    AnomalyDetectorDescription = "My Detector",
    AnomalyDetectorConfig = {
        "AnomalyDetectorFrequency" : "PT1H",
    },
)

anomaly_detector_arn = response["AnomalyDetectorArn"]
anomaly_detector_arn

## Define Metrics

### Measures and Dimensions

`Measures` are variables or key performance indicators on which customers want to detect outliers and `Dimensions` are meta-data that represent categorical information about the measures. 

In this E-commerce example, views and revenue are our measures and platform and marketplace are our dimensions. Customers may want to monitor their data for anomalies in number of views or revenue for every platform, marketplace, and combination of both. You can designate up to five measures and five dimensions per dataset.

### Metrics 


After creating a detector, and mapping your measures and dimensions, Amazon Lookout for Metrics will analyze each combination of these measures and dimensions. For the above example, you have of 7 unique values (us, jp, de, etc.) for marketplace and 3 unique values (mobile web, mobile app, pc web) for platform for a total of 21 unique combinations. Each unique combination of measures with the dimension values (e.g. us/mobile app/revenue) is a time series `metric`. In this case, you have 21 dimensions and 2 measures for a total of 42 time-series `metrics`. 

Amazon Lookout for Metrics detects anomalies at the most granular level so you are able to pin-point any unexpected behavior in your data.

### Datasets

Measures, dimensions and metrics map to `datasets`, which also contain the Amazon S3 locations of your source data, an IAM role that has both read and write permissions to those Amazon S3 locations, and the rate at which data should be ingested from the source location (the upload frequency and data ingestion delay).


Now, create a metric set for our detector that point to the backtest data in S3:

First, the cell below will create a backtesting path for S3 which is then passed to our arguments and then the API.

In [None]:
s3_path_backtest = 's3://'+ s3_bucket + '/ecommerce/backtest/'
s3_path_backtest

In [None]:
params = {
    "AnomalyDetectorArn": anomaly_detector_arn,
    "MetricSetName" : project + '-metric-set-1',
    "MetricList" : [
        {
            "MetricName" : "views",
            "AggregationFunction" : "SUM",
        },
        {
            "MetricName" : "revenue",
            "AggregationFunction" : "SUM",
        },
    ],

    "DimensionList" : [ "platform", "marketplace" ],

    "TimestampColumn" : {
        "ColumnName" : "timestamp",
        "ColumnFormat" : "yyyy-MM-dd HH:mm:ss",
    },

    #"Delay" : 120, # seconds the detector will wait before attempting to read latest data per current time and detection frequency below
    "MetricSetFrequency" : frequency,

    "MetricSource" : {
        "S3SourceConfig": {
            "RoleArn" : role_arn,
            "HistoricalDataPathList": [
                s3_path_backtest,
            ],
#            "TemplatedPathList": [
#                s3_path_format,
#            ],

            "FileFormatDescriptor" : {
                "CsvFormatDescriptor" : {
                    "FileCompression" : "NONE",
                    "Charset" : "UTF-8",
                    "ContainsHeader" : True,
                    "Delimiter" : ",",
#                    "HeaderList" : [
#                        "platform",
#                        "marketplace",
#                        "timestamp",
#                        "views",
#                        "revenue"
#                    ],
                    "QuoteSymbol" : '"'
                },
            }
        }
    },
}

params

The cell below will take those arguments and create our `MetricSet` so we are ready to activate in the next step.

In [None]:
response = L4M.create_metric_set( ** params )

metric_set_arn = response["MetricSetArn"]
metric_set_arn

## Activate the Detector and Execute Backtesting

Now that the `MetricSet` has been specified, you are ready to start backtesting, that's done by activating the back test anomaly detector. The backtesting process can take 25 minutes or so, so feel free to take a break and grab a snack and catch up on any articles you have saved. Note, when it says `BACK_TEST_ACTIVE` the service has trained a model and is now evaluating the holdout period.

In [None]:
L4M.back_test_anomaly_detector(AnomalyDetectorArn = anomaly_detector_arn)

Note, the cell will first report that it is activating, then that the backtesting job is active. This merely means that it is executing the inference process, give it more time until the cell fully completes before looking for anomalies.

In [None]:
utility.wait_anomaly_detector( L4M, anomaly_detector_arn )

## Validate results

After backtesting is finished, you can visually validate the historical anomalies via the console or inspect the results by running the commands below. It is recommended that you start your exploration in the console however. The console will be your tool for viewing and understanding alerts in the online mode later, this way you start to get familiar with the process.

In [None]:
anomaly_groups = []
next_token = None
first_response = None

while True:
    params = {
        "AnomalyDetectorArn" : anomaly_detector_arn,
        "SensitivityThreshold" : 50,
        "MaxResults" : 100,
    }
    
    if next_token:
        params["NextToken"] = next_token
    
    response = L4M.list_anomaly_group_summaries( **params )
    if first_response is None:
        first_response = response
    
    anomaly_groups += response["AnomalyGroupSummaryList"]
    
    if "NextToken" in response:
        next_token = response["NextToken"]
        continue
    break

first_response

And to dive even deeper into a specific anomaly group, simlpy choose your anomaly group of interest and drill down to it's time-series. Here you will use the first anomaly group in the list.


## Exporting Results

To do that, simply open `4.ExportingAnomalies.ipynb` and it will guide you through creating a CSV file of all the anomalies found.

## Clean up resources 

Once you have completed backtesting, you can start to cleanup the resources that were created. Before cleaning up, you can visit the "Anomalies" page of the Amazon Lookout for Metrics console, and visually check the detected anomalies.

Note this will erase all the resources that have been created, so wait to run this until you are sure you wish to delete everything.

**NOTE IF YOU DELETE THE ROLE BELOW YOU NEED TO CREATE IT AGAIN BEFORE YOU CAN BUILD THE CONTINOUS DETECTOR**

In [None]:
answer = input("Delete resources? (y/n)")
if answer=="y":
    delete_resources = True
else:
    delete_resources = False
    
if delete_resources:
    L4M.delete_anomaly_detector( AnomalyDetectorArn = anomaly_detector_arn )
    utility.wait_delete_anomaly_detector( L4M, anomaly_detector_arn )
    utility.delete_iam_role(role_name)
else:
    print("Not deteleting resources.")