# Exporting Anomalies

No matter if you have chosen a backtesting project or a continous Detector, you may want to export the findings from the service into either a Pandas dataframe or a simple CSV file for usage in other tools or systems. This notebook will walk you through the process of connecting to a Detector, querying for anomalies, and writing them into a format for usage later. Simply run the cells below after updating your `Detector_ARN` value with the corresponding one from your account and you're all set.

## Imports and Setup

In [None]:
import os
import json
import shutil
import zipfile
import pathlib
import pandas as pd
import boto3
import utility
import synth_data
import datetime

In [None]:
lookoutmetrics_client = boto3.client( "lookoutmetrics")

In [None]:
Detector_ARN = ""

Once you ahve specified your Detector's ARN the next task is to inspect it and learn about the frequency it uses, this is critical later to ensuring we build the exported file correctly. The cell below does just that.

In [None]:
response = lookoutmetrics_client.describe_anomaly_detector(AnomalyDetectorArn=Detector_ARN)

anomaly_detector_frequency = response["AnomalyDetectorConfig"]["AnomalyDetectorFrequency"]
if anomaly_detector_frequency=="PT5M":
    frequency = "5Min"
    frequency_timedelta = datetime.timedelta(minutes=5)
elif anomaly_detector_frequency=="PT10M":
    frequency = "10Min"
    frequency_timedelta = datetime.timedelta(minutes=10)
elif anomaly_detector_frequency=="PT1H":
    frequency = "1H"
    frequency_timedelta = datetime.timedelta(hours=1)
elif anomaly_detector_frequency=="P1D":
    frequency = "1D"
    frequency_timedelta = datetime.timedelta(days=1)
else:
    assert False, "unknown frequency" + anomaly_detector_frequency


## Fetching Anomalies 

Next up we need to loop over all the anomaly groups that have been collected and build them into a list to parse later.

In [None]:
anomaly_groups = []
next_token = None

while True:    
    params = {
        "AnomalyDetectorArn" : Detector_ARN,
        "SensitivityThreshold" : 0,
        "MaxResults" : 100,
    }

    if next_token:
        params["NextToken"] = next_token

    response = lookoutmetrics_client.list_anomaly_group_summaries( **params )

    anomaly_groups += response["AnomalyGroupSummaryList"]

    if "NextToken" in response:
        next_token = response["NextToken"]
        continue

    break

## Building the Dataframe

At last the time has come to iterate over the results and build a dataframe to house them. Simply run the cell below to get a nice organized collection of your anomalies. Note this cell can take a minute or so to run.

In [None]:
df_list = []
dimension_names = set()

for anomaly_group in anomaly_groups:

    def datetime_from_string(s):
        try:
            dt = datetime.datetime.fromisoformat(s.split("[")[0])
        except ValueError:
            dt = datetime.datetime.strptime(s.split("[")[0], "%Y-%m-%dT%H:%MZ")

        return dt

    start_time = datetime_from_string( anomaly_group["StartTime"] )
    end_time = datetime_from_string( anomaly_group["EndTime"] )
    anomaly_group_id = anomaly_group["AnomalyGroupId"]
    anomaly_group_score = anomaly_group["AnomalyGroupScore"]
    primary_metric_name = anomaly_group["PrimaryMetricName"]

    time_series_list = []
    next_token = None

    while True:    

        params = {
            "AnomalyDetectorArn" : Detector_ARN,
            "AnomalyGroupId" : anomaly_group_id,
            "MetricName" : primary_metric_name,
            "MaxResults" : 100,
        }

        if next_token:
            params["NextToken"] = next_token

        response = lookoutmetrics_client.list_anomaly_group_time_series( **params )

        time_series_list += response["TimeSeriesList"]

        if "NextToken" in response:
            next_token = response["NextToken"]
            continue

        break

    for time_series in time_series_list:
        data = {}

        for dimension in time_series["DimensionList"]:
            data[ dimension["DimensionName"]] = [ dimension["DimensionValue"]]
            dimension_names.add(dimension["DimensionName"])
        data[primary_metric_name + "_group_score"] = [anomaly_group_score]

        t = start_time
        while t<=end_time:
            data[ "timestamp" ] = [ t ]
            df_part = pd.DataFrame(data)
            df_list.append(df_part)
            t += frequency_timedelta

df = pd.concat(df_list)

# fold multiple metrics into same rows
df = df.groupby(["timestamp", *dimension_names], as_index=False).max()

The cell below will render the first few rows of your anomalies:

In [None]:
df.head()

## Exporting the Results

The very last step below will create a CSV file for you to use later, once the file has been created you can right click and download the file out of JupyterLab using the file browser on the right.

In [None]:
filename = Detector_ARN.split(':')[-1] + "_anomalies.csv"
df.to_csv(filename, index=False )