# Benchmark

We use a list of public datasets to benchmark all the algorithms in StreamAD. Thanks!

1. AIOPS_KPI, [AIOps Challenge public dataset for KPI anomaly detection](https://github.com/NetManAIOps/KPI-Anomaly-Detection)
2. 

In [None]:

import pandas as pd
import numpy as np
from tqdm import tqdm
from time import perf_counter
from streamad.util import StreamGenerator, CustomDS
from streamad.model import SpotDetector
from streamad.evaluate import NumentaAwareMetircs, PointAwareMetircs, SeriesAwareMetircs
from dataset import prepare_ds, read_ds

We download the dataset, unzip it, and reconstruct its structure with **prepare_ds()** and load the dataset with **read_ds()**

By now, **ds_name** and **file_name** are represented by

```python

DS = {"AIOPS_KPI": ["preliminary_train", "finals_train", "finals_ground_truth"]}
```



In [None]:
path = './streamad-benchmark-dataset'
ds_name = 'AIOPS_KPI'
prepare_ds(ds_name=ds_name,path=path)

In [None]:
dfs = read_ds(ds_name=ds_name,ds_file="preliminary_train")

In [None]:
benchmark_items = [
    "Dataset",
    "Key",
    "Size(#)",
    "Time(s)",
    "Point_Precision",
    "Point_Recall",
    "Point_Fbeta",
    "Series_Precision",
    "Series_Recall",
    "Series_Fbeta",
    "Numenta_Precision",
    "Numenta_Recall",
    "Numenta_Fbeta",
]
benchmark_df = pd.DataFrame(columns=benchmark_items)

In [None]:
scores = []
for key, (df, label) in dfs.items():

    ds = CustomDS(df, label)
    stream = StreamGenerator(ds.data)
    model = SpotDetector(window_len=200)

    start_time = perf_counter()
    for x in tqdm(stream.iter_item(), total=len(ds.data)):
        score = model.fit_score(x)
        scores.append(score)

    time = perf_counter() - start_time

    benchmark_values = [ds_name, key, len(ds.data), time]

    label = ds.label
    for metric in [
        PointAwareMetircs(),
        SeriesAwareMetircs(),
        NumentaAwareMetircs(),
    ]:

        # scores = np.nan_to_num(np.array(scores, dtype=float), nan=0)
        benchmark_values.extend(list(metric.evaluate(label, scores)))

    benchmark_df.loc[len(benchmark_df)] = benchmark_values

    break


In [None]:
benchmark_df.to_csv('./benchamark_results.csv', index=False)