# Benchmark

We use a list of public datasets to benchmark all the algorithms in StreamAD. Thanks!

1. AIOPS_KPI, [AIOps Challenge public dataset for KPI anomaly detection](https://github.com/NetManAIOps/KPI-Anomaly-Detection)
2. 

In [None]:

import pandas as pd
import numpy as np
from tqdm import tqdm
from time import perf_counter
from streamad.util import StreamGenerator, CustomDS
from streamad.model import SpotDetector
from streamad.evaluate import NumentaAwareMetircs, PointAwareMetircs, SeriesAwareMetircs
from dataset import prepare_ds, read_ds

We download the dataset, unzip it, and reconstruct its structure with **prepare_ds()** and load the dataset with **read_ds()**

By now, **ds_name** and **file_name** are represented by

```python

DS = {"AIOPS_KPI": ["preliminary_train", "finals_train", "finals_ground_truth"]}
```



In [None]:
path = './streamad-benchmark-dataset'
ds_name = 'AIOPS_KPI'
prepare_ds(ds_name=ds_name,path=path)

In [None]:
dfs = read_ds(ds_name=ds_name,ds_file="preliminary_train")

In [None]:
benchmark_items = [
    "Detector",
    "Dataset",
    "Key",
    "Size(#)",
    "Time(s)",
    "Point_Precision",
    "Point_Recall",
    "Point_Fbeta",
    "Series_Precision",
    "Series_Recall",
    "Series_Fbeta",
    "Numenta_Precision",
    "Numenta_Recall",
    "Numenta_Fbeta",
]
benchmark_df = pd.DataFrame(columns=benchmark_items)

In [None]:
models = {'spot':SpotDetector(window_len=200)}

In [None]:
scores = []
for key, (df, label) in dfs.items():

    ds = CustomDS(df, label)
    stream = StreamGenerator(ds.data)

    for model_name, model in models.items():

        start_time = perf_counter()
        for x in tqdm(stream.iter_item(), total=len(ds.data)):
            score = model.fit_score(x)
            scores.append(score)

        time = perf_counter() - start_time

        benchmark_values = [model_name, ds_name, key, len(ds.data), time]

        label = ds.label
        for metric in [
            PointAwareMetircs(),
            SeriesAwareMetircs(),
            NumentaAwareMetircs(),
        ]:
            # scores = np.nan_to_num(np.array(scores, dtype=float), nan=0)
            benchmark_values.extend(list(metric.evaluate(label, scores)))

        benchmark_df.loc[len(benchmark_df)] = benchmark_values

    break


In [None]:
benchmark_df.to_csv('./benchamark_results.csv', index=False)

## Plot the benchmark results into a table

In [2]:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
benchmark_df = pd.read_csv("./benchamark_results.csv")
# fig = go.Figure(
#     data=[
#         go.Table(
#             header=dict(values=list(benchmark_df.columns)),
#             cells=dict(
#                 values=[
#                     benchmark_df[i].round(decimals=5)
#                     if benchmark_df[i].dtype is np.dtype("float") or benchmark_df[i].dtype is np.dtype("int")
#                     else benchmark_df[i]
#                     for i in benchmark_df.columns.tolist()
#                 ],
#                 format=[""]*3 + [".3f"] * (len(benchmark_df.columns)-3),
#                 fill_color='white',
#                 line_color='lightgrey'
#             ),
#         ),
        
#     ],
# )
# fig.update_layout(margin=dict(l=0, r=0, t=0, b=0))
# fig.write_image("benchmark_results.svg")


In [3]:
benchmark_df.applymap(lambda x: round(x,3) if isinstance(x,(int,float)) else x)

Unnamed: 0,Dataset,Key,Size(#),Time(s),PointAwarePrecision,PointAwareRecall,PointAwareFbeta,SeriesAwarePrecision,SeriesAwareRecall,SeriesAwareFbeta,NumentaAwarePrecision,NumentaAwareRecall,NumentaAwareFbeta
0,dataset1,02e99bd4f6cfb33f,128562,211.26,0.571,0.001,0.002,0.9,0.002,0.004,0.571,0.002,0.004
1,dataset1,046ec29ddf80d62e,8784,6.628,1.0,0.013,0.025,1.0,0.013,0.025,1.0,0.013,0.025
2,dataset1,07927a9a18fa19ae,10960,28.547,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,dataset1,09513ae3e75778a3,128971,751.832,0.417,0.021,0.04,0.714,0.052,0.096,0.417,0.052,0.092
4,dataset1,18fbb1d5a5dc099d,129128,110.694,0.491,0.02,0.038,0.642,0.139,0.229,0.491,0.224,0.307
5,dataset1,1c35dbf57f55f5e4,128853,74.565,0.669,0.009,0.018,0.892,0.116,0.205,0.669,0.138,0.229
6,dataset1,40e25005ff8992bd,100254,48.765,1.0,0.121,0.216,1.0,0.349,0.518,1.0,0.349,0.518
7,dataset1,54e8a140f6237526,8248,13.913,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,dataset1,71595dd7171f4540,147668,155.894,1.0,0.071,0.133,1.0,0.282,0.44,1.0,0.282,0.44
9,dataset1,769894baefea4e9e,8784,2.502,1.0,0.111,0.2,1.0,0.111,0.2,1.0,0.111,0.2


Write benchmark results into a table file.

In [11]:
content = benchmark_df.to_markdown()

with open('../docs/source/benchmark.md','w') as f:
    f.write('# Benchmark \n' + content)

