# Run ADBench 
- Here we provide a demo for testing AD algorithms on the datasets proposed in ADBench.
- Feel free to evaluate any customized algorithm in ADBench.
- For reproducing the complete experiment results in ADBench, please run the code in the run.py file.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# import basic package
import os
import pandas as pd

import warnings

warnings.filterwarnings("ignore")

# import the necessary package
from utils.data_generator import DataGenerator
from utils.myutils import Utils

datagenerator = DataGenerator()  # data generator
utils = Utils()  # utils function

2023-11-03 16:21:56.718803: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-03 16:21:56.757045: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


- We include all the datasets of ADBench in the "datasets" folder, as the "number_data_class.npz" filename. Please see the table in the markdown for details.
    - You can specify the dataset name by removing the filename ".npz" suffix in the data generator, e.g., "10_cover.npz" as "10_cover". 
    
    
- All the algorithms included in the ADBench are illustrated in the table of markdown.
    - You need to specify the model name when initialization, as some algorithms (e.g., supervised algorithms) are integrated in one class, please see the table in the markdown for details.
    - You can also test your own AD algorithms on our generated dataset, as long as the algorithm can output anomaly score for evaluation.

In [3]:
os.listdir("datasets/Classical")

['01_ALOI.npz',
 '02_annthyroid.npz',
 '03_backdoor.npz',
 '04_breastw.npz',
 '05_campaign.npz',
 '06_cardio.npz',
 '07_Cardiotocography.npz',
 '08_celeba.npz',
 '09_census.npz',
 '99_circles.npz',
 '99_clusters.npz',
 '99_linear.npz',
 '99_moons.npz']

In [4]:
from baseline.PyOD import PYOD
from aeb_gplvm import *

# dataset and model list / dict
dataset_list = [
    "01_ALOI",
    "02_annthyroid",
    "03_backdoor",
    "04_breastw",
    "05_campaign",
    "06_cardio",
    "07_Cardiotocography",
    "08_celeba",
    "09_census",
]
model_dict = {
    "IForest": PYOD,
    "PCA": PYOD,
    "GPLVM": "",
    "AEB_GPLVM": ""
}

# save the results
df_AUCROC = pd.DataFrame(data=None, index=dataset_list, columns=model_dict.keys())
df_AUCPR = pd.DataFrame(data=None, index=dataset_list, columns=model_dict.keys())

In [10]:
# seed for reproducible results
seed = 42

for dataset in dataset_list:
    """
    la: ratio of labeled anomalies, from 0.0 to 1.0
    realistic_synthetic_mode: types of synthetic anomalies, can be local, global, dependency or cluster
    noise_type: inject data noises for testing model robustness, can be duplicated_anomalies, irrelevant_features or label_contamination
    """

    # import the dataset
    datagenerator.dataset = dataset  # specify the dataset name
    data = datagenerator.generator(
        la=0.1, realistic_synthetic_mode=None, noise_type=None
    )  # only 10% labeled anomalies are available

    for name, clf in model_dict.items():
        # model initialization
        clf = clf(seed=seed, model_name=name)

        # training, for unsupervised models the y label will be discarded
        clf = clf.fit(X_train=data["X_train"], y_train=data["y_train"])

        # output predicted anomaly score on testing set
        score = clf.predict_score(data["X_test"])

        # evaluation
        result = utils.metric(y_true=data["y_test"], y_score=score)

        # save results
        df_AUCROC.loc[dataset, name] = result["aucroc"]
        df_AUCPR.loc[dataset, name] = result["aucpr"]

subsampling for dataset 01_ALOI...
current noise type: None
{'Samples': 10000, 'Features': 27, 'Anomalies': 302, 'Anomalies Ratio(%)': 3.02}
best param: None
best param: None
  1/219 [..............................] - ETA: 18s

2023-11-03 16:05:34.402561: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 27)]              0         
                                                                 
 dense_1 (Dense)             (None, 64)                1728      
                                                                 
 net_output (Dense)          (None, 32)                2048      
                                                                 
 tf.math.subtract_1 (TFOpLam  (None, 32)               0         
 bda)                                                            
                                                                 
 tf.math.pow_1 (TFOpLambda)  (None, 32)                0         
                                                                 
 tf.math.reduce_sum_1 (TFOpL  (None,)                  0         
 ambda)                                                    

In [11]:
df_AUCROC

Unnamed: 0,IForest,DeepSVDD
01_ALOI,0.496581,0.47864
02_annthyroid,0.826387,0.75085
03_backdoor,0.737342,0.58035
04_breastw,0.979938,0.484761
05_campaign,0.696507,0.571301
06_cardio,0.944193,0.680346
07_Cardiotocography,0.708773,0.484776
08_celeba,0.736814,0.38509
09_census,0.620219,0.511897


In [12]:
df_AUCPR

Unnamed: 0,IForest,DeepSVDD
01_ALOI,0.031745,0.031912
02_annthyroid,0.353499,0.205219
03_backdoor,0.050942,0.366501
04_breastw,0.954298,0.382458
05_campaign,0.249056,0.198592
06_cardio,0.615718,0.312177
07_Cardiotocography,0.478403,0.299519
08_celeba,0.094863,0.050561
09_census,0.075204,0.069327
