# Robustness Analysis of Decision Tree Classifiers Against Adversarial Attacks Using TTTS

### Description:

##### This notebook leverages the TTTS (Tree Test Time Simulation) package to systematically assess the robustness of various Decision Tree classifiers against adversarial attacks. The analysis is conducted across an extensive collection of 50 datasets, ensuring a broad and deep understanding of classifier performance under adversarial conditions. Key components of the notebook include:

#### <ins>Dataset Preparation and Organization:</ins>

##### - Initialization of base_path to set the directory for dataset storage.
##### - Curation of dataset_list, bigger_datasets_list, and more_bigger_datasets_list to segregate datasets based on size for differentiated processing.

#### <ins>Adversarial Robustness Assessment of Decision Tree Classifiers:</ins>

##### - Implementation of a robust testing framework using 5-fold StratifiedKFold cross-validation.
##### - Configuration of various Decision Tree classifiers, each with unique probability adjustment strategies.
##### - Functionality for training classifiers, generating adversarial examples, applying defensive techniques, and evaluating multiple performance metrics.
##### - Parallel compilation of evaluations in results_df_dtattack DataFrame, offering a structured analysis of each classifier's resilience against adversarial attacks.

#### <ins>Evaluation Against Zoo Adversarial Attacks:</ins>

##### - Detailed robustness analysis of Decision Tree classifiers, inclusive of a custom MonteCarlo DT implementation, when faced with Zoo adversarial attacks.
##### - Parallel processing to efficiently train classifiers, generate adversarial samples, apply optional defenses, and compute key performance metrics.
##### -Aggregation and visualization of results, providing insights into the defensive capabilities of each classifier against Zoo attacks.

#### <ins>Evaluation Against Feature Squeezing and Gaussian Augmentation Attacks:</ins>

##### - Extensive testing of Decision Tree classifiers, including the MonteCarlo DT variant, against FeatureSqueezing and GaussianAugmentation attacks using 5-fold cross-validation.
##### - Efficient parallel execution of training, adversarial sample generation, optional defense application, and performance metric computation.
##### - Comprehensive aggregation and presentation of results, shedding light on each classifier's robustness against a range of sophisticated adversarial attacks.

In [None]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import time
import re
import warnings
import sklearn
from joblib import Parallel, delayed

from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score, f1_score, log_loss, accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer,fetch_lfw_pairs,load_digits,load_iris,load_wine
sklearn.base.BaseEstimator.n_features_ = property(lambda self: self._n_features)
from sklearn.utils import Bunch

from art.attacks.evasion import FastGradientMethod,AutoProjectedGradientDescent,ThresholdAttack
from art.attacks.evasion import ZooAttack,HopSkipJump, BoundaryAttack, DecisionTreeAttack
from art.attacks.evasion import HighConfidenceLowUncertainty, ProjectedGradientDescent

from art.estimators.classification import SklearnClassifier

from art.defences.preprocessor import FeatureSqueezing,GaussianAugmentation
from sklearn.metrics import roc_auc_score, f1_score, log_loss, accuracy_score

In [2]:
#imports the MonteCarloDecisionTreeClassifier class from our TTTS module.
from TTTS import MonteCarloDecisionTreeClassifier

## Parallel Loading and Processing of Multiple Datasets

In [3]:
# Define the base path for file storage and lists of dataset filenames
# 'base_path' represents the directory where the dataset files are stored.
# 'dataset_list' contains filenames of datasets to be processed.
# 'bigger_datasets_list' and 'more_bigger_datasets_list' are subsets of larger datasets for additional processing options.

# Base path where the files are stored
base_path = "../data/"

# List of files
dataset_list = [
    "!ar4.csv",
    "!bodyfat.csv",
    "Kaggle_Surgical-deepnet.csv",
    "MaternalBinary.csv",
    "OPENML_philippine.csv",
    "AcousticExtinguisherFire.csv",
    "acute-inflammation.csv",
    "acute-nephritis.csv",
    "backache.csv",
    "blood.csv",
    "chess-krvkp.csv",
    "cloud.csv",
    "congressional-voting.csv",
    "credit-approval.csv",
    "dresses-salesN.csv",
    "echocardiogram.csv",
    "haberman-survival.csv",
    "heart_failure_clinical_records_dataset.csv",
    "heart-hungarian.csv",
    "hill-valley.csv",
    "horse-colic.csv",
    "ilpd-indian-liver.csv",
    "no2.csv",
    "kaggle_REWEMA.csv",
    "lowbwt.csv",
    "madelon.csv",
    "Mesothelioma.csv",
    "MIMIC2.csv",
    "molec-biol-promoter.csv",
    "oil_spill.csv",
    "oocytes_merluccius_nucleus_4d.csv",
    "oocytes_trisopterus_nucleus_2f.csv",
    "ozone.csv",
    "Parkinson_Multiple_Sound_Recording.csv",
    "PC1 Software defect prediction.csv",
    "pd_speech_features.csv",
    "pima.csv",
    "Pistachio_28_Features_Dataset.csv",
    "plasma_retinol.csv",
    "primary-tumorNumeric.csv",
    "seismic-bumps.csv",
    "sleuth_case2002.csv",
    "spambase.csv",
    "spect.csv",
    "spectf.csv",
    "statlog-australian-credit.csv",
    "statlog-heart_.csv",
    "ThoraricSurgery.csv",
    "triazines.csv",
]
bigger_datasets_list = [
    "Kaggle_Surgical-deepnet.csv",
    "AcousticExtinguisherFire.csv",
    "chess-krvkp.csv",
    "kaggle_REWEMA.csv",
    "madelon.csv",
    "OPENML_philippine.csv",
    "ozone.csv",
    "Pistachio_28_Features_Dataset.csv",
    "seismic-bumps.csv",
    "spambase.csv",
]
more_bigger_datasets_list = [
    "kaggle_fraud_detection_bank_dataset.csv",
    "mushroom.csv",
    "musk.csv",
    "bank.csv",
    "sick_numeric2.csv",
]

dataset_list = dataset_list  # + bigger_datasets_list #+ more_bigger_datasets_list

# Function to load dataset
def load_dataset(file_name, path):
    try:
        data = pd.read_csv(path + file_name)
        # Use all columns except the last one as features
        X = data.iloc[:, :-1]
        # Use the last column as the target class
        y = data.iloc[:, -1]
        return (file_name, Bunch(data=X, target=y))
    except Exception as e:
        print(f"Error loading {file_name}: {e}")
        return None


# Parallel loading of datasets (using all available cores with n_jobs=-1)
datasets = Parallel(n_jobs=-1)(
    delayed(load_dataset)(file_name, base_path) for file_name in dataset_list
)

# Filter out None values in case of loading errors
datasets = [dataset for dataset in datasets if dataset is not None]

# Now 'datasets' is a list of tuples, where each tuple contains file_name and the corresponding dataset as a Bunch object.


## Adversarial Robustness Assessment of Decision Tree Classifiers

In [None]:
# This notebook cell conducts an adversarial robustness assessment of various Decision Tree classifiers using 5-fold cross-validation. The process involves:

# Initializing StratifiedKFold for equitable distribution across folds.
# Defining a suite of Decision Tree classifiers, each tailored with unique probability adjustment strategies.
# Implementing the evaluate_classifier function to train classifiers, generate adversarial examples, apply defensive techniques (if applicable),
# and evaluate performance metrics such as AUC, F1 score, Log Loss, Accuracy, and Runtime.
# Iterating over datasets and classifiers, the cell conducts cross-validation, compiles the evaluations in parallel,
# and organizes the results in a DataFrame results_df_dtattack for a structured analysis of each classifier's resilience against adversarial attacks.

# 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# List to store evaluation results
results = []

# Define classifiers
classifiers = {
    "DecisionTree_DecisionTreeAttack": DecisionTreeClassifier(random_state=123),
    "MonteCarloDecisionTree_Fix_Prob_DecisionTreeAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="fixed"
    ),
    "MonteCarloDecisionTree_Depth_Prob_DecisionTreeAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="depth"
    ),
    "MonteCarloDecisionTree_Agreement_Prob_DecisionTreeAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="agreement"
    ),
    "MonteCarloDecisionTree_Confidence_Prob_DecisionTreeAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="confidence"
    ),
    "MonteCarloDecisionTree_Distance_Prob_DecisionTreeAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="distance"
    ),
    "FeatureSqueezing_DecisionTreeAttack": DecisionTreeClassifier(random_state=123),
    #   'GaussianAugmentation_DecisionTreeAttack': DecisionTreeClassifier(random_state=123),
}


# The evaluate_classifier function trains a specified classifier, generates adversarial samples using a Decision Tree attack,
# applies defenses if necessary, and evaluates the classifier's performance on metrics like AUC, F1 score, Log Loss,
# and Accuracy, returning a summarized performance report along with the runtime.


def evaluate_classifier(dataset_name, dataset, clf_name, clf, train_index, test_index):
    X, y = dataset.data.fillna(0).values, dataset.target
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train classifier
    clf.fit(X_train, y_train)

    if clf_name != "DecisionTree_DecisionTreeAttack":
        dummy_clf = DecisionTreeClassifier()
        dummy_clf.fit(X_train, y_train)
        classifier = SklearnClassifier(model=dummy_clf, use_logits=True)
    else:
        classifier = SklearnClassifier(model=clf, use_logits=True)

    attack = DecisionTreeAttack(classifier=classifier)
    x_test_adv = attack.generate(x=X_test)

    if clf_name == "FeatureSqueezing_DecisionTreeAttack":
        # Initialize the feature squeezing defence
        defence = FeatureSqueezing(
            clip_values=(X_train.min(), X_train.max()), bit_depth=4
        )

        # Fit the defence with training data
        defence.fit(X_train)

        # Apply the defence on testing data
        x_test_adv = defence(x_test_adv)[0]

    if clf_name == "GaussianAugmentation_DecisionTreeAttack":
        # Initialize the Gaussian Augmentation defence
        defence = GaussianAugmentation(sigma=1.0)

        # Apply the Gaussian Augmentation on training data
        X_train_augmented, y_train_augmented = defence(X_train, y_train)

        # Retrain the model on the augmented data
        clf.fit(X_train_augmented, y_train_augmented)

    start_time = time.time()
    pred_probs = clf.predict_proba(x_test_adv)
    preds = clf.predict(x_test_adv)
    runtime = time.time() - start_time

    # Evaluate
    auc = roc_auc_score(y_test, pred_probs[:, 1], multi_class="ovr")
    f1 = f1_score(y_test, preds, average="macro")
    logloss = log_loss(y_test, pred_probs)
    accuracy = accuracy_score(y_test, preds)

    return [dataset_name, clf_name, auc, f1, logloss, accuracy, runtime]


# with warnings.catch_warnings():
#     warnings.simplefilter("ignore")

all_tasks = []

for dataset_name, dataset in datasets:
    print(dataset_name)
    X, y = dataset.data.fillna(0).values, dataset.target

    # Perform 5-fold cross-validation
    for clf_name, clf in classifiers.items():
        for i, (train_index, test_index) in enumerate(skf.split(X, y)):
            task = delayed(evaluate_classifier)(
                dataset_name, dataset, clf_name, clf, train_index, test_index
            )
            all_tasks.append(task)

# Execute all tasks in parallel
results = Parallel(n_jobs=-1)(all_tasks)

# Convert results to DataFrame for easy visualization
results_df_dtattack = pd.DataFrame(
    results,
    columns=["Dataset", "Classifier", "AUC", "F1", "LogLoss", "Accuracy", "Runtime"],
)


In [5]:
# This cell aggregates the performance metrics for each classifier and dataset combination from results_df_dtattack, 
# calculating the mean and standard deviation of AUC, F1 score, Log Loss, Accuracy, and Runtime.

summary_dtattack = results_df_dtattack.groupby(['Dataset','Classifier']).agg({
    'AUC': ['mean', 'std'],
    'F1': ['mean', 'std'],
    'LogLoss': ['mean', 'std'],
    'Accuracy': ['mean', 'std'],
    'Runtime': ['mean', 'std']
}).reset_index()

summary_dtattack[0:30]

Unnamed: 0_level_0,Dataset,Classifier,AUC,AUC,F1,F1,LogLoss,LogLoss,Accuracy,Accuracy,Runtime,Runtime
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std,mean,std
0,!ar4.csv,DecisionTree_DecisionTreeAttack,0.324183,0.12147,0.211553,0.092686,28.335617,3.387842,0.213853,0.093993,0.000154,4e-06
1,!ar4.csv,FeatureSqueezing_DecisionTreeAttack,0.5,0.0,0.448421,0.001441,6.740631,0.170926,0.812987,0.004742,0.000177,4.1e-05
2,!ar4.csv,MonteCarloDecisionTree_Agreement_Prob_Decision...,0.556699,0.209812,0.377033,0.243936,2.448832,3.232074,0.453247,0.294442,0.015285,0.00352
3,!ar4.csv,MonteCarloDecisionTree_Confidence_Prob_Decisio...,0.451797,0.218409,0.369818,0.179426,20.249184,8.829845,0.412121,0.214655,0.041057,0.010039
4,!ar4.csv,MonteCarloDecisionTree_Depth_Prob_DecisionTree...,0.552451,0.193971,0.414267,0.238587,4.678998,4.56175,0.506926,0.300376,0.003804,0.000314
5,!ar4.csv,MonteCarloDecisionTree_Distance_Prob_DecisionT...,0.442647,0.120221,0.255833,0.214449,7.952718,4.647213,0.281385,0.256724,0.01127,0.001476
6,!ar4.csv,MonteCarloDecisionTree_Fix_Prob_DecisionTreeAt...,0.394363,0.200162,0.389719,0.232867,5.22031,1.848801,0.44632,0.283122,0.006109,0.002429
7,!bodyfat.csv,DecisionTree_DecisionTreeAttack,0.004,0.008944,0.003846,0.0086,35.902306,0.316063,0.003922,0.008769,0.000149,3e-06
8,!bodyfat.csv,FeatureSqueezing_DecisionTreeAttack,0.5,0.0,0.336819,0.003663,17.736304,0.300742,0.507922,0.008344,0.000144,5e-06
9,!bodyfat.csv,MonteCarloDecisionTree_Agreement_Prob_Decision...,0.465662,0.094903,0.003846,0.0086,0.9035,0.341616,0.003922,0.008769,0.015009,0.003162


## Performance Evaluation of DT Classifiers and Monte Carlo DT Classifier under Zoo Attacks

In [None]:
# This cell evaluates the robustness of Decision Tree classifiers, including our MonteCarlo DT implementation, against Zoo adversarial attacks using 5-fold cross-validation.
# It trains classifiers, generates adversarial samples, optionally applies defenses, and measures performance metrics.
# The evaluation is executed in parallel for efficiency, and results are aggregated and displayed, providing insights into each classifier's defense against Zoo attacks.

from sklearn.model_selection import StratifiedKFold
from art.defences.preprocessor import FeatureSqueezing, GaussianAugmentation

# Assuming that MonteCarloRandomForestClassifier is imported or defined elsewhere

# 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# List to store evaluation results
results = []

# Define classifiers
classifiers = {
    "DecisionTree_ZooAttack": DecisionTreeClassifier(random_state=123),
    "MonteCarloDecisionTree_Fix_Prob_ZooAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="fixed"
    ),
    "MonteCarloDecisionTree_Depth_Prob_ZooAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="depth"
    ),
    "MonteCarloDecisionTree_Certainty_Prob_ZooAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="certainty"
    ),
    "MonteCarloDecisionTree_Confidence_Prob_ZooAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="confidence"
    ),
    "MonteCarloDecisionTree_Distance_Prob_ZooAttack": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="distance"
    ),
    "FeatureSqueezing_ZooAttack": DecisionTreeClassifier(random_state=123),
    #   'GaussianAugmentation_ZooAttack': DecisionTreeClassifier(random_state=123),
}


def evaluate_classifier(dataset_name, dataset, clf_name, clf, train_index, test_index):
    X, y = dataset.data.fillna(0).values, dataset.target
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train classifier
    clf.fit(X_train, y_train)

    if clf_name != "DecisionTree_ZooAttack":
        dummy_clf = DecisionTreeClassifier(random_state=123)
        dummy_clf.fit(X_train, y_train)
        classifier = SklearnClassifier(model=dummy_clf, use_logits=True)
    else:
        classifier = SklearnClassifier(model=clf, use_logits=True)

    attack = ZooAttack(
        classifier=classifier,
        confidence=0.0,
        targeted=False,
        learning_rate=1e-1,
        max_iter=5,
        binary_search_steps=5,
        initial_const=1e-3,
        abort_early=True,
        use_resize=False,
        use_importance=False,
        nb_parallel=2,
        batch_size=1,
        variable_h=0.2,
    )
    x_test_adv = attack.generate(x=X_test)

    if clf_name == "FeatureSqueezing_ZooAttack":
        # Initialize the feature squeezing defence
        defence = FeatureSqueezing(
            clip_values=(X_train.min(), X_train.max()), bit_depth=4
        )

        # Fit the defence with training data
        defence.fit(X_train)

        # Apply the defence on testing data
        x_test_adv = defence(x_test_adv)[0]

    if clf_name == "GaussianAugmentation_ZooAttack":
        # Initialize the Gaussian Augmentation defence
        defence = GaussianAugmentation(sigma=1.0)

        # Apply the Gaussian Augmentation on training data
        X_train_augmented, y_train_augmented = defence(X_train, y_train)

        # Retrain the model on the augmented data
        clf.fit(X_train_augmented, y_train_augmented)

    start_time = time.time()
    pred_probs = clf.predict_proba(x_test_adv)
    preds = clf.predict(x_test_adv)
    runtime = time.time() - start_time

    # Evaluate
    auc = roc_auc_score(y_test, pred_probs[:, 1], multi_class="ovr")
    f1 = f1_score(y_test, preds, average="macro")
    logloss = log_loss(y_test, pred_probs)
    accuracy = accuracy_score(y_test, preds)

    return [dataset_name, clf_name, auc, f1, logloss, accuracy, runtime]


# Loop through datasets
all_tasks = []

for dataset_name, dataset in datasets:
    print(dataset_name)
    X, y = dataset.data.fillna(0).values, dataset.target
    try:
        # Perform 5-fold cross-validation
        for clf_name, clf in classifiers.items():
            for i, (train_index, test_index) in enumerate(skf.split(X, y)):
                task = delayed(evaluate_classifier)(
                    dataset_name, dataset, clf_name, clf, train_index, test_index
                )
                all_tasks.append(task)
    except Exception as e:
        print(f"Error in {clf_name} on {dataset_name}: {e}")

# Execute all tasks in parallel
try:
    results = Parallel(n_jobs=-1)(all_tasks)
except Exception as e:
    print(f"Error in {clf_name} on {dataset_name}: {e}")

# Convert results to DataFrame for easy visualization
results_df_ZooAttack = pd.DataFrame(
    results,
    columns=["Dataset", "Classifier", "AUC", "F1", "LogLoss", "Accuracy", "Runtime"],
)


In [16]:
summary_ZooAttack = results_df_ZooAttack.groupby(['Dataset','Classifier']).agg({
    'AUC': ['mean', 'std'],
    'F1': ['mean', 'std'],
    'LogLoss': ['mean', 'std'],
    'Accuracy': ['mean', 'std'],
    'Runtime': ['mean', 'std']
}).reset_index()

print(summary_ZooAttack[0:30])

                         Dataset  \
                                   
0                       !ar4.csv   
1                       !ar4.csv   
2                       !ar4.csv   
3                       !ar4.csv   
4                       !ar4.csv   
5                       !ar4.csv   
6                       !ar4.csv   
7                   !bodyfat.csv   
8                   !bodyfat.csv   
9                   !bodyfat.csv   
10                  !bodyfat.csv   
11                  !bodyfat.csv   
12                  !bodyfat.csv   
13                  !bodyfat.csv   
14  AcousticExtinguisherFire.csv   
15  AcousticExtinguisherFire.csv   
16  AcousticExtinguisherFire.csv   
17  AcousticExtinguisherFire.csv   
18  AcousticExtinguisherFire.csv   
19  AcousticExtinguisherFire.csv   
20  AcousticExtinguisherFire.csv   
21   Kaggle_Surgical-deepnet.csv   
22   Kaggle_Surgical-deepnet.csv   
23   Kaggle_Surgical-deepnet.csv   
24   Kaggle_Surgical-deepnet.csv   
25   Kaggle_Surgical-deepnet

In [9]:
## Performance Evaluation of DT Classifiers and Monte Carlo DT Classifier under FeatureSqueezing and GaussianAugmentation Attacks

In [11]:
# This cell evaluates the robustness of Decision Tree classifiers, including our MonteCarlo DT implementation, against FeatureSqueezing and GaussianAugmentation attacks using 5-fold
# cross-validation. It trains classifiers, generates adversarial samples, optionally applies defenses, and measures performance metrics.
# The evaluation is executed in parallel for efficiency, and results are aggregated and displayed, providing insights into each classifier's defense against Zoo attacks.

from sklearn.model_selection import StratifiedKFold
from art.defences.preprocessor import FeatureSqueezing, GaussianAugmentation

# Assuming that MonteCarloRandomForestClassifier is imported or defined elsewhere

# 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# List to store evaluation results
results = []

# Define classifiers
classifiers = {
    "DecisionTree": DecisionTreeClassifier(random_state=123),
    "MonteCarloDecisionTree_Fix_Prob": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="fixed"
    ),
    "MonteCarloDecisionTree_Depth_Prob": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="depth"
    ),
    "MonteCarloDecisionTree_Certainty_Prob": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="certainty"
    ),
    "MonteCarloDecisionTree_Confidence": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="confidence"
    ),
    "MonteCarloDecisionTree_Distance_Prob": MonteCarloDecisionTreeClassifier(
        random_state=123, prob_type="distance"
    ),
    "FeatureSqueezing": DecisionTreeClassifier(random_state=123),
    #  'GaussianAugmentation': DecisionTreeClassifier(random_state=123),
}


def evaluate_classifier(dataset_name, dataset, clf_name, clf, train_index, test_index):
    X, y = dataset.data.fillna(0).values, dataset.target
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train classifier
    clf.fit(X_train, y_train)

    if clf_name != "DecisionTree":
        dummy_clf = DecisionTreeClassifier(random_state=123)
        dummy_clf.fit(X_train, y_train)
        classifier = SklearnClassifier(model=dummy_clf, use_logits=True)
    else:
        classifier = SklearnClassifier(model=clf, use_logits=True)

    x_test_adv = X_test

    if clf_name == "FeatureSqueezing":
        # Initialize the feature squeezing defence
        defence = FeatureSqueezing(
            clip_values=(X_train.min(), X_train.max()), bit_depth=4
        )

        # Fit the defence with training data
        defence.fit(X_train)

        # Apply the defence on testing data
        x_test_adv = defence(x_test_adv)[0]

    if clf_name == "GaussianAugmentation":
        # Initialize the Gaussian Augmentation defence
        defence = GaussianAugmentation(sigma=1.0)

        # Apply the Gaussian Augmentation on training data
        X_train_augmented, y_train_augmented = defence(X_train, y_train.to_numpy())

        # Retrain the model on the augmented data
        clf.fit(X_train_augmented, y_train_augmented.to_numpy())

    start_time = time.time()
    pred_probs = clf.predict_proba(x_test_adv)
    preds = clf.predict(x_test_adv)
    runtime = time.time() - start_time

    # Evaluate
    auc = roc_auc_score(y_test, pred_probs[:, 1], multi_class="ovr")
    f1 = f1_score(y_test, preds, average="macro")
    logloss = log_loss(y_test, pred_probs)
    accuracy = accuracy_score(y_test, preds)

    return [dataset_name, clf_name, auc, f1, logloss, accuracy, runtime]


# Loop through datasets
all_tasks = []

for dataset_name, dataset in datasets:
    print(dataset_name)
    X, y = dataset.data.fillna(0).values, dataset.target

    # Perform 5-fold cross-validation
    for clf_name, clf in classifiers.items():
        for i, (train_index, test_index) in enumerate(skf.split(X, y)):
            task = delayed(evaluate_classifier)(
                dataset_name, dataset, clf_name, clf, train_index, test_index
            )
            all_tasks.append(task)

# Execute all tasks in parallel
results = Parallel(n_jobs=-1)(all_tasks)

# Convert results to DataFrame for easy visualization
results_df_wo = pd.DataFrame(
    results,
    columns=["Dataset", "Classifier", "AUC", "F1", "LogLoss", "Accuracy", "Runtime"],
)


!ar4.csv
!bodyfat.csv
Kaggle_Surgical-deepnet.csv
MaternalBinary.csv
OPENML_philippine.csv
AcousticExtinguisherFire.csv
acute-inflammation.csv
acute-nephritis.csv
backache.csv
blood.csv
chess-krvkp.csv
cloud.csv
congressional-voting.csv
credit-approval.csv
dresses-salesN.csv
echocardiogram.csv
haberman-survival.csv
heart_failure_clinical_records_dataset.csv
heart-hungarian.csv
hill-valley.csv
horse-colic.csv
ilpd-indian-liver.csv
no2.csv
kaggle_REWEMA.csv
lowbwt.csv
madelon.csv
Mesothelioma.csv
MIMIC2.csv
molec-biol-promoter.csv
oil_spill.csv
oocytes_merluccius_nucleus_4d.csv
oocytes_trisopterus_nucleus_2f.csv
ozone.csv
Parkinson_Multiple_Sound_Recording.csv
PC1 Software defect prediction.csv
pd_speech_features.csv
pima.csv
Pistachio_28_Features_Dataset.csv
plasma_retinol.csv
primary-tumorNumeric.csv
seismic-bumps.csv
sleuth_case2002.csv
spambase.csv
spect.csv
spectf.csv
statlog-australian-credit.csv
statlog-heart_.csv
ThoraricSurgery.csv
triazines.csv


In [12]:
summary_wo = results_df_wo.groupby(['Dataset','Classifier']).agg({
    'AUC': ['mean', 'std'],
    'F1': ['mean', 'std'],
    'LogLoss': ['mean', 'std'],
    'Accuracy': ['mean', 'std'],
    'Runtime': ['mean', 'std']
}).reset_index()

print(summary_wo[0:30])

                         Dataset                             Classifier  \
                                                                          
0                       !ar4.csv                           DecisionTree   
1                       !ar4.csv                       FeatureSqueezing   
2                       !ar4.csv  MonteCarloDecisionTree_Certainty_Prob   
3                       !ar4.csv      MonteCarloDecisionTree_Confidence   
4                       !ar4.csv      MonteCarloDecisionTree_Depth_Prob   
5                       !ar4.csv   MonteCarloDecisionTree_Distance_Prob   
6                       !ar4.csv        MonteCarloDecisionTree_Fix_Prob   
7                   !bodyfat.csv                           DecisionTree   
8                   !bodyfat.csv                       FeatureSqueezing   
9                   !bodyfat.csv  MonteCarloDecisionTree_Certainty_Prob   
10                  !bodyfat.csv      MonteCarloDecisionTree_Confidence   
11                  !body

In [14]:
pd.concat([summary_wo, summary_dtattack, summary_ZooAttack]).sort_values(by='Dataset').to_csv('50ds_summary.csv',index=False)

##### The End