Tutorial for MMD with the TorchDrift library: https://towardsai.net/p/machine-learning/drift-detection-using-torchdrift-for-tabular-and-time-series-data

more documentation on TorchDrift MMD: https://torchdrift.org/notebooks/note_on_mmd.html

In [11]:
import pandas as pd
import torch
import numpy as np
import os
import torchdrift.detectors as detectors

##### On the (Statistical) Detection of Adversarial Examples

**Two-sample hypothesis testing** — As stated before, the test we chose is appropriate to handle high dimensional inputs and small sample sizes. We compute the biased estimate of MMD using a **Gaussian kernel**, and then apply **10 000 bootstrapping iterations** to estimate the distributions. Based on this, we compute the **pvalue** and compare it to the threshold, in our experiments **0.05**. For samples of **legitimate data, the observed p-value should always be very high**, whereas for sample sets containing adversarial examples, we expect it to be low—since they are sampled from a different distribution and thus the hypothesis should be rejected. The test is more likely to detect a difference in two distributions when it considers samples of large size (i.e., the sample contains more inputs from the distribution).

In [12]:
BOOTSTRAP = 10_000
PVAL = 0.05
kernel = detectors.mmd.GaussianKernel()

Because our dataset is a time series, we will use MMD on different time segments rather than shuffling the dataset

Load unperturbed observations from untargeted adversarial attack

In [13]:
df_adv_obs = pd.read_csv('run 0 obs 1.csv', 
                        header=None,
                        dtype='float32')
df_adv_obs.set_index(df_adv_obs.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

Load perturbed observations from untargeted adversarial attack (100% adversarial)

In [14]:
df_adv_perturbed_obs = pd.read_csv('run 0 adv_obs 1.csv', 
                        header=None,
                        dtype='float32')
df_adv_perturbed_obs.set_index(df_adv_perturbed_obs.index.astype(int), inplace=True) 

Here we'll get the MMD between two full distrubtions during evaluation, the observations from the environment and the same observations once perturbed by BB

In [15]:
result = detectors.kernel_mmd(torch.from_numpy(df_adv_obs.values).to('cuda'), #clean obs from adv trace
                                  torch.from_numpy(df_adv_perturbed_obs.dropna().values).to('cuda'), #perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]}, p-value:{result[1]}')

mmd:0.0012357234954833984, p-value:0.0
