Tutorial for MMD with the TorchDrift library: https://towardsai.net/p/machine-learning/drift-detection-using-torchdrift-for-tabular-and-time-series-data

more documentation on TorchDrift MMD: https://torchdrift.org/notebooks/note_on_mmd.html

In [1]:
RED = "\033[91m"
AUTO = "\033[0m"

In [2]:
import pandas as pd
import torch
import numpy as np
import os
import torchdrift.detectors as detectors
from joblib import Parallel, delayed

In [3]:
baseline_path = os.path.join(os.getcwd(), '..','20 bin PPO 500 results/baseline_obs.csv')
baseline_path = os.path.normpath(baseline_path) #resolve '..'
df_baseline_obs = pd.read_csv(baseline_path, index_col=0, dtype='float32')
df_baseline_obs.set_index(df_baseline_obs.index.astype(int), inplace=True) #line above makes the index a float32

##### On the (Statistical) Detection of Adversarial Examples

**Two-sample hypothesis testing** — As stated before, the test we chose is appropriate to handle high dimensional inputs and small sample sizes. We compute the biased estimate of MMD using a **Gaussian kernel**, and then apply **10 000 bootstrapping iterations** to estimate the distributions. Based on this, we compute the **pvalue** and compare it to the threshold, in our experiments **0.05**. For samples of **legitimate data, the observed p-value should always be very high**, whereas for sample sets containing adversarial examples, we expect it to be low—since they are sampled from a different distribution and thus the hypothesis should be rejected. The test is more likely to detect a difference in two distributions when it considers samples of large size (i.e., the sample contains more inputs from the distribution).

In [4]:
BOOTSTRAP = 10_000
PVAL = 0.05
kernel = detectors.mmd.GaussianKernel()

Load unperturbed observations from untargeted adversarial attack

In [5]:
df_adv_obs = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/adv_obs.csv')), #navigate to another folder in parent dir
                        index_col=0,
                        dtype='float32')
df_adv_obs.set_index(df_adv_obs.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

Load perturbed observations from untargeted adversarial attack (100% adversarial)

In [6]:
df_adv_perturbed_obs = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/adv_perturbed_obs.csv')), #navigate to another folder in parent dir
                        index_col=0,
                        dtype='float32')
df_adv_perturbed_obs.set_index(df_adv_perturbed_obs.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

Here we'll get the MMD between two full distrubtions during evaluation, the observations from the environment and the same observations once perturbed by ACG

In [9]:
result = detectors.kernel_mmd(torch.from_numpy(df_adv_obs.values).to('cuda'), #clean obs from adv trace
                                  torch.from_numpy(df_adv_perturbed_obs.values).to('cuda'), #perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]}, p-value:{result[1]}')

mmd:0.00014674663543701172, p-value:1.0


MMD sees no difference between the perturbed and unperturbed distributions! The MMD is smaller between these two distributaions than between segments of the baseline ditribution would it be different if the min/max normalization is undone?

In [7]:
result = detectors.kernel_mmd(torch.from_numpy(df_baseline_obs.values).to('cuda'), #clean obs from clean trace
                                  torch.from_numpy(df_adv_perturbed_obs.values).to('cuda'),#perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]:.4f}, p-value:{result[1]}')

mmd:0.0002, p-value:0.8951999545097351


In [9]:
df_bb_obs = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/bb results/bb obs.csv')), #navigate to another folder in parent dir
                        index_col=0,
                        dtype='float32')
df_bb_obs.set_index(df_bb_obs.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

Load perturbed observations from untargeted adversarial attack (100% adversarial)

In [10]:
df_bb_perturbed_obs = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/bb results/clean obs.csv')), #navigate to another folder in parent dir
                        index_col=0,
                        dtype='float32')
df_bb_perturbed_obs.set_index(df_bb_perturbed_obs.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

Strange that bb is detected when ACG is not, given that the bb norm is smaller:

In [11]:
result = detectors.kernel_mmd(torch.from_numpy(df_baseline_obs.values).to('cuda'), #clean obs from clean trace
                                  torch.from_numpy(df_bb_perturbed_obs.values).to('cuda'),#perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]:.4f}, p-value:{result[1]}')

mmd:0.0006, p-value:0.0


In [12]:
result = detectors.kernel_mmd(torch.from_numpy(df_bb_obs.values).to('cuda'), #clean obs from clean trace
                                  torch.from_numpy(df_bb_perturbed_obs.values).to('cuda'),#perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]:.4f}, p-value:{result[1]}')

mmd:0.0006, p-value:0.0


In [16]:
df_rebaseline = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/rebaseline obs.csv')), #navigate to another folder in parent dir
                        #index_col=0,
                        dtype='float32')
df_rebaseline.set_index(df_rebaseline.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

In [17]:
result = detectors.kernel_mmd(torch.from_numpy(df_rebaseline.values).to('cuda'), #clean obs from clean trace
                                  torch.from_numpy(df_bb_perturbed_obs.values).to('cuda'),#perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]:.4f}, p-value:{result[1]}')

mmd:0.0006, p-value:0.0


Since the previous action is part of the state, we can see if removing the electrical_storage_soc feature changes this result

In [20]:
df_baseline_obs.columns.get_loc('electrical_storage_soc')

25

In [27]:
result = detectors.kernel_mmd(torch.from_numpy(df_baseline_obs.drop(columns='electrical_storage_soc').values).to('cuda'), #clean obs from clean trace
                                  torch.from_numpy(df_bb_perturbed_obs.drop(columns='25').values).to('cuda'),#perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]:.4f}, p-value:{result[1]}')

mmd:0.0001, p-value:1.0


Without accounting for actions, these appear to be drawn from the same ditribution. So it seems that the actions taken are what separates the two and MMD is not detecting the actual adversarial perturbations