Tutorial for MMD with the TorchDrift library: https://towardsai.net/p/machine-learning/drift-detection-using-torchdrift-for-tabular-and-time-series-data

more documentation on TorchDrift MMD: https://torchdrift.org/notebooks/note_on_mmd.html

In [1]:
RED = "\033[91m"
AUTO = "\033[0m"

In [41]:
import pandas as pd
import torch
import numpy as np
import os
import torchdrift.detectors as detectors
from joblib import Parallel, delayed

In [3]:
baseline_path = os.path.join(os.getcwd(), '..','20 bin PPO 500 results/baseline_obs.csv')
baseline_path = os.path.normpath(baseline_path) #resolve '..'
df_baseline_obs = pd.read_csv(baseline_path, index_col=0, dtype='float32')
df_baseline_obs.set_index(df_baseline_obs.index.astype(int), inplace=True) #line above makes the index a float32

##### On the (Statistical) Detection of Adversarial Examples

**Two-sample hypothesis testing** — As stated before, the test we chose is appropriate to handle high dimensional inputs and small sample sizes. We compute the biased estimate of MMD using a **Gaussian kernel**, and then apply **10 000 bootstrapping iterations** to estimate the distributions. Based on this, we compute the **pvalue** and compare it to the threshold, in our experiments **0.05**. For samples of **legitimate data, the observed p-value should always be very high**, whereas for sample sets containing adversarial examples, we expect it to be low—since they are sampled from a different distribution and thus the hypothesis should be rejected. The test is more likely to detect a difference in two distributions when it considers samples of large size (i.e., the sample contains more inputs from the distribution).

In [4]:
BOOTSTRAP = 10_000
PVAL = 0.05
kernel = detectors.mmd.GaussianKernel()

Because our dataset is a time series, we will use MMD on different time segments rather than shuffling the dataset

In [5]:
results = [] #tuple containing the mmd and pval
segments = 10
samples = np.array_split(df_baseline_obs, segments)
for i in range(len(samples)-1):
    result = detectors.kernel_mmd(torch.from_numpy(samples[i].values).to('cuda'), 
                                  torch.from_numpy(samples[1+1].values).to('cuda'), #I wrote 1+1 instead of i+1 LMAO, good thing I'm redoing this in baseline MMDs and it didn't make it to the thesis
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
    print(f'mmd:{result[0]:.3f}, p-value:{result[1]}')
    results.append(result)

mmd:0.073, p-value:0.0
mmd:0.030, p-value:0.0
mmd:0.001, p-value:1.0
mmd:0.026, p-value:0.0
mmd:0.063, p-value:0.0
mmd:0.106, p-value:0.0
mmd:0.133, p-value:0.0
mmd:0.139, p-value:0.0
mmd:0.131, p-value:0.0


Clearly the p-value is not a useful metric in this test for finding adversarial samples, as it only correctly identifies that two segments are from the same distribution. Let's try shuffled data

In [6]:
results = [] #tuple containing the mmd and pval
segments = 10
samples = np.array_split(df_baseline_obs.sample(frac=1), segments)
print(f'Using a p-value threshold of {PVAL}')
for i in range(len(samples)-1):
    result = detectors.kernel_mmd(torch.from_numpy(samples[i].values).to('cuda'), 
                                  torch.from_numpy(samples[1+1].values).to('cuda'),
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
    if result[1] > PVAL:
        dist = 'identical'
        colour = AUTO
    else:
        dist = 'distinct'
        colour = RED
    print(f'mmd:{result[0]:.3f}, p-value:{result[1]}, {colour}distributions are {dist}{AUTO}')
    results.append(result)

Using a p-value threshold of 0.05
mmd:0.003, p-value:0.05469999834895134, [0mdistributions are identical[0m
mmd:0.003, p-value:0.032600000500679016, [91mdistributions are distinct[0m
mmd:0.001, p-value:1.0, [0mdistributions are identical[0m
mmd:0.003, p-value:0.15770000219345093, [0mdistributions are identical[0m
mmd:0.003, p-value:0.040699999779462814, [91mdistributions are distinct[0m
mmd:0.004, p-value:0.002099999925121665, [91mdistributions are distinct[0m
mmd:0.004, p-value:0.007999999448657036, [91mdistributions are distinct[0m
mmd:0.002, p-value:0.307699978351593, [0mdistributions are identical[0m
mmd:0.003, p-value:0.01719999872148037, [91mdistributions are distinct[0m


Load unperturbed observations from untargeted adversarial attack

In [7]:
df_adv_obs = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/adv_obs.csv')), #navigate to another folder in parent dir
                        index_col=0,
                        dtype='float32')
df_adv_obs.set_index(df_adv_obs.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

Load perturbed observations from untargeted adversarial attack (100% adversarial)

In [8]:
df_adv_perturbed_obs = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/adv_perturbed_obs.csv')), #navigate to another folder in parent dir
                        index_col=0,
                        dtype='float32')
df_adv_perturbed_obs.set_index(df_adv_perturbed_obs.index.astype(int), inplace=True) #all data is loaded as float32, but the index should be an int

Here we'll get the MMD between two full distrubtions during evaluation, the observations from the environment and the same observations once perturbed by ACG

In [9]:
result = detectors.kernel_mmd(torch.from_numpy(df_adv_obs.values).to('cuda'), #clean obs from adv trace
                                  torch.from_numpy(df_adv_perturbed_obs.values).to('cuda'), #perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]}, p-value:{result[1]}')

mmd:0.00014674663543701172, p-value:1.0


MMD sees no difference between the perturbed and unperturbed distributions! The MMD is smaller between these two distributaions than between segments of the baseline ditribution would it be different if the min/max normalization is undone?

In [10]:
result = detectors.kernel_mmd(torch.from_numpy(df_baseline_obs.values).to('cuda'), #clean obs from clean trace
                                  torch.from_numpy(df_adv_perturbed_obs.values).to('cuda'),#perturbed obs from adv trace
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)
print(f'mmd:{result[0]:.3f}, p-value:{result[1]}')

mmd:0.000, p-value:0.9019999504089355


In [12]:
df_adv_obs['action'] = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/adv_obs_a.csv')), 
                                   dtype=int)

In [18]:
df_adv_perturbed_obs['action'] = pd.read_csv(os.path.normpath(os.path.join(os.getcwd(), '..','20 bin PPO 500 results/adv_perturbed_obs_a.csv')), 
                                   dtype=int)

Here we are grouping our samples (observations/states) by class (action), to see if the normal and adversarial samples are drawn from the same distributions for each distinct class

In [55]:
def show_results(results): #results is a tuple of (action (mmd,pval))
    for result in results:
        if result[1][1] > PVAL:
            dist = 'identical'
            colour = AUTO
        else:
            dist = 'distinct'
            colour = RED
        print(f'For action {result[0]}: mmd:{result[1][0]:.3f}, p-value:{result[1][1]}, {colour}distributions are {dist}{AUTO}')

In [56]:
from joblib import Parallel, delayed
#import torch

def process_action(i):
    return i, detectors.kernel_mmd(torch.from_numpy(df_adv_obs[df_adv_obs['action']==i].iloc[:,:-1].values).to('cuda'), #slice excludes actions column
                                  torch.from_numpy(df_adv_perturbed_obs[df_adv_perturbed_obs['action']==i].iloc[:,:-1].values).to('cuda'),
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)

#%%time
results = Parallel(n_jobs=10, #set n_jobs so you don't run out of vram, 10 is faster than 12, probably because that's exactly half of the threads needed, so 12 just results in more threads tripping over eachother to use the gpu
            prefer='threads' #threads are like 8 times faster than multiprocessing, less overhead and the cpu work is negligable
            )(delayed(process_action)(i) for i in range(df_adv_obs['action'].max().astype(int)+1))

show_results(results)


For action 0: mmd:0.042, p-value:0.02499999850988388, [91mdistributions are distinct[0m
For action 1: mmd:0.024, p-value:0.0, [91mdistributions are distinct[0m
For action 2: mmd:0.055, p-value:0.0, [91mdistributions are distinct[0m
For action 3: mmd:0.009, p-value:0.0, [91mdistributions are distinct[0m
For action 4: mmd:0.080, p-value:0.016099998727440834, [91mdistributions are distinct[0m
For action 5: mmd:0.021, p-value:0.1103999987244606, [0mdistributions are identical[0m
For action 6: mmd:0.006, p-value:0.00969999935477972, [91mdistributions are distinct[0m
For action 7: mmd:0.006, p-value:9.999999747378752e-05, [91mdistributions are distinct[0m
For action 8: mmd:0.022, p-value:0.0, [91mdistributions are distinct[0m
For action 9: mmd:0.021, p-value:0.0, [91mdistributions are distinct[0m
For action 10: mmd:0.040, p-value:0.0017999999690800905, [91mdistributions are distinct[0m
For action 11: mmd:0.021, p-value:0.0, [91mdistributions are distinct[0m
For action

This works where the other test failed because a sample which originally lead to action X is different thatn a sample which leads to action Y + a perturbation which leads it to actions X...OR this difference is an artifact of the time series data and we are comparing sample from differen time of day or saeaions and the difference is not due to perturbations. 
- confirm if the difference is due to perturbations or time series artifacts 
- if due to perturbations:
     - how many samples do we need for detection (this could be a metirc for Ranwa's competition), we can use a binary search, stating with half our adversarial samples
     - does different regularization evade detection
     - does another attack evade detection
     - this was detected using like a year's (?) worth of data, could this feasible detect an attack before it's too late?
     - does this still work if we are using last year's smaples to detect an attack next year? this detection was demo with the before and after from perturbations. IRL you would only have the after **Will detection work when the detector is fitted before an episode so it detects adversarial samples during an episode** ?

We want to see if the detection is because the we are comparing sample from different times, so we will now only look at unperturbed samples

In [64]:
def process_action(i):
    return i, detectors.kernel_mmd(torch.from_numpy(df_adv_obs[df_adv_obs['action']==i].iloc[:,:-1].values).to('cuda'), #slice excludes actions column
                                  torch.from_numpy(df_adv_obs[:-1][df_adv_perturbed_obs['action']==i].iloc[:,:-1].values).to('cuda'), #there is no action for the final observation, so there is one few adversarial sample than sample
                                  n_perm=BOOTSTRAP,
                                  kernel=kernel)

#%%time
results = Parallel(n_jobs=10, #set n_jobs so you don't run out of vram, 10 is faster than 12, probably because that's exactly half of the threads needed, so 12 just results in more threads tripping over eachother to use the gpu
            prefer='threads' #threads are like 8 times faster than multiprocessing, less overhead and the cpu work is negligable
            )(delayed(process_action)(i) for i in range(df_adv_obs['action'].max().astype(int)+1))

show_results(results)


For action 0: mmd:0.042, p-value:0.021399999037384987, [91mdistributions are distinct[0m
For action 1: mmd:0.024, p-value:9.999999747378752e-05, [91mdistributions are distinct[0m
For action 2: mmd:0.056, p-value:0.0, [91mdistributions are distinct[0m
For action 3: mmd:0.009, p-value:0.0, [91mdistributions are distinct[0m
For action 4: mmd:0.080, p-value:0.016999999061226845, [91mdistributions are distinct[0m
For action 5: mmd:0.021, p-value:0.1127999946475029, [0mdistributions are identical[0m
For action 6: mmd:0.007, p-value:0.006099999882280827, [91mdistributions are distinct[0m
For action 7: mmd:0.006, p-value:0.00029999998514540493, [91mdistributions are distinct[0m
For action 8: mmd:0.022, p-value:0.0, [91mdistributions are distinct[0m
For action 9: mmd:0.021, p-value:0.0, [91mdistributions are distinct[0m
For action 10: mmd:0.040, p-value:0.002099999925121665, [91mdistributions are distinct[0m
For action 11: mmd:0.021, p-value:0.0, [91mdistributions are dis

so we aren't detecting adversarial examples, just these observations are fundamentally different...

**TODO** compare the difference in MMDs