# Sample notebook of Permutation test with MMD

Statistical test is a test that we confirm if two-distribution is same or not.

As a one class of the statistical tests, Permutation test is well known.

In this notebook, we present you how to run Permutation test with MMD.

In [1]:
import sys
sys.path.append("../")
sys.path.append(".")
import numpy as np
import torch
from model_criticism_mmd import MMD
from model_criticism_mmd.backends.kernels_torch import BasicRBFKernelFunction
from model_criticism_mmd.supports.permutation_tests import PermutationTest
from model_criticism_mmd.models import TwoSampleDataSet
from model_criticism_mmd import ModelTrainerTorchBackend, MMD, TwoSampleDataSet, split_data
from model_criticism_mmd.backends import kernels_torch



In [2]:
device_obj = torch.device('cpu')

Next, we set dataset. The input type into the Permutation class is `TwoSampleDataSet`.
We can set either `numpy.ndarray` or `torch.tensor`.

In [3]:
np.random.seed(seed=1)
x = np.random.normal(3, 0.5, size=(500, 2))
y = np.random.normal(3, 0.5, size=(500, 2))

Then, we run the Permutation test.

In [4]:
init_scale = torch.tensor(np.array([0.05, 0.55]))
device_obj = torch.device(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
kernel_function = BasicRBFKernelFunction(log_sigma=0.0, device_obj=device_obj, opt_sigma=True)
mmd_estimator = MMD(kernel_function_obj=kernel_function, device_obj=device_obj, scales=init_scale)
dataset_train = TwoSampleDataSet(x, y, device_obj)

permutation_tester = PermutationTest(is_normalize=True, mmd_estimator=mmd_estimator, dataset=dataset_train)
statistics = permutation_tester.compute_statistic()
threshold = permutation_tester.compute_threshold(alpha=0.05)
p_value = permutation_tester.compute_p_value(statistics)
print(f'statistics: {statistics}, threshold: {threshold}, p-value: {p_value}')
if p_value > 0.05:
    print('Same distribution!')
else:
    print('Probably different distribution!')

100%|██████████| 1000/1000 [00:09<00:00, 104.60it/s]

statistics: 0.005706118359427581, threshold: 0.26126070815185587, p-value: 0.895
Same distribution!





## Permutation test with optimized kernels

To run the permutation test, we have to define a MMD estimator who has a designed kernel function.

In normal cases, we search the optimal kernel on the given datset (i.e. trainings, optimizations).

In [5]:
n_train = 400
x_train = x[:n_train]
y_train = y[:n_train]
x_test = x[n_train:]
y_test = y[n_train:]
dataset_val = TwoSampleDataSet(x_test, y_test, device_obj=device_obj)

In [6]:
init_scale = torch.tensor(np.array([0.05, 0.55]))
kernel_function = BasicRBFKernelFunction(log_sigma=0.0, device_obj=device_obj, opt_sigma=True)
mmd_estimator = MMD(kernel_function_obj=kernel_function, device_obj=device_obj, scales=init_scale)
trainer = ModelTrainerTorchBackend(mmd_estimator=mmd_estimator, device_obj=device_obj)
trained_obj = trainer.train(dataset_training=dataset_train, 
                            dataset_validation=dataset_val, 
                            num_epochs=500, batchsize=200)

2021-08-09 15:32:32,130 - model_criticism_mmd.logger_unit - INFO - Validation at 0. MMD^2 = 0.0054279060877084895, ratio = [1.42118584] obj = [-0.35149162]
2021-08-09 15:32:32,491 - model_criticism_mmd.logger_unit - INFO -      5: [avg train] MMD^2 0.014181100859385306 obj [-1.8767617] val-MMD^2 0.0274158374440569 val-ratio [2.24433956] val-obj [-0.8084113]  elapsed: 0.0
2021-08-09 15:32:33,911 - model_criticism_mmd.logger_unit - INFO -     25: [avg train] MMD^2 0.013349085295267188 obj [-4.83800576] val-MMD^2 0.01998787532532578 val-ratio [199.87875325] val-obj [-5.29771095]  elapsed: 0.0
2021-08-09 15:32:36,105 - model_criticism_mmd.logger_unit - INFO -     50: [avg train] MMD^2 0.013348356957317481 obj [-4.83780986] val-MMD^2 0.019994455136139996 val-ratio [199.94455136] val-obj [-5.29804008]  elapsed: 0.0
2021-08-09 15:32:39,968 - model_criticism_mmd.logger_unit - INFO -    100: [avg train] MMD^2 0.013348379533296746 obj [-4.8378148] val-MMD^2 0.019994291901512036 val-ratio [199.94

Now, we have the trained MMD estimator.

In [7]:
trained_mmd_estimator = MMD.from_trained_parameters(trained_obj, device_obj=device_obj)

Finally, we run a permutation test. For that, we call a class named `PermutationTest`.

In [8]:
permutation_tester = PermutationTest(n_permutation_test=1000, 
                                     mmd_estimator=trained_mmd_estimator, 
                                     dataset=dataset_train, 
                                     batch_size=-1)
statistics = permutation_tester.compute_statistic()
threshold = permutation_tester.compute_threshold(alpha=0.05)
p_value = permutation_tester.compute_p_value(statistics)
print(f"MMD-statistics: {statistics}, threshold: {threshold}, p-value: {p_value}")
if p_value > 0.05:
    print('Same distribution!')
else:
    print('Probably different distribution!')

100%|██████████| 1000/1000 [00:25<00:00, 39.73it/s]

MMD-statistics: 0.004013310439378724, threshold: 0.004015453203178707, p-value: 0.07599999999999996
Same distribution!



