# Sample notebook of Permutation test with MMD

Statistical test is a test that we confirm if two-distribution is same or not.

As a one class of the statistical tests, Permutation test is well known.

In this notebook, we present you how to run Permutation test with MMD.

In [2]:
import sys
sys.path.append("../")
sys.path.append(".")
import numpy as np
import torch
from model_criticism_mmd import MMD
from model_criticism_mmd.backends.kernels_torch import BasicRBFKernelFunction
from model_criticism_mmd.supports.permutation_tests import PermutationTest
from model_criticism_mmd.models import TwoSampleDataSet
from model_criticism_mmd import ModelTrainerTorchBackend, MMD, TwoSampleDataSet, split_data
from model_criticism_mmd.models.static import DEFAULT_DEVICE



In [3]:
device_obj = DEFAULT_DEVICE

Next, we set dataset. The input type into the Permutation class is `TwoSampleDataSet`.
We can set either `numpy.ndarray` or `torch.tensor`.

In [4]:
np.random.seed(seed=1)
x = np.random.normal(3, 0.5, size=(500, 2))
y = np.random.normal(3, 0.5, size=(500, 2))

Then, we run the Permutation test.

In [5]:
init_scale = torch.tensor(np.array([0.05, 0.55]))
device_obj = torch.device(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
kernel_function = BasicRBFKernelFunction(log_sigma=0.0, device_obj=device_obj, opt_sigma=True)
mmd_estimator = MMD(kernel_function_obj=kernel_function, device_obj=device_obj, scales=init_scale)
dataset_train = TwoSampleDataSet(x, y, device_obj)

permutation_tester = PermutationTest(is_normalize=True, mmd_estimator=mmd_estimator, dataset=dataset_train)
statistics = permutation_tester.compute_statistic()
threshold = permutation_tester.compute_threshold(alpha=0.05)
p_value = permutation_tester.compute_p_value(statistics)
print(f'statistics: {statistics}, threshold: {threshold}, p-value: {p_value}')
if p_value > 0.05:
    print('Same distribution!')
else:
    print('Probably different distribution!')

100%|██████████| 1000/1000 [00:09<00:00, 102.61it/s]

statistics: 0.005706118359427581, threshold: 0.2574213420057925, p-value: 0.878
Same distribution!





## Permutation test with optimized kernels

To run the permutation test, we have to define a MMD estimator who has a designed kernel function.

In normal cases, we search the optimal kernel on the given datset (i.e. trainings, optimizations).

In [6]:
n_train = 400
x_train = x[:n_train]
y_train = y[:n_train]
x_test = x[n_train:]
y_test = y[n_train:]
dataset_val = TwoSampleDataSet(x_test, y_test, device_obj=device_obj)

In [7]:
init_scale = torch.tensor(np.array([0.05, 0.55]))
kernel_function = BasicRBFKernelFunction(log_sigma=0.0, device_obj=device_obj, opt_sigma=True)
mmd_estimator = MMD(kernel_function_obj=kernel_function, device_obj=device_obj, scales=init_scale)
trainer = ModelTrainerTorchBackend(mmd_estimator=mmd_estimator, device_obj=device_obj)
trained_obj = trainer.train(dataset_training=dataset_train, 
                            dataset_validation=dataset_val, 
                            num_epochs=500, batchsize=200)

2021-08-26 10:09:56,935 - model_criticism_mmd.logger_unit - INFO - Getting median initial sigma value...
2021-08-26 10:09:56,998 - model_criticism_mmd.logger_unit - INFO - initial by median-heuristics -0.352 with is_log=True
2021-08-26 10:09:57,002 - model_criticism_mmd.logger_unit - INFO - Validation at 0. MMD^2 = 0.006344559736931377, ratio = [1.26598805] obj = [-0.23585288]
2021-08-26 10:09:57,278 - model_criticism_mmd.logger_unit - INFO -      5: [avg train] MMD^2 0.015077468287632567 obj [-1.91112737] val-MMD^2 0.028746948926260896 val-ratio [2.20834747] val-obj [-0.79224449]  elapsed: 0.0
2021-08-26 10:09:58,390 - model_criticism_mmd.logger_unit - INFO -     25: [avg train] MMD^2 0.013349298026780948 obj [-4.83804213] val-MMD^2 0.019986890681881194 val-ratio [199.86890682] val-obj [-5.29766169]  elapsed: 0.0
2021-08-26 10:09:59,800 - model_criticism_mmd.logger_unit - INFO -     50: [avg train] MMD^2 0.013348413561665128 obj [-4.83783625] val-MMD^2 0.019993196609990115 val-ratio [

Now, we have the trained MMD estimator.

In [8]:
trained_mmd_estimator = MMD.from_trained_parameters(trained_obj, device_obj=device_obj)

Finally, we run a permutation test. For that, we call a class named `PermutationTest`.

In [9]:
permutation_tester = PermutationTest(n_permutation_test=1000, 
                                     mmd_estimator=trained_mmd_estimator, 
                                     dataset=dataset_val, 
                                     batch_size=-1)
statistics = permutation_tester.compute_statistic()
threshold = permutation_tester.compute_threshold(alpha=0.05)
p_value = permutation_tester.compute_p_value(statistics)
print(f"MMD-statistics: {statistics}, threshold: {threshold}, p-value: {p_value}")
if p_value > 0.05:
    print('Same distribution!')
else:
    print('Probably different distribution!')

100%|██████████| 1000/1000 [00:02<00:00, 436.10it/s]

MMD-statistics: 0.019989082557885524, threshold: 0.02003272012882909, p-value: 0.655
Same distribution!



