# Validation of statistical-tests

We would like to validate that our test-settings are correct. 

We have data of $X=Y$ and $X!=Y$ beforehand, and we run tests. If test's result are same as the truth, we regard that as valid.

The notebook shows you samples to validate a stats-test.

In [2]:
import sys
sys.path.append("../")
sys.path.append(".")

In [3]:
from model_criticism_mmd import ModelTrainerTorchBackend, MMD, TwoSampleDataSet
from model_criticism_mmd import kernels_torch
from model_criticism_mmd import PermutationTest, SelectionKernels
from model_criticism_mmd.models.static import DEFAULT_DEVICE
from model_criticism_mmd.supports.evaluate_stats_tests import StatsTestEvaluator, TestResultGroupsFormatter



In [4]:
import torch
import numpy as np
import tqdm
import typing
%matplotlib inline
import matplotlib.pyplot as plt

In [31]:
N_DATA_SIZE = 500
N_FEATURE = 100
NOISE_MU_X = 0
NOISE_SIGMA_X = 0.5
NOISE_MU_Y = 0
NOISE_SIGMA_Y = 0.5
THRESHOLD_P_VALUE = 0.05

# Epoch should be > 500 normally. Here small value for example.
num_epochs_selection = 50
# Permutation should be > 500 normally. Here small value for example.
n_permutation_test = 100

In [6]:
device_obj = torch.device('cpu')

In [18]:
x_train = torch.tensor(np.random.normal(NOISE_MU_X, NOISE_SIGMA_X, (N_DATA_SIZE, N_FEATURE)))
x_eval = torch.tensor(np.random.normal(NOISE_MU_X, NOISE_SIGMA_X, (N_DATA_SIZE, N_FEATURE)))
y_train_same = torch.tensor(np.random.normal(NOISE_MU_X, NOISE_SIGMA_X, (N_DATA_SIZE, N_FEATURE)))
y_eval_same = torch.tensor(np.random.normal(NOISE_MU_X, NOISE_SIGMA_X, (N_DATA_SIZE, N_FEATURE)))
y_train_diff = torch.tensor(np.random.laplace(NOISE_MU_X, NOISE_SIGMA_X, (N_DATA_SIZE, N_FEATURE)))
y_eval_diff = torch.tensor(np.random.laplace(NOISE_MU_X, NOISE_SIGMA_X, (N_DATA_SIZE, N_FEATURE)))

In [12]:
# lengthscale=-1.0 is "median heuristic"
rbf_kernel = kernels_torch.BasicRBFKernelFunction(device_obj=device_obj, log_sigma=-1.0)
matern_0_5 = kernels_torch.MaternKernelFunction(nu=0.5, device_obj=device_obj, lengthscale=-1.0)
matern_1_5 = kernels_torch.MaternKernelFunction(nu=1.5, device_obj=device_obj, lengthscale=-1.0)
matern_2_5 = kernels_torch.MaternKernelFunction(nu=2.5, device_obj=device_obj, lengthscale=-1.0)

# the tuple is (initial-scles, kernel-function). If initial-scale is None, the scale is initialized randomly.
kernels_optimization = [(None, rbf_kernel), (None, matern_0_5), (None, matern_1_5), (None, matern_2_5)]
kernels_non_optimization = [rbf_kernel, matern_2_5]

`StatsTestEvaluator` runs all operations automatically,

1. optimization of kernels.
2. running of permutation tests.
3. decision if stats-test is same as our expectations.

In [16]:
test_eval = StatsTestEvaluator(candidate_kernels=kernels_optimization, 
                               kernels_no_optimization=kernels_non_optimization, 
                               device_obj=device_obj, 
                               num_epochs=num_epochs_selection, 
                               n_permutation_test=n_permutation_test)

Either (y_train_same, y_eval_same) or (y_train_diff, y_eval_diff) must be given

In [19]:
stats_tests = test_eval.interface(code_approach='tests', 
                                  x_train=x_train,
                                  y_train_same=y_train_same,
                                  y_train_diff=y_train_diff,
                                  x_eval=x_eval,
                                  y_eval_same=y_eval_same,
                                  y_eval_diff=y_eval_diff)

2021-08-26 10:23:47,538 - model_criticism_mmd.logger_unit - INFO - Set the initial scales value
  scales = torch.tensor(init_scale.clone().detach().cpu(), requires_grad=True, device=self.device_obj)
2021-08-26 10:23:47,541 - model_criticism_mmd.logger_unit - INFO - Getting median initial sigma value...
2021-08-26 10:23:47,656 - model_criticism_mmd.logger_unit - INFO - initial by median-heuristics 1.78 with is_log=True
2021-08-26 10:23:47,689 - model_criticism_mmd.logger_unit - INFO - Validation at 0. MMD^2 = 0.009395547543890603, ratio = [93.95547544] obj = [-4.542821]
2021-08-26 10:23:48,374 - model_criticism_mmd.logger_unit - INFO -      5: [avg train] MMD^2 0.004749546037662811 obj [-3.83669794] val-MMD^2 0.009564010162189884 val-ratio [95.64010162] val-obj [-4.56059221]  elapsed: 0.0
2021-08-26 10:23:49,685 - model_criticism_mmd.logger_unit - INFO -     25: [avg train] MMD^2 0.004896354518263313 obj [-3.8594957] val-MMD^2 0.010425010347658525 val-ratio [104.25010348] val-obj [-4.64

`TestResultGroupsFormatter` is a class to format test-results friendly.

In [28]:
test_formatter = TestResultGroupsFormatter(stats_tests)
df_results = test_formatter.format_result_table()
df_results_summary = test_formatter.format_result_summary_table()
text_tests = test_formatter.format_test_result_summary()

`format_result_summary_table()` shows you test-results for both of X=Y and X!=Y.

In [23]:
df_results_summary

Unnamed: 0,test-key,X=Y,X!=Y
0,tests-BasicRBFKernelFunction-False,pass,pass
1,tests-BasicRBFKernelFunction-True,pass,pass
2,tests-MaternKernelFunction-nu=0.5-True,pass,pass
3,tests-MaternKernelFunction-nu=1.5-True,pass,pass
4,tests-MaternKernelFunction-nu=2.5-False,pass,pass
5,tests-MaternKernelFunction-nu=2.5-True,pass,pass


`format_result_table()` shows you details of test-results.

In [24]:
df_results

Unnamed: 0,codename_experiment,kernel,kernel_parameter,is_optimized,test_result,p_value,is_same_distribution_truth,is_same_distribution_test,ratio
0,tests,MaternKernelFunction-nu=0.5,"[[tensor(4.9679, grad_fn=<UnbindBackward>)]]",True,pass,0.44,True,True,149.548907
1,tests,MaternKernelFunction-nu=1.5,"[[tensor(4.9679, grad_fn=<UnbindBackward>)]]",True,pass,0.38,True,True,142.026242
2,tests,MaternKernelFunction-nu=2.5,"[[tensor(4.9679, grad_fn=<UnbindBackward>)]]",True,pass,0.43,True,True,139.85009
3,tests,BasicRBFKernelFunction,1.7762933595325865,True,pass,0.37,True,True,112.636156
4,tests,BasicRBFKernelFunction,1.7762933595325865,False,pass,0.5,True,True,
5,tests,MaternKernelFunction-nu=2.5,"[[tensor(4.9679, grad_fn=<UnbindBackward>)]]",False,pass,0.54,True,True,
6,tests,MaternKernelFunction-nu=0.5,"[[tensor(6.0542, grad_fn=<UnbindBackward>)]]",True,pass,0.0,False,False,23.928723
7,tests,BasicRBFKernelFunction,1.9740368666226569,True,pass,0.0,False,False,20.115482
8,tests,MaternKernelFunction-nu=1.5,"[[tensor(6.0542, grad_fn=<UnbindBackward>)]]",True,pass,0.0,False,False,20.109416
9,tests,MaternKernelFunction-nu=2.5,"[[tensor(6.0542, grad_fn=<UnbindBackward>)]]",True,pass,0.0,False,False,19.271291


`format_test_result_summary()` shows you cross-table for each test.

In [30]:
print(text_tests)

exp-code=tests, Kernel=BasicRBFKernelFunction with length_scale=1.7762933595325865 optimization=True
p-value=0.37
+----------------+--------+---------+
| Truth / Test   |   True |   False |
|----------------+--------+---------|
| True           |      1 |       0 |
| False          |      0 |       0 |
+----------------+--------+---------+

exp-code=tests, Kernel=BasicRBFKernelFunction with length_scale=1.7762933595325865 optimization=False
p-value=0.5
+----------------+--------+---------+
| Truth / Test   |   True |   False |
|----------------+--------+---------|
| True           |      1 |       0 |
| False          |      0 |       0 |
+----------------+--------+---------+

exp-code=tests, Kernel=BasicRBFKernelFunction with length_scale=1.9740368666226569 optimization=True
p-value=0.0
+----------------+--------+---------+
| Truth / Test   |   True |   False |
|----------------+--------+---------|
| True           |      0 |       0 |
| False          |      0 |       1 |
+----------