This script reproduces the results presented in the two-sample experiments. For comparisons with other methods include any one of 'tst.GP_test', 'tst.MMD', 'tst.C2ST', in the the methods list below.

In [9]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import general_utils as general_utils
import data as data
import kernel_utils as kernel_utils
import two_sample_test_utils as tst

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Comparisons on type I error for different sample sizes and number of observations
We generate data under the null hypothesis of equal distribution. Below we define the methods to be compared, our desired level $\alpha$ and the number of runs our results will be avergaed over.

In [5]:
methods = [tst.RMMD] # note tst.GP_test takes a long time to run
alpha = 0.05 # significance level
num_runs= 100 # number of times experiment is replicated
variance = 0.1 # variance of the error terms

# iterate over different sample sizes
params_size = [100,250,500]
type_I_error = general_utils.performance_comparisons(methods,num_runs,param_name='size',params=params_size,
                                                     var1=variance, var2=variance,alpha=alpha,num_obs=20,
                                                     meta_mu=6)

print("Type I error across different number of samples")
for key, value in type_I_error.items():
     print(key, '--> ', value)

100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:07<00:00, 12.72it/s]
100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:19<00:00,  5.02it/s]
100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:49<00:00,  2.00it/s]


Type I error across different number of samples
method: RMMD; param value: 100  -->  0.09999999999999999
method: RMMD; param value: 250  -->  0.09999999999999999
method: RMMD; param value: 500  -->  0.17


In [None]:
methods = [tst.RMMD,tst.MMD]
alpha = 0.05 
num_runs= 500
variance = 0.25

# iterate over different number of observations
params = [5,10,20,50,100]
type_I_error = general_utils.performance_comparisons(methods,num_runs,param_name='num_obs',params=params,
                                                     var1=variance, var2=variance,alpha=alpha)
print("Type I error across different number of observations")
for key, value in type_I_error.items():
     print(key, '--> ', value)                

  4%|███▏                                                                             | 20/500 [00:10<04:20,  1.84it/s]

### Comparisons on power for convergent mean functions
We generate data under the hypothesis of different underlying distributions. For this experiment, the mean functions are both sine waves but we will vary the amplitude to make the problem progressively harder. Variance in both samples remains the same. Below we define the methods to be compared, our desired level $\alpha$ and the number of runs our results will be avergaed over. 

In [10]:
methods = [tst.RMMD,tst.MMD]#, tst.C2ST
alpha = 0.05 # significance level
num_runs= 50


# iterate over different mean functions. Reference sample has amplitutude equal to 1
params = [1.5,1.25,1.1,1.05]
power = general_utils.performance_comparisons(methods,num_runs,param_name='mean_scale',params=params,
                                              mean_scale=1,alpha=alpha,num_obs = 20,size=300)
print("Power across different scales of the mean function")
for key, value in power.items():
     print(key, '--> ', value) 

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:29<00:00,  1.70it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:29<00:00,  1.70it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:28<00:00,  1.76it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:27<00:00,  1.80it/s]


Power across different scales of the mean function
method: RMMD; param value: 1.5  -->  1.0000000000000004
method: MMD; param value: 1.5  -->  1.0000000000000004
method: RMMD; param value: 1.25  -->  1.0000000000000004
method: MMD; param value: 1.25  -->  0.9200000000000005
method: RMMD; param value: 1.1  -->  0.6400000000000002
method: MMD; param value: 1.1  -->  0.21999999999999997
method: RMMD; param value: 1.05  -->  0.23999999999999996
method: MMD; param value: 1.05  -->  0.08


### Comparisons on power for different error variances

In [8]:
methods = [tst.RMMD, tst.C2ST, tst.MMD]
alpha = 0.05 # significance level
num_runs= 50


# iterate over different gaussian variances
# variance of fixed sample is 0.1
params = [0.11,0.15,0.2,0.25]
power = general_utils.performance_comparisons(methods,num_runs,param_name='var1',params=params,
                                              alpha=alpha)
print("Power across different error variance")
for key, value in power.items():
     print(key, '--> ', value)

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [02:44<00:00,  3.28s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [02:44<00:00,  3.29s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [02:42<00:00,  3.25s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [02:45<00:00,  3.30s/it]


Power across different error variance
method: RMMD; param value: 0.11  -->  0.21999999999999997
method: C2ST; param value: 0.11  -->  0.04
method: MMD; param value: 0.11  -->  0.06
method: RMMD; param value: 0.15  -->  0.9800000000000005
method: C2ST; param value: 0.15  -->  0.1
method: MMD; param value: 0.15  -->  0.34
method: RMMD; param value: 0.2  -->  1.0000000000000004
method: C2ST; param value: 0.2  -->  0.38000000000000006
method: MMD; param value: 0.2  -->  0.7000000000000003
method: RMMD; param value: 0.25  -->  1.0000000000000004
method: C2ST; param value: 0.25  -->  0.6800000000000003
method: MMD; param value: 0.25  -->  1.0000000000000004


### Comparisons of time complexity across different models

In [None]:
methods = [tst.RMMD, tst.C2ST, tst.MMD]
sizes = [1000,2000]
times = general_utils.time_complexity(methods,num_runs=10, sizes=sizes, num_obs = 10)
print("Average run times for 1 test computation with varying sample size of 10 obs")
for key, value in times.items(): 
     print(key, '--> ', value)

### Performance comparisons as a function of the number of random features

In [12]:
perf = general_utils.perf_num_features(num_runs=100,num_features=[10,50,100,250,500], mean_scale = 1.1)
print("Performance as a function of the number of random features")
for key, value in perf.items():
     print(key, '--> ', value)

100%|██████████| 100/100 [00:52<00:00,  1.89it/s]
100%|██████████| 100/100 [00:54<00:00,  1.83it/s]
100%|██████████| 100/100 [00:56<00:00,  1.78it/s]
100%|██████████| 100/100 [00:56<00:00,  1.77it/s]
100%|██████████| 100/100 [01:00<00:00,  1.66it/s]


Performance as a function of the number of random features
method: RMMD; number of features: 10  -->  0.6400000000000003
method: RMMD; number of features: 50  -->  0.6100000000000003
method: RMMD; number of features: 100  -->  0.6300000000000003
method: RMMD; number of features: 250  -->  0.6700000000000004
method: RMMD; number of features: 500  -->  0.6100000000000003
