This script reproduces the results presented in the two-sample experiments. For comparisons with other methods include any one of 'tst.RDC', 'tst.COR', 'tst.HSIC' in the the methods list below.

In [2]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import general_utils as general_utils
import data as data
import kernel_utils as kernel_utils
import independence_test_utils as tst

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Comparisons on type I error for different sample sizes and number of observations
We generate data under the null hypothesis of equal distribution. Below we define the methods to be compared, our desired level $\alpha$ and the number of runs our results will be avergaed over. 

In [3]:
methods = [tst.RHSIC,tst.RDC,tst.COR] #,tst.HSIC
alpha = 0.05 # significance level
num_runs= 100 # number of times experiment is replicated
variance = 0.1

# iterate over different sample sizes
params_size = [100]
error = general_utils.performance_comparisons_indep(methods,num_runs,param_name='size',params=params_size,
                                                     var=variance,alpha=alpha, data_type='ind')

print("Type I error with different number of samples")
for key, value in error.items():
     print(key, '--> ', value)

100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [02:34<00:00,  1.55s/it]


Type I error different number of samples
method: RHSIC; param value: 100  -->  0.07
method: RDC; param value: 100  -->  0.05
method: COR; param value: 100  -->  0.08


### Comparisons of power for less dependent data by increasing the variance of error terms
We generate data under the alternative hypothesis of dependence between the functional samples. Below we define the methods to be compared, our desired level $\alpha$ and the number of runs our results will be avergaed over. 

In [5]:
methods = [tst.RHSIC,tst.COR,tst.HSIC,tst.RDC] #,tst.HSIC,tst.RDC
alpha = 0.05 # significance level
num_runs= 50
variance = 0.1

# iterate over different variances with a random transformation (among a set of choices) 
# of the original signal
params_var = [0.1,0.25,0.5,0.75,1,1.25,1.5]
power = general_utils.performance_comparisons_indep(methods,num_runs,param_name='var',
                                                    params=params_var,alpha=alpha, num_obs=100,
                                                    data_type='dep',transformation = None)

print("Power across different values of the variance")
for key, value in power.items():
     print(key, '--> ', value)

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [01:27<00:00,  1.75s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [01:31<00:00,  1.82s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [01:31<00:00,  1.84s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [01:30<00:00,  1.81s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [01:29<00:00,  1.78s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [01:28<00:00,  1.77s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [01:26<00:00,  1.72s/it]


Power across different values of the variance
method: RHSIC; param value: 0.1  -->  1.0000000000000004
method: COR; param value: 0.1  -->  0.36000000000000004
method: HSIC; param value: 0.1  -->  1.0000000000000004
method: RDC; param value: 0.1  -->  1.0000000000000004
method: RHSIC; param value: 0.25  -->  1.0000000000000004
method: COR; param value: 0.25  -->  0.34
method: HSIC; param value: 0.25  -->  0.9600000000000005
method: RDC; param value: 0.25  -->  1.0000000000000004
method: RHSIC; param value: 0.5  -->  0.9800000000000005
method: COR; param value: 0.5  -->  0.34
method: HSIC; param value: 0.5  -->  0.8600000000000004
method: RDC; param value: 0.5  -->  0.8800000000000004
method: RHSIC; param value: 0.75  -->  0.9800000000000005
method: COR; param value: 0.75  -->  0.4000000000000001
method: HSIC; param value: 0.75  -->  0.7400000000000003
method: RDC; param value: 0.75  -->  0.6800000000000003
method: RHSIC; param value: 1  -->  0.8000000000000004
method: COR; param value: 

### Comparisons of power as a function of the number of observations in each trajectory
We generate data under the alternative hypothesis of dependence between the functional samples. Below we define the methods to be compared, our desired level $\alpha$ and the number of runs our results will be avergaed over. 

In [14]:
methods = [tst.COR,tst.RDC] #,tst.HSIC
alpha = 0.05 # significance level
num_runs= 100
variance = 1

# iterate over different variances with a square transformation of the original signal
params_num_obs = [5,10,20,50,100,200]
power = general_utils.performance_comparisons_indep(methods,num_runs,param_name='num_obs',
                                                    params=params_num_obs,alpha=alpha, 
                                                    data_type='dep',var = variance)

print("Power across different number of observations")
for key, value in power.items():
     print(key, '--> ', value)

100%|██████████| 100/100 [03:16<00:00,  1.97s/it]
100%|██████████| 100/100 [03:29<00:00,  2.09s/it]
100%|██████████| 100/100 [03:31<00:00,  2.12s/it]
100%|██████████| 100/100 [03:18<00:00,  1.99s/it]
100%|██████████| 100/100 [03:30<00:00,  2.10s/it]
100%|██████████| 100/100 [03:28<00:00,  2.09s/it]


Power across different number of observations
method: COR; param value: 5  -->  0.2800000000000001
method: RDC; param value: 5  -->  0.36000000000000015
method: COR; param value: 10  -->  0.34000000000000014
method: RDC; param value: 10  -->  0.47000000000000025
method: COR; param value: 20  -->  0.2800000000000001
method: RDC; param value: 20  -->  0.5200000000000002
method: COR; param value: 50  -->  0.2700000000000001
method: RDC; param value: 50  -->  0.47000000000000025
method: COR; param value: 100  -->  0.22000000000000006
method: RDC; param value: 100  -->  0.4300000000000002
method: COR; param value: 200  -->  0.36000000000000015
method: RDC; param value: 200  -->  0.45000000000000023


### Comparisons of time complexity across different models

In [5]:
methods = [tst.RDC,tst.COR,tst.RHSIC,tst.HSIC]
sizes = [1000,2000]
times = general_utils.time_complexity(methods,num_runs=10, sizes=sizes, num_obs = 10)
print("Average run times for 1 test computation with varying sample size of 10 obs")
for key, value in times.items(): 
     print(key, '--> ', value)

100%|██████████| 2/2 [07:34<00:00, 227.01s/it]


Average run times for 1 test computation with varying sample size of 10 obs
method: RDC, number of samples 1000 -->  7.632645606994629
method: COR, number of samples 1000 -->  0.1326216220855713
method: RHSIC, number of samples 1000 -->  2.2937623262405396
method: HSIC, number of samples 1000 -->  2.1175427198410035
method: RDC, number of samples 2000 -->  13.836620783805847
method: COR, number of samples 2000 -->  0.1805337905883789
method: RHSIC, number of samples 2000 -->  9.569473600387573
method: HSIC, number of samples 2000 -->  9.487808418273925
