This notebook is the tutorial of supplementary material for the paper *Dual-Efficient Ensemble Conditional Independence Testing* (ECIT).

The required dependencies can be installed via pip:

In [None]:
!pip install -r requirements.txt

Our ensemble framework can be applied on top of any CIT method using.
The current implementation supports the following six widely used CIT methods:

1. `KCIT`
2. `RCIT` 
3. `LPCIT` (requires simple setup as instructed by the original repository: https://github.com/meyerscetbon/lp-ci-test)  
4. `CMIknn`
5. `CCIT`
6. `FisherZ`  

These CIT methods can be directly used through the provided code.

In [None]:
from ecit import *
cit_methods = [kcit, rcit, lpcit, cmiknn, ccit, fisherz]

np.random.seed(1)
x,y,z = generate_samples(n=800, indp="C") # Condition independence data

In [3]:
p_value = kcit(x,y,z)
p_value

0.14060121965645966

To apply our method, use the `ECIT` class as follows. It is recommended to directly use the predefined function for combining p-values with a stable distribution:

In [None]:
data = np.hstack([x,y,z])

# Each function takes a list of p-values as input and returns a single aggregated p-value.
p_combination_list = [
    p_alpha2,    # Uses stable distribution(alpha=2, beta=0, loc=0, scale=1)
    p_alpha175,  # Uses stable distribution(alpha=1.75, beta=0, loc=0, scale=1)
    p_alpha15,   # Uses stable distribution(alpha=1.5, beta=0, loc=0, scale=1)
    p_alpha125,  # Uses stable distribution(alpha=1.25, beta=0, loc=0, scale=1)
    p_alpha1,    # Uses stable distribution(alpha=1, beta=0, loc=0, scale=1)
    p_mean       # Simple mean of p-values
]

# Or define a custom parameters of stable distribution:
# p_combination = lambda p_list: p_stable(p_list, alpha=2, beta=0, loc=0, scale=1)

In [None]:
# Directly create an ECIT instance by specifying:
# - the base CIT method (callable),
# - the p-value combination function (callable),
# - and the number of data splits k (int).
ekcit = ECIT(data, kcit, p_alpha2, k=2)  # Or use custom combination: ECIT(data, kcit, p_combination, k=2)

# Apply the ECIT instance to test conditional independence between X, Y given Z.
p_value = ekcit([0], [1], [2])
p_value

0.42434490839326344

A simple comparison is performed below:

In [None]:
from tqdm import tqdm

def simple_compare(cit, k=2, n=1200, t=10):
    power = 0
    for i in tqdm(range(t)):
        np.random.seed(i)
        data = np.hstack((generate_samples(n=n, indp="N"))) # NOT Condition independence
        ekcit = ECIT(data, cit, p_alpha2, k=k)
        p_value = ekcit([0],[1],[2])
        if p_value<0.05: power+=1
    power = power/t
    if k==1:
        print("Power:", power)
    else:
        print("Ensemble Power:", power)

In [7]:
simple_compare(kcit, k=1, n=1200, t=10) # When k = 1, the ECIT degenerates to directly applying the base test.
# Note: This may take a little time.

100%|██████████| 10/10 [04:37<00:00, 27.78s/it]

Power: 0.4





In [8]:
simple_compare(kcit, k=3, n=1200, t=10)

100%|██████████| 10/10 [01:24<00:00,  8.40s/it]

Ensemble Power: 0.7





All experiments and results presented in the paper are provided under the `./experiment` directory.

- `./experiment/eff` corresponds to **Section 4.1**.

- `./experiment/cit` corresponds to **Section 4.2** and **Appendix D.3**.
  (Note: Due to high computational cost, results for **CCIT** and **CMIknn** are split into three separate files each.)
- `./experiment/real_data` corresponds to **Section 4.3**  
  (Please refer to `Flow-Cytometry.txt` for detailed information.)

- `./experiment/eff/alpha_choose.ipynb` corresponds to **Appendix D.1**.

- `./experiment/pc` corresponds to **Appendix D.5**  
  (Note: This experiment is relatively time-consuming.)