In [14]:
import pandas as pd
from tester import Ktest

# Ktest: two-sample kernel tests for large datasets
## Functionalities demonstrated on real data

Loading data:

In [2]:
data = pd.read_csv('v5_data/RTqPCR_reversion_logcentered.csv', index_col=0)

Constructing metadata and instantiating Ktest:

In [3]:
meta = pd.Series(data = pd.Series(data.index).apply(lambda x : x.split(sep='.')[1]))
meta.index = data.index
kt_1 = Ktest(data=data, metadata=meta, sample_names=['48HREV','48HDIFF'])
print(kt_1)

An object of class Ktest.
83 features across 339 observations
Comparison: 48HREV (171 observations) and 48HDIFF (168 observations).
___Multivariate test results___
MMD:
not computed, run ktest.test.
kFDA:
not computed, run ktest.test.


Performing multivariate test (default setting):

In [4]:
kt_1.test()

- Computing kFDA statistic
- Computing asymptotic p-values


In [5]:
print(kt_1)

An object of class Ktest.
83 features across 339 observations
Comparison: 48HREV (171 observations) and 48HDIFF (168 observations).
___Multivariate test results___
MMD:
not computed, run ktest.test.
kFDA:
Truncation 1: 117.43187874099617, pvalue: asymptotic - 2.3089574670908834e-27,
permutation - not computed.
Truncation 2: 159.66777872775367, pvalue: asymptotic - 2.1309947530545922e-35,
permutation - not computed.
Truncation 3: 208.18560203300848, pvalue: asymptotic - 7.18302843947566e-45,
permutation - not computed.
Truncation 4: 415.74023080466327, pvalue: asymptotic - 1.1041605715456346e-88,
permutation - not computed.
Truncation 5: 468.52865047574164, pvalue: asymptotic - 4.94307382314933e-99,
permutation - not computed.


As a result, kFDA statistic with assiciated asymptotic p-values for each truncation were calculated. The latter are stored respectively in attributes `kfdat` and `pval_kfdat_asymp`:

In [5]:
kt_1.kfdat

1       117.431879
2       159.667779
3       208.185602
4       415.740231
5       468.528650
          ...     
335    5505.910999
336    5508.706471
337    5943.961903
338    6394.955057
339    7601.363086
Length: 339, dtype: float64

In [6]:
kt_1.pval_kfdat_asymp

0      2.308957e-27
1      2.130995e-35
2      7.183028e-45
3      1.104161e-88
4      4.943074e-99
           ...     
334    0.000000e+00
335    0.000000e+00
336    0.000000e+00
337    0.000000e+00
338    0.000000e+00
Length: 339, dtype: float64

Alongside with asymptotic p-values, permutation-based p-values can be calcultated and stored in `pval_kfdat_perm`:

In [8]:
kt_1.test(permutation=True, n_permutations=int(1e4))

- Computing kFDA statistic
- Performing permutations to compute p-values:


100%|█████████████████████████████████████| 10000/10000 [06:59<00:00, 23.85it/s]


In [9]:
print(kt_1)

An object of class Ktest.
83 features across 339 observations
Comparison: 48HREV (171 observations) and 48HDIFF (168 observations).
___Multivariate test results___
MMD:
not computed, run ktest.test.
kFDA:
Truncation 1: 117.43187874099617, pvalue: asymptotic - 2.3089574670908834e-27,
permutation - 0.0.
Truncation 2: 159.66777872775367, pvalue: asymptotic - 2.1309947530545922e-35,
permutation - 0.0.
Truncation 3: 208.18560203300848, pvalue: asymptotic - 7.18302843947566e-45,
permutation - 0.0.
Truncation 4: 415.74023080466327, pvalue: asymptotic - 1.1041605715456346e-88,
permutation - 0.0.
Truncation 5: 468.52865047574164, pvalue: asymptotic - 4.94307382314933e-99,
permutation - 0.0.


Alternatively to kFDA, MMD test statistic can also be calculated (for p-values, only permutation version is available). The statistic and the p-value are stored in `mmd` and `pval_mmd` respectively:

In [10]:
kt_1.test(stat='mmd', permutation=True, n_permutations=int(1e4))

- Computing MMD statistic
- Performing permutations to compute p-values:


100%|████████████████████████████████████| 10000/10000 [01:09<00:00, 144.30it/s]


In [11]:
print(kt_1)

An object of class Ktest.
83 features across 339 observations
Comparison: 48HREV (171 observations) and 48HDIFF (168 observations).
___Multivariate test results___
MMD:
1.5076821888197004e-06, pvalue (permutation test): 0.0.
kFDA:
Truncation 1: 117.43187874099617, pvalue: asymptotic - 2.3089574670908834e-27,
permutation - 0.0.
Truncation 2: 159.66777872775367, pvalue: asymptotic - 2.1309947530545922e-35,
permutation - 0.0.
Truncation 3: 208.18560203300848, pvalue: asymptotic - 7.18302843947566e-45,
permutation - 0.0.
Truncation 4: 415.74023080466327, pvalue: asymptotic - 1.1041605715456346e-88,
permutation - 0.0.
Truncation 5: 468.52865047574164, pvalue: asymptotic - 4.94307382314933e-99,
permutation - 0.0.


To speed up calculations, a Nystrom approximation can be performed. One may activate the corresponding option when instantiating Ktest, and customize several parameters such as the number of landmarks/anchors or landmark selection method:

In [12]:
n_landmarks = 100
landmark_method = 'kmeans++'
n_anchors = 30
kt_2 = Ktest(data=data, metadata=meta, nystrom=True, n_landmarks=n_landmarks,
             landmark_method=landmark_method, n_anchors=n_anchors)
# kFDA:
kt_2.test(permutation=True, n_permutations=int(1e5))

# MMD:
kt_2.test(stat='mmd', n_permutations=int(1e5))

- Computing kFDA statistic
- Performing permutations to compute p-values:


100%|███████████████████████████████████| 100000/100000 [37:56<00:00, 43.92it/s]


- Computing MMD statistic
- Performing permutations to compute p-values:


100%|███████████████████████████████████| 100000/100000 [31:22<00:00, 53.13it/s]


In [13]:
print(kt_2)

An object of class Ktest.
83 features across 346 observations
Comparison: 0H (173 observations) and 24H (173 observations).
Nystrom approximation with 50 landmarks.
___Multivariate test results___
MMD:
nan, pvalue (permutation test): 0.0.
kFDA:
Truncation 1: 61.228711240255166, pvalue: asymptotic - not computed,
permutation - 0.0.
Truncation 2: 88.26828792827772, pvalue: asymptotic - not computed,
permutation - 0.0.
Truncation 3: 123.17619453259113, pvalue: asymptotic - not computed,
permutation - 0.0.
Truncation 4: 418.5170170943204, pvalue: asymptotic - not computed,
permutation - 0.0.
Truncation 5: 446.46639300102913, pvalue: asymptotic - not computed,
permutation - 0.0.
