Jonckheere-Terpestra test -- helps determine whether there is significant ordering of the medians of several groups

- Want to know if synergy score increases from minor < moderate < major (this would be bad, synergy scoring would then be significantly increasing with higher toxicity so optimizing on synergy would make prioritized candidates more toxic)
- Want to know if synergy score increases from major < moderate < minor (this would be good that synergy scores actually reduce toxicity)
- If there is no significant ordering, this is okay for the field of synergy prediction -- at least you're not optimizing for high toxicity when you're prioritizing synergy scores

In [2]:
# import everything you need here
import math
from scipy.stats import norm

There are 5 assumptions to use the Jonckheere-Terpestra test: 
1. Dependent variable is ordinal/continuous (synergy score is continuous)
2. Independent variable should consist of two or more ordinal, independent groups (toxicity is categorized into minor, moderate, and major)
3. Should have independence of observations - no relationship between observation each group or between groups (toxicity categories are not dependent on each other)
4. Distributions in each group have the same shape and same variability (confirmed roughly the same shape, variance can be pretty different)
5. Decide a priori the order of groups of the independent variable and alternative hypothesis (minor < moderate < major toxicity; also major < moderate < minor)

In [40]:
def jonckheere_terpestra_test(samples):
    """
    Perform the Jonckheere-Terpstra test on the given samples.

    Parameters:
        samples: An array of arrays, where each inner array is a group containing the samples.

    Returns:
        A tuple containing the test statistic and the p-value.
    """
    
    if not samples or len(samples) < 2:
        raise ValueError("At least two groups are required")
    
    jt_stat = 0 # initialize the test statistic
    n_i = [len(sample) for sample in samples] # get group sizes
    N = sum(n_i) # total number of samples

    for i in range(len(samples) - 1): # for each group
        for j in range(i + 1, len(samples)): # compare with all other groups
            for first_sample in samples[i]: # for each sample in the first group
                for second_sample in samples[j]: # for each sample in the second group
                    if first_sample < second_sample:
                        jt_stat += 1
                    elif first_sample == second_sample:
                        jt_stat += 0.5

    # Calculate mean under null hypothesis
    mean = ((N**2) - sum(size**2 for size in n_i)) / 4
    
    # Calculate variance under null hypothesis
    term1 = N**2 * (2*N + 3)
    term2 = sum(size**2 * (2*size + 3) for size in n_i)
    variance = (term1 - term2) / 72

    # Calculate standardized statistic
    z_stat = (jt_stat - mean) / math.sqrt(variance)

    # Calculate one-tail p-value
    p_value = 1 - norm.cdf(z_stat)
    
    return z_stat, p_value

In [None]:
# Test cases

# https://www.usu.edu/math/jrstevens/biostat/projects2013/rep_ordnonpar.pdf
usu_edu_samples = [
    [48],
    [33, 59, 48, 56],
    [60, 101, 67],
    [85, 107],
]

print(jonckheere_terpestra_test(usu_edu_samples)) # Expected: 2.8489, p value 0.002263 - matches p value

# https://www.statext.com/practice/JonckheereTest01.php
statext_1_samples = [
    [7, 1, 2, 6, 11, 8],
    [4, 7, 16, 11, 21],
    [20, 25, 13, 9, 14, 11],
]

print(jonckheere_terpestra_test(statext_1_samples)) # Expected: 2.553308, 0.005206 - matches z stat

# https://www.statext.com/practice/JonckheereTest02.php
statext_2_samples = [
    [16, 8, 6],
    [27, 16, 15],
    [31, 29, 18, 42],
]

print(jonckheere_terpestra_test(statext_2_samples)) # Expected: 2.690981, 0.003562 - matches neither

# https://www.statext.com/practice/JonckheereTest03.php
statext_3_samples = [
    [40, 35, 38, 43, 44, 41],
    [38, 40, 47, 44, 40, 42],
    [48, 40, 45, 43, 46, 44],
]

print(jonckheere_terpestra_test(statext_3_samples)) # Expected: 2.02113, 0.021633 - matches both

(2.838961340444303, 0.002263032003871568)
(2.5533076283443092, 0.005335260246766338)
(2.6819135986192237, 0.0036601181678723727)
(2.0211302086361083, 0.021633143978495584)


Test on each synergy score and toxicity category

In [43]:
from toxicity_ranking import *

ddinter_df = get_ddinter_data()
drugcomb_df = get_drug_comb_data(bliss=True, loewe=True, hsa=True, zip=True)
drug_syntox_df, major_pairs, moderate_pairs, minor_pairs, unknown_pairs = find_drugcomb_ddinter_intersect(drugcomb_df, ddinter_df)

# remove pairs that have unknown toxicity
drug_syntox_df = drug_syntox_df[~drug_syntox_df['toxicity_category'].str.contains('Unknown')]

  drugcomb_df = pd.read_csv('data/DrugComb/drugcomb_summary_v_1_5.csv', sep=',', index_col=False)


Original shape of drugcomb data:  (1432351, 26)
Final shape of filtered drugcomb data:  (123882, 26)
Number of drugs in common between drugcomb and ddinter [lowercase enforced]:  486
Major pairs in both DrugComb and in DDInter:  335
Moderate pairs in both DrugComb and in DDInter:  1027
Minor pairs in both DrugComb and in DDInter:  59
Unknown toxicity pairs in both DrugComb and in DDInter:  573
Total common pairs:  1994
Total known pairs:  1421


In [49]:
drug_syntox_major = drug_syntox_df[drug_syntox_df['toxicity_category'] == 'Major']
drug_syntox_moderate = drug_syntox_df[drug_syntox_df['toxicity_category'] == 'Moderate']
drug_syntox_minor = drug_syntox_df[drug_syntox_df['toxicity_category'] == 'Minor']

# Bliss JT Test
bliss_major_samples = drug_syntox_major['synergy_bliss'].values.tolist()
bliss_moderate_samples = drug_syntox_moderate['synergy_bliss'].values.tolist()
bliss_minor_samples = drug_syntox_minor['synergy_bliss'].values.tolist()

bliss_increasing_tox_samples = [
    bliss_minor_samples,
    bliss_moderate_samples,
    bliss_major_samples,
]

print("Increasing toxicity - Bliss", jonckheere_terpestra_test(bliss_increasing_tox_samples))

bliss_decreasing_tox_samples = [
    bliss_major_samples,
    bliss_moderate_samples,
    bliss_minor_samples,
]

print("Decreasing toxicity - Bliss", jonckheere_terpestra_test(bliss_decreasing_tox_samples))

# Loewe JT Test
loewe_major_samples = drug_syntox_major['synergy_loewe'].values.tolist()
loewe_moderate_samples = drug_syntox_moderate['synergy_loewe'].values.tolist()
loewe_minor_samples = drug_syntox_minor['synergy_loewe'].values.tolist()

loewe_increasing_tox_samples = [
    loewe_minor_samples,
    loewe_moderate_samples,
    loewe_major_samples,
]

print("Increasing toxicity - Loewe", jonckheere_terpestra_test(loewe_increasing_tox_samples))

loewe_decreasing_tox_samples = [
    loewe_major_samples,
    loewe_moderate_samples,
    loewe_minor_samples,
]

print("Decreasing toxicity - Loewe", jonckheere_terpestra_test(loewe_decreasing_tox_samples))

# HSA JT Test
hsa_major_samples = drug_syntox_major['synergy_hsa'].values.tolist()
hsa_moderate_samples = drug_syntox_moderate['synergy_hsa'].values.tolist()
hsa_minor_samples = drug_syntox_minor['synergy_hsa'].values.tolist()

hsa_increasing_tox_samples = [
    hsa_minor_samples,
    hsa_moderate_samples,
    hsa_major_samples,
]

print("Increasing toxicity - HSA", jonckheere_terpestra_test(hsa_increasing_tox_samples))

hsa_decreasing_tox_samples = [
    hsa_major_samples,
    hsa_moderate_samples,
    hsa_minor_samples,
]

print("Decreasing toxicity - HSA", jonckheere_terpestra_test(hsa_decreasing_tox_samples))

# ZIP JT Test
zip_major_samples = drug_syntox_major['synergy_zip'].values.tolist()
zip_moderate_samples = drug_syntox_moderate['synergy_zip'].values.tolist()
zip_minor_samples = drug_syntox_minor['synergy_zip'].values.tolist()

zip_increasing_tox_samples = [
    zip_minor_samples,
    zip_moderate_samples,
    zip_major_samples,
]

print("Increasing toxicity - ZIP", jonckheere_terpestra_test(zip_increasing_tox_samples))

zip_decreasing_tox_samples = [
    zip_major_samples,
    zip_moderate_samples,
    zip_minor_samples,
]

print("Decreasing toxicity - ZIP", jonckheere_terpestra_test(zip_decreasing_tox_samples))

Increasing toxicity - Bliss (5.81130571557779, 3.0993722033301196e-09)
Decreasing toxicity - Bliss (-5.81130571557779, 0.9999999969006278)
Increasing toxicity - Loewe (0.9590017542667599, 0.16877893071920358)
Decreasing toxicity - Loewe (-0.9590017542667599, 0.8312210692807964)
Increasing toxicity - HSA (-5.928971194953452, 0.9999999984758072)
Decreasing toxicity - HSA (5.928971194953452, 1.5241927719955584e-09)
Increasing toxicity - ZIP (-0.8434123572414516, 0.8005010683918299)
Decreasing toxicity - ZIP (0.8434123572414516, 0.19949893160817012)
