Jonckheere-Terpestra test -- helps determine whether there is significant ordering of the medians of several groups

- Want to know if synergy score increases from minor < moderate < major (this would be bad, synergy scoring would then be significantly increasing with higher toxicity so optimizing on synergy would make prioritized candidates more toxic)
- Want to know if synergy score increases from major < moderate < minor (this would be good that synergy scores actually reduce toxicity)
- If there is no significant ordering, this is okay for the field of synergy prediction -- at least you're not optimizing for high toxicity when you're prioritizing synergy scores

In [None]:
# import everything you need here

from preprocessing_functions import jonckheere_terpestra_test


There are 5 assumptions to use the Jonckheere-Terpestra test: 
1. Dependent variable is ordinal/continuous (synergy score is continuous)
2. Independent variable should consist of two or more ordinal, independent groups (toxicity is categorized into minor, moderate, and major)
3. Should have independence of observations - no relationship between observation each group or between groups (toxicity categories are not dependent on each other)
4. Distributions in each group have the same shape and same variability (confirmed roughly the same shape, variance can be pretty different)
5. Decide a priori the order of groups of the independent variable and alternative hypothesis (minor < moderate < major toxicity; also major < moderate < minor)

In [2]:
# Test cases

# https://www.usu.edu/math/jrstevens/biostat/projects2013/rep_ordnonpar.pdf
usu_edu_samples = [
    [48],
    [33, 59, 48, 56],
    [60, 101, 67],
    [85, 107],
]

print(jonckheere_terpestra_test(usu_edu_samples)) # Expected: 2.8489, p value 0.002263 - matches p value

# https://www.statext.com/practice/JonckheereTest01.php
statext_1_samples = [
    [7, 1, 2, 6, 11, 8],
    [4, 7, 16, 11, 21],
    [20, 25, 13, 9, 14, 11],
]

print(jonckheere_terpestra_test(statext_1_samples)) # Expected: 2.553308, 0.005206 - matches z stat

# https://www.statext.com/practice/JonckheereTest02.php
statext_2_samples = [
    [16, 8, 6],
    [27, 16, 15],
    [31, 29, 18, 42],
]

print(jonckheere_terpestra_test(statext_2_samples)) # Expected: 2.690981, 0.003562 - matches neither

# https://www.statext.com/practice/JonckheereTest03.php
statext_3_samples = [
    [40, 35, 38, 43, 44, 41],
    [38, 40, 47, 44, 40, 42],
    [48, 40, 45, 43, 46, 44],
]

print(jonckheere_terpestra_test(statext_3_samples)) # Expected: 2.02113, 0.021633 - matches both

(2.838961340444303, 0.002263032003871568)
(2.5533076283443092, 0.005335260246766338)
(2.6819135986192237, 0.0036601181678723727)
(2.0211302086361083, 0.021633143978495584)
