**NONPARAMETRIC TESTS**

**Parametric vs Nonparametric Tests**
> Nonparametric tests are statistical methods that do not assume a specific distribution for the data. Parametric tests are generally more powerful than nonparametric tests. If possible, always use parametric tests. Nonparametric tests are often based on ranking data rather than using the actual data points.
> 
> *Spearman correlation* is similar to the *Pearson correlation*, but ranks are used instead of raw values. *T-tests for independent samples* assess whether there is a mean difference, whereas the *Mann-Whitney U test* checks for differences in rank sums.
>
> **Rank Sum Calculation :** Sort all the data from smallest to largest. Assign ranks to each value. Then, simply add up the ranks for the first group and the second group.
>
> <img src="images/Parametric vs Nonparametric.png" alt="Parametric vs Nonparametric" width="750" style="display: block; margin: 0 auto;"/>


**Wilcoxon Test**
> Tests whether there is a difference between two dependent samples. Wilcoxon test is the nonparametric counterpart to the t test for dependent samples.
> 
> |  |  |
> |--|--|
> | $H_0$ | → The central tendencies of the two dependent samples are the same. |
> | $H_1$ | → The central tendencies of the two dependent samples are unequal.  |
> 
> **ASSUMPTIONS**
> > → Only two dependent random samples with at least ordinally scaled characteristics need to be available. <br>
> > → The variabe do not have to satisfy a distribution curve. However the distribution shape of the differences of the two dependent samples should be approximately symetric.
>


**Mann-Whitney U Test**
> Tests whether there is a difference between two independent samples. Mann–Whitney U test is the nonparametric counterpart to the t test for independent samples. 
> |  |  |
> |--|--|
> | Two independent samples t-test | → Mean difference |
> | Mann–Whitney U | → Rank sum difference |
>
> |  |  |
> |--|--|
> | $H_0$ | → In two samples, the rank sums do not differ significantly. |
> | $H_1$ | → In two samples, the rank sums do differ significantly.  |


**Kruskal–Wallis Test**
> The Kruskal-Wallis test is used to compare three or more independent groups on a continuous or ordinal outcome. It is the nonparametric counterpart to the one-way ANOVA.
>
> When a difference between groups is detected after the Kruskal-Wallis test, it is necessary to determine where this difference originates from. **Conover's Test** or **Dunn's Test** from **post-hoc tests** are applied.
> 
> |  |  |
> |--|--|
> | $H_0$ | → All groups of independent variable have the same central tendency and therefore come from the same population. <br> → No difference in the rank sum. <br> → The medians of all groups are equal. |
> | $H_1$ | → At least one group of the independent variable does not have the same tendency as the other groups and therefore come from a different population. <br> → At least one group differs in the rank sum. <br> → At least one group's median is different from the others.
 |
>
> **ASSUMPTIONS**
> > → A nominal or ordinal variable with more than two expressions. <br>
> > → Dependent variable must have a metrtic or ordinal scale.
>

**Friedman Test**
> The Friedman test is used to detect differences across three or more related groups (conditions or treatments), where the same subjects are measured under each condition (within-subjects design). It is the non-parametric counterpart to the repeated measures ANOVA.
> > **Independent variable:** refers to the conditions or treatments. <br>
> > **Dependent variable:** refers to the outcome being measured under each condition. The same dependent variable is measured repeatedly across different conditions.
>
> |  |  |
> |--|--|
> | $H_0$ | → There are no significant differences between the dependent groups. <br> → The distributions of the ranks of the dependent variable are the same across the conditions or time points. |
> | $H_1$ | → At least one dependent group differs. <br> → At least one condition differs from the others in terms of the distribution of ranks. |
>

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

**Likert Scale**
> A Likert scale is a psychometric scale commonly used in surveys to measure attitudes, opinions, or perceptions. It typically consists of a series of statements where respondents indicate their level of agreement or disagreement on a symmetrical scale, often ranging from strongly agree to strongly disagree.
>
> The Likert scale helps quantify subjective data, making it easier to analyze statistically. When analyzing Likert data, it's important to consider whether we treat the data as ordinal (ranking without assuming equal intervals between responses) or interval (assuming equal intervals between responses), as this affects the choice of statistical tests.

**Cronbach's Alpha**
> Cronbach's alpha is a measure of internal consistency or reliability, commonly used to assess the reliability of a set of scale or test items (e.g., in surveys or questionnaires). It evaluates how well the individual items in a scale correlate with one another, essentially checking if they measure the same underlying construct.
> | Interpretation of Cronbach's Alpha | |
> |-|-|
> | α ≥ 0.9 | Excellent (very high reliability)
> | 0.8 ≤ α < 0.9 | Good
> | 0.7 ≤ α < 0.8 | Acceptable
> | 0.6 ≤ α < 0.7 | Questionable
> | 0.5 ≤ α < 0.6 | Poor
> | α < 0.5 | Unacceptable
> 

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

In [1]:
import pandas as pd
import pingouin as pg
from scipy import stats
import scikit_posthocs as sp

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
    &nbsp; Functions to Use </p>

In [2]:
α = alpha = 0.05

def decision(p, alpha=0.05):
    'acceptance or rejection of the null hypothesis'
    if p < alpha: return 'H0 rejected.'
    else: return 'H0 cannot be rejected.'

def _decision(p, alpha=0.05):
    if p<alpha: return False
    return True

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
 &nbsp; SIGN TEST </p>
    
|  |  |
|--|--|
| $H_0$ | $μ = X$ |
| $H_1$ | $μ ≠ X$ |

In [3]:
X = 30
data = {
    'Participant': range(1,26),
    'Score': [1,1,2,2,3,3,4,5,5,6,7,7,8,10,20,22,25,27,33,40,42,50,55,75,80]
}

data = pd.DataFrame(data)

In [4]:
pg.normality(data['Score'])

Unnamed: 0,W,pval,normal
Score,0.811745,0.000359,False


Since the **normality assumption is not met** and there is **not enough data to invoke the central limit theorem**, the sign test is used instead of a simple t-test.

In [5]:
# t, p = stats.wilcoxon(data['Score']-X)
# print(f'p: {p:.4f} \t Decision: {decision(p)} ')

test = pg.wilcoxon(data['Score']-X)
test['Null Hypothesis'] = test['p-val'].map(_decision)
test

Unnamed: 0,W-val,alternative,p-val,RBC,CLES,Null Hypothesis
Wilcoxon,86.5,two-sided,0.039339,-0.467692,,False


In [6]:
print('Median:',data['Score'].median())

Median: 8.0


<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
 &nbsp; MANN–WHITNEY U TEST </p>
    
|  |  |
|--|--|
| $H_0$ | → There is no difference between the duration of practicing sports according to gender. |
| $H_1$ | → There is a difference between the duration of practicing sports according to gender. |

In [7]:
data = {
    'Gender': ['F','F','F','F','F','F', 'M','M','M','M','M','M'],
    'Duration': [55,25,66,22,35,55,33,75,29,63,27,29]
}

data = pd.DataFrame(data)

male = data[data['Gender']=='M']['Duration']
female = data[data['Gender']=='F']['Duration']

In [8]:
pg.normality(data, dv='Duration', group='Gender')

Unnamed: 0_level_0,W,pval,normal
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
F,0.899448,0.370678,True
M,0.765136,0.027905,False


In [9]:
test = pg.mwu(female, male, alternative='two-sided')
test['Null Hypothesis'] = test['p-val'].map(_decision)
test

Unnamed: 0,U-val,alternative,p-val,RBC,CLES,Null Hypothesis
MWU,17.0,two-sided,0.935962,-0.055556,0.472222,True


<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
 &nbsp; WILCOXON T TEST </p>
    
|  |  |
|--|--|
| $H_0$ | → There is no difference in the patient's values before and after treatment. |
| $H_1$ | → There is a difference in the patient's values before and after treatment. |

In [10]:
data = {
    'Patient':[1,2,3,4,5,6,7],
    'Before':[54,53,61,51,48,60,58],
    'After': [60,40,66,60,55,62,60]
}

data = pd.DataFrame(data)

data['Difference'] = data['Before'] - data['After']

In [11]:
pg.normality(data['Difference'])

Unnamed: 0,W,pval,normal
Difference,0.774784,0.022864,False


In [12]:
test = pg.wilcoxon(data['Before'], data['After'])
# test = pg.wilcoxon(data['Difference'])
test['Null Hypothesis'] = test['p-val'].map(_decision)
test

Unnamed: 0,W-val,alternative,p-val,RBC,CLES,Null Hypothesis
Wilcoxon,7.0,two-sided,0.296875,-0.5,0.295918,True


<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
 &nbsp; KRUSKAL-WALLIS H TEST </p>
    
|  |  |
|--|--|
| $H_0$ | → There is no difference between the results according to the methods applied. |
| $H_1$ | → At least one of the applied methods has different results. |

In [13]:
data = {
    'Method_1': [81,32,42,62,37,44,38,47,49,41],
    'Method_2': [48,31,25,22,30,30,32,15,40,77],
    'Method_3': [18,49,33,19,24,17,48,22,31,17]
}

data = pd.DataFrame(data)

data = data.melt(value_vars=['Method_1', 'Method_2', 'Method_3'], var_name='Method', value_name='Result')
display(data.sample(3))

Unnamed: 0,Method,Result
19,Method_2,77
1,Method_1,32
5,Method_1,44


In [14]:
pg.normality(data['Result'])

Unnamed: 0,W,pval,normal
Result,0.912326,0.01704,False


In [15]:
test = pg.kruskal(data, dv='Result', between='Method')
test['Null Hypothesis'] = test['p-unc'].map(_decision)
test

Unnamed: 0,Source,ddof1,H,p-unc,Null Hypothesis
Kruskal,Method,2,8.733601,0.012692,False


In [16]:
test = sp.posthoc_conover(data, val_col='Result', group_col='Method', p_adjust='bonf')
test

Unnamed: 0,Method_1,Method_2,Method_3
Method_1,1.0,0.065296,0.008453
Method_2,0.065296,1.0,1.0
Method_3,0.008453,1.0,1.0


<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
 &nbsp; FRIEDMAN TEST </p>
    
|  |  |
|--|--|
| $H_0$ | → There is no difference between the averages of the applied test results. |
| $H_1$ | → At least one of the means of the applied test results is different. |

In [17]:
data = {
    'Patient': ['Jehan', 'Georgina', 'Emel', 'Parveen'],
    'Test_0': [14,7,13,3],
    'Test_1': [78,87,24,17],
    'Test_2': [99,11,4,10],
    'Test_3': [55,17,14,20]
}

data = pd.DataFrame(data)
data_melt = data.melt(id_vars='Patient', value_vars=data.columns[1:],
                      var_name='Test', value_name='Result')
display(data)

Unnamed: 0,Patient,Test_0,Test_1,Test_2,Test_3
0,Jehan,14,78,99,55
1,Georgina,7,87,11,17
2,Emel,13,24,4,14
3,Parveen,3,17,10,20


In [18]:
pg.normality(data_melt, dv='Result', group='Test')

Unnamed: 0_level_0,W,pval,normal
Test,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Test_0,0.906973,0.466514,True
Test_1,0.833805,0.177942,True
Test_2,0.691725,0.009263,False
Test_3,0.746922,0.036132,False


In [19]:
test = pg.friedman(data_melt, dv='Result', within='Test', subject='Patient')
test['Null Hypothesis'] = test['p-unc'].map(_decision)
test

Unnamed: 0,Source,W,ddof1,Q,p-unc,Null Hypothesis
Friedman,Test,0.575,3,6.9,0.075154,True


<p style="background-image: linear-gradient(#f87674, #ffffff 10%);"> 
 &nbsp; Post-hoc / Conover - <b>UNNECESSARY IN THIS CASE</b> </p>

In [20]:
test = sp.posthoc_conover_friedman(data.iloc[:,1:],  p_adjust='bonf')
# test = sp.posthoc_conover_friedman(data_melt, group_col='Test',  
#                             y_col='Result', block_col='Patient', 
#                             melted=True, p_adjust='bonf')
test

Unnamed: 0,Test_0,Test_1,Test_2,Test_3
Test_0,1.0,0.057705,1.0,0.188208
Test_1,0.057705,1.0,0.613575,1.0
Test_2,1.0,0.613575,1.0,1.0
Test_3,0.188208,1.0,1.0,1.0


<p style="background-image: linear-gradient(to right, #ee2965, #e31837)"> &nbsp; </p>

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>