# Why Non-Parametric Tests?
- Some experiments yield response measurements that defy exact quantification.
    - e.g. it is impossible to make statements such as “teacher A is twice as good as teacher B.”
- Insufficient knowledge to estimate if the population(s) fulfil the assumptions that specified by parametric tests.

In [3]:
import pandas as pd
import numpy as np
import pingouin as pg
from scipy import stats
from statsmodels.stats.descriptivestats import sign_test
from pydataset import data

In [2]:
df = data('iris')

# Paired Sign Test 
- Number of positive differences, M where $D_i = X_i − Y_i$ follows a binomial distribution.
- Test statistic: M = number of positive differences where $D_i = X_i − Y_i$
- $H_0$: $p=0.5$
- $H_a$: $p\neq0.5$

In [30]:
def sign_test(x, y, tail='two-sided', p0=0.5):
    M = (x>y).sum()
    left_p = stats.binom.cdf(M, len(x), p0) # P(X<=x)
    if tail=='greater':
        return M, 1-left_p
    if tail=='less':
        return M, left_p
    else:
        return M, (left_p if left_p<0.5 else 1-left_p) * 2

In [32]:
M, pval = sign_test(
    x = df.loc[df['Species']=='setosa','Sepal.Width'].values, 
    y = df.loc[df['Species']=='versicolor','Sepal.Width'].values,
    tail='two-sided', 
    p0=0.5
)
print("M:", M, "p-value:", pval)

M: 43 p-value: 3.243740565039843e-08


# Wilcoxon Signed-Rank Test

