The Van der Waerden test is a non-parametric test for testing the assumption that $k$ sample distribution
functions are equal. Van der Waerden's test is similar to the Kruskal-Wallis one-way analysis of variance test in
that it converts the data to ranks and then to standard normal distribution quantiles which are designated as the
'normal scores'.

The benefit of Van der Waerden's test is that it is performant compared to ANOVA (analysis of variance) when the
samples are normally distributed and the Kruskal-Wallis test when the samples are not normally distributed.

The null and alternative hypotheses of the Van der Waerden test can be stated generally as follows:

* $H_0$: All of the $k$ population distribution functions are equal
* $H_A$: At least one of the $k$ population distribution functions are not equal and tend to yield larger   observations to the other distribution functions.

### Test Procedure

Let $n_j$, be the number of samples for each of the $k$ groups where $j$ is the $j$-th group.
$N$ is the number of total samples in all groups, while $X_{ij}$ is the $i$-th value of the $j$-th group.
The normal scores used in the Van der Waerden test are calculated as:

$$ A_{ij} = \phi^{-1} \left( \frac{R \left( X_{ij} \right)}{N + 1} \right) $$

where $R(X_{ij})$ and $phi^{-1}$ are the ranks of the $X_{ij}$ observation and the normal
quantile function (percent point function), respectively. The average normal scores can then be calculated as:

$$ \bar{A}_j = \frac{1}{n_j} \sum^{n_j}_{i=1} A_{ij} \qquad j = 1, 2, \cdots, k $$

The variance $s^2$ of the normal scores is defined as:

$$ s^2 = \frac{1}{N - 1} \sum^k_{i=1} \sum^{n_i}_{j=1} A^2_{ij} $$

The Van der Waerden test statistic, $T_1$ is defined as:

$$ T_1 = \frac{1}{s^2} \sum^k_{i=1} n_i (\bar{A}_i)^2 $$

As the test is approximate to a chi-square distribution, the critical region for a significance level $\alpha$
is:
    
$$ T_1 = \chi^2_{\alpha, k-1} $$

When the null hypothesis is rejected (p-value within the critical region) and at least one of the sample
distribution functions differs, a post-hoc multiple comparions test can be performed to get a better sense of
which populations differ from the others. Two sample populations, $j_1$ and $j_2$, tend to be different
if the following is true:

$$ | \bar{A}_{j_1} - \bar{A}_{j_2} | > s \space t_{1-\frac{\alpha}{2}} \sqrt{\frac{N-1-T_1}{N-k}} \sqrt{\frac{1}{n_{j_1}} + \frac{1}{n_{j_2}}} $$


### Van der Waerden's Test in Python

In [30]:
import numpy as np
import pandas as pd
from scipy.stats import rankdata, norm, chi2, t
import numpy_indexed as npi
from itertools import combinations

In [2]:
plants = pd.read_csv('../../data/PlantGrowth.csv')
plants = plants.to_numpy()
plants[:3]

In [8]:
ranks = rankdata(plants[:, 1], 'average')
ranks = np.column_stack([plants, ranks])

In [34]:
ranks[:10]

array([[1, 4.17, 'ctrl', 3.5],
       [2, 5.58, 'ctrl', 24.0],
       [3, 5.18, 'ctrl', 17.0],
       [4, 6.11, 'ctrl', 28.0],
       [5, 4.5, 'ctrl', 7.0],
       [6, 4.61, 'ctrl', 9.0],
       [7, 5.17, 'ctrl', 16.0],
       [8, 4.53, 'ctrl', 8.0],
       [9, 5.33, 'ctrl', 20.0],
       [10, 5.14, 'ctrl', 15.0]], dtype=object)

In [14]:
n, k = plants.shape

aij = norm.ppf(list(ranks[:, 3] / (n + 1)))
plants_score = np.column_stack([plants, aij])

In [33]:
plants_score[:3]

array([[1, 4.17, 'ctrl', -1.2112321309213452],
       [2, 5.58, 'ctrl', 0.7527287942581697],
       [3, 5.18, 'ctrl', 0.1215873827504829]], dtype=object)

In [18]:
avg_scores = npi.group_by(plants_score[:, 2], plants_score[:, 3], np.mean)

In [19]:
score_variance = np.sum(plants_score[:, 3] ** 2) / (n - 1)

In [24]:
average_scores = np.array([i for _, i in avg_scores])
group_obs = np.array([i[1] for i in npi.group_by(plants[:, 2], plants[:, 2], len)])
t1 = np.sum(group_obs * average_scores ** 2) / score_variance

p_value = chi2.sf(t1, k - 1)

In [25]:
print(t1)
print(p_value)

7.925272519897477
0.019012925151783353


### Post-Hoc Analysis

In [31]:
sample_sizes = 1 / np.array(list(combinations(group_obs, 2)))[:, 0] + \
               1 / np.array(list(combinations(group_obs, 2)))[:, 1]

average_score_differences = np.abs(np.array(list(combinations(average_scores, 2)))[:, 0] - \
                            np.array(list(combinations(average_scores, 2)))[:, 1])

group_names = np.unique(plants[:, 2])

groups = pd.DataFrame(np.array(list(combinations(group_names, 2))))

groups['groups'] = groups[0] + ' - ' + groups[1]
groups['score'] = average_scores

groups['difference'] = average_score_differences > np.sqrt(score_variance) * \
                       t.ppf(1 - 0.05 / 2, n - k) * \
                       np.sqrt((n - 1 - t1) / (n - k)) * np.sqrt(sample_sizes)

del groups[0]
del groups[1]

In [32]:
groups

Unnamed: 0,groups,score,difference
0,ctrl - trt1,-0.061897,False
1,ctrl - trt2,-0.543135,False
2,trt1 - trt2,0.605899,True


### References

Conover, W. J. (1999). Practical Nonparameteric Statistics (Third ed.). Wiley.

Wikipedia contributors. "Van der Waerden test." Wikipedia, The Free Encyclopedia.
    Wikipedia, The Free Encyclopedia, 8 Feb. 2017. Web. 8 Mar. 2020.