<img src="which_test.gif">

<img src="which_test.jpg"/>

# one-way ANOVA with repeated measures (paired)




In [1]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/VPB3xrsFl4o" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>')

# one-way ANOVA (unpaired-equal variance)

How do we find the variance in individual groups and between them... Then we will look at multiple groups.

##### Sample Variance for a single group

# $ s^2 = {\frac{\sum_{i=1}^N (x_i - \overline{x})^2}{N-1} } $


##### F-test for two groups

# $ F = \frac{S_X^2}{S_Y^2} $

##### For multiple groups!

# $ F = \frac{MSS_B}{MSS_W} $

where $ MSS_B = $ mean sum of squares between the groups and $ MSS_W = $ the mean sum of squares within the groups



# $ MSS_W = \frac {\sum_{g \in G} (X - \overline X_g)^2} {n - k} $

Thus, for each value $ g $ in a group, $ \in G $ for all groups, we calculate each value $X$ minus the mean of each group $\overline X_g$ and square. we then divide by the total number of variables $n$ and the total number of groups $k$.



# $ MSS_B = \frac {\sum_{g \in G} n_g(\overline X_g - \overline X_G)^2} {k - 1} $

Thus, for each value $ g $ in a group, we multiply the number in each group$n_g$ by the mean of the group $X_g$ minus the mean of all goups $X_G$ divided by the total number of groups minus 1 $k - 1$

Then we look up in an F table using:

# $ df_W = n - k $

# $ df_B = k - 1 $


So, our hypotheses are...

$ {{H_0}: {\mu_1} = {\mu_2} = {\mu_3}} $

$ {{H_1}: {\mu_1} \neq {\mu_2} \neq {\mu_3}} $

If $ F_{STAT} > F_{CRIT} $ we reject $ H_0 $

In [1]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/WUjsSB7E-ko" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>')

# one-way ANOVA in Scipy

##### scipy.stats.f_oneway(*args)

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

Parameters:	
- sample1, sample2, ... : array_like
    - The sample measurements for each group.

Returns:	
- statistic : float
    - The computed F-value of the test.
- pvalue : float
    - The associated p-value from the F-distribution.

The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.
1. The samples are independent.
2. Each sample is from a normally distributed population.
3. The population standard deviations of the groups are all equal. This property is known as homoscedasticity.

If these assumptions are not true for a given set of data, it may still be possible to use the Kruskal-Wallis H-test (scipy.stats.kruskal) although with some loss of power.

In [4]:
import scipy.stats as stats

tillamook = [0.0571, 0.0813, 0.0831, 0.0976, 0.0817, 0.0859, 0.0735]
newport = [0.0873, 0.0662, 0.0672, 0.0819, 0.0749, 0.0649, 0.0835]
petersburg = [0.0974, 0.1352, 0.0817, 0.1016, 0.0968, 0.1064, 0.105]
magadan = [0.1033, 0.0915, 0.0781, 0.0685, 0.0677, 0.0697, 0.0764, 0.0689]
tvarminne = [0.0703, 0.1026, 0.0956, 0.0973, 0.1039, 0.1045]

stats.f_oneway(tillamook, newport, petersburg, magadan, tvarminne)

F_onewayResult(statistic=6.3888861848013461, pvalue=0.00076477673906445525)

# Kruskal-Wallis H test (para-unpaired-unequal variance)

non-para version of ANOVA, ordinal, ratings etc or not not normally distributed

$ H_0 $: The three probability distributions are the same

$ H_1 $: The three probability distributions are not the same

# $ H = \frac{12}{n(n+1)} \sum \frac{{R_i}^2}{n_i} - 3(n+1) $

$n$ = total number values across all groups

$R_i$ the Ranks for each numbers of groups

$n_i$ the number of values in each group

##### Step 1
# $ H = \frac{12}{n(n+1)}$

##### Step 2
# $ \sum \frac{{R_i}^2}{n_i} $
Pool and rank all data. Sum the ranks in each group, square them, and divide by number of values in that group... add to next groups data.

##### Step 3
$ - 3(n+1) $

##### Step 4
This yields a $ H_{STAT} $.

The $dF = n - 1$.

We then use a $ ChiSquare_{CRIT} $

If $ H_{STAT} > ChiSquare_{CRIT} $ we reject $ H_0 $

In [2]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/q1D4Di1KWLc" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>')

# Kruskal-Wallis H test in Scipy

##### scipy.stats.kruskal(*args, **kwargs)

Compute the Kruskal-Wallis H-test for independent samples

The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. It is a non-parametric version of ANOVA. The test works on 2 or more independent samples, which may have different sizes. Note that rejecting the null hypothesis does not indicate which of the groups differs. Post-hoc comparisons between groups are required to determine which groups are different.

- Parameters:	
    - sample1, sample2, ... : array_like
        - Two or more arrays with the sample measurements can be given as arguments.
    - nan_policy : {‘propagate’, ‘raise’, ‘omit’}, optional
        - Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.

- Returns:	
    - statistic : float
        - The Kruskal-Wallis H statistic, corrected for ties
    - pvalue : float
        - The p-value for the test using the assumption that H has a chi square distribution
        
Due to the assumption that H has a chi square distribution, the number of samples in each group must not be too small. A typical rule is that each sample must have at least 5 measurements.

In [5]:
x = [1, 1, 1]
y = [2, 2, 2]
z = [2, 2]

stats.kruskal(x, y, z)

KruskalResult(statistic=7.0, pvalue=0.030197383422318501)