## Permutation

The permutation test is a type of statistical significance test. It is a resampling technique, just like the [bootstrap](https://github.com/data-sandbox/stats-sandbox/blob/main/bootstrap.ipynb).

References:
- [ThinkStats2 by AllenDowney](https://github.com/AllenDowney/ThinkStats2)

In [1]:
from pathlib import Path
import numpy as np


From the [2002 NSFG Cycle 6](https://www.cdc.gov/nchs/nsfg/nsfg_cycle6.htm) data, the mean pregancy length for first babies is slightly longer than other babies. Let's try a permutation test to see if the difference in length is statistically significant. To do so, we'll combine both groups into one large group, shuffle the data, then randomly divide the data into two groups with sizes that match the original groups.

**Null hypothesis:** There is no difference in pregnancy length for first babies versus other babies.

In [2]:
firstpreg = np.load(Path('firstpreg.npy'))
otherpreg = np.load(Path('otherpreg.npy'))

print(f'Array lengths: {len(firstpreg)} and {len(otherpreg)}')
print(f'First pregnancy mean length (weeks): {firstpreg.mean():.3f}')
print(f'Other pregnancy mean length (weeks): {otherpreg.mean():.3f}')
print(f'Difference in length: {abs(firstpreg.mean() - otherpreg.mean()):.3f}')


Array lengths: 4413 and 4735
First pregnancy mean length (weeks): 38.601
Other pregnancy mean length (weeks): 38.523
Difference in length: 0.078


In [3]:
def shuffle_groups(group1, group2, seed=False):
    """Combine groups, shuffle data, then return two new groups
    of the original sizes.

    Note np.random.shuffle() performs shuffle in-place
    and therefore is not used here for clarify.
    """
    if seed:
        np.random.seed(42)
    combined = np.hstack((group1, group2))
    shuffled = np.random.permutation(combined)
    new1 = shuffled[:len(group1)]
    new2 = shuffled[len(group1):]
    return new1, new2


In [4]:
group1, group2 = shuffle_groups(firstpreg, otherpreg)
test_stat = abs(group1.mean() - group2.mean())

print(f'Difference in means: {test_stat:.3f}')


Difference in means: 0.062


Next let's wrap everything into a function to compute the p-value from a distribution.

In [5]:
def get_pvalue(group1, group2, actual_stat, iters=1000):
    """Compute p_value from resampled distribution"""
    test_stat = []
    for _ in range(iters):
        new1, new2 = shuffle_groups(group1, group2)
        test_stat.append(abs(new1.mean() - new2.mean()))

    count = sum(1 for x in test_stat if x >= actual_stat)

    return count / iters


In [6]:
pvalue = get_pvalue(firstpreg, otherpreg, abs(
    firstpreg.mean() - otherpreg.mean()))
print(f'p-value: {pvalue:.3f}')


p-value: 0.184


The p-value is much greater than 0.05, thus we cannot reject the null hypothesis that there is no difference in pregnancy lengths. We can conclude that the observed difference is likely due to random sampling.

Interestingly, the permutation test here results in a much higher p-value than the ~0.06 obtained through a [bootstrap test](https://github.com/data-sandbox/stats-sandbox/blob/main/bootstrap.ipynb).