<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Resampling-Methods" data-toc-modified-id="Resampling-Methods-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Resampling Methods</a></span><ul class="toc-item"><li><span><a href="#Bootstrapping" data-toc-modified-id="Bootstrapping-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Bootstrapping</a></span><ul class="toc-item"><li><span><a href="#Example" data-toc-modified-id="Example-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Example</a></span></li></ul></li><li><span><a href="#Jackknife" data-toc-modified-id="Jackknife-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Jackknife</a></span><ul class="toc-item"><li><span><a href="#Example" data-toc-modified-id="Example-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Example</a></span></li></ul></li><li><span><a href="#Permutation-Tests" data-toc-modified-id="Permutation-Tests-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Permutation Tests</a></span></li></ul></li></ul></div>

# Resampling Methods

Repeated subsamples from a sample

Goal: gauge the confidence interval and variability of the point estimate for the population


> https://young.physics.ucsc.edu/jackboot.pdf

In [None]:
import numpy as np
random_sample = np.random.normal(size=1000) + np.pi/3

## Bootstrapping

Random samples with replacement

Sample from our original sample (m e t a)

### Example

In [None]:
np.random.choice(random_sample, size=len(random_sample), replace=True)

In [None]:
# Alternative

# https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html
from sklearn.utils import resample

resample(random_sample)

## Jackknife

Samples by removing one or more, observations at a time

Gives the same every time

### Example

In [None]:
# Note this has not been optimized
jackknife_resample = np.empty((0,random_sample.size-1), 'float64')

for i in range(random_sample.size):
    first_part = random_sample[0:i]
    second_part = random_sample[i+1:]
    # Make as one array
    removed = np.concatenate((first_part,second_part))
    jackknife_resample = np.append(jackknife_resample, [removed], axis=0)

In [None]:
jackknife_resample.shape

In [None]:
total = np.sum(sample.sum()/(random_sample.size - 1) for sample in jackknife_resample)
total

In [None]:
avg_of_avgs = total / jackknife_resample.shape[0]
avg_of_avgs

In [None]:
diff = np.pi/3 - avg_of_avgs
diff

## Permutation Tests

Instead of assumed parameter distributions

Essentially calculate the p-value from all the variations

<!---We calculate the means (or other statistic) from a random sample -->

1. Find observed difference of means (between A & B)
2. Find the difference of means between all possible ways of splitting the samples
    - pool the two A & B
    - draw out samples of sizes n_A & n_B
3. The probability that difference of means observed is different from the differences of the sample means
    - depends on either 2 tail or 1 tail 