# Question 2: Standard Error of the Mean

In this question, we will use resampling to examine how broadly the *means* of samples from the same population are distributed.

First, load up the blood pressure data as before, but this time use the `bloodpressure2.txt` file.

In [None]:
import numpy

# YOUR ANSWER HERE

print(len(aa_bp), len(gg_bp))
print(numpy.mean(aa_bp), numpy.mean(gg_bp))

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = [12, 4]

gg_mean = numpy.mean(gg_bp)
aa_mean = numpy.mean(aa_bp)
hist_data = plt.hist(gg_bp, bins='auto', density=True, label='GG')
plt.axvline(gg_mean, color='orange', label='GG mean')
bin_edges = hist_data[1]
hist_data = plt.hist(aa_bp, bins=bin_edges, density=True, alpha=0.5, label='AA')
plt.axvline(aa_mean, color='blue', label='AA mean')
legend = plt.legend()

## Question 2.1
Write a function called `resample_means` that will repeatedly resample a given dataset and return a list of the means of each of those resamples.

Then generate means from 10000 resamples of the GG dataset and plot a histogram of those means, overlaid transparently atop of the original GG dataset (use `bins='auto'` and `density=True` for both histograms.)

In [None]:
def resample_means(data, n_resamples=10000):
    # YOUR ANSWER HERE

gg_means = resample_means(gg_bp)
# YOUR ANSWER HERE
print(numpy.mean(gg_bp), numpy.mean(gg_means))
print(numpy.std(gg_bp), numpy.std(gg_means))

In [None]:
assert len(gg_means) == 10000
assert 1.20 < numpy.std(gg_means) < 1.26

While the GG blood pressure data has a pretty wide spread (standard deviation of 17.7), the means of repeated resamples are very narrowly clustered around the mean of the GG data. 

As we discussed in class, the *standard deviation* of a statistic (such as the mean) upon repeated sampling is called the "standard error" of that statistic. So here, the standard error of the mean is ±1.23 or so.

## Question 2.2
Repeat the above for the AA dataset:

In [None]:
# YOUR ANSWER HERE
print(numpy.mean(aa_bp), numpy.mean(aa_means))
print(numpy.std(aa_bp), numpy.std(aa_means))

In [None]:
assert len(aa_means) == 10000
assert 2.10 < numpy.std(aa_means) < 2.21

While the standard deviation of the AA dataset is slightly smaller than that of the GG dataset, the standard error is almost double in size! What's going on?

One hint is that the AA dataset is much smaller (56 measurements, versus 205 for the GG data). Maybe there's a relationship between sample size and how accurately the mean can be measured...

## Question 2.3
To examine this relationship, we will need to pretend that we have a dataset of different sizes. So we are going to depart briefly from the usual resampling strategy of drawing exactly the same number of points as our original dataset. Modify `resample_means` to take a `sample_size` parameter. Then, calculate the standard error of the mean of the GG data for each of the specified sample sizes. Use the default 10000 resamples. The code below will plot this relationship.

In [None]:
def resample_means_of_size(data, sample_size, n_resamples=10000):
    # YOUR ANSWER HERE

sample_sizes = [10, 20, 40, 80, 160, 320]
standard_errors = []
# YOUR ANSWER HERE
plt.plot(sample_sizes, standard_errors)
plt.xlabel('sample size')
plt.ylabel('standard error')
print(standard_errors)

## Question 2.4
So, what is the relationship between the standard error and the sample size? Specifically, to decrease the error by half (from 4 to 2, say, or from 2 to 1), what factor do you have to increase the sample size by? What about to decrease error by 4-fold (from 4 to 1)?

YOUR ANSWER HERE

A little thinking about the above answer will tell you that this means that the sample size is inversely proportional to the square of the standard error of the mean. For compactness, we'll call the sample size 'n' and the standard error of the mean 'SEM'. Thus:

$\textit{n} \propto 1/\textit{SEM}^2$ (where $\propto$ means "is proportional to").

Rearranging, we have $\textit{SEM} \propto \frac{1}{\sqrt{\textit{n}}}$. Or equivalently, 
$\textit{SEM} = \frac{p}{\sqrt{\textit{n}}}$
for some constant of proportionality $p$.

In other words, $p = \textit{SEM} \cdot \sqrt{\textit{n}}$

Let's calculate this constant, for each of the SEM / sample-size pairs we calculated above. That is, make a list `ps` containing the value of $\textit{SEM} \cdot \sqrt{\textit{n}}$ for each sample size and standard error that you calculated above. (Use a for loop...) And recall that `x**2` squares something in Python, while `x**0.5` takes the square root...

In [None]:
ps = []
# YOUR ANSWER HERE
print(ps)

In [None]:
assert len(ps) == len(sample_sizes)
for p in ps:
    assert 16.5 < p < 18.5

If everything went right, you should have gotten basically the same constant for each pair. What does the constant of proportionality we got mean?

Specifically, the numbers should be right around 17.7... what property of the GG data is right around that number?

YOUR ANSWER HERE

So, for the mean, we get $\textit{standard error} = \frac{\textit{standard deviation}}{\sqrt{\textit{n}}}$... probably some of you remember this from statistics class.