#### Problem
This question is to help you understand the idea of a **sampling distribution**.
Let $X_1,\,\dots,\,X_n$ be IID with mean $\mu$ and variance $\sigma^2$.
Let $\overline{X}_n = n^{-1} \sum_{i=1}^n X_i$.
Then $\overline{X}_n$ is a **statistic**, that is, a function of the data.
Since $\overline{X}_n$ is a random variable, it has a distribution.
This distribution is called the *sampling distribution of the statistic*.
Recall that $\mathbb{E} \big( \overline{X}_n \big) = \mu$
and $\mathbb{V} \big( \overline{X}_n \big) = \frac{\sigma^2}{n}$.
Don't confuse the distribution of the data $f_X$ and the distribution of the statistic $f_{\overline{X}_n}$.

To make this clear, let $X_1,\,\dots,\,X_n \sim \text{Uniform}(0,1)$,
Plot $f_X$.
Now let $\overline{X}_n = n^{-1} \sum_{i=1}^n X_i$.
Find $\mathbb{E} \big( \overline{X}_n \big)$ and $\mathbb{V} \big( \overline{X}_n \big)$.
Plot them as a function of $n$.
Interpret.
Now simulate the distribution of $\overline{X}_n$ for $n=1,\,5,\,25,\,100$.
Check that the simulated values of $\mathbb{E} \big( \overline{X}_n \big)$
and $\mathbb{V} \big( \overline{X}_n \big)$ agree with your theoretical calculations.
What do you notice about the sampling distribution of $\overline{X}_n$ as $n$ increases?

#### Solution
We note that $\mathbb{E} \big( \overline{X}_n \big) = \mu = \frac{1}{2}$
and that $\mathbb{V} \big( \overline{X}_n \big) = \frac{\sigma^2}{n} = \frac{1}{12n}$.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

from scipy.stats import uniform

In [2]:
def sample_mean(n):
    """Produce a SINGLE realisation of the sample mean."""
    X = uniform.rvs(size=n)
    return np.mean(X)

def sample_mean_trials(n, trials):
    """Produce MANY realisations of the sample mean."""
    return [sample_mean(n) for _ in range(trials)]

def result_string(n, trials, Xn):
    mean = np.mean(Xn)
    variance = np.var(Xn)
    return (
        f"We performed {trials} trials with n = {n} for each trial:\n"
        f"The numerical mean is       {mean:.5f}\n"
        f"The theoretical mean is     {0.5:.5f}\n"
        f"The numerical variance is   {variance:.5f}\n"
        f"The theoretical variance is {1/(12*n):.5f}"
    )

In [3]:
n = 5
trials = int(1e4)
Xn = sample_mean_trials(n, trials)
print(result_string(n, trials, Xn))

We performed 10000 trials with n = 5 for each trial:
The numerical mean is       0.49863
The theoretical mean is     0.50000
The numerical variance is   0.01682
The theoretical variance is 0.01667


In [4]:
n = 25
trials = int(1e4)
Xn = sample_mean_trials(n, trials)
print(result_string(n, trials, Xn))

We performed 10000 trials with n = 25 for each trial:
The numerical mean is       0.50157
The theoretical mean is     0.50000
The numerical variance is   0.00333
The theoretical variance is 0.00333


In [5]:
n = 100
trials = int(1e4)
Xn = sample_mean_trials(n, trials)
print(result_string(n, trials, Xn))

We performed 10000 trials with n = 100 for each trial:
The numerical mean is       0.49980
The theoretical mean is     0.50000
The numerical variance is   0.00084
The theoretical variance is 0.00083
