# Bessel's Correction

Or, why when computing the sample standard deviation you need to divide by $n-1$ and not n.

## Imports 

In [2]:
%run ~/.jupyter/config.ipy
import numpy as np

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Theory

Let's say there is some distribution $\mathcal{N}(\mu, \sigma)$ that we don't know. We want to learn its parameters and so we sample from it.

Using those samples, we compute the sample mean,

$$\bar{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i$$

We then compute the (no factor of n) sample variance,

$$
\begin{align}
\bar{\sigma}^2  
&= \sum_{i=1}^{n} (x_i - \bar{\mu})^2 \\
&= \sum_{i=1}^{n} (x_i^2 - 2 x_i \bar{\mu} + \bar{\mu}^2) \\
&= \sum_{i=1}^{n} x_i^2 -  \sum_{i=1}^{n} 2 x_i \bar{\mu} +  \sum_{i=1}^{n} \bar{\mu}^2 \\
&= \sum_{i=1}^{n} x_i^2 - 2 n \bar{\mu}^2 +  n \bar{\mu}^2 \\
&= \sum_{i=1}^{n} x_i^2 - n \bar{\mu}^2 \\
\end{align}
$$


In [185]:
mu = 5
sigma = 2
n_samples = 10


samples = np.random.normal(mu, sigma, size=(10000, n_samples))

print(np.mean(np.var(samples, axis=1, ddof=1)))

4.010035908935977
