
**Definition - Unbiased sample mean estimator**

The sample mean of a random variable  \\(X\\) given a realization \\(x=X(\omega)\\) where \\(\omega\\) sample in the total sample space \\(\omega \in \Omega \\) can be estimated by.

$$\overline x = \frac{1}{n} \sum_\{i=1}^n x_{i}$$

**Definition - Bias of sample mean estimator**

The bias of the sample mean estimator is defined as the expected value of the sample mean estimator minus the true value of the sample mean

$$Bias_\{\overline x} =  E[\overline x] - \mu_x$$

The expected value of the sample mean estimator is given as 

$$E[\overline x]=E[\frac{1}{n} \sum_\{i=1}^n x_{i}]=\frac{1}{n} \sum_\{i=1}^n E[x_{i}]=\frac{1}{n} \sum_\{i=1}^n \mu_{x}=\frac{1}{n} n \mu_{x}=\mu_{x}$$

Since the expected value of the sample mean estimator is the true sample mean, it's an unbiased estimator.

**Definition - Unbiased sample variance estimator**

The sample variance of a random variable \\(X\\) given a realization \\(x=X(\omega)\\) where \\(\omega\\) sample in the total sample space \\(\omega \in \Omega \\) can be estimated by.

$$\overline \sigma_{x}^2 = \frac{1}{n-1} \sum_\{i=1}^n (x_{i}-\overline x)^2$$

**Definition - Bias of sample variance estimator**

The bias of the sample variance estimator is defined as the expected value of the sample variance estimator minus the true value of the sample variance

$$Bias_\{\overline \sigma_{x}^2} =  E[\overline \sigma_{x}^2] - \sigma_{x}^2$$

The expected value of the sample variance estimator is given as 

$$E[\overline \sigma_{x}^2]=E[\frac{1}{n-1} \sum_\{i=1}^n (x_{i}-\overline x)^2]=\frac{1}{n-1} \sum_\{i=1}^n E[x_{i}^2-2 \overline x x_{i}+ \overline x ^2]$$

Since \\(E[x_{i}^2]=\sigma_{x}^2+\mu_{x}^2\\), \\(E[x_{i} \overline x]=\frac{\sigma_{x}^2}{n}+\mu_x^2\\) and \\(E[\overline x ^2]=\frac{\sigma_{x}^2}{n}+\mu_x^2\\) this can be rewritten to

$$E[\overline \sigma_{x}^2]=\frac{1}{n-1} \sum_\{i=1}^n (\sigma_{x}^2+\mu_{x}^2-\frac{2\sigma_{x}^2}{n}-2\mu_x^2+\frac{\sigma_{x}^2}{n}+\mu_x^2)=\frac{1}{n-1} \sum_\{i=1}^n (\sigma_{x}^2-\frac{\sigma_{x}^2}{n})=\frac{n}{n-1} \sigma_{x}^2\frac{n-1}{n}=\sigma_{x}^2$$

Since the expected value of the sample variance estimator is the true sample variance it's an unbiased estimator.

**Biased sample variance estimator**

The biased sample variance estimator is defined as 

$$\overline \sigma_{x}^2 = \frac{1}{n} \sum_\{i=1}^n (x_{i}-\overline x)^2$$

This will have an expected value of 

$$E[\overline \sigma_{x}^2]=\sigma_{x}^2\frac{n-1}{n}$$

This will converge towards \\(\sigma_{x}^2\\) for large number of samples

In [0]:
import numpy as np
import matplotlib.pyplot as plt

In [0]:
def variance_estimator(x, biased=False):
    n = len(x)
    mean = np.mean(x)
    if biased:
        return 1/n*np.sum((x-mean)**2)
    else:
        return 1/(n-1)*np.sum((x-mean)**2)

In [0]:
num_samples = [10 * i for i in range (1,20)]
var_estimates_biased = []
var_estimates_unbiased = []
test = 10000
mu, sigma = 0, 1

for num_sample in num_samples:

    var_estimate_biased = 0
    var_estimate_unbiased = 0
    for i in range(test):
        x = np.random.normal(mu, sigma, num_sample)
        var_estimate_biased += variance_estimator(x, biased=True)
        var_estimate_unbiased += variance_estimator(x)

    var_estimates_biased += [var_estimate_biased/test]
    var_estimates_unbiased += [var_estimate_unbiased/test]

plt.figure()
plt.plot(num_samples, np.abs(sigma-np.array(var_estimates_biased)), label="Biased variance estimates mean error")
plt.plot(num_samples, np.abs(sigma-np.array(var_estimates_unbiased)), label="Unbiased variance estimates mean error")
plt.legend()
plt.xlabel("Number of samples")
plt.ylabel("Variance estimate")
plt.title("Variance estimation")