# Variance exercise

Write three functions to compute variance on a set of data. Compare their numerical behaviour on random samples with different means, variances, and number of entries.

For a description of the algorithms and their issues see:

* https://www.johndcook.com/blog/2008/09/28/theoretical-explanation-for-numerical-results/
* https://www.johndcook.com/blog/standard_deviation/
* https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/

Definitions:

* Mean $\mu$, sample mean $\bar{x}$, Standard deviation $\sigma$, sample standard deviation $s$.

## Method 1

This is based on the defintion of the mean and variance:

$$\bar{x} = \frac{1}{N}\sum_{i=1}^N x_i$$

$$s^2 = \frac{1}{N-1} \sum_{i=1}^N (x_i-\bar{x})^2$$ 

## Method 2

This method is based on the identity $\sigma^2 = E((X-E(X))^2) = E(X^2) - (E(X))^2$.

$$ M = \sum x_i, S = \sum x_i^2, s^2 = \frac{1}{N(N-1)}(N S - M^2)$$

## Method 3

This method defines a sequence that allows you to compute variance as the data arrive and update the calculation when new data are added..

Define $M_1=x_1, S_1=0$ then compute

$$ M_k = M_{k-1} + (x_k-M_{k-1})/k$$
$$ S_k = S_{k-1} + (x_k-M_{k-1})(x_k-M_k)$$

The variance is $s^2 = S_k/(k-1)$.


One of these requires "two passes" through the data. The first to compute the mean, the second to compute the variance.

The other two require one pass only. This is an advantage if you need to estimate the variance as data comes in, so you never have the whole dataset, or if the dataset is too large to get all at once.

It will turn out that one of the one-pass methods is much worse than the other two methods. It's your job to find out which, and why.


## Generate random samples from a Normal distribution

In [7]:
random_normal(N, μ, σ) = randn(N) .* σ .+ μ

random_normal (generic function with 1 method)

In [10]:
random_normal(5, 5, 10)

5-element Vector{Float64}:
 -10.219048365860367
  27.64094749349729
  13.615100733792989
  23.400130254018784
  -6.445464600772288

## Method 1

In [12]:
function var1(x)
    N = length(x)
    xbar = sum(x)/N
    sum((x .- xbar).^2)/(N-1)
end

var1 (generic function with 1 method)

## Method 2

In [13]:
function var2(x)
    N = length(x)
    M = sum(x)
    S = sum(x .^ 2)
    (N*S-M^2)/(N*(N-1))
end

var2 (generic function with 1 method)

## Method 3

In [18]:
function var3(x)
   N = length(x)
   M1 = x[1]
   S1 = 0
    for i in 2:N
        M2 = M1 + (x[i]-M1)/i
        S2 = S1 + (x[i]-M1)*(x[i]-M2)
        M1, S1 = M2, S2
    end
    S1 / (N-1)
end

var3 (generic function with 1 method)

## Test them all with the same data

In [21]:
x1 = random_normal(100, 6, 1);
var1(x1), var2(x1), var3(x1)

(1.0562894231618603, 1.0562894231618696, 1.05628942316186)

In [22]:
x1 = random_normal(10000, 1000, 1);
var1(x1), var2(x1), var3(x1)

(1.0052605061341233, 1.0052605062068707, 1.005260506134081)

In [25]:
x1 = random_normal(1_000_000, 1e9, 1.0);
var1(x1), var2(x1), var3(x1)

(1.0010776085237365, -140.7376290929571, 1.0010775918191157)

## Extensions

Compute skewness and kurtosis using online (one pass) algorithms.