# Law of Large Numbers

> According to the law, the average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed.
> 
> -- <cite>Wikipedia</cite>

The *Law of Large Numbers* allows statistical inference to be possible. It guarantees that, under certain conditions, the moments we calculate from samples will converge asymptotically to the true moments. Of course, we ought to ask ourselves the following questions:

1. Under which conditions the Law of Large Number works?
2. When the Law of Large Numbers works, how *fast* the sample moment converges to the true one?

## Convergence of Moments of the Normal Distribuition: An experimental approach

Let's develop a feeling of how moments converge using the Normal distribuition as an example:

In [43]:
from sympy.stats import Normal
from sympy.stats import sample
import plotly.graph_objects as go

NUMBER_OF_SAMPLES = 1000
TRUE_MEAN = 5
TRUE_VARIANCE = 3

random_variable = Normal('X', TRUE_MEAN, TRUE_VARIANCE)

x_axis = list(range(0,NUMBER_OF_SAMPLES))
experiments = [float(sample(random_variable)) for x in x_axis]

# Plot
fig = go.Figure(data=go.Scatter(x=x_axis, y=experiments))
fig.show()

`experiments` contains samples of a Normally Distribuited random variable with `TRUE_MEAN` mean and variance equals `TRUE_VARIANCE`. To calculate the first moment, we can use the recursive mean formula:

In [44]:
from sympy import symbols

past_mean, index, current_sample = symbols("M_{n-1} n Xn")

recursive_mean  =past_mean*(index-1)/index + current_sample/index
recursive_mean

M_{n-1}*(n - 1)/n + Xn/n

Note that what we are going to do isn't computationally efficient. That said, we want to see how Mn evolves as we increase the number of samples:

In [45]:
means = [0]

for index in range(1,NUMBER_OF_SAMPLES):
    current_mean = (means[index-1] *(index-1)/index) + (experiments[index]/index)
    means.append(current_mean) 

# Plot
fig = go.Figure(data=go.Scatter(x=x_axis, y=means))
fig.show()

We can also plot the relative error (%) evolution:


In [63]:
errors = [100*(mean-TRUE_MEAN)/TRUE_MEAN for mean in means]

# Plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_axis, y=errors,
                    mode='lines',
                    name='Error relative to the true mean (%)'))
fig.show()

We can also see the evolution of each term of the recusive mean formula:

In [61]:
means = [0]
left_terms = [0]
right_terms = [0]

for index in range(1,NUMBER_OF_SAMPLES):
    current_left_term = means[index-1] *(index-1)/index 
    current_right_term = experiments[index]/index
    current_mean = current_left_term + current_right_term
    left_terms.append(current_left_term)
    right_terms.append(current_right_term)
    means.append(current_mean) 

# Plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_axis, y=left_terms,
                    mode='lines',
                    name='Left Term'))
fig.add_trace(go.Scatter(x=x_axis, y=right_terms,
                    mode='lines',
                    name='Right Term'))
fig.show()

From the recursive formula we can see that as `n` gets bigger `(n-1)/n` approaches `1` and `Xn/n` approaches zero. Hence, the formula becomes `Mn = Mn-1` and `Mn` converges.

An important result is that the convergence of the first moment, for any distribuition, depends only on the asymptotic behaviour of the term `Xn/n`. If `Xn` grows faster or with the same speed than `n`, then the series will diverge. Otherwise, it will converge. But since `X` is a random variable, we must ask ourselves the following question: 

> What is the probability of `Xn` being larger than `n`, as n gets larger?  

Note that this is exactly the question a random variable's PDF answers. For the Normal Distribuition case, the probability of `Xn` being larger than `n` decays exponentially to zero. Hence:

1. The sample mean converges to the true mean.
2. The sample mean converges to the true mean *very fast*.

Naturally this isn't a proof of the Law of Large Numbers. It says nothing about other moments nor other distribuitions. But it gives us a good intuition of *how and *why* LLN works.
