(primers-law_largeN)=
# Law of Large Numbers

This notebook introduces the law of large numbers when generating independent and identically distributed (iid) samples from a random variable with a finite mean.




1) This notebook starts by importing all packages that were found in `requirements.txt`

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import ipywidgets as widgets
import pandas as pd

2) Next, a method is defined to take a parameter `N` and plot the probability density function (PDF) of `N` samples drawn from a standard normal distribution which is also plotted for comparison. The mean-squared error (MSE) between sample mean and true mean when performing many trials of generating N samples and calculating sample mean is printed on the plot. The vertical dashed line is meant to mark the sample mean of the shown PDF of `N` samples.

In [3]:
def plot_PDF_of_noise_with_samples(N):
    noise = np.random.randn(N)
    sample_mean = noise.mean()

    x = np.linspace(-4, 4, 1000)
    pdf = sp.stats.norm.pdf(x, loc=0, scale=1)

    num_trials = 100
    noise_batches = np.random.randn(N, num_trials)
    noise_batches_sample_means = noise_batches.mean(axis=0)
    mse = (noise_batches_sample_means**2).mean()

    fig, pdf_ax = plt.subplots(1, 1, figsize=(7, 3))
    pdf_ax.set_title(f'Probability Density Function ({N} samples)')
    pdf_ax.hist(noise, bins=100, density=True, zorder=2)
    pdf_ax.plot(x, pdf, label='Theoretical Standard Normal Distribution')

    pdf_ax.axvline(x=sample_mean, label=f'Sample mean line',
                color='k', linestyle='dashed', linewidth=2, zorder=3)
    pdf_ax.text(x = -3.5, y = 0.65, s = f'MSE over {num_trials} trials: {(mse):.4f})')
    pdf_ax.set_ylim(0, 1)
    pdf_ax.set_xlim(-4, 4)
    pdf_ax.grid(which='both')
    pdf_ax.set_ylabel('Probability')
    pdf_ax.set_xlabel('Noise values')
    pdf_ax.legend()
    
    plt.show()

3) Finally, an interactive widget is used to plot the above PDF for any `N` value ranging from 100 to 5000 with step sizes of 100. As `N` increases, the MSE between sample mean and true mean decreases to 0

In [None]:
N_slider = widgets.IntSlider(
    value=100,
    min=100,
    max=5000,
    step=10,
    description='Number of samples:', 
    style={'description_width': 'initial'},
    disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='d',
    layout=widgets.Layout(width="650px")
)

out = widgets.interact(plot_PDF_of_noise_with_samples,N=N_slider)

interactive(children=(IntSlider(value=100, description='Number of samples:', layout=Layout(width='650px'), max…