# Example usage of the Central Limit Theorem 
Link to the book example: https://cnx.org/contents/tWu56V64@35.8:Mjy3YF-Z@20/7-2-Using-the-Central-Limit-Theorem#eip-id1170498321883

>The law of large numbers says that if you take samples of larger and larger size from any population, then the mean of the sampling distribution, μ x – tends to get closer and closer to the true population mean, μ. From the Central Limit Theorem, we know that as n gets larger and larger, the sample means follow a normal distribution. The larger n gets, the smaller the standard deviation of the sampling distribution gets.

1. The probability density function of the sampling distribution of means is normally distributed regardless of the underlying distribution of the population observations
2. Standard deviation of the sampling distribution decreases as the size of the samples that were used to calculate the means for the sampling distribution increases.

If you go down tot he bottom you will see the sample of means makes a normal distribution (as seen in the bottom chart) even though the probability of each of the numbers is relatively the same (as seen in the top chart)

In [None]:
import altair as alt
import pandas as pd
import random
import ipywidgets as widgets
from ipywidgets import interact

alt.themes.enable('opaque')

In [None]:
data = [random.randint(0,100) for x in range(1000000)]

In [None]:
df = pd.DataFrame(data=data, columns=['values'])

In [None]:
df_done = df['values'].value_counts().to_frame().reset_index().sort_values('index')

In [None]:
df_done.head()

In [None]:
alt.Chart(df_done).mark_bar().encode(
    x='index:O',
    y='values:Q').properties(title="The counts are relatively flat since the probability of any number occuring at random is about the same")

Just looking at what a sample of the whole population I built looks like

In [None]:
def sample_it(num_samples):
    the_data = df.sample(n=num_samples)['values'].value_counts().to_frame().reset_index().sort_values('index')
    the_chart = alt.Chart(the_data).mark_bar().encode(
                    x='index:O',
                    y='values:Q'
                    )
    return the_chart

interact(sample_it, num_samples=widgets.IntSlider(value=300, min=10, max=500000));

Below is the aggregated samples plotted to show the normal distribution that results according to the Central Limit Theorem.

_Note_: It is taking lots of samples so it will take about 30 seconds to run

In [None]:
def sample_it(rounding, samples_within_sample, total_samples):
    the_data = pd.DataFrame([df.sample(n=samples_within_sample).mean()[0] for x in range(total_samples)], columns=['values'])['values'].round(rounding).value_counts().to_frame().reset_index().sort_values('index') 
    bar_chart = alt.Chart(the_data).mark_bar().encode(x='index:O',y='values:Q')
    line_chart = alt.Chart(the_data).mark_line(interpolate="basis", color='orange').encode(x='index:O', y='values:Q')
    return bar_chart+line_chart

interact(sample_it,
         rounding=widgets.Dropdown(value=0, options=[0,1,2]),
         samples_within_sample=widgets.IntSlider(value=300, min=10, max=500000, continuous_update=False),
         total_samples=widgets.IntSlider(value=1539, min=10, max=5000, continuous_update=False)
         );