In [None]:
#######################
# standard code block #
#######################

# see https://ipython.readthedocs.io/en/stable/interactive/magics.html
%pylab inline

%config InlineBackend.figure_formats = ['retina']

## Central Limit Theorem

- How it works: 
    - Sample from a distribution k times; this can be from any kind of distribution, but let's call the distribution **D**
        - Take the mean (expectation) of those samples; now you have an estimate of the underlying distribution's mean
        - Call the set of samples $k_{1}$
        - $E(k_{1})$ = one sample mean of **D**
    - Take a second random sample, again of size k, from the same distribution **D**; compute that sample's mean 
        - Let's call this second sample $k_{2}$
        - $E(k_{2})$ = a second sample mean of **D**
    - Now you have two samples of the underlying distribution's mean (each sample is size k, and both are coming from distribution **D**)
        - We have $E(k_{1})$ and $E(k_{2})$ from **D**
    - Keep taking samples (up to n) of size k from distribution **D**
        - e.g. $E(k_{1})$, $E(k_{2})$, $E(k_{3})$, $E(k_{4})$
        - Call the number of samples n, so we have n = 4 now
    - Now you have n samples, where each sample is made up of k elements, that represent an estimate of distribution **D**'s mean
    - CLT says that the distribution of n will now be approximately normal (and actually normal in the limit, when we take infinite samples)

## Example drawing from a uniform distribution

In [None]:
uniform_draws = np.random.uniform(1,10,(10,4))
# draw from a uniform distribution 10 times, and take 4 samples each time

uniform_draws

In [None]:
sample_means = np.mean(uniform_draws, axis=1) 
# calculate the mean of each sample
# (where a sample is a four-element array)
 
sample_means

In [None]:
sample_means.mean() # now take the expectation of the 10 means

In [None]:
# Now let's run this for many more samples and plot the means
import scipy.stats as stats

uniform_draws = np.random.uniform(1, 10, (10**4, 100))
sample_means = np.mean(uniform_draws, axis=1)

plt.hist(
    sample_means, bins=50, density=True, label="sample means");
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 1000)
y = stats.norm.pdf(x,np.mean(sample_means),np.std(sample_means))
plt.plot(x,y,'k',label="normal");
plt.legend();

## In conclusion

- Note that we originally drew from a **uniform** distribution! 
- Then we took a bunch of samples of size 500 and took the mean of each sample
- The distribution of our sample means looks **normal**! That's the CLT

# You Code

Try this with a different distribution. Are there any that you expect will not follow the CLT?