# DATA 5600: Introduction to Regression and Machine Learning for Analytics

## __Sampling Distributions and the Monte Carlo Method__ <br>

Author:      Tyler J. Brough <br>
Updated: September 20, 2021 <br>

---

<br>

In [1]:
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('darkgrid')
plt.rcParams['figure.figsize'] = [10, 5]

<br>

---


# __The Monte Carlo Simulation Algorithm for Sampling Distributions__

1. Determine the population distribution of interest. 

2. Use a pseudo random number generator (PRNG) to draw $M$ samples each of size $n$ from the population. 
   (We are using the PRNG's builtin to Python's `np.random`) 

3. Apply the sample statistic of interest to each of the $M$ samples. Obtain $M$ values of the sample
   statistic. 

4. Plot a histogram of the simulated sample from the sampling distribution.

5. Calculate the mean and standard deviation of the simulated sample. If available (i.e., if the Central
   Limit Theorem, or CLT, applies compare with the theoretical values from the population.)

<br>

## __Glossary__

<br>

The __Law of Large Numbers__ (__LLN__) is a theorem that describes the result of performing the same experiment a large number of times. According to the LLN, the average of the results obtained from a large number of trials should be arbitrarily close to the expected value, and will tend to become closer as more trials are performed. 

<br>

See the Wikipedia article on the [Law of Large Numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers) for more details. 

<br>
<br>

The __Central Limit Theorem__ (__CLT__) says that under random sampling, as the sample
size gets large, the sampling distribution of the sample mean approaches a normal
distribution with mean $\mu$ and variance $\sigma^{2}/n$. Put another way, if the
sample size is sufficiently large, we can assume that the sample mean has a normal 
distribution. This means that with a 'sufficiently large' sample size, it can be 
assumed that

<br>

$$
\large{Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}}
$$

<br>

has a standard normal distribution.

<br>