# The Normal Distribution
Is super important in stats, because the central limit theorem says that (assuming a large sample size) the sample average (i.e. the average of any observations you make) is normally distributed. That means you can use it for modelling errors, unexplained variations in obeservations.

Say you take a sample from a population and measure it. How close is the mean of those samples measures (the sample average) to the true mean of the population? CLT says that you can figure that out because sample means are (for  sufficiently large samples) normally distributed around the population mean.

## Describing the normal distribution

$X \sim N(\mu, \sigma^2)$

$f(x|\mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{exp}\left[\frac{-1}{2\sigma^2}(x-\mu)^2\right]$

$E[X] = \mu$

$Var(X) = \sigma^2$

Graphed, $X \sim N(3, 2^2)$ would look like a bell curve, with the peak at 3 and the 'width' of the bell proportional to $\sigma$:

In [1]:
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [3, 3]
import numpy as np

def gaussian(list, mu, sigsq):
    y = []
    a = 1/(np.sqrt(2*np.pi*sigsq))
    b = 1/(2*sigsq)
    for xval in list:
        c = (xval-mu)**2
        yval = a * np.exp(-b*c)
        y.append(yval)
    return y

x = []
xval = -4
while xval <= 10:
    x.append(xval)
    xval = xval + 0.01

plt.plot(x, gaussian(x, 3, 2**2))
plt.show()

<Figure size 300x300 with 1 Axes>

### Algebra with the normal distribution
The normal is also nice in that it's easy to combine normal distributions with simple algebra.

If $X \sim N(\mu, \sigma^2)$ then $Y=aX+b \sim N(a\mu + b, a^2\sigma^2)$ 

If $X_1 \sim N(\mu_1, \sigma_1^2)$ and  $X_2 \sim N(\mu_2, \sigma_2^2)$ and $X \perp Y$ then $ X_1 + X_2 \sim N(\mu_1+\mu_2, \sigma_1^2 + \sigma_2^2) $

(If X and Y are not independent it's more tricky, but still not so bad)

## T Distribution and samples from a normal distribution
A consequence of the algebra above is that you extend that for $n$ random variables $X_i \sim N(\mu,\sigma^2)$ ($X_i$ are independently selects, and identically distributed) the above equation becomes:

$$X_1 + X_2 + \dots + X_n \sim N(\mu + \mu + \dots + \mu, \sigma^2 + \sigma^2 + \dots \sigma^2)$$

$$\sum_{i=1}^n X_i \sim N(n\mu, n\sigma^2)$$

$\frac{1}{n} \sum_{i=1}^n X_i $ is the mean of the random variables $= \overline{\rm X}$

so if we use the equation 'If $X \sim N(\mu, \sigma^2)$ then $Y=aX+b \sim N(a\mu + b, a^2\sigma^2)$' where $a = \frac{1}{n}$ and $b=0$

$$ \overline{\rm X} \sim N \left(\mu, \frac{\sigma^2}{n} \right)$$

In words, the mean of $n$ random variables/observations $X_i$ each with with the same $\mu$ and $\sigma$ is itself normally distributed around that same $\mu$, with the spread being proportional to the number of variables.

What this means in practice is that you can estimate the true mean $\mu$ and standard deviation $\sigma$ of a random variable by using a set of observations, by reversing the above transformation, to get from a $N \left(\mu, \frac{\sigma^2}{n} \right)$ to $N \left(0, 1 \right)$ (the 'Standard Normal').

$a \mu + b = 0$ and $a^2 \frac{\sigma^2}{n} = 1$

$$a = \sqrt{\frac{n}{\sigma^2}} = \frac{\sqrt{n}}{\sigma}$$

$$b = -a\mu = \frac{-\sqrt{n}\mu}{\sigma}$$

$$a \overline{\rm{X}} + b = \frac{\sqrt{n}\overline{\rm{X}}}{\sigma} - \frac{\sqrt{n}\mu}{\sigma} = \frac{\sqrt{n}\left(\overline{\rm{X}}-\mu\right)}{\sigma} = \frac{\overline{\rm{X}}-\mu}{\sigma / \sqrt{n}}$$

$$\frac{\overline{\rm{X}}-\mu}{\sigma / \sqrt{n}} \sim N(0,1)$$

Thus if you have a set of observations, using the mean of those observations $\overline{\rm{X}}$ you can calculate the true population mean $\mu$, because it takes whatever value that gives a standard normal distribution. 

The problem here is that you don't have the populations standard deviation $\sigma$. Instead, you have to approxmiate it using $\rm{S}$, the Sample Standard Deviation (i.e. the standard deviation of all the observations in your sample)

$$ S = \sqrt{\sum_i \left( X_i - \overline{X}\right)^2/\left(n-1\right)} $$

So the formula becomes

$$\frac{\overline{\rm{X}}-\mu}{S / \sqrt{n}} \sim t(\nu)$$

which is a t distribution, with $\nu = n-1$ degrees of freedom. So it's no longer normally distributed. It *looks* kind of normal, but with fatter tails. 

The CLT is basically that as $n$ gets large, the t distribution gets more standard normal, i.e. tends to $N(0,1)$. 

### Describing the T distribution

$$Y \sim t(\nu)$$

$$f(y) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right)\sqrt{\nu \pi}} \left(1 + \frac{y^2}{\nu} \right)^{-\left(\frac{\nu+1}{2}\right)}$$

$$E[Y] = 0 \text{ if } \nu \gt 1 $$

$$Var[Y] = \frac{\nu}{\nu-2} \text{ if } \nu \gt 2$$