In [1]:
'''
Import here useful libraries
Run this cell first for convenience
'''
import numpy as np
from scipy import stats
import scipy
import warnings
warnings.simplefilter('ignore', DeprecationWarning)

# Chapter 5 - Normal Distribution

## Probability Calculation Using the Normal Distribution

### Definition of the Normal Distribution



- Normal Distribution $N( \mu, \sigma)$:
\begin{equation}
    f(x; \mu, \sigma) = \frac{1}{\sqrt{2 \pi} \sigma } e ^{- \frac{(x - \mu) ^2}{2 \sigma ^2}}
\end{equation}
- $\mu$ is the mean and $\sigma$ the standard deviation of the distribution, hence $\sigma^2$ is its variance
- Symmetric about $\mu$
- Also called the Gaussian distribution

In [5]:
from scipy.stats import norm

# Parameters
x = 1300 # value to look for
mu = 1320 # mean
sigma = 15 # standard deviation

print("Mean: ", norm.mean(loc = mu, scale = sigma))
print("Variance: ", norm.var(loc = mu, scale = sigma)) 
print("Probability mass function: ", norm.pdf(x, loc = mu, scale = sigma))
print("Cumulative distribution function: ", norm.cdf(x, loc = mu, scale = sigma))
print("Survival function (1-cdf): ", norm.sf(x, loc = mu, scale = sigma))

Mean:  1320.0
Variance:  225.0
Probability mass function:  0.010934004978399577
Cumulative distribution function:  0.09121121972586788
Survival function (1-cdf):  0.9087887802741321


### Standard Normal Distribution

- Normal distribution with $\mu = 0$ and $\sigma = 1$
- Probability distribution function:
\begin{equation}
    f(x) = \frac{1}{\sqrt{2 \pi}} e ^{- \frac{x ^2}{2}}
\end{equation}
- Cumulative distribution function: $\Phi(x)$ (special notation for the Gaussian)

### Probability Calculation for General Normal Distributions

- $X \sim N(\mu, \sigma^2) \Longrightarrow Z = \frac{X - \mu}{\sigma} \sim N(0,1)$
- So, $P(a \leq X \leq b) = P \left( \frac{ a - \mu}{ \sigma} \leq \frac{ X - \mu}{ \sigma} \leq \frac{ b - \mu}{ \sigma} \right) = P \left( \frac{ a - \mu}{ \sigma} \leq Z \leq \frac{ b - \mu}{ \sigma} \right) = \Phi \left(\frac{ b - \mu}{ \sigma} \right) - \Phi \left( \frac{a - \mu}{ \sigma} \right)$
- Other properties:
    - $P( \mu -c \sigma \leq X \leq \mu + c \sigma) = P(-c \leq Z \leq c)$
    - $P(X \leq \mu + \sigma z_{\alpha}) = P(Z \leq z_ {\alpha}) = 1 - \alpha$

In [12]:
# This time, we calculate the distribution using the standard normal distribution, by calculating the new variable Z
# Same results as in the previous approach. This method is suggested for manual calculation though
from scipy.stats import norm

# Parameters 
x = 1300 # value to look for
mu = 1320 # mean
sigma = 15 # standard deviation
# Optional
alpha = 0.99 # cdf value used in ppf for calculating the inverse of the probability density function

# Parameter transformation
Z = (x - mu)/sigma


print("Mean: ", mu) # the mean is given 
print("Variance: ", sigma**2) # we calculate the variance using the provided standard deviation
print("Probability mass function: ", norm.pdf(Z))
print("Cumulative distribution function: ", norm.cdf(Z))
print("Survival function (1-cdf): ", norm.sf(Z))
print("Percent point function (inverse of cdf): ", norm.ppf(alpha))

Mean:  1320
Variance:  225
Probability mass function:  0.16401007467599366
Cumulative distribution function:  0.09121121972586788
Survival function (1-cdf):  0.9087887802741321
Percent point function (inverse of cdf):  2.3263478740408408


## Linear Combinations of Normal Random Variables

### Linear Functions of a Normal Random Variable

- $X \sim N(\mu, \sigma^2) $
    - $\Longrightarrow Y = aX +b \sim N(a \mu + b, a^2 \sigma ^2)$ for constant $a,b$ 
- Given $X \sim N(\mu_1, \sigma_1^2)$ and $X \sim N(\mu_2, \sigma_2^2)$ independent 
    - $\Longrightarrow Y = X_1 + X_2 \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)$ 

### Properties of independent Normal Random Variables

- $ X_i \sim N(\mu_i, \sigma_i^2), 1\leq i \leq n$ are independent and $a_i$, $1 \leq i \leq n$, and $b$ are constants
    - $ Y = a_1X_1 + \cdots + a_n X_n + b \sim N(\mu, \sigma^2)$
    - where $\mu = a_1\mu_1 + \cdots + a_n\mu_n + b, \sigma^2 = a_1^2\sigma_1^2 + \cdots +a_n^2\sigma_n^2$ 
- $X_i \sim N( \mu, \sigma^2)$, $  1\leq i \leq n$ are independent
    - $ \bar{X} \sim N(\mu, \frac{\sigma^2}{n})$ where $\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i$
-If $X_1, \cdots , X_n$ are not independent, then these properties may not be valid anymore


## Approximating Distributions with the Normal Distribution

### The Normal Approximation to the Binomial Distribution

Theorem (approximation of a binomial to a normal distribution):
- If $X$ is a binomial random variable with mean $\mu = np$ and variance $\sigma^2 = npq$, then the limiting form of the distribution of
\begin{equation}
    Z = \frac{X - np}{\sqrt{npq}},
\end{equation}
as $ n \to \infty$ is the standard normal distribution $n(z;0,1)$

### Continuity correction in the Normal approximation

- $X \sim B(n,p)$ and $Z \sim N(0,1)$ then:
    - $P(X \leq x) \approx P \left( Z \leq \frac{x + 0.5 -np}{\sqrt{npq}} \right)$
    - $P(X \geq x) \approx P \left( Z \geq \frac{x - 0.5 -np}{\sqrt{npq}} \right)$

### The Central Limit Theorem

- Let $X_1, \cdots , X_n$ be _iid_ (independent identically distributed) with a distribution with a mean $\mu$ and a variance $\sigma^2$; then
\begin{equation}
    \bar{X} = \frac{\sum_i^n X_i}{n} \approx N \left( \mu, \frac{\sigma^2}{n} \right)
\end{equation}
for $ n \to \infty$

- The central limit theorem works better if the distribution for the sample is closer to the normal distribution