## Theory: normal (Gaussian) distribution

### concepts:
* distribution of continuous variable
* Gaussian distribution and normality assumption
* 68-95-99.7 rules
* Probability Density Function (PDF)
* Cummulative Density Function (CDF)
* Percent Point Function (PPF), which is the inverse of the CDF
* normal probability plot

In [1]:
%pwd

'C:\\Users\\sc522\\PROJECTS\\STAT\\IS5\\code\\ch05'

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

In [3]:
from ipywidgets import interact

In [4]:
import ipywidgets as widgets

In [19]:
def std_normal(cut):
    guass = stats.norm(loc=0, scale=1)
    x = np.linspace(-guass.ppf(0.9999), guass.ppf(0.9999), 10_000)
    n = np.arange(0, x[-1])
    fig, ax = plt.subplots()
    ax.plot(x, guass.pdf(x), 
            color='k', 
            lw=2,
            label='Standard Normal')
    ax.plot(np.ones(len(x)) * cut, x,
            label='cut-off')
    ax.set_xlim(-5, 5)
    ax.set_ylim(0, guass.pdf(x).max() * 1.2)
    plt.title(f'P [ x > {cut:.2f} ]: {1-guass.cdf(cut):.4f}')
    plt.close(fig)
    ax.set_xlabel('x')
    return fig

In [20]:
interact(std_normal, cut=widgets.FloatSlider(min=-3.7, max=3.7, step=0.001));

interactive(children=(FloatSlider(value=0.0, description='cut', max=3.7, min=-3.7, step=0.001), Output()), _do…

In [22]:
stats.norm.cdf(1.959963984540054)  # cummulative density function

0.975

In [13]:
stats.norm.pdf(0)  # probability density function

0.3989422804014327

In [15]:
stats.norm.ppf(0.975)  # probability percentage function -  reverse function of the cdf

1.959963984540054

In [21]:
stats.norm.ppf(0.95)

1.6448536269514722

In [24]:
# SAT scored Mean 500, SD 100, between 450 and 600
mu =  500
sd = 100
y1 = 450
y2 = 600
z1 = (y1 - mu) / sd
z2 = (y2 - mu) / sd
c1 = stats.norm.cdf(z1)
c2 = stats.norm.cdf(z2)
c2 - c1

0.532807207342556

In [25]:
# SAT: desired to score in the first 10%
c = 1 - 0.1
z = stats.norm.ppf(c)
y = mu + z * sd

In [26]:
y

628.1551565544601