# Content
- Application of Normal Distribution (find values based on probabilities)
- Random Numbers
- Law of Large Numbers
- Central Limit Theorem
- Normal Approximation to Binomial Distribution

Parametric statistics assumes that sample data comes from a population that follows a probability distribution based on a fixed set of parameters, often the Gaussian distribution (i.e. normal distribution).

If a data sample is not Gaussian, then the assumptions of parametric statistical tests are violated and nonparametric statistical methods must be used.

In [2]:
# Imports
from scipy import stats
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math

In [4]:
# Generic functions

def display_probability_density_chart():
    r'''
    Display probability density chart.
    '''
    # Plot between -10 and 10 with .001 steps.
    x_axis = np.arange(-10, 10, 0.001)
    # Mean = 0, SD = 2.
    plt.plot(x_axis, norm.pdf(x_axis,0,2))
    plt.show()

# Application of Normal Distribution

## Normal Distribution
Is is a **continuous distribution**, not discrete.

## Standard Normal Distribution
Properties:
- μ = 0
- σ = 1

**All normally distributed variables can be transformed into standard normally distributed variables.**

Standard score = z = (X - μ)/σ

<img src="attachment:Standard%20Normal%20Distribution.PNG" width="700" align="left">

**Standardization** is the process of putting different variables on the same scale.

Useful for solving practical problems with parameters in absolute numbers (e.g. dollars spent on buying clothes).

In [32]:
def get_zscore_from_probability(
    p_left=None, # probability left side of target range (if any)
    p_right=None, # probability right side of target range (if any)
):
    r'''
    Calculate z-score from table (with ppf).
    '''
    z = 0
    if p_left and not p_right:
        z = stats.norm.ppf(p_left)
        
    elif p_right and not p_left:
        z = 1 - stats.norm.ppf(p_right)
    
    elif p_left and p_right:
        z_left = stats.norm.ppf(p_left)
        z_right = stats.norm.ppf(p_right)
        z = z_right - z_left
    
    return z


def get_probability_from_zscore(
    z_left=None, # z-score left side of target range (if any)
    z_right=None, # z-score right side of target range (if any)
):
    r'''
    Calculate probability (with cdf).
    '''
    p = 0
    if z_left and not z_right:
        p = stats.norm.cdf(z_left)
        
    elif z_right and not z_left:
        p = 1 - stats.norm.cdf(z_right)
    
    elif z_left and z_right:
        p_left = stats.norm.cdf(z_left)
        p_right = stats.norm.cdf(z_right)
        p = p_right - p_left
    
    return p

## Examples

In [24]:
# Find area left of z=2.06
# p = get_probability_from_zscore(
#     z_left=2.06, # z-score left side of target range (if any)
#     z_right=None, # z-score right side of target range (if any)
# )

# Find area right of z=-1.19
p = get_probability_from_zscore(
    z_left=None, # z-score left side of target range (if any)
    z_right=-1.19, # z-score right side of target range (if any)
)

# Find area between z=1.68 and z=-1.37
# p = get_probability_from_zscore(
#     z_left=-1.37, # z-score left side of target range (if any)
#     z_right=1.68, # z-score right side of target range (if any)
# )

print('p =', round(p, 4))

p = 0.883


In [46]:
# Find z-value for area between 0 and z-value of 0.2123
z = get_zscore_from_probability(
    p_left=0.5, # probability left side of target range (if any) -> here the mean
    p_right=0.5+0.2123, # probability right side of target range (if any)
)
print('z-value =', round(z, 2))

z-value = 0.56


## Normality Tests
https://machinelearningmastery.com/a-gentle-introduction-to-normality-tests-in-python/

Techniques:
- Graphical methods (check chart curve)
- Statistical tests

In [47]:
starbucks_branches_per_city = [
    67, 84, 80, 77, 97, 59, 62, 37, 33, 42,
    
]