## 1 Gaussian Distribution

### 1.1 What is Gaussian Distribution?

#### Definition

A Gaussian Distribution also called as Normal Distribution is a symmetric, continuous probability distribution where most data points cluster around the mean.

#### Properties of Normal Distribution

* Normal Distribution is only applicable for Continuous Random variable.
* Binning the data-points in the distribution using Histogram creates bell curve.
* Mean and Standard Deviation is called as parameter of Normal Distribution since we can recreate entire distribution using mean and SD.

> **Note**:
>
> If Mean, Median, Mode all lie at the center does not always mean its a Normal Distribution.  
> Best way to identify Normal Distribution is to use Empirical Rule.

### 1.2 Empirical Rule

#### Rule

$
\begin{align}
\large
\text{68 / 95 / 99.7}
\end{align}
$

#### Explanation

For a given distribution whose mean mean lie at the center:

1. 68% of data-points lie within 1 standard deviation away from mean.
2. 95% of data-points lie within 2 standard deviation away from mean.
3. 99% of data-points lie within 3 standard deviation away from mean.

### 1.3 Examples

In [1]:
import numpy as np
from scipy import stats

#### Quiz #1

The height of people is Gaussian with mean 65 inches and standard deviation 2.5 inches.  
What is the fraction of people whose height is between 60 and 72.5?

##### Solution

In [2]:
mu = 65
std = 2.5
# Find: P (60 <= x <= 72.5)
# P(x <= 72.5) - P(x <= 60)

In [3]:
mu - 2 * std, mu + 3 * std

(60.0, 72.5)

In [4]:
# Half of 2SD + Half of 3SD
(95 / 2) + (99.7 / 2)

97.35

or

In [5]:
z1 = -2
z2 = 3
(stats.norm.cdf(z2) - stats.norm.cdf(z1)).round(4)

np.float64(0.9759)

#### Quiz #2

The height of people is Gaussian with mean 65 inches and standard deviation 2.5 inches.  
What fraction of people are shorter than 67.5?

##### Solution

In [6]:
65 - (1 * 2.5), 65 + (1 * 2.5)

(62.5, 67.5)

In [7]:
# Why 0.5?
# Ans: 50% of data is before median
res = 0.5 + (0.68 / 2)
round(res, 4)

0.84

or

In [8]:
mu = 65
std = 2.5
# Find: P(x <= 67.5)

In [9]:
mu - 1 * std, mu + 1 * std

(62.5, 67.5)

In [10]:
z_score = 1
stats.norm.cdf(z_score).round(4).item()

0.8413

#### Quiz #3

The height of people is Gaussian with mean 65 inches and standard deviation 2.5 inches.  
What fraction of people are shorter than 70?

##### Solution

In [11]:
mu = 65
std = 2.5
# Find: P (x <= 70)

In [12]:
mu - 2 * std, mu + 2 * std

(60.0, 70.0)

In [13]:
# 50% of data is before median
0.5 + (0.95 / 2)

0.975

or

In [14]:
z_score = 2
stats.norm.cdf(z_score).round(4).item()

0.9772

## 2 Z-Score

### 2.1 What is Z-Score?

#### Definition

A Z-score (or standard score) measures how many standard deviations $\sigma$ a data point $x$ is above or below the mean $\mu$ of a distribution.

$x = \mu + (z \cdot \sigma)$

$
\begin{align}
P\bigl((\mu - 1\sigma) < x < (\mu - 1\sigma)\bigr) = 0.68 \text{, when z = 1}\\
P\bigl((\mu - 2\sigma) < x < (\mu - 2\sigma)\bigr) = 0.95 \text{, when z = 2}\\
P\bigl((\mu - 3\sigma) < x < (\mu - 3\sigma)\bigr) = 0.99 \text{, when z = 3}\\
\end{align}
$

#### Formula

$
\begin{align}
\large
\text{Z-Score} = \frac{(x - \mu)}{\sigma}
\end{align}
$

1. Z-score can be zero, positive or negative.  
1. A **z-score = 0**, represents that the data-point is the **mean itself**.  
1. A **positive Z-Score** indicates that data-point lies **right side of mean**.  
1. A **negative Z-Score** indicates that data-point lies **left side of mean**.

### 2.2 Examples

#### Quiz #1

The height of people is Gaussian with mean 65 inches and standard deviation 2.5 inches.  
What fraction of people are shorter than 69.1?

##### Solution

In [15]:
mu = 65
std = 2.5
# Find: P(x <= 69.1)
x = 69.1

In [16]:
z_score = (x - mu) / std
z_score

1.6399999999999977

In [17]:
stats.norm.cdf(z_score).round(4).item()

0.9495

#### Quiz #2

The height of people is Gaussian with mean 65 inches and standard deviation 2.5 inches.  
What fraction of people are shorter than 67.5?

##### Solution

In [18]:
mu = 65
std = 2.5
# Find: P(x < 67.5)
x = 67.5

In [19]:
z_score = (x - mu) / std
z_score

1.0

In [20]:
stats.norm.cdf(z_score).round(4).item()

0.8413

#### Quiz #3

Balls produced by the manufacturer have a mean diameter of 50 mm and std dev 2 mm.  
What fraction of balls have a diameter smaller than 53 mm?

##### Solution

In [21]:
mu = 50
std = 2
# Find: P(x < 53)
x = 53

In [22]:
z_score = (53 - 50) / 2
stats.norm.cdf(z_score).round(4).item()

0.9332

## 3 Percent Point Function `ppf`

### 3.1 What is Percent Point Function `ppf`?

The percent point function (PPF) is the inverse of the cumulative distribution function (CDF).

#### `cdf` vs `ppf`

* Use `ppf()` to get Z-Score from Percentile.
* Use `cdf()` to get Percentile from Z-Score.

> **Note**:
>
> `cdf` function gives probability from z-score.  
> `ppf` function gives Z-Score from probability.  

### 3.2 Examples

#### Quiz #1

The height of people is Gaussian Distribution with mean height of 65 inches and standard deviation of 2.5 inches.  
One person says "96% of people are shorter than me." What is that person's height?

##### Solution

In [23]:
mu = 65
std = 2.5
prob = 0.96

In [24]:
z_score = stats.norm.ppf(prob).round(2)
print(f"z-core for the probability of {prob} is {z_score}")

z-core for the probability of 0.96 is 1.75


In [25]:
height = mu + (z_score * std)
print("Person's Height is:", height)

Person's Height is: 69.375


#### Quiz #2

Skaters take a mean of 7.42 seconds and std dev of 0.34 seconds for 500 meters.  
What should his speed be such that he is faster than 95% of his competitors?

##### Solution

In [26]:
mu = 7.42
std = 0.34
distance = 500
prob = round(1 - 0.95, 2)  # Top 5%

In [27]:
z_score = stats.norm.ppf(prob).round(2).item()
print(f"z-score value for probability {prob} is {z_score}.")

z-score value for probability 0.05 is -1.64.


In [28]:
time = mu + (z_score * std)
time = round(time, 2)
print(f"Time at z-score {z_score} is {time}")

Time at z-score -1.64 is 6.86


In [29]:
speed = distance / time
speed = round(speed, 2)
print(f"Speed must be {speed} to be in top 5%")

Speed must be 72.89 to be in top 5%


#### Quiz #3

A retail outlet sells around 1000 toothpastes a week, with standard deviation = 200.  
If we have 1300 stock units as our inventory, then 

1. What is the probability you'd need to replenish stocks?
2. How much inventory should you have, such that there is only 3% chance of running out of stock mid-week?

##### Solution

PART 1:

In [30]:
mu = 1000
std = 200

# We have to replenish stock if sales goes > 1300.
# Hence find: P(x > 1300)
# P(x > 1300) = 1 - P(x < 1300)
x = 1300

In [31]:
z_score = (x - mu) / std
z_score

1.5

In [32]:
prob_x_gt_1300 = 1 - stats.norm.cdf(z_score)
print(round(prob_x_gt_1300, 4))

0.0668


PART 2:

In [33]:
# probability that you are 97% in stock:
prob = 1 - 0.03

In [34]:
z_score = stats.norm.ppf(prob).round(2)
x = mu + (z_score * std)
print(x)

1376.0


## 4 Standard Normal Distribution

### 4.1 What is Standard Normal Distribution?

#### Definition

Normal Distribution with mean zero and standard deviation one is called as Standard Normal Distribution.

We can take any Normal Distribution and convert it to The Standard Normal Distribution, using Standardization.

#### Formula

## 5 Appendix

```
  P(a < x < b) 
= P(x < b) - P(x < a)
= cdf(b) - cdf(b)

  P(x < a) 
= cdf(a)

  P(X > a) 
= 1 - P(x < a)
= 1 - cdf(a)
```