## 1 Advanced Probability Distributions

1. Log Normal Distribution
2. Poison Distribution
3. Exponential Distribution
4. Geometric Distribution

## 2 Log Normal Distribution

## 3 Poison Distribution

### 3.1 What is Poison Distribution?

#### Definition

1. Poisson distribution models the **number of events in a fixed interval of time**.
2. Poison Distribution is a Discrete Distribution.

#### Variables

1. Fixed interval
2. Average rate ($\lambda$)

#### Formula

$
\begin{align}
\large
P(x = k) = \frac{\lambda^k \cdot e^{-\lambda}}{k!}
\end{align}
$

#### Expected Value

$
\large
\begin{align}
E[X] = \lambda
\end{align}
$

#### Properties

1. As lambda increases the variance (spread in the data) increases.
2. As lambda increases distribution tends to be closer to normal distribution.

### 3.2 Examples

In [1]:
import math
from fractions import Fraction as F
import numpy as np
from scipy import stats

#### Quiz #1

In 1 hr time 10 cars have passed through a toll booth:

1. What is the probability of exactly 25 vehicles passing through the toll booth in next 1 hr?
2. What is the probability of more than 40 vehicles passing through the toll booth in next 1 hr?

##### Solution

In [2]:
mu = 10
k = 25
e = 2.71

In [3]:
((mu**k) * (e ** (-mu))) / math.factorial(k)

3.01759820292243e-05

In [4]:
# Find(x = 25)
x = 25

In [5]:
p_x_25 = stats.poisson.pmf(k=x, mu=mu)
p_x_25.round(4).item()

0.0

In [6]:
# Find: P(x >= 40)
# 1 - P(x <= 39)
x = 39

In [7]:
p_x_ge_40 = 1 - stats.poisson.cdf(k=x, mu=mu)
p_x_ge_40.round(4).item()

0.0

What is the prob of exactly 20 errors in log file for a particular server in the next 10 hours?

> **Note**:
>
> Question is missing information to determine rate hence taking a random value of 5 errors in 1 hour.

In [8]:
#  1hr -> 5
# 10hr -> ?

mu = 10 * 3
k = 20
e = 2.71

In [9]:
p_x_20 = ((mu**k) * (e ** (-mu))) / math.factorial(k)
round(p_x_20, 4)

0.0147

#### Quiz #2

Suppose a particular hospital experiences an average of 2 births per hour.  
We can use the formula above to determine the probability of experiencing 0, 1, 2, 3 births, etc. in a given hour:

##### Solution

In [10]:
mu = 2  # Birth rate
# k = 0, 1, 2, 3

In [11]:
p_x_0 = stats.poisson.pmf(k=0, mu=mu)
p_x_0.round(4).item()

0.1353

In [12]:
p_x_1 = stats.poisson.pmf(k=1, mu=mu)
p_x_1.round(4).item()

0.2707

In [13]:
p_x_2 = stats.poisson.pmf(k=2, mu=mu)
p_x_2.round(4).item()

0.2707

In [14]:
p_x_3 = stats.poisson.pmf(k=3, mu=mu)
p_x_3.round(4).item()

0.1804

#### Quiz #3

Imagine we have collected data for all the football matches ever happened, now we want to analyze the distribution of goals.  
We observe that the average goal per 90 mins match is 2.5, So the rate will be 2.5 goals per match (λ = 2.5).

1. If I want to know the probability of getting 1 goal in last 30 mins?

##### Solution

In [15]:
# 90min -> 2.5
# 30min -> ?
mu = 30 * 2.5 / 90

# Find: P(x = 1)
x = 1

In [16]:
p_x_1 = stats.poisson.pmf(k=x, mu=mu)
p_x_1.round(4).item()

0.3622

#### Quiz #4

The shop is open for 8 hours. The average number of customers is 74 - assume Poisson distributed.

1. What is the probability that in 2 hours, there will be at most 15 customers?
2. What is the probability that in 2 hours, there will be at least 7 customers?

##### Solution

In [17]:
# 8hrs -> 74
# 2hrs -> ?
mu = 2 * 74 / 8

# Find: P(x <= 15)
x = 15

In [18]:
p_x_le_15 = stats.poisson.cdf(k=x, mu=mu)
p_x_le_15.round(4).item()

0.249

In [19]:
# Find: P(k >= 7)
# 1 - P(k <= 6)
x = 6

In [20]:
p_x_le_6 = 1 - stats.poisson.cdf(k=x, mu=mu)
p_x_le_6.round(4).item()

0.9993

#### Quiz #5

It is known that a certain website makes 10 sales per hour.  
In a given hour, what is the probability that the site makes exactly 8 sales?

##### Solution

In [21]:
mu = 10
k = 8

In [22]:
stats.poisson.pmf(mu=mu, k=k).round(4).item()

0.1126

### 3.3 Poisson approximation to Binomial

#### Conditions for Poisson approximation to Binomial

The Binomial distribution converges towards the Poisson distribution as the number of trials $n$ goes to infinity while the product $n \cdot p$ converges to a finite limit.   
Therefore, the Poisson distribution with parameter $\lambda = n \cdot p$ can be used as an approximation to $B(n, p)$ of the binomial distribution   
**if $n$ is sufficiently large and $p$ is sufficiently small**.

For a reasonable approximation:  

- if $n \geq 20$ and $p \leq 0.05$ such that $n \cdot p \leq 1$
- or if $n > 50$ and $p < 0.1$ such that $n \cdot p < 5$
- or if $n \geq 100$ and $n \cdot p \leq 10$.

If the above conditions met the we can use the Poisson distribution to estimate the probabilities of different event counts.  

### 3.4 Examples

#### Quiz #1

There are 80 students in a kinder garden class.  
Each one of them has 0.015 probability of forgetting their lunch on any given day.

1. What is the average or expected number of students who forgot lunch in the class?  
2. What is the probability that exactly 3 of them will forget their lunch today?

##### Solution

In [23]:
n = 80
p = 0.015
mu = 1.2

# Find: P(x = 3)
x = 3

In [24]:
exp = n * p
exp

1.2

In [25]:
stats.poisson.pmf(mu=mu, k=x).round(4).item()

0.0867

Poisson approximation to Binomial Distribution.

So, **in the context of our problem**, $n = 80$ and $p = 0.015$, the conditions $n \cdot p < 10$ and $p \leq 0.1$ are satisfied.  
We can use the Poisson distribution with $\lambda = 80 * 0.015$ as an approximation to the Binomial Distribution.

In [26]:
n = 80
p = 0.015

In [27]:
stats.binom.pmf(k=x, n=n, p=p).round(4).item()

0.0866

#### Quiz #2

You receive 240 messages per hour on average - assume Poisson distributed.  
Rate of messages arriving per second is (1/15).  

What is the probability of having no message in 10 seconds?

##### Solution

Verify rate:

In [28]:
# 3600 -> 240
#    1 -> ?
print(F(240, 3600))
mu = 1 / 15

1/15


In [29]:
#  1 -> 1/15
# 10 -> ?
mu = 10 * (1 / 15)

# Find: P(x = 0)
x = 0

In [30]:
p_x_0 = stats.poisson.pmf(mu=mu, k=x)
p_x_0.round(4).item()

0.5134

## 4 Exponential Distribution

### Definition

1. Exponential distribution models the **time between consecutive events**.
2. Exponential distribution is a Continuous Distribution.

### Formula

$
\begin{align}
\large
P(T \le x) = 1 - e^{-\lambda x}
\end{align}
$

$
\begin{align}
\large
P(x, \lambda) = \lambda \cdot e^{-\lambda x}
\end{align}
$

#### Scale

$
\begin{align}
\beta = \frac{1}{\lambda}
\end{align}
$

#### Rate

$
\begin{align}
\lambda = \frac{1}{\beta}
\end{align}
$

#### Expected Value

$
\large
\begin{align}
E[X] = \beta = \frac{1}{\lambda}
\end{align}
$

### Examples

#### Quiz #1

You receive 240 messages per hour on average - assume Poisson distributed.  
Rate of messages arriving per second is (1/15)

What is the probability of waiting for more than 10 seconds for the next message?

##### Solution

In [31]:
mu = 1 / 15
scale = 1 / mu

# Find: P(x > 10)
# 1 - P(x < 10)
x = 10

In [32]:
p_x_gt_10 = 1 - stats.expon.cdf(x=x, scale=scale)
p_x_gt_10.round(4).item()

0.5134

#### Quiz #2

You are working as a data engineer who has to resolve any bugs/failures of machine learning models in production.  
The time taken to debug is exponentially distributed with mean of 5 minutes.  

Find the probability of debugging in 4 to 5 minutes?

##### Solution

how many bugs solved in 1 min?  
One bug in 5 mins  
&nbsp;&nbsp;?&nbsp;&nbsp;&nbsp; bug in 1 min 

In [33]:
# 5mins -> 1 bug
# 1mins -> ?
mu = 1 / 5
scale = 1 / mu

# Find P(x = 5) - P(x = 4)
# x = 4, 5

In [34]:
x1, x2 = 4, 5
p_x_4 = stats.expon.cdf(x=x1, scale=scale)
p_x_5 = stats.expon.cdf(x=x2, scale=scale)

(p_x_5 - p_x_4).round(4).item()

0.0814

Find the probability of needing more than 6 minutes to debug.

##### Solution

In [35]:
# P(x > 6)
# 1 - P(x < 6)
x = 6

In [36]:
x_gt_6 = 1 - stats.expon.cdf(x=6, scale=scale)
x_gt_6.round(4).item()

0.3012

### Memory less property of exponential distribution

The memory-less property essentially means that the time you've already spent on an event doesn't affect the future waiting time.  
In the context of the exponential distribution, it means that the probability of needing more time in the future is the same, regardless of how much time has already passed.

For example,  

Given that you have already spent 3 minutes, what is the probability of needing more than 9 minutes.

$
\begin{align}
P(x > 9\; \textbar \; x > 3) = \frac{P(x > 9 \; \cap \; x > 3)}{P(x > 3)}
\end{align}
$

$
\begin{align}
P(x > 9 \; \textbar \; x > 3) = \frac{P(x > 9)}{P(x > 3)}
\end{align}
$

#### Quiz #1

You are working as a data engineer who has to resolve any bugs/failures of machine learning models in production.  
The time taken to debug is exponentially distributed with mean of 5 minutes.  

Given that you have already spent 3 minutes, what is the probability of needing more than 9 minutes.

##### Solution

In [37]:
# 5mins -> 1 bug
# 1mins -> ?
mu = 1 / 5
scale = 1 / mu

# Find P(x > 9), P(x > 3)
# 1 - P(x < 9), 1 - P(x < 3)
# x = 9, 3

In [38]:
x1, x2 = 9, 3
p_x_gt_9 = 1 - stats.expon.cdf(x=x1, scale=scale)
p_x_gt_3 = 1 - stats.expon.cdf(x=x2, scale=scale)

(p_x_gt_9 / p_x_gt_3).round(4).item()

0.3012

## 5 Geometric Distribution

### 5.1 What is Geometric Distribution?

#### Definition

1. Geometric distribution calculates **probability of the first success occurring during a Bernoulli trial**
2. Geometric distribution is a Discrete Distribution.

1. Geometric Distribution models the **number of independent trials needed to achieve the first success** in a series of Bernoulli trials, where each trial has a constant probability of success.
2. Geometric Distribution is a **Discrete probability distribution**

#### Formula

$
\begin{align}
\large
P(x = k) = (1 - p)^{k - 1} \cdot p
\end{align}
$

#### Expected value

$
\begin{align}
\large
E(x) = \frac{1}{\text{probability of success}}
\end{align}
$

### 5.2 Examples

#### Quiz #1

You are flipping a fair coin repeatedly until you get heads for the first time.  
You're interested in finding out, on average, how many times you need to flip the coin before you get that first heads.

##### Solution

In [39]:
p = 0.5
# Find k

In [40]:
exp = 1 / p
exp

2.0

#### Quiz #2

Suppose you are playing a game where success rate of winning a prize is 0.7  

1. What is the probability of winning a prize on 4th attempt?
1. What is the probability that you don't win in 2 attempts?
1. What is the expected number of trials to get the first success?

##### Solution

In [41]:
p = 0.7

In [42]:
# Find: P(x = 4)
x = 4

In [43]:
p_x_4 = stats.geom.pmf(k=x, p=p)
p_x_4.round(4).item()

0.0189

In [44]:
# Find: P(x >= 3)
# 1 - P(X <= 2)
x = 2

In [45]:
p_x_gt_3 = 1 - stats.geom.cdf(k=x, p=p)
p_x_gt_3.round(4).item()

0.09

In [46]:
exp = 1 / p
round(exp, 4)

1.4286