In [1]:
#renders matplotlib plots in the notebook
%matplotlib notebook 

#enables tab autocomplete feature
%config IPCompleter.greedy=True 

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

## Random Variable
A random variable is a function that maps the sample space S into a subset of the real line.

X - Random variable - known deterministic function, which maps $s_i$ to $x_i$ <br/>

$X :: S \rightarrow R$, $s_i \leftarrow S, x_i \leftarrow Z$

the discrete random variable, which is one that takes on a finite or countably infinite number of values.

$S$ - Original sample space <br/>
$S_X$ - new sample space

### Probability Mass Function

$px[x_i] = P[X(s) = x_i]$

probability mass function maps the real numbers x_i to their probabilities

$P :: S \rightarrow [0,1]$

$X :: S \rightarrow Z$

$px :: Z \rightarrow [0,1]$

In summary, the probability mass function is the probability that the random variable X takes on the value x_i for each possible x_i.

**Convention**: $X(s)$ is represented as just $X$

![random variable and PMF](files/random_variable_and_PMF.jpg)

### Bernoulli PMF

Bernoulli experiment has only 2 outcomes, success and failure.

$X  \sim  Ber(p)$ means $X$ is distributed as per the Bernoulli probability mass function where probability of success is p.

### Binomial PMF

$bin(M,k) = \left ( {M \atop k} \right )p^k(1-p)^{M-k}$

Since $bin(M,k)$ is PMF does $\sum_{k=0}^{k=M} bin(M,k) = 1$?

In [12]:
from utils import utils

M = 50
p = 0.5

K = np.arange(0,M+1)
S = np.sum([utils.binomial_probability(M, k, p) for k in K])
print(S)
assert np.abs(S - 1) < 1e-6

1.0


### Geometric PMF
page 113 (5.7)

The probability that the first success occurs at experiment / trial k

$p_X[k] = (1-p)^{(1-k)}p^k$

### Poisson PMF
page 113 (5.8)

$p_X[k] = e^{-\lambda}  \lambda^k / k!$


## Random Variable Transformation

![random variable transformation](files/transformed_random_variable.jpg)


If the transformation is one-to-one then finding the new PMF is straight forward. For every element of the transformed set, the probability is the same it's image in the original set.

If the transformation is many-to-one then $P_Y(y_i)$ is the summation of PMF of all $x$ that map to $y_i$

$$P_Y[y] = \sum_{\forall g(x_i) = y}^{}P_X[x_i]$$

## Cumulative Distribution Function

Also called as distribution function

$$F_X(x) = P[X <= x]  -\infty < x < \infty$$

Cumulative Distribution Function of random variable X is the probability of X less than or equal to x and it can be computed by summing up all probabilities of X less than or equal to x

### For discreet random variables, the cdf is right continuous.

$P[a < X <b] = F_X[a^{+}] - F_X[b^{+}]$

$a^{+}$ is the right neighborhood of a

### cdf from $-\infty < x < \infty$ is 1
This is nothing but sum of probabilities of all possible values of X

### cdf is monotonically increasing
As probability is always positive, summing up probabilities is monitonically increasing

### Real-World Example - Servicing Customers

Pg No: 124

Prof. Poisson observes that on an average 70 people come to a lane on a weekday. He needs to determine the probability of no more than 2 people arriving at a lane in any one minute time span, because it takes one minute to service a customer and queue length should be at most 2.

This can be modelled as a sequence of bernoulli trials where success is defined as customer arriving and failure otherwise in that time slice.

If the time slice is 1 second then there are 60 trials and the probability that no more than 2 people arrive = 
bin(X <= 2) = bin(0) + bin(1) + bin(2)  
M = 60, p = 70/3600

The above model fails if two people arrive in the same second.
If we reduce the time slice even more down to infinitismally small then we would have a poisson distribution with lambda = 70/60 -> expected customers per minute

P(X<=2) = P(0,7/6) + P(1,7/6) + P(2,7/6) = 0.886; this is less than 95% and another lane is needed

#### Book's Method
Expected no:of people arriving at a lane in any minute halves. lambda = 7/12

Probability of X<=2 in both lanes = P_lane1(X<=2) and P_lane2(X<=2)  
= P(X<=2) * P(X<=2)  
= poisson.cdf(2,7/12) * poisson.cdf(2,7/12)  
= 0.9574605417971194

#### My Method
Since there are two lanes now, at most 4 people can arrive in any one minute span  
P(X<=4) lambda=7/6 = 0.9930883484362362

In [2]:
from scipy.stats import poisson

poisson.cdf(2,7/12)**2, poisson.cdf(4,7/6)

(0.9574605417971194, 0.9930883484362362)

### Variation 1
processing time = 1 minute  
no:of customers per hour = 600  
expected no:of cutomers for any one minute = 600/60 = 10  
Outcome: No:of lanes required such that there are at most 2 people in the line 95% of the time

#### Book's method
P(X<=2)^L >= 95% and lambda = 10/L; L - number of lanes

L = 54 lanes required for 600 customers per hour

In [3]:
from scipy.stats import poisson
mu = 600/60

i = 1
while True:
    p = poisson.cdf(2, mu/i)**i
    if p >= 0.95:
        break
    i = i+1
print(i)

54


#### My Method
P(X<=2 * L) and lambda = 10; L - number of lanes

L = 8 lanes required for 600 customers per hour

In [6]:
mu = 600/60
i = 1
while True:
    p = poisson.cdf(2*i, mu)
    if p >= 0.95:
        break
    i = i+1
print(i)

8


### Variation 2

processing time = 5 minutes  
no:of customers per hour = 70  
Outcome: No:of lanes required such that there are at most 2 people in the line 95% of the time

Now, in any 5 minute time span no more than 2 customers should arrive at a lane.
Expected no:of customers for a 5 minute time span = 70/12


#### Book's method
P(X<=2)^L lambda = 600/(12 * L)

24 lanes required

In [15]:
from scipy.stats import poisson
mu = 70/12

i = 1
while True:
    p = poisson.cdf(2, mu/i)**i
    if p >= 0.95:
        break
    i = i+1
print(i)

24


#### My Method

P(X<=2 * L) and lambda = 70/12; L - number of lanes

L = 5 lanes required for 70 customers per hour with processing time of 5 minutes each.

In [17]:
mu = 70/12
i = 1
while True:
    p = poisson.cdf(2*i, mu)
    if p >= 0.95:
        break
    i = i+1
print(i)

5


My method calculates the probability of no:of customers arriving in any 5 minute time span. This only works if there is one lane. In multiple lane scenario the 5 minute window is different for different lanes and needs to be calculated separetely.