# <center> <font color=navy> BE_18- Data Analysis and Applied Statistics WS 22/23 </font> </center>
## <center> <font color=	#FF4500> Chapter 3 : Random variables and Distributions </font> </center>  

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as st
from scipy.special import comb
from scipy.stats import hypergeom
from scipy.stats import binom
from scipy.stats import geom
from scipy.stats import poisson
from scipy.stats import uniform
from scipy.stats import expon
import math

## <font color= 	#483D8B> Combinatorics </font>

Number of ways to choose *x* (without replacement) out of *n* (total number). Formally called <font color=	#FF4500>**Binomial Formula**</font>.

<br> 
<div style="border: 2px solid royalblue">
<br> $$ \pmatrix{n \\ x} = C_{n/x}$$ <p style="text-align:right">  (1) &nbsp;
</div> 



## <font color= 	#483D8B> 1. Hypergeometric distribution </font>

A discrete probability distribution with only two independent subsets (eg. Aces/non-Aces); used to find probility of obtaining successes without replacement. It is similar t binomial distribution, but probability of success is unknown.

<font color=	#FF4500>**Probability Mass Function** (PMF)</font> allows to compute probability of a single success: 


<br> 
<div style="border: 2px solid royalblue">
<br> $$ P(x)= \frac{\pmatrix{A \\ x}  \pmatrix{ N-A \\ n-x}}{\pmatrix{N \\ n}}$$ <p style="text-align:right">  (2) &nbsp;
</div> 




#### EX 1. Pick 5 cards out of a deck, how many ways to get 2 aces?

N= 52, A = 5, n= 4, x = 2

In [10]:
 #Pmf=probability mass function, probability of exactly 2 sucesses.
    
def hypergeom_pmf(N,A,n,x):
    
    '''
    Probability Mass Function for Hypergeometric Distribution
    :param N: population size
    :param A: total number of desired items in N
    :param n: number of draws made from N
    :param x: number of desired items in our draw of n items
    :returns: PMF computed at x
    '''
    Achoosex = comb(A,x)
    NAchoosenx = comb(N-A, n-x)
    Nchoosen = comb(N,n)
    
    return (Achoosex)*NAchoosenx/Nchoosen

hypergeom_pmf(52,5,4,2) 

0.03992981808107859

#### EX 2. Pick 5 cards out of a deck, how many ways to get 2 or less aces?

N= 52, A = 5, n= 4, x = 2

In this example, <font color=	#FF4500>**Cumulative Distribution Function** (CDF)</font> is used to compute total probabilities for a range of values of x.

<br> 
<div style="border: 2px solid royalblue">
<br> $$ P(X \ge x)= \sum_{x=0}^t \frac{\pmatrix{A \\ x}  \pmatrix{ N-A \\ n-x}}{\pmatrix{N \\ n}}$$ <p style="text-align:right">  (3) &nbsp;
</div>

In [12]:
#cdf for probability of at most 2 successes.

def hypergeom_cdf(N, A, n, t, min_value=None):
    
    '''
    Cumulative Density Funtion for Hypergeometric Distribution
    :param N: population size
    :param A: total number of desired items in N
    :param n: number of draws made from N
    :param t: number of desired items in our draw of n items up to t
    :returns: CDF computed up to t
    '''
    if min_value:
        return np.sum([hypergeom_pmf(N, A, n, x) for x in range(min_value, t+1)])
    
    return np.sum([hypergeom_pmf(N, A, n, x) for x in range(t+1)])
hypergeom_cdf(52,5,4,2)

0.9982454520269646

#### EX 3. Pick 5 cards out of a deck, how many ways to get 2 or more aces?

N= 52, A = 5, n= 4, x = 2

In this example, substract probability of getting two or less aces (by cdf) from 1. This is also called **Survival frequency**.

<br> 
<div style="border: 2px solid royalblue">
<br> $$ SF = 1- CDF$$ <p style="text-align:right">  (4) &nbsp;
</div> 



In [13]:
#sf=survival frequency, probability of atleast 2 successes.

1-hypergeom_cdf(52,5,4,2)   

0.0017545479730354252

## <font color= 	#483D8B> 2. Binomial Distribution </font>

A probability distribution with only two independent outcomes (success/failure); used to find probility of obtaining exact number of successes with replacement. Probability of success (p) doesn't change.

<br> 
<div style="border: 2px solid royalblue">
<br> $$ P(x)= \pmatrix{n \\ x} p^x q^{n-x}$$ <p style="text-align:right">  (5) &nbsp;
</div> 


where p is the probability of x successes out of n and q is the probability of failure. 

<br> 
<div style="border: 2px solid royalblue">
<br> $$ q= 1-p$$ <p style="text-align:right">  (6) &nbsp;
</div> 



#### EX 4. A trainer is teaching a dolphin to do tricks. The probability that the dolphin successfully performs the trick is 35%, and the probability that the dolphin does not successfully perform the trick is 65%. Out of 20 attempts, you want to find the probability that the dolphin succeeds 12 times. Find the P(X=12).

n= 20
x= 12
p= 0.35

In [17]:

def binom_pmf(n,x,p):
    
    '''
    Probability Mass Function for Binomial Distribution
    :param n: number of attempts
    :param x: number of desired attempts out of n 
    :param p: probability of success
    :returns: PMF computed at x
    '''
    
    return comb(n,x)* p**x * (1-p)**(n-x)
binom_pmf(20,12,0.35)

0.013564085376714451

#### EX 4. A trainer is teaching a dolphin to do tricks. The probability that the dolphin successfully performs the trick is 35%, and the probability that the dolphin does not successfully perform the trick is 65%. Out of 20 attempts, you want to find the probability that the dolphin succeeds 12 times. Find the P(X<=12).

n= 20
x= 12
p= 0.35

In this example, **Cumulative Distribution Function** (CDF) is used to compute total probabilities for a range of values of x.

<br> 
<div style="border: 2px solid royalblue">
<br> $$ P(X \ge x)= \sum_{x=0}^t \pmatrix{n \\ x} p^x q^{n-x}$$ <p style="text-align:right">  (7) &nbsp;
</div> 


In [19]:
#cdf for probability of at most 2 successes.

def binom_cdf(n,t,p, min_value=None):
    
    '''
    Cumulative Density Funtion for Binomial Distribution
    :param n: number of attempts
    :param x: number of desired attempts out of n 
    :param p: probability of success
    :param t: number of desired items in our draw of n items up to t
    :returns: CDF computed up to t
    '''
    if min_value:
        return np.sum([binom_pmf(n, x, p) for x in range(min_value, t+1)])
    
    return np.sum([binom_pmf(n, x, p) for x in range(t+1)])
binom_cdf(20,12,0.35)

0.9939847300418231

#### EX 5. A trainer is teaching a dolphin to do tricks. The probability that the dolphin successfully performs the trick is 35%, and the probability that the dolphin does not successfully perform the trick is 65%. Out of 20 attempts, you want to find the probability that the dolphin succeeds 12 times. Find the P(X>=12).

n= 20
x= 12
p= 0.35

In this example, substract probability of getting two or less aces (by cdf) from 1. This is also called <font color=	#FF4500>**Survival frequency**</font> .

<br> 
<div style="border: 2px solid royalblue">
<br> $$ SF = 1- CDF$$ <p style="text-align:right">  (4) &nbsp;
</div>


In [20]:
#sf=survival frequency, probability of atleast 12 successes.

1-binom_cdf(20,12,0.35)   

0.0060152699581769165

## <font color= 	#483D8B> 3. Geometric distribution </font>

A discrete probability distribution of the probability (*p*) for an event to take place after *X* number of independent trials. *p* remains same for each trial.

For $X_s$ = first round of success;

<br> 
<div style="border: 2px solid royalblue">
<br> $$ P(X_s=x) = (1-p)^{x-1}  p $$ <p style="text-align:right">  (8) &nbsp;
</div> 


For $X_f$ = number of failures before first success;

<br> 
<div style="border: 2px solid royalblue">
<br> $$ P(X_f=x) = (1-p)^x  p$$ <p style="text-align:right">  (9) &nbsp;
</div>
 

#### EX 6. The probability that Bob hits a free throw in basketball is 20%. What is the probability that he will miss five times before making one (makes the first in 6th try)?

$X_f$ = 5, p= 0.20

In [23]:
def geom(s,p):
    
    '''
    Geometric distribution
    :param s: attempt of first success
    :param p: probability of success
    :returns: Probability of first success
    '''
    
    return (1-p)**(s-1) *p
geom(6,0.2)

0.06553600000000002

## <font color= 	#483D8B> 4. Poisson distribution </font>

A probability distribution for events that occur at an average rate. Events are independent of each other and rate doesn't change.

<br> 
<div style="border: 2px solid royalblue">
<br> $$ P(x)= \frac {\mu^x e^{-\mu}}{x!}$$ <p style="text-align:right">  (10) &nbsp;
</div>


where $\mu$ is rate of events per unit time, x is the number of successes and e is Euler's constant (equal to approx 2.71828).

#### EX 6. According to a survey a university professor gets, on average, 7 emails per day. Let X = the number of emails a professor receives per day. The discrete random variable X takes on the values x = 0, 1, 2 …. The random variable X has a Poisson distribution: X ~ P(7). The mean is 7 emails. What is the probability that an email user receives exactly 2 emails per day?


In [13]:
def poisson_pmf(x,mu):
    
    '''
    Poisson distribution
    :param x: number of successes
    :param mu: mean number of successes in unit time
    :returns: PMF computed at x
    '''
    
    return mu^{x} * (math.e)^{-mu}/comb(x)
poisson.pmf(2,7)

0.022341108156085643

In [11]:
def poisson_pmf(x,mu):
    
    '''
    Probability mass function for Poisson distribution
    :param x: number of successes
    :param mu: mean number of successes in unit time
    :returns: PMF computed at x
    '''
    
    return mu^{x} * np.exp(-mu)/comb(x)
poisson.pmf(2,7)

0.022341108156085643

In [12]:
#cdf for probability of at most 2 successes.

def poisson_cdf(t,mu,min_value=None):
    
    '''
    Cumulative Density Function for Poisson distribution
    :param x: number of successes
    :param mu: mean number of successes in unit time
    :param t: number of desired successes up to t
    :returns: CDF computed up to t
    '''
    if min_value:
        return np.sum([poisson_pmf(x,mu) for x in range(min_value, t+1)])
    
    return np.sum([poisson_pmf(x,mu) for x in range(t+1)])
poisson_cdf(2,7)

TypeError: unsupported operand type(s) for *: 'set' and 'float'

#### EX.7 According to a survey a university professor gets, on average, 7 emails per day. Let X = the number of emails a professor receives per day. The discrete random variable X takes on the values x = 0, 1, 2 …. The random variable X has a Poisson distribution: X ~ P(7). The mean is 7 emails. What is the probability that an email user receives atmost 2 emails per day?


In [5]:
poisson.cdf(2,7)

0.029636163880521763

## <font color= 	#483D8B> 5. Uniform distribution </font>

A probability distribution where each event has an equal chance of occuring. 

**Probability Density Function:**
<br> 
<div style="border: 2px solid royalblue">
<br> $$f(x) = \frac{1}{(b-a)}$$ <p style="text-align:right">  (11) &nbsp;
</div> 

 
for $a \le x \le b$, all values between a and b are equally likely.

**Cumulative  Distribution Function:**
<br> 
<div style="border: 2px solid royalblue">
<br> $$ f(x) = \frac{(x-a)}{(b-a)}$$ <p style="text-align:right">  (12) &nbsp;
</div> 



with 0 for x<a and 1 for x>b

**Mean:**
<br> 
<div style="border: 2px solid royalblue">
<br> $$\mu = \frac {a+b}{2}$$ <p style="text-align:right">  (13) &nbsp;
</div> 


**Standard deviation:**
<br> 
<div style="border: 2px solid royalblue">
<br> $$ \sigma = \sqrt\frac{(b-a)^2}{12}$$ <p style="text-align:right">  (14) &nbsp;
</div> 



#### EX 8. The amount of time, in minutes, that a person must wait for a bus is uniformly distributed between zero and 15 minutes, inclusive.What is the probability that a person waits exactly 12.5 minutes?

In [1]:
def uniform_pdf(x, a, b):
    
    '''
    Probability mass function for Uniform distribution
    :param a: lowest value of x
    :param b: highest value of x
    :returns: PMF computed at x
   '''
    if a <= x <= b:
        return 1/(b-a)
    else:
        return 0
uniform_pdf(12.5,0,15)

0.06666666666666667

#### EX 9. The amount of time, in minutes, that a person must wait for a bus is uniformly distributed between zero and 15 minutes, inclusive.What is the probability that a person waits fewer than 12.5 minutes?

In [2]:
def uniform_cdf(x, a, b):
    
    '''
    Cumulative distribution function for Uniform distribution
    :param a: lowest value of x
    :param b: highest value of x
    :returns: CDF computed at x
   '''
    if x < a:
        return 0
    elif a <= x <= b:
        return (x-a) /(b-a)
    elif x >= b:
        return 1
uniform_cdf(12.5,0,15)

0.8333333333333334

## <font color= 	#483D8B> 6. Exponential distribution </font>

Probability distribution where lower values are more probable than higher ones.

**Probability Density Function:**
<br> 
<div style="border: 2px solid royalblue">
<br> $$ f(x) = me^{-mx}
$$
    
    
<center>OR</center>
    
$$
f(x) = \frac {1}{\mu}e^{\frac{-1}{\mu}x}
$$ <p style="text-align:right">  (15) &nbsp;
</div> 


**Cumulative Distribution function:**
<br> 
<div style="border: 2px solid royalblue">
<br> $$
f(x)= 1- e^{-mx}
$$
    
<center>OR </center>
    
$$
f(x) = 1- e^{\frac{-1}{\mu}x}
$$ <p style="text-align:right">  (16) &nbsp;
</div> 

where m is the number of events per unit time (average rate) and $\mu$ is average time.

<br> 
<div style="border: 2px solid royalblue">
<br> $$\mu = \frac{1}{m}$$ <p style="text-align:right">  (17) &nbsp;
</div>

#### EX 10. On the average, a certain computer part lasts ten years. The length of time the computer part lasts is exponentially distributed. What is the probability that a computer part lasts 7 years?

In [5]:
def expon_pdf(x, m):
    
    '''
    Probability mass function for Exponential distribution
    :param m: average rate
    :returns: PMF computed at x
   '''
    return m * np.exp(-m * x)
expon_pdf(7,0.1)

0.04965853037914095

#### EX 11. On the average, a certain computer part lasts ten years. The length of time the computer part lasts is exponentially distributed. What is the probability that a computer part lasts more than 7 years?

In [6]:
def expon_cdf(x, m):
    
    '''
    Cumulative density function for Exponential distribution
    :param m: average rate
    :returns: CDF computed at x
   '''
    return 1 - np.exp(-m * x)
expon_cdf(7,0.1)

0.5034146962085906