# Discrete Probability Distributions

## Random Variables

### Discrete Random Variable: Definition of a Random Variable

A random variable is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete and continuous.

A discrete random variable is one which may take on only a countable number of distinct values and thus can be quantified. For example, you can define a random variable X to be the number which comes up when you roll a fair dice. X can take values : [1,2,3,4,5,6] and therefore is a discrete random variable.

Random variable $X$: mapping from sample space $S$ to a real line $R$
Numerical value $X(w)$ mapped to each outcome $w$ of a particular experiment

### Probability Mass Function

Probability Mass Function (p.m.f.): set of probability values $p_i$ assigned to each value taken by the discrete random variable $x_i$
$ 0 \leq p_i \leq 1 \text{ and } \sum_i p_i = 1$
Probability: $P(X = x_i) = p_i$

### Cumulative Distribution Function

Cumulative Distribution Function (CDF): $F(x) = P(X \leq x)$

### Expectation of a Random Variable: Expectations of Discrete Random Variables

Expectation of a discrete random variable $X$ with p.m.f. $p$:$$
  E(X) = \sum_i p_i x_i
$$

In [1]:
from scipy.stats import rv_discrete

x = [10, 20, 30]
p = [0.2, 0.3, 0.5]
distribution = rv_discrete(values=(x, p))
print("Expected value: ", distribution.expect())

Expected value:  23.0


### Variance of a Random Variable: Definition and Interpretation of Variance

Variance ($ \sigma ^2$): $Var(X) = E(X - E(X))^2 = E(X^2) - \mu ^2$
Positive quantity measuring the spread of the distribution about its mean value
Standard Deviation($\sigma$): $\sqrt{Var(x)}$

In [2]:
from scipy.stats import rv_discrete

x = [10, 20, 30]
p = [0.2, 0.3, 0.5]
distribution = rv_discrete(values=(x, p))
print("Variance: ", distribution.var())
print("Standard Deviation: ", distribution.std())

Variance:  61.0
Standard Deviation:  7.810249675906654


### The Binomial Distribution

#### Bernoulli Random Variables

Modeling of a process with two possible outcomes, labeled 0 and 1
Random variable defined by the parameter $p$, $0 \leq p \leq 1$, which is the probability that the outcome is 1
The Bernoulli distribution $Ber(p)$ is:$$
  f(x;p) = p^x(1-p)^{1-x}, \text{   } x= 0,1
$$
$E(X) = p$
$Var(X) = p(1-p)$

In [3]:
from scipy.stats import bernoulli
n = 4 # number of trials
p = 0.3 # probability of success
print("Mean: ", bernoulli.mean(p))
print("Variance: ", bernoulli.var(p))

Mean:  0.3
Variance:  0.21


#### Definition of the Binomial Distribution

Let's consider an experiment consisting of $n$ Bernoulli trials $X_1, \cdots, X_n$ independent and with a constant probability $p$ of success
Then the total number of successes $X = \sum_{i=1}^m X_i$ is a random variable whose Binomial distribution with parameters $n$ (number of trials) and $p$ is:$$
  X \sim B(n,p)
$$
Probability mass function of a $B(n, p)$ random variable is:$$
  f(x;n,p) = \binom{n}{x}p^x(1-p)^{n-x}, \text{   } x= 0,1, \cdots, n
$$
$E(X) = np$
$Var(X) = np(1-p)$

In [4]:
from scipy.stats import binom

# Parameters
n = 10 # number of trials
x = 7 # number of successes
p = 0.2 # probability of success

print("Mean: ", binom.mean(n, p))
print("Variance: ", binom.var(n, p))
print("Probability mass function: ", binom.pmf(x, n, p))
print("Cumulative distribution function: ", binom.cdf(x,n,p))

Mean:  2.0
Variance:  1.6
Probability mass function:  0.0007864320000000006
Cumulative distribution function:  0.9999220736


### The Geometric Distribution

Number of $X$ of trials up to and including the first success in a sequence of independent Bernoulli trials with a constant success probability $p$ has a geometric distribution with parameter $p$
Probability mass function:$$
  P(X = x) = (1 - p)^{x-1}p, \text{   } x=1,2, \cdots.
$$
Cumulative distribution function:$$
  P(X \leq x) = 1 - (1-p)^x
$$
$E(X) = \frac{1}{p}$
$Var(X) = \frac{1-p}{p^2}$

In [5]:
from scipy.stats import geom
x = 5 # number of trials up to and including the first success
p = 0.23 # probability of success
print("Mean: ", geom.mean(p))
print("Variance: ", geom.var(p))
print("Probability mass function: ", geom.pmf(x, p))
print("Cumulative distribution function: ", geom.cdf(x, p))

Mean:  4.3478260869565215
Variance:  14.555765595463136
Probability mass function:  0.08085199430000001
Cumulative distribution function:  0.7293215843


### Hypergeometric Distribution

Definition of the Hypergeometric Distribution
Consider a collection of $N$ items of which $r$ are of a certain kind
Probability the item is of the special kind: $p = \frac{r}{N}$
If $n$ items are chosen at random without replacement, then the distribution of $X \sim B(n,p)$
Hypergeometric distribution: $n$ items chosen at random without replacement
Probability mass function:$$
  f(x; N, n, r) = \frac{ \binom{r}{x} \binom{N-r}{n-x} }{ \binom{N}{n} },
$$$$
  max \{ 0, n-(N-r) \} \leq x \leq min \{ n, r \}
$$
$E(X) = n\frac{r}{N}$
$Var(X) = \frac{N-n}{N-1} n \frac{r}{N}(1- \frac{r}{N})$
Comparison with $B(n,p)$ when $ p = \frac{r}{N}$
$E_B(X) = E_H(X) = np$
$\sigma_B ^2 (X) = npq \geq \sigma_H ^2(X) = \frac{N-n}{N-1} npq$

In [6]:
from scipy.stats import hypergeom

x = 2 # number of rare elements picked
N = 15 # total number of elements
r = 9 # number of rare elements
n = 5 # picked up elements

print("Mean: ", hypergeom.mean(N, r, n))
print("Variance: ", hypergeom.var(N, r, n))
print("Probability mass function: ", hypergeom.pmf(x, N, r, n))
print("Cumulative distribution function: ", hypergeom.cdf(x, N, r, n))

Mean:  3.0
Variance:  0.8571428571428571
Probability mass function:  0.23976023976023988
Cumulative distribution function:  0.2867132867132869


### The Poisson Distribution

Describes the number of "events" occurring within certain specified boundaries of space and time
A random variable $X$ distributed as a Poisson random variable with parameter $\lambda$ is written as:$$
  X \sim P(\lambda)
$$
Probability mass function:$$
  P(X = x) = \frac{ e^{- \lambda} \lambda ^ {x}} {x!} \text{   } x=0,1,2, \cdots.
$$
$Eprint("Mean: ", multinomial.mean(x, p))(X) = Var(X) = \lambda$

In [7]:
from scipy.stats import poisson

# Parameters
x = 1 # number of events
Lambda = 2/3 # lambda parameter

print("Mean: ", poisson.mean(Lambda))
print("Variance: ", poisson.var(Lambda))
print("Probability mass function: ", poisson.pmf(x, Lambda))
print("Cumulative distribution function: ", poisson.cdf(x, Lambda))

Mean:  0.6666666666666666
Variance:  0.6666666666666666
Probability mass function:  0.3422780793550613
Cumulative distribution function:  0.8556951983876534
