# Probability

## Fundamental rules:
+ Probability of a union of two events:
$$\begin{aligned} p ( A \vee B ) & = p ( A ) + p ( B ) - p ( A \wedge B ) \\ & = p ( A ) + p ( B ) \text { if } A \text { and } B \text { are mutually exclusive } \end{aligned}$$

+ Joint probability (product rule):
$$p ( A , B ) = p ( A \wedge B ) = p ( A | B ) p ( B )$$

+ Marginal distribution (sum rule, rule of total probability):
$$p ( A ) = \sum _ { b } p ( A , B ) = \sum _ { b } p ( A | B = b ) p ( B = b )$$

+ Chain rule:
$$p \left( X _ { 1 : D } \right) = p \left( X _ { 1 } \right) p \left( X _ { 2 } | X _ { 1 } \right) p \left( X _ { 3 } | X _ { 2 } , X _ { 1 } \right) p \left( X _ { 4 } | X _ { 1 } , X _ { 2 } , X _ { 3 } \right) \ldots p \left( X _ { D } | X _ { 1 : D - 1 } \right)$$

+ Conditional Probability:
$$p ( A | B ) = \frac { p ( A , B ) } { p ( B ) } \text { if } p ( B ) > 0$$

+ Bayes rule: 
$$p ( X = x | Y = y ) = \frac { p ( X = x , Y = y ) } { p ( Y = y ) } = \frac { p ( X = x ) p ( Y = y | X = x ) } { \sum _ { x ^ { \prime } } p \left( X = x ^ { \prime } \right) p ( Y = y | X = x ^ { \prime } ) }$$

+ Independence: 
$$X \perp Y \Longleftrightarrow p ( X , Y ) = p ( X ) p ( Y )$$

+ Conditional Independence:
$$X \perp Y | Z \Longleftrightarrow p ( X , Y | Z ) = p ( X | Z ) p ( Y | Z )$$

## Some common discrete distributions

+ Binomial distribution: outcome of tossing a coin (2 sides) n times
$$\operatorname { Bin } ( k | n , \theta ) \triangleq \left( \begin{array} { l } { n } \\ { k } \end{array} \right) \theta ^ { k } ( 1 - \theta ) ^ { n - k }$$

where: n choose k $\left( \begin{array} { l } { n } \\ { k } \end{array} \right) \triangleq \frac { n ! } { ( n - k ) ! k ! }$

$$\text { mean } = \theta , \quad \text { var } = n \theta ( 1 - \theta )$$

+ Bernoulli distribution: utcome of tossing a coin 1 time
$$\operatorname { Ber } ( x | \theta ) = \theta ^ { \mathbb { I } ( x = 1 ) } ( 1 - \theta ) ^ { \mathbb { I } ( x = 0 ) }$$ 
In other words,
$$\operatorname { Ber } ( x | \theta ) = \left\{ \begin{array} { l l } { \theta } & { \text { if } x = 1 } \\ { 1 - \theta } & { \text { if } x = 0 } \end{array} \right.$$

+ Multinomial distribution: outcome of tossing a K-sided die n times 
$$\operatorname { Mu } ( \mathbf { x } | n , \boldsymbol { \theta } ) \triangleq \left( \begin{array} { c } { n } \\ { x _ { 1 } \ldots x _ { K } } \end{array} \right) \prod _ { j = 1 } ^ { K } \theta _ { j } ^ { x _ { j } }$$

where $\theta_j$ is the probability taht side $j$ shows up and:
$\left( \begin{array} { c } { n } \\ { x _ { 1 } \ldots x _ { K } } \end{array} \right) \triangleq \frac { n ! } { x _ { 1 } ! x _ { 2 } ! \cdots x _ { K } ! }$

+ Multinoulli (Categorical) distribution: outcome of tossing a K-sided die 1 time
$$\operatorname { Cat } ( x | \boldsymbol { \theta } ) \triangleq \operatorname { Mu } ( \mathbf { x } | 1 , \boldsymbol { \theta } ) = \prod _ { j = 1 } ^ { K } \theta _ { j } ^ { \mathbb { I } \left( x _ { j } = 1 \right) }$$

+ Poisson distribution: used for counting rare events like radioactive decay, traffic accidents $X \in \{0, 1, 2, \ldots\}$
$$\operatorname { Poi } ( x | \lambda ) = e ^ { - \lambda } \frac { \lambda ^ { x } } { x ! }$$

+ Emperical distribution:
Given dataset $D = \{x_1, \ldots, x_N\}$:
$$p _ { \mathrm { emp } } ( A ) \triangleq \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \delta _ { x _ { i } } ( A )$$

where $\delta_x(A)$ is Dirac measure: $\delta _ { x } ( A ) = \left\{ \begin{array} { l l } { 0 } & { \text { if } x \notin A } \\ { 1 } & { \text { if } x \in A } \end{array} \right.$

## Some common continous distributions
+ Uniform distribution:
$$\operatorname { Unif } ( x | a , b ) = \frac { 1 } { b - a } \mathbb { I } ( a \leq x \leq b )$$

+ Gaussian (normal) distribution: most widely used distribution in statistics
$$\mathcal { N } ( x | \mu , \sigma ^ { 2 } ) \triangleq \frac { 1 } { \sqrt { 2 \pi \sigma ^ { 2 } } } e ^ { - \frac { 1 } { 2 \sigma ^ { 2 } } ( x - \mu ) ^ { 2 } }$$

where $\mu = E[X]$ is the mean, $\sigma^2 = Var[X]$ is the variance and $\lambda = 1/\sigma^2$ is the precision. A high precision means a narrow distribution (low variance) centered on $\mu$

  **cdf**: $$\Phi \left( x ; \mu , \sigma ^ { 2 } \right) \triangleq \int _ { - \infty } ^ { x } \mathcal { N } ( z | \mu , \sigma ^ { 2 } ) d z$$

  **error function (erf)**: $$\Phi ( x ; \mu , \sigma ) = \frac { 1 } { 2 } [ 1 + \operatorname { erf } ( z / \sqrt { 2 } ) ]$$

where $z = (x - \mu)/\sigma$ and $\operatorname { erf } ( x ) \triangleq \frac { 2 } { \sqrt { \pi } } \int _ { 0 } ^ { x } e ^ { - t ^ { 2 } } d t$
 
  **central limit theorem**: sums of independent random variables have an approximately Gaussian distribution

+ Student t distribution: (more robust than normal distribution with outlier)
$$\mathcal { T } ( x | \mu , \sigma ^ { 2 } , \nu ) \propto \left[ 1 + \frac { 1 } { \nu } \left( \frac { x - \mu } { \sigma } \right) ^ { 2 } \right] ^ { - \left( \frac { \nu + 1 } { 2 } \right) }$$

where $\mu$ is the location, $\sigma > 0$ is the scale parameter, and $\nu > 0$ is the degrees of freedom: 
$$$\operatorname { mean } = \mu , \operatorname { mode } = \mu , \operatorname { var } = \frac { \nu \sigma ^ { 2 } } { ( \nu - 2 ) }$$

+ The laplace distribution (with heavy tails, or double sided exponential distribution)
$$\operatorname { Lap } ( x | \mu , b ) \triangleq \frac { 1 } { 2 b } \exp \left( - \frac { | x - \mu | } { b } \right)$$

where $\mu$ is the location, $b > 0$ is the scale parameter: $$\text { mean } = \mu , \text { mode } = \mu , \text { var } = 2 b ^ { 2 }$$

![Gaussian, Student-t and laplace](../images/2.gaussian.png)
![Gaussian, Student-t and laplace Robust to noise](../images/2.robust.png)

+ The gamma distribution: for $x > 0$:
$$\mathrm { Ga } ( T | \text { shape } = a , \text { rate } = b ) \triangleq \frac { b ^ { a } } { \Gamma ( a ) } T ^ { a - 1 } e ^ { - T b }$$

where $\Gamma(a)$ is the gamma function: $\Gamma ( x ) \triangleq \int _ { 0 } ^ { \infty } u ^ { x - 1 } e ^ { - u } d u$

$$\text { mean } = \frac { a } { b } , \text { mode } = \frac { a - 1 } { b } , \text { var } = \frac { a } { b ^ { 2 } }$$