# Probability and Distributions

## Nomenclature for probability distributions

| Type | "Point probability" | "Interval probability" |
| --- | --- | --- |
| Discrete | *P(X = x)* <br>Probability mass function (pmf) | Not applicable |
| Continuous | *p(x)* <br>Probability density function (pdf) | *P(X $\it \le$ x)* <br>Cumulative distribution function |

## Sum Rule, Product Rule and Bayes' Theorem

Sum rule:

$$
p(x) = \begin{cases}
    \sum p({\bf x},{\bf y}) &  if  \ {\bf y} \ is \ discrete \\
    \int _Y ^{y \in Y} p(x,y)dy & if \ {\bf y} \ is \ continuous
\end{cases}
$$

Product rule:

$$
p({\bf x},{\bf y}) = p({\bf y}|{\bf x})p({\bf x})
$$

Bayes' theorem:

$$
p({\bf x},{\bf y}) = \frac {p({\bf y}|{\bf x})p({\bf x})}{p({\bf y})}
$$

## Means and Covariances

**Expected Value:** The expected value of a function $g : \mathbb{R} \rightarrow \mathbb{R}$ of a univariate continuous random variable $X \sim p(x)$ is given by:

$$
\mathbb{E}_X [g(x)] = \int _X g(x)p(x)dx
$$

The expected value of a function g of a discrete random variable $X \sim p(x)$ is given by

$$
\mathbb{E}_X [g(x)] = \sum \limits _{x \in X} g(x)p(x)
$$

where X is the set of possible outcomes (the target space) of the random variable X.

**Mean:** The mean of a random variable X with states $x \in \mathbb{R}^D$ is an average and is defined as:

$$
\mathbb{E} _{x_d} [x_d] := \begin{cases}
    \int _X x_d p(x_d)dx_d & \textrm{if X is a continuous random variable} \\
    \sum \limits _{x_i \in X} x_i p(x_d=x_i) & \textrm{if X is a discrete random variable}
\end{cases}
$$

**Univariate Covariance:** The covariance between two univariate random variables $X,Y \in \mathbb{R}$ is given by the excpected product of their deviations from their respective means:

$$
Cov[x,y] = \mathbb{E}[xy] - \mathbb{E}[x]\mathbb{E}[y]
$$
 
**Multivariate Covariance:** For two multivariate variables $X$ and $Y$ with states $x \in \mathbb{R}^D$ and $y \in \mathbb{R}^E$, the covariance between $X$ and $Y$ is defined as:

$$
Cov[x,y] = \mathbb{E}[xy^T] - \mathbb{E}[x]\mathbb{E}[y]^T = Cov[y,x]^T \in \mathbb{R}^{D \times E}
$$

**Variance:** The variance of a random variable $X$ with states $x \in \mathbb{R}^D$ and a mean vector $\mu \in \mathbb{R}^D$ is defined as:

$$
\mathbb{V}_X [x] = Cov_X [x,x] = \mathbb{E} _X [(x - \mu)(x - \mu)^T] = \mathbb{E}_X[x^{}x^T] - \mathbb{E}_X [x] \mathbb{E}_X [x]^T 
$$

**Correlation:** The correlation between two random variables $X, Y$ is given by:

$$
corr[x,y] = \frac{Cov[x,y]}{\sqrt{\mathbb{V}[x]\mathbb{V}[y]}} \in [-1,1]
$$

## Empirical Means and Covariances

**Empirical Mean:** The empirical mean vector is the arithmetic average of the observations for each variable and it is defined as:

$$
{\bar x} := \frac{1}{N} \sum \limits _{n=1}^{N} x_n
$$
 where $x_n \in \mathbb{R}^D$.

**Empirical Covariance:** The empirical covariance matrix is a $D \times D$ matrix.

$$
\sum := \frac{1}{N} \sum \limits _{n=1}^{N} (x_n-{\bar x})(x_n-{\bar x})^T
$$

**Variance:**

$$
\mathbb{V}_X[x] := \mathbb{E}_X[(x - \mu)^2] = \mathbb{E}[x^2] - (\mathbb{E}_X[x])^2
$$

## Sums and Transformations of Random Variables

For two random variables $X,Y$ with states $x,x \in \mathbb{R}^D$ the following holds:

$$
\mathbb{E}[x+y] = \mathbb{E}[x] + \mathbb{E}[y] \\
\mathbb{E}[x-y] = \mathbb{E}[x] - \mathbb{E}[y] \\
\mathbb{V}[x+y] = \mathbb{V}[x] + \mathbb{V}[y] + Cov[x,y] + Cov[y,x] \\
\mathbb{V}[x-y] = \mathbb{V}[x] + \mathbb{V}[y] - Cov[x,y] - Cov[y,x]
$$

## Statistical Independence

Two random variables $X,Y$ are statistically independent if following is true:

$$
p(x,y) = p(x)p(y)
$$

More intuitively, the following yields:

$$
p(y|x) = p(y) \\
p(x|y) = p(x) \\
\mathbb{V}_{X,Y}[x+y] = \mathbb{V}_X[x] + \mathbb{V}_Y[y] \\
Cov_{X,Y}[x,y] = 0
$$

## Sources

- [Deisenroth., M., Faisal, A., Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge.](https://mml-book.github.io/book/mml-book.pdf)