In [11]:
#Name: Chaeyoon Kim
#City Email: Chaeyoon.Kim@city.ac.uk
#Chris Bishop, "Pattern Recognition and Machine Learning", Springer, 2006 (https://g.co/kgs/CsLSX8)

import numpy as np

# Intro

Let $\mathbf{X}$ be a Random Variable that tells you the side of a coin,
$R_{\mathbf{x}} = \left\{\text{Heads}, \text{Tails} \right\}$

The proability of all possible values $P_{\mathbf{x}} =\left\{\begin{array}{rl}{0.5} & {\text{ if } x=Heads} \\ {0.5} & {\text{ if } x=Tails} \end{array}\right\}$

Let $\mathbf{X}$ be a Random Variable that tells you the side of two coins,
$R_{\mathbf{x}} = \left\{\left(\begin{array}{l} {T} \\ {T}\end{array}\right), \left(\begin{array}{l} {T} \\ {H}\end{array}\right), \left(\begin{array}{l} {H} \\ {T}\end{array}\right), \left(\begin{array}{l} {H} \\ {H}\end{array}\right)\right\}$

The proability of all possible values $P_{\mathbf{x}} = \left\{0.25, 0.25, 0.25, 0.25\right\}$

Let $\mathbf{X}$ be a Random Variable which is the winner of the 2020 presidential election. The possible values that random value can take, $R_{\mathbf{x}} = \left\{ A, B \right\}$

The proability of all possible values $P_{\mathbf{x}} =\left\{\begin{array}{rl}{0.1} & {\text{ if } x=A} \\ {0.9} & {\text{ if } x=B} \end{array}\right\}$

Let $\mathbf{X}$ be a Random Variable to answer 'Is it going to rain on Wednesday?'. The range of values R.V X can take, $R_{\mathbf{x}} = \left\{ \text{rain}, \text{sunny}, \text{snow} \right\}$

The proability of all possible values $P_{\mathbf{x}} =\left\{\begin{array}{rl}{0.3} & {\text{ for } x=rain} \\ {0.3} & {\text{ for } x=sunny} \\ {0.4} & {\text{ for } x=snow} \end{array}\right\}$

$\mathbb{E}[\mathbf{X}] = \sum_{x \in R_{\mathbf{x}}}{xP_{\mathbf{x}}(x)} = \sum_{x \in {1,2,3}}{P({\mathbf{X}}=x)} = P({\mathbf{x}}=1)+P({\mathbf{x}}=2)+P({\mathbf{x}}=3)=0.3+0.3+0.4 = 1$

# Expectations

Suppose $X=\left\{1, 2, 2, 3, 3, 3\right\}$. The average value of some function $f(x)$ under a probability distribution $p(x)$ is called the expectation of $f(x)$ and denoted by $\mathbb{E}[f]$. The weighted averages of functions, $\mathbb{E}[\mathbf{X}] = 1\times\frac{3}{6}+2\times\frac{2}{6}+3\times\frac{1}{6}=1.3$

All distributions satisfy the definition of Expectation($\mathbb{E}$) which directs linearity, <br>
$g:\mathbb{R} \rightarrow \mathbb{R}$, $g$ is linear if the following two axioms hold: <br>
(1) for $g({\lambda}x) \text{ where } \lambda\in\mathbb{R}, g({\lambda}x)={\lambda}{g(x)}$ <br>
(2) for $a, b \in \mathbb{R}$, $g(a+b) = g(a) + g(b)$

By definition, the Expectation($\mathbb{E}[\mathbf{X}]$) is linear with two axioms of linearity: <br>
(1) $\mathbb{E}[\lambda\mathbf{X}]= \sum_{x \in \mathbb{R}_\mathbf{x}}{\lambda}{x}{P_{\mathbf{x}}(x)} = {\lambda}\sum_{x \in \mathbb{R}_\mathbf{x}}{x}{P_{\mathbf{x}}(x)} = \lambda\mathbb{E}[\mathbf{X}]$ <br>
(2) $\mathbb{E}[x+y] = \sum_{x,y \in \mathbb{R}_\mathbf{x},\mathbb{R}_\mathbf{y}}(x+y){P_{\mathbf{x},\mathbf{y}}(x+y)} = \sum_{x,y \in \mathbb{R}_\mathbf{x},\mathbb{R}_\mathbf{y}}{xP_{\mathbf{x}}(x)}+{yP_{\mathbf{y}}(y)} = \sum_{x \in \mathbb{R}_\mathbf{x}}{x}{P_{\mathbf{x}}(x)}$ + $\sum_{y \in \mathbb{R}_\mathbf{y}}{y}{P_{\mathbf{y}}(y)} = {\mathbb{E}[x]}+{\mathbb{E}[y]}$

# Covariances

with respect to probability, <br>
$Cov(\mathbf{x},\mathbf{y}) = \mathbb{E}_{\mathbf{x},\mathbf{y}}[(\mathbf{x}-\mathbb{E}_\mathbf{x}[\mathbf{X}])(\mathbf{y^\top}-\mathbb{E}_\mathbf{y}[\mathbf{y^\top}])]$ <br><br>
Once I get Expectation, it is no longer random variable. Therefore, <br>
$\qquad\qquad = \mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbf{x}\mathbf{y^\top} - \mathbf{x}\mathbb{E}_\mathbf{y}[\mathbf{y^\top}] - \mathbb{E}_\mathbf{x}[\mathbf{x}]\mathbf{y^\top} + \mathbb{E}_\mathbf{x}[\mathbf{x}]\mathbb{E}_\mathbf{y}[\mathbf{y^\top}]]$ <br><br>
$\qquad\qquad = \mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbf{x}\mathbf{y^\top}] - \mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbf{x}\mathbb{E}_\mathbf{y}[\mathbf{y^\top}]] - \mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbb{E}_\mathbf{x}[\mathbf{x}]\mathbf{y^\top}] + \mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbb{E}_\mathbf{x}[\mathbf{x}]\mathbb{E}_\mathbf{y}[\mathbf{y^\top}]]$ <br><br>

(1) $\mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbf{x}\mathbf{y^\top}] = \mathbb{E}[\left[\begin{array}{rrr}{\mathbf{x}_1\mathbf{y}_1} & {\mathbf{x}_1\mathbf{y}_2} \\ {\mathbf{x}_2\mathbf{y}_1} & \mathbf{x}_2\mathbf{y}_2 \end{array} \right]] \text{ [note: if }\mathbb{E}[\mathbf{X}] = \frac{1}{2} \text{ then } \mathbb{E}\left[\begin{array}{rrr}{\mathbf{X}_1} \\ {\mathbf{X}_2} \end{array}\right] = \left[\begin{array}{rrr}{\mathbb{E}[\mathbf{X}_1]} \\ {\mathbb{E}[\mathbf{X}_2]} \end{array}\right] = \left[\begin{array}{rrr}{\frac{1}{2}} \\ {\frac{1}{2}} \end{array}\right]]$<br>
(2) $\mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbf{x}\mathbb{E}_\mathbf{y}[\mathbf{y^\top}]] = \mathbb{E}[\mathbf{x}]\mathbb{E}[\mathbf{y^\top}]$ <br>
(3) $\mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbb{E}_\mathbf{x}[\mathbf{x}]\mathbf{y^\top}] = \mathbb{E}[\mathbf{x}]\mathbb{E}[\mathbf{y^\top}]$ <br><br>

$\qquad\qquad = \mathbb{E}_{\mathbf{x},\mathbf{y}}[\mathbf{x}\mathbf{y^\top}] - \mathbb{E}_\mathbf{x}[\mathbf{x}]\mathbb{E}_\mathbf{y}[\mathbf{y^\top}]$

For two random variable $\mathbf{x} \text{ and } \mathbf{y}$, $-1 \leq Cov(\mathbf{x},\mathbf{y}) = \frac{Cov(x,y)}{\sqrt{Var(\mathbf{x})Var(\mathbf{y})}} \leq 1$

$Cov(\mathbf{x},\mathbf{y}) $=$ \left[\begin{array}{rrr}{Cov(\mathbf{x}_1\mathbf{y}_1)} & {Cov({\mathbf{x}_1\mathbf{y}_2})} \\ {Cov({\mathbf{x}_2\mathbf{y}_1})} & {Cov(\mathbf{x}_2\mathbf{y}_2)} \end{array}\right]  = \mathbb{E}_{\mathbf{x},\mathbf{y}}[(\mathbf{x}-\mathbb{E}_\mathbf{x}[\mathbf{X}])(\mathbf{y^\top}-\mathbb{E}_\mathbf{y}[\mathbf{y^\top}])]$

## compute covariance for vectors

if $\mathbf{x} = [1, 2, 3], \mathbf{y} = [2, 3, 1]$, then $\mathbf{x}\mathbf{y}^\top=11 \text{(scalar)}$, $\mathbb{E}[\mathbf{x}\mathbf{y}^\top]=11 \text{(scalar)}$ <br>
if $\mathbf{x} =\left[\begin{array}{rrr}{{1} \\ {2} \\ {3}} \end{array}\right]$, $\mathbf{y} =\left[\begin{array}{rrr}{{2} \\ {3} \\ {1}} \end{array}\right]$, then $\mathbf{x}\mathbf{y}^\top = \left[\begin{array}{rrr}{2} & {3} & {1} \\ {4} & {6} & {2} \\ {6} & {9} & {3} \end{array}\right] (3 \times 3 \text{ matrix})$

In [10]:
x = [[1],[2],[3]]
y = [[2],[3],[1]]
xy = np.matmul(x,np.transpose(y))
xy

array([[2, 3, 1],
       [4, 6, 2],
       [6, 9, 3]])