In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
from sympy import init_printing 
from sympy import Matrix
init_printing(use_latex=True)
def out(mat, n=2): return Matrix(np.round(mat, decimals=n))
from IPython.core.display import HTML
HTML('<link href="https://fonts.googleapis.com/css?family=Cabin|Quicksand" rel="stylesheet"><style>.container{width:90% !important; font-family: "Cabin", sans-serif;}em{color: red !important;}</style><style>.output_png {display: table-cell;text-align: center;vertical-align: middle;}</style>')

# Covariance

- the covariance is a measure of the joint variability of two random variables
- if the variables tend to show similar behavior the covariance is positive
  - if the greater values of one variable mainly correspond with the greater values of the other variable
  - and the same holds for the lesser values
- the sign of the covariance therefore shows the tendency in the *linear* relationship between the variables

# Covariance

- the covariance between two jointly distributed real-valued random variables X and Y is defined as the expected product of their deviations from their individual expected values

$$\operatorname {cov} (X,Y)=\operatorname {E} {{\big [}(X-\operatorname {E} [X])(Y-\operatorname {E} [Y]){\big ]}}$$

- it is also denoted $σ_{XY}$ or $σ(X,Y)$ in analogy to variance

# Linearity of expectation

- The expected value operator (or expectation operator) $\operatorname {E}[\cdot ]$ is linear 
    - for any random variables $X$ and $Y$ and constant $a$

$${\displaystyle {\begin{aligned}\operatorname {E} [X+Y]&=\operatorname {E} [X]+\operatorname {E} [Y]
\\\operatorname {E} [aX]&=a\operatorname {E} [X]
\end{aligned}}}$$


So:
- the expected value of the sum of random variables is the sum of the expected values
- and the expected value scales linearly with a multiplicative scalar

# Covariance formulation

$${\displaystyle {\begin{aligned}\operatorname {cov} (X,Y)&=\operatorname {E} \left[\left(X-\operatorname {E} \left[X\right]\right)\left(Y-\operatorname {E} \left[Y\right]\right)\right]\\&=\operatorname {E} \left[XY-X\operatorname {E} \left[Y\right]-\operatorname {E} \left[X\right]Y+\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\right]\\&=\operatorname {E} \left[XY\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]+\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\\&=\operatorname {E} \left[XY\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\end{aligned}}}$$

# Properties of covariance

- ${\displaystyle \operatorname {cov} (X,X)=\operatorname {var} (X)\equiv \sigma ^{2}(X)\equiv \sigma _{X}^{2}}$
- when $X, Y, W$ and $V$ are real-valued random variables and $a, b, c, d$ are constants

$${\displaystyle {\begin{aligned}\operatorname {cov} (X,a)&=0\\\operatorname {cov} (X,X)&=\operatorname {var} (X)\\\operatorname {cov} (X,Y)&=\operatorname {cov} (Y,X)\\\operatorname {cov} (aX,bY)&=ab\,\operatorname {cov} (X,Y)\\\operatorname {cov} (X+a,Y+b)&=\operatorname {cov} (X,Y)\\\operatorname {cov} (aX+bY,cW+dV)&=ac\,\operatorname {cov} (X,W)+ad\,\operatorname {cov} (X,V)+bc\,\operatorname {cov} (Y,W)+bd\,\operatorname {cov} (Y,V)\end{aligned}}}$$

# Cross covariance matrix

- for random vectors $\mathbf {X} \in \mathbb {R} ^{m}$ and $\mathbf {Y} \in \mathbb {R} ^{n}$
- the $m × n$ cross covariance matrix is equal to

$${\begin{aligned}\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )&=\operatorname {E} \left[(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {Y} -\operatorname {E} [\mathbf {Y} ])^{\mathrm {T} }\right]\\&=\operatorname {E} \left[\mathbf {X} \mathbf {Y} ^{\mathrm {T} }\right]-\operatorname {E} [\mathbf {X} ]\operatorname {E} [\mathbf {Y} ]^{\mathrm {T} }\end{aligned}}$$

- the (i, j)-th element of this matrix is equal to the covariance $\operatorname {cov}(X_i, Y_j)$ between the i-th scalar component of $X$ and the j-th scalar component of $Y$

# Variance-Covariance matrix

- for a vector ${\displaystyle \mathbf {X} ={\begin{bmatrix}X_{1}&X_{2}&\dots &X_{m}\end{bmatrix}}^{\mathrm {T} }}$ of $m$ jointly distributed random variables 
- its covariance matrix (also known as the variance–covariance matrix) is


$${\displaystyle \Sigma (\mathbf {X} )=\operatorname {cov} (\mathbf {X} ,\mathbf {X} )}$$

# Properties of covariance

- if $X$ is a random vector with covariance matrix $Σ(X)$, and $A$ is a matrix
- then 

$${\displaystyle \Sigma (\mathbf {A} \mathbf {X} )=\operatorname {E} [\mathbf {A} \mathbf {X} \mathbf {X} ^{\mathrm {T} }\mathbf {A} ^{\mathrm {T} }]-\operatorname {E} [\mathbf {A} \mathbf {X} ]\operatorname {E} [\mathbf {X} ^{\mathrm {T} }\mathbf {A} ^{\mathrm {T} }]=\mathbf {A} \Sigma (\mathbf {X} )\mathbf {A} ^{\mathrm {T} }}$$

# Covariance

- the magnitude of the covariance is not easy to interpret 
  - it is not normalized and hence depends on the magnitudes of the variables
- the normalized version of the covariance is called *correlation coefficient*
  - the magnitude of the correlation coefficient shows the strength of the linear relation

# Correlation: Pearson's product-moment coefficient

- it is defined as the covariance of the two variables divided by the product of their standard deviations

$$\rho _{X,Y}=\mathrm {corr} (X,Y)={\mathrm {cov} (X,Y) \over \sigma _{X}\sigma _{Y}}={E[(X-\mu _{X})(Y-\mu _{Y})] \over \sigma _{X}\sigma _{Y}}$$

- $\rho$ is symmetric: corr(X,Y) = corr(Y,X)
- $\rho$ is +1 in the case of a perfect direct (increasing) linear relationship (correlation)
- $\rho$ is −1 in the case of a perfect decreasing (inverse) linear relationship (anticorrelation)
- as $\rho$ approaches zero there is less of a relationship (closer to uncorrelated)

# Independence

- two events are independent if the occurrence of one does not affect the probability of occurrence of the other
- two random variables are *independent* if the realization of one does not affect the probability distribution of the other



# Correlation vs dependency 

- random variables are *dependent* if they are not probabilistically independent
- correlation refers to a specific *type* of dependency: 
<br>a *linear* relationship

# Note

if the variables are independent, then the correlation coefficient is 0, <br>but the converse is not true 

<center><img src="img/rho.png" width="700"/></center>


only in the special case when X and Y are *jointly normal*
<br> then <u>uncorrelatedness is equivalent to independence

# Independence

- two random variables $X$ and $Y$ 
  - with cumulative distribution functions $F_X(x)$ and $F_Y(y)$ 
  - and probability densities $f_{X}(x)$ and $f_Y(y)$
- are independent iff the combined random variable $(X, Y)$ has a joint cumulative distribution function $F_{X,Y}(x,y) = F_X(x) F_Y(y)$
- or equivalently, if the joint density exists, $f_{X,Y}(x,y) = f_X(x) f_Y(y)$

# Independence example

- if two cards are drawn *with* replacement from a deck of cards
  - the event of drawing a red card on the first trial and that of drawing a red card on the second trial 
- are *independent*


- if two cards are drawn *without* replacement from a deck of cards
  - the event of drawing a red card on the first trial and that of drawing a red card on the second trial
- are *not independent*
  - because a deck that has had a red card removed has proportionately fewer red cards

# Uncorrelatedness and independence


- if $X$ and $Y$ are independent, then their covariance is zero
- in fact independence means
$${\displaystyle \operatorname {E} [XY]=\operatorname {E} [X]\cdot \operatorname {E} [Y]}$$
- and 
$${\displaystyle {\begin{aligned}\operatorname {cov} (X,Y)&=\operatorname {E} \left[XY\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\end{aligned}}}$$

# Uncorrelatedness and independence


- but if two variables are uncorrelated, that does not in general imply that they are independent
- e.g. if  $X$ is uniformly distributed in [−1, 1] and $Y = X^2$
  - $X$ and $Y$ are dependent, but

$${\displaystyle {\begin{aligned}\sigma (X,Y)&=\sigma (X,X^{2})\\&=\operatorname {E} [X\cdot X^{2}]-\operatorname {E} [X]\cdot \operatorname {E} [X^{2}]\\&=\operatorname {E} \!\left[X^{3}\right]-\operatorname {E} [X]\operatorname {E} [X^{2}]\\&=0-0\cdot \operatorname {E} [X^{2}]=0\end{aligned}}}$$

- the relationship between $Y$ and $X$ is non-linear, 
- while correlation and covariance are measures of *linear dependence* between two variables

# Correlation and dependence

- a *dependence* or association is any statistical relationship between two random variables
  - it can be *causal* or not

- the *correlation* indicates a specific type of dependency: a *linear* relationship 

# Correlation does not imply causation!

- when two variables are correlated one could imagine that one variable causes the other
- "correlation proves causation" is a logical *fallacy*

# B causes A (reverse causation)

- the faster windmills are observed to rotate, the more wind is observed to be
  - therefore wind is caused by the rotation of windmills
- when a country's debt rises above 90% of GDP, growth slows
  - therefore, high debt causes slow growth
  

# Third factor C (the common-causal variable) 

- sleeping with one's shoes on is strongly correlated with waking up with a headache
  - therefore, sleeping with one's shoes on causes headache
  - third factor: going to bed drunk
- as ice cream sales increase, the rate of drowning deaths increases sharply
  - therefore, ice cream consumption causes drowning
  - third factor: in summer people swim more

# The relationship between A and B is coincidental

- there is a 99.79% correlation for the period 1999-2009 between U.S. spending on science and the number of suicides by suffocation
- a bald state leader of Russia has succeeded a hairy one for 200 years