## Covariance and Correlation

- Definition: Let $(X,Y)$ be a random vector with $E(X)=\mu_X$, $Var(X)=\sigma_X^2$, $E(Y)=\mu_Y$, $Var(Y)=\sigma_Y^2$. We define

    $Cov(X,Y)=E((X-\mu_X)(Y-\mu_Y))=\int\int(x-\mu_X)(y-\mu_Y)f(x,y)dxdy$

    $Cor(X,Y)=\rho_{XY}=\frac{Cov(X,Y)}{\sigma_X\sigma_Y}$

- Measures of the strength of a linear relationship between 2 random variables
- Note: $Cov(X,X)=E((X-\mu_X)(X-\mu_X))=Var(X)$
- Theorem: $Cov(X,Y)=E(XY)-\mu_X\mu_Y$
    - Proof:

        $\begin{aligned}
        Cov(X,Y)&=E((X-\mu_X)(Y-\mu_Y))\\
                &=E(XY-X\mu_Y-Y\mu_X+\mu_X\mu_Y)\\
                &=E(XY)-\mu_X\mu_Y-\mu_X\mu_Y+\mu_X\mu_Y\\
                &=E(XY)-\mu_X\mu_Y
        \end{aligned}$

![1](16-1.jpg)

### Expand

This image illustrates the effect of different correlation ($\rho$) values on the shape of a bivariate Gaussian (or normal) distribution. 

Each row shows the distribution for different correlation values while keeping the variances ($\sigma_X$ and $\sigma_Y$) constant. The left side of each pair displays a 3D surface plot, while the right side shows the corresponding contour plot.

Here's a breakdown of the configurations:

1. **Top Row**: $\rho = 0$ (no correlation)
   - When the correlation is zero, the contours are circular or symmetric around the center because the two variables are independent. 

2. **Middle Row**: $\rho = 0.75$ (positive correlation)
   - With positive correlation, the contour elongates along the line $y = x$, indicating that as $X$ increases, $Y$ tends to increase as well. The 3D plot shows an ellipsoid tilted along this direction.

3. **Bottom Row**: $\rho = -0.75$ (negative correlation)
   - With negative correlation, the contour elongates along the line $y = -x$, meaning that as $X$ increases, $Y$ tends to decrease. The 3D plot's ellipsoid is tilted along this anti-diagonal direction.

The left and right columns within each row indicate two different variances for $X$ relative to $Y$:
- **Left Column**: $\sigma_X = \sigma_Y$
- **Right Column**: $2\sigma_X = \sigma_Y$

These variances affect the spread of the distribution. When $\sigma_X$ is twice $\sigma_Y$, the distribution elongates more along the $X$-axis.


### Continuous example 2
- 
    $f(x,y) = \left\{\begin{array}{ll}8xy&\mathrm{if~}0<y<x<1\\0&\mathrm{otherwise}\end{array}\right.$

    $f_X(x) = 4x^3I_{[0,1]}(x)$

    $f_{Y}(y) =4(y-y^3)~I_{[0,1]}(y)$

- Find the covariance and correlation for this pdf.
- First find $\mu_X$ and $\mu_Y$

    $\mu_X=\int_{0}^1 x\cdot 4x^3 dx=\frac{4}{5}x^5|_0^1=\frac{4}{5}$

    $\mu_Y=\int_{0}^1 y\cdot 4(y-y^3)dy=\frac{4}{3}y^3-\frac{4}{5}y^5|_0^1=\frac{8}{15}$

    $\begin{aligned}
    E(XY)
    &=\int\int xyf(x,y)dxdy\\
    &=\int\int 8x^2y^2dxdy,\,\big(y\in(0,x),x\in(0,1)\big)\\
    &=\int_{0}^1 8x^2 \int_{0}^x y^2 dydx\\
    &=\int_{0}^1 \frac{8}{3}x^5 dx \\
    &=\frac{8}{18}x^6|_{0}^1\\
    &=\frac{4}{9}
    \end{aligned}$

    $Cov(X,Y)=\frac{4}{9}-\frac{4}{5}\cdot\frac{8}{15}=\frac{4}{225}$

    $\sigma_X^2=\int_{0}^1 x^2\cdot 4x^3dx - (\frac{4}{5})^2=\frac{2}{75}$

    $\sigma_Y^2=\int_{0}^{1} y^2\cdot4(y-y^3)dy - (\frac{4}{5})^2=\frac{11}{225}$

    $Cor(X,Y)=\frac{\frac{4}{225}}{\sqrt{\frac{2}{75}}\sqrt{\frac{11}{225}}}=0.4924$



### 3 coins example

$\begin{array}{c|cccc|c}\mathrm{f(x,y)}&0&1&2&3&f_X(x)\\\hline0&\frac18&\frac28&\frac18&0&\frac12\\1&0&\frac18&\frac28&\frac18&\frac12\\\hline f_Y(y)&\frac18&\frac38&\frac38&\frac18\end{array}$

- $X = 0,1$ and $Y=0,1,2,3$ 
- $Y\sim Binomial(3,0.5)$ and $X\sim Bernoulli(0.5)$
    
    $\mu_X=0.5,\sigma_X=\sqrt{1\cdot 0.5\cdot(1-0.5)}=0.5$

    $\mu_Y=3\cdot 0.5=1.5, \sigma_Y=\sqrt{3\cdot 0.5\cdot 0.5}=0.5\sqrt{3}$


- Find the Covariance and Correlation for this pmf.

    $\begin{aligned}
    E(XY)&=\sum_{x,y}xyf(x,y)=\sum_{x=0}^1\sum_{y=0}^3xyf(x,y),\,(Search~table)\\
         &=0\cdot0\cdot \frac{1}{8}+0\cdot1\cdot\frac{2}{8}+0\cdot 2\cdot \frac{1}{8}+0\cdot 3\cdot 0+1\cdot 0\cdot 0\\
         &~~~~+1\cdot 1\cdot\frac{1}{8}+1\cdot 2\cdot\frac{2}{8}+1\cdot 3\cdot\frac{1}{8}=1
    \end{aligned}$

    $Cov(X,Y)=E(XY)-\mu_X\mu_Y=1-1.5\cdot 0.5=0.25$

    $Cor(X,Y)=\frac{Cov(X,Y)}{\sigma_X\sigma_Y}=\frac{0.5}{0.5\cdot 0.5\sqrt{3}}=0.577$

## Variance and covariance

- Theorem: Let $(X,Y)$ be a random vector and let $a,b$ be constants, then 
        
    $Var(aX+bY)=a^2Var(X)+b^2Var(Y)+2abCov(X,Y)$

#### Proof:
$\begin{aligned}
&Var(aX+bY)\\
&=E\big((aX+bY-E(aX+bY))^2\big)\\
&=E\big((aX+bY-a\mu_X-b\mu_Y)^2\big)\\
&=E\big(a^2(X-\mu_X)^2\big)+E\big(b^2(Y-\mu_Y)^2\big)+E\big(2ab(X-\mu_X)(Y-\mu_Y)\big)\\
&=a^2Var(X)+b^2Var(Y)+2abCov(X,Y)
\end{aligned}$

## Correlation

#### Theorem
Let $(X,Y)$ be a random vector and let $\rho_{XY}=Cor(X,Y)$. Then
- a. -1 $\leq \rho_{XY} \leq$ 1
- b. $|\rho_{XY}|$ = 1, **if and only if** there exists a constant $a \neq 0$ and $b$ such that 
    - $P(Y=aX+b)=1$

#### Proof (a): -1 $\leq \rho_{XY} \leq$ 1

$\begin{aligned}
&Var(\frac{1}{\sigma_X}X+\frac{1}{\sigma_Y}Y)\geq 0,\,(\text{Variance is always } \geq 0)\\
&\Rightarrow \frac{1}{\sigma_X^2}V(X)+\frac{1}{\sigma_Y^2}V(Y)+2\frac{1}{\sigma_X}\frac{1}{\sigma_Y}Cov(X,Y)\geq 0\\
&\Rightarrow 1+1+2\rho_{XY}\geq 0 \Rightarrow \rho_{XY}\geq -1
\end{aligned}$

$\begin{aligned}
&Var(\frac{1}{\sigma_X}X-\frac{1}{\sigma_Y}Y)\geq 0,\,(\text{Variance is always } \geq 0)\\
&\Rightarrow \frac{1}{\sigma_X^2}V(X)+\frac{1}{\sigma_Y^2}V(Y)-2\frac{1}{\sigma_X}\frac{1}{\sigma_Y}Cov(X,Y)\geq 0\\
&\Rightarrow 1+1-2\rho_{XY}\geq 0 \Rightarrow \rho_{XY}\leq 1
\end{aligned}$

#### What part (b) tells us:

- $\rho_{XY}=1$ or $\rho_{XY}=-1$ can only happen if there is an **exact linear relationship** between $X$ and $Y$
- $\rho_{XY}=1\Leftrightarrow a>0$ and $\rho_{XY}=-1\Leftrightarrow < 0$
    - sign of correlation = sign of the slope



#### Extension

- Slope a:

    $\rho_{XY}=\frac{\operatorname{cov}(X,Y)}{\sigma_X\sigma_Y}=\frac{a\cdot\sigma_X^2}{\sigma_X\cdot\sigma_Y}=\frac{a\cdot\sigma_X}{\sigma_Y}=\plusmn1$

    $\Rightarrow a=\frac{\sigma_Y}{\sigma_X}\quad\mathrm{or}\quad a=-\frac{\sigma_Y}{\sigma_X}$

## Covariance and independence
- Theorem: If $X$ and $Y$ are independent random variables then
    - $Cov(X,Y)=0$ and $\rho_{XY}=0$
- Proof:
    - Recall: If $X, Y$ are independent, $E(g(X)h(Y))=E(g(X))E(h(Y))$.

        $\begin{aligned}
        &Cov(X,Y)\\
        &=E(XY)-\mu_X\mu_Y\\
        &=E(X)E(Y)-\mu_X\mu_Y\\
        &=\mu_X\mu_Y-\mu_X\mu_Y=0
        \end{aligned}$

- But: If $Cov(X,Y)=0$, we can not get $X,Y$ are independent.
    - Example:
        $
        X = \text{Uniform}( -1, 1) \quad \text{and} \quad Y = X^2
        $

        - The expected values are:
            $
            \mathbb{E}[X] = 0, \quad \mathbb{E}[Y] = \mathbb{E}[X^2] = \frac{1}{3}
            $
        - The covariance is:
            $
            \text{cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y] = \mathbb{E}[X \cdot X^2] - 0 \cdot \frac{1}{3} = \mathbb{E}[X^3]
            $
            Since $X$ is uniformly distributed over $[-1, 1]$, $\mathbb{E}[X^3] = 0$, so the covariance $\text{cov}(X, Y) = 0$.
            
        However, $X$ and $Y$ are clearly **not independent**, because knowing $X$ gives you complete information about $Y$ (since $Y = X^2$).







## Multivariate random vectors

- $n$-dimensional random vector
    - $X=(X_1,X_2,...,X_n)$
- $f(\underline{x})=f(x_1,x_2,\cdots,x_n)$
- Most things follow naturally from the $n=2$ setting
- Example: $X=(X_1,X_2,...,X_n)$
    - (joint) marginal distribution of $(X_1,X_3,X_4)$:

        $f_{1,3,4}(x_1,x_3,x_4)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f(x_1,x_2,\cdots,x_n)dx_2dx_5$
    - (joint) conditional distribution of $(X_1,X_3)$ given $(X_2,X_4,X_5)$:

        $f(x_1,x_3|x_2,x_4,x_5)=\frac{f(x_1,x_2,\cdots,x_n)}{f(x_2,x_4,x_5)}$
    

## Mutually independent random variables
- There are some generalizations for $n>2$ to take note of.
- Definition

    Let $X=(X_1,X_2,...,X_n)$ be a random vector with joint pdf/pmf $f(x_1,x_2,...,x_n)$ and marginal pdfs/pmfs $f_1(x_1),f_2(x_2),...,f_n(x_n)$. $X_1,X_2,...,X_n$ are called mutually independent random variables if

    $f(\underline{x})=f(x_1,x_2,...,x_n)=f_1(x_1)f_2(x_2)...f_n(x_n),\forall x\in \mathbb{R}^n$

    $\Rightarrow Cov(x_i,x_j)=0,\forall i\neq j$

    - If $X_1, X_2, ..., X_n$ are random vectors with joint pdf/pmf $f(x_1,x_2,...,x_n)$ and (joint) marginal pdfs/pmfs $f_1(x_1),f_2(x_2),...,f_n(x_n)$, we say that $X_1,X_2,...,X_n$ are mutually independent random vectors if

        $f(x_1,x_2,...,x_n)=\prod_{i=1}^nf_i(x_i),\forall(x_1,x_2,...,x_n)$

## Mutually independent random varibales

- Theorem:
    - Let $X_1,X_2,...,X_n$ be mutually **independent random** variables. Then for any functions $g_1(\cdot),...,g_n(\cdot)$
        - $E(g_1(X_1)g_2(X_2)\cdots g_n(X_n))=E(g_1(X_1))E(g_2(X_2))\cdots E(g_n(X_n))$
        - $M_{X_1+X_2+\cdots+X_n}(t)=M_{X_1}(t)M_{X_2}(t)\cdots M_{X_n}(t)$
            - $=(M_{X_1}(t))^n$, if all $x_i$ has same distributions.
        - $M_{b+a_1X_1+a_2X_2+...+a_nX_n}(t)=e^{tb} M_{X_1}(a_1t)M_{X_2}(a_2t)...M_{X_n}(a_nt)$
        - $g_1(X_1),g_2(X_2),...,g_n(X_n)$ are mutually independent


- Theorem
    - $X_1,X_2,...,X_n$ are mutually independent random variables if and only if the joint pdf/pmf can be written as

        $f(x_1,x_2,...,x_n)=g_1(x_1)g_2(x_2)...g_n(x_n),\forall (x_1,x_2,...,x_n)\in\mathbb{R}^2$



### Example

- Let $X_1,X_2,...,X_n$ be **mutually independent** random variables where

    $X_i\sim N(\mu_i,\sigma_i^2),\,i=1,2,...,n$

    Find the distribution of 

    $Y = a_1X_1+a_2X_2,...,a_nX_n+b$

    Know: $M_{X_i}=e^{t\mu_i+t^2\sigma_i^2/2},~for~i=1,2,...,n$



$\begin{aligned}
&Y=a_1X_1+a_2X_2+...+a_nX_n+b\\
&\Rightarrow M_{Y}(t)=e^{tb}\prod_{i=1}^{n} M_{X_i}(at)\\
&=e^{tb}\prod_{i=1}^{n}e^{a_{i}t\mu_i+a_{i}^2t^{2}\sigma_{i}^2/2}\\
&=exp(tb+\sum_{i=1}^{n}a_it\mu_i+\sum_{i=1}^{n}a_{i}^2t^2\sigma_i^2/2)\\
&=exp(t(b+a_{i} \sum_{i=1}^{n}\mu_i)+t^2a_i^2\sum_{i=1}^{n}\sigma_i^2/2)\\
&where:~mean=b+a_{i} \sum_{i=1}^{n}\mu_i,variance=a_i^2\sum_{i=1}^{n}\sigma_i^2\\
&=mgf~of~N(b+\sum_{i=1}^{n}a_i\mu_i,\sum_{i=1}^{n}a_i^2\sigma_i^2)
\end{aligned}$

## Multivariate transformations 
#### (continuous case, one-to-one transformation)

- Let $X=(X_1,X_2,...,X_n)$ be a random vector with joint pdf $f_X(x_1,x_2,...,x_n)$
    - $\mathbb{R}^n\Rightarrow \mathbb{R}^n$
- Want the joint pdf of $U_1=g_1(\underline{X}),U_2=g_2(\underline{X}),...,U_n=g_n(\underline{X})$
    - $g(\underline{X})=(g_1(\underline{X}),g_2(\underline{X}),...,g_n(\underline{X}))$ was to be one-to-one.
- Find the inverse functions:

    $\begin{aligned}
&u_1 =g_1(x_1,x_2,\ldots,x_n) && x_{1} && =h_1(u_1,u_2,\ldots,u_n) \\
&\vdots& \Rightarrow~~~~~ & \vdots \\
&u_n =g_n(x_1,x_2,\ldots,x_n) && x_{n} && =h_n(u_1,u_2,\ldots,u_n) 
\end{aligned}$

- Then the joint pdf of $U$ is

    $f_U(u)=f_X(h_1(u),h_2(u),...,h_n(u))|J|$

    where $J$ is the jacobian:

    $\boldsymbol{J}=\det\left(\begin{bmatrix}\frac{\partial h_1(\mathbf{u})}{\partial u_1}&\cdots&\frac{\partial h_1(\mathbf{u})}{\partial u_n}\\\frac{\partial h_2(\mathbf{u})}{\partial u_1}&\cdots&\frac{\partial h_2(\mathbf{u})}{\partial u_n}\\\vdots&\ddots&\vdots\\\frac{\partial h_n(\mathbf{u})}{\partial u_1}&\cdots&\frac{\partial h_n(\mathbf{u})}{\partial u_n}\end{bmatrix}\right)$