$$
\newcommand{theorem}{\textbf{Theorem: }}
\newcommand{proof}{\textbf{Proof: }}
$$

# Two dimensional random variables
There are experiments where we are interested in more than 1 random variable.
For instance, a researcher might want to know the distribution of height and weight of people in a population.

Thus, instead of only having a single random variable $X$, we now are interested in $X$ and $Y$, which both assigns some value to each $s \in S$.

Our range space is now reformulated to be:
$$
R_{X,Y} = \left\{(x,y) | x = X(s), y = Y(s) , s \in S\right\}
$$

Note that if we wish, we can also look at more than 2 random variables.
The notation would be $X_1, X_2, \dots, X_n$ are our random variables of interest.

## Joint probability density function
As with the single random variable case, we define a function ($f_{X,Y}(x,y)$) which maps a probability to each value of the 2 dimensional random variable.

### Discrete

Similar to the one-dimensional case, the function must satisfy:
* $f_{X,Y}(x_i, y_j) \geq 0$ for all $(x_i, y_j) \in R_{X,Y}$
* $\sum_i \sum_j f_{X,Y}(x_i, y_j) = 1$

### Continuous

Similarly, the function must satisfy:
* $f_{X,Y}(x_i, y_j) \geq 0$ for all $(x_i, y_j) \in R_{X,Y}$
* $\int \int_{(x,y) \in R_{X, Y}} f_{X,Y}(x, y) dx dy = 1$

## Marginal probability distribution

The **marginal probability distribution** or $X$ and $Y$ are defined as follows:

### Discrete
$$
f_X(x) = \sum_y f_{X,Y}(x,y) \quad 
f_Y(y) = \sum_x f_{X,Y}(x,y)
$$

### Continuous
$$
f_X(x) = \int^\infty _{-\infty} f_{X,Y}(x,y) dy\quad 
f_Y(y) = \int^\infty _{-\infty} f_{X,Y}(x,y) dx
$$

This can be viewed as "squashing" the 2-D table of probability of 2 variables into a 1-D table with 1 variable.

Suppose that we had a census which gives us the distribution of height and weight of people.
But we only care about the distribution of heights.
We can use the above operation to obtain our desired distribution.

## Conditional distribution
The **conditional distribution** of $X$ given that $Y =y$ is defined as:
$$
f_{X | Y}(x | y)= \frac{f_{X,Y}(x,y)}{f_Y(y)} \quad \text{if } f(Y)(y) > 0
$$

We define similarly for $f_{Y,X}(y|x)$

Note that marginal/conditional probability distribution both satisfy the [conditions for the single variable case](./random_variables.ipynb#variable-props).

Conditional distribution also follows the *Bayes Law*

$$
f_{X,Y}(x,y) = f_{Y|X} (y|x) f_X(x)
$$

**Example**

A way of contextualizing conditional distribution is suppose we have a census on the height and weight of people, but we wish to find the distribution of weight of all people who is 1.6m tall.
<span hidden>TODO: add example <span/>

## Independent random variable
Random variable $X,Y$ are independent if and only if:
$$
f_{X, Y}(x,y) = f_X(x)f_Y(x) \quad \textbf{for all } x,y
$$

## Expectation
(For brevity, we jump straight to discussing ["expectation with transformation"](./random_variables.ipynb#function-expectation))

The expectation $g(X,Y)$ is defined as 

### Discrete
    
$$
E(g(X,Y)) = \sum_x \sum_y g(x,y) f_{X,Y}(x,y)
$$

### Continuous

$$
E(g(X,Y)) = \int^\infty_{-\infty}\int^\infty_{-\infty} g(x,y) f_{X,Y}(x,y) dx dy
$$

## Covariance
The **covariance** of $(X,Y)$ is defined as
$$
Cov(X,Y) = E((X-\mu_X)(Y-\mu_Y))
$$

This is equivalent to setting $g(X,Y) = (X-\mu_X)(Y-\mu_Y)$

### Discrete    

$$
Cov(X,Y) = \sum_x \sum_y (x-\mu_X)(y-\mu_Y) f_{X,Y}(x,y)
$$

### Continuous

$$
Cov(X,Y) = \int^\infty_{-\infty}\int^\infty_{-\infty} (x-\mu_X)(y-\mu_Y) f_{X,Y}(x,y) dx dy
$$

### Properties
* $Cov(X,Y) = E(XY) - \mu_x \mu _Y$
* If $X$ and $Y$ are independent, then $Cov(X,Y) = 0$
    * However, the converse is not true
* $Cov(aX+b, cY+d) = ac(Cov(X, Y))$
* $V(aX + bY) = a^2V(X) + b^2V(Y) + 2abCov(X,Y)$

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
<div style="background: aliceblue">
$$
\begin{align}
Cov(X, Y) &= \sum _x \sum_y (x - \mu_X)(y-\mu_Y) f_{X, Y}(x, y) &\\
&= \sum_x \sum_y (xy - y\mu_X-x\mu_Y + \mu_X \mu_Y) f_{X, Y}(x, y) &\\
&= \sum _x \sum_y xy f_{X, Y}(x, y) - \sum _x \sum_y y\mu_X f_{X, Y}(x, y) -\sum_x\sum_y x\mu_Y f_{X, Y}(x, y) + \sum_x\sum_y\mu_X \mu_Y f_{X, Y}(x, y) &\\
&= E(XY) - \sum_x \sum_y y\mu_X f_{X, Y}(x, y) -\sum_x\sum_y x\mu_Y f_{X, Y}(x, y) + \sum_x\sum_y\mu_X \mu_Y f_{X, Y}(x, y) &\\
&= E(XY) -\mu_X \sum_x \sum_y y f_{X, Y}(x, y) -\mu_Y\sum_x\sum_y x f_{X, Y}(x, y) +\mu_X \mu_Y \sum_x\sum_y f_{X, Y}(x, y) &\\
&= E(XY) -\mu_X \sum_y \sum_x y f_{X, Y}(x, y) -\mu_Y\sum_x f_{X}(x) +\mu_X \mu_Y &\\
&= E(XY) -\mu_X \sum_y y f_{Y}(y) -\mu_Y\mu_X +\mu_X \mu_Y &\\
&= E(XY) -\mu_X \mu_Y -\mu_Y\mu_X +\mu_X \mu_Y &\\
&= E(XY) -\mu_X \mu_Y &\\
& QED
\end{align} \\
$$

---    
 
$$
\begin{align}
Cov(X, Y) &= E(XY) -\mu_X \mu_Y \\
&=\sum _x \sum _y xy f_{X,Y} (x,y) - \mu_X \mu_Y \\
&=\sum _x \sum _y xy f_X(x) f_Y(y) - \mu_X \mu_Y \\
&=\sum _x x f_X(x) \sum _y y f_Y(y) - \mu_X \mu_Y \\
&=\mu_X \mu_Y - \mu_X \mu_Y \\
&=0 \\
& QED
\end{align}      
$$
        
---

Consider the following distribution
        
| $$f_{X, Y}(x,y)$$ |  $$x = -2$$ | $$x=0$$ | $$x = 2$$ |
| --- | --- | --- | --- |
| $$y = -1$$ | 0 | 0.2  | 0  |
| $$y = 0$$ | 0.2  | 0.2 | 0.2  |
| $$y = 1 $$ | 0  | 0.2 | 0  |

        
We can see that
$$
Cov(X, Y) = E(XY) -\mu_X \mu_Y \\
= 0 - 0 \\
= 0
$$

However, $f_{X}(-2) = 0.2$, $f_{Y}(0) = 0.2 + 0.2 +0.2 = 0.6$, $f_{X,Y}(-2, 0) = 0.2 \neq f_X(-2) f_Y(0) = (0.2)(0.6)$.

Therefore, this forms our simple counter example.

---

$$
\begin{align}
Cov(aX + b, cY + d) &= E\left( (aX+b - \mu_{aX + b}) (cY+d - \mu_{cY + d}) f_{X,Y}(x,y) \right) \\
&= E\left((aX+b-(a \mu_X + b)) (cY+d - (c\mu_Y + d))f_{X,Y}(x,y) \right) \\
&= E\left((aX-a \mu_X) (cY - c\mu_Y)f_{X,Y}(x,y) \right) \\
&= E\left(a(X- \mu_X) c(Y - \mu_Y)f_{X,Y}(x,y) \right) \\
&= acE\left((X- \mu_X)(Y - \mu_Y)f_{X,Y}(x,y) \right) \\
&= acCov(X, Y) \\
&= QED
\end{align}
$$
    
$$
\begin{align}
V(aX + bY) &= E\left( (aX + bY - \mu_{aX + bY})^2 \right) \\
&= E\left( (a^2 X^2 + b^2Y^2 + \mu_{aX + bY}^2 + 2abXY - 2(aX + bY) (\mu_{aX+bY}) \right) \\
&= E\left( (a^2 X^2 + b^2Y^2 + (\mu_{aX} + \mu_{bY})^2 + 2abXY - 2(aX + bY) (\mu_{aX}+\mu_{bY})) \right) \\
&= E( (a^2 X^2 + b^2Y^2 + (\mu^2_{aX} + \mu^2_{bY} + 2\mu_{aX} \mu_{bY}) + 2abXY - \\
& \qquad 2(aX\mu_{aX} + bY\mu_{aX}+ aX\mu_{bY} + bY\mu_{bY})) \\
&= E\left( a^2 X^2 - 2 aX\mu_{aX} +  \mu^2_{aX}\right) 
+ E\left(b^2Y^2 - 2bY\mu_{bY} + \mu^2_{bY}\right) \\
& \qquad
+ E \left( 2\mu_{aX} \mu_{bY} + 2abXY - 2(aX\mu_{bY} + bY\mu_{aX}) \right) \\
&= V(aX) + V(bX) +  E \left( 2\mu_{aX} \mu_{bY} + 2abXY - 2(aX\mu_{bY} + bY\mu_{aX})\right) \\
&= V(aX) + V(bX) +  E \left( 2a\mu_{X} b\mu_{Y} + 2abXY - 2(abX\mu_{Y} + abY\mu_{X})\right) \\
&= V(aX) + V(bX) + 2abCov(X, Y) \\ 
& QED
\end{align}
$$

</div>
</details>

## Correlation coefficient
The **correlation coefficient** of $X,Y$ is defined as follows
$$
Cor(X,Y) = \rho_{X,Y} = \frac{Cov(X,Y)}{\sqrt{V(X)} \sqrt{V(Y)}}
$$

### Properties
* $-1 \leq \rho_{X,Y} \leq 1$
* It measures the degree of linear relationship between $X$ and $Y$
* If $X$ and $Y$ are independent, then $\rho_{X,Y} = 0$
    * The converse is not true
    

<span hidden> TODO: add proof <span/>
