# Sum of Real Random Variables

In [None]:
%%html
<link rel="stylesheet" type="text/css" href="../styles/styles.css">

## Learning Objectives

By the end of this lesson, you will be able to:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from mpl_toolkits.mplot3d import Axes3D

# Set style for better-looking plots
plt.style.use('seaborn-v0_8-darkgrid')
#sns.set_palette("husl")

# Set random seed for reproducibility
np.random.seed(42)

In [None]:
import sys
from pathlib import Path

# Add the "resources" directory to the path
project_root = Path().resolve().parent
resources_path = project_root / 'resources'
sys.path.insert(0, str(resources_path))

In [None]:
from multivariate import(get_viz_movie_data, viz_joint_distr_cont, covariance_demo, demo_corr_dependence, get_xy_plane, get_all_pdf_cont, demo_correlation, generate_movie_data)

## Functions of Two Random Variables

Until now we have been interested in two r.v. and we have begun to see some examples of functions of two r.v.

<div class="alert alert-success">
<h4>Definition: Function of two r.v.</h4>

<h5>Discrete case</h5>

Let $X$ and $Y$ be two discrete r.v. Let $Z = g(X, Y)$ where $g: \mathbb{R}^2 \rightarrow \mathbb{R}$.
Then:

$$\mathbb{P}_Z(z) = \mathbb{P}_Z(g(X, Y) = z) = \sum_{(x_i, y_j)\in A} \mathbb{P}_{XY}(x_i,y_j)$$

where $A = \{(x_i, y_j) \in \mathbb{R}_{XY} : \ g(x_i,y_j) = z \}$

<h5>Continuous case</h5>

In the continuous case, the CDF of $Z = g(X, Y)$ can be defined as:

$$F_Z(z) = \mathbb{P}(Z \leq z) = \mathbb{P}(g(X,Y) \leq z) = \iint\limits_D f_{XY}(x,y)dxdy$$

where $D = \{(x,y)\ | \ g(x,y) < z\}$

</div>

What if we want to find the expected value of a function of a random variable, but we don't know (or don't want to derive) the distribution of that transformed variable?

If we need to find the expectation of $Z = g(X, Y)$, we can use the following expression:

<div class="alert alert-success" >
<h4>Definition: LOTUS - Law of the Unconscious Statistician</h4>

<h5>Discrete case</h5>

Let $X$ and $Y$ be two discrete r.v. Let $Z = g(X, Y)$ where $g: \mathbb{R}^2 \rightarrow \mathbb{R}$. Then, the expectation of $Z$ is given by (Law of the Unconscious Statistician - LOTUS):

$$\mathbb{E}[Z] = \mathbb{E}[g(X,Y)] =  \sum_{(x_i, y_j)\in \mathbb{R}_{XY}} g(x_i,y_j) \mathbb{P}_{XY}(x_i,y_j)$$

<h5>continuous case</h5>

Let $(X,Y)$ be a pair of jointly continuous r.v. with PDF $f_{XY}(x,y)$. Let $Z = g(X, Y)$ where $g: \mathbb{R}^2 \rightarrow \mathbb{R}$. Then, the expectation of $Z$ is given by:

$$\mathbb{E}[Z] = \mathbb{E}[g(X,Y)] = \int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}g(x,y)f_{XY}(x,y)dxdy$$

Note that in the case where $X$ and $Y$ are independent, $f_{XY}(x,y) = f_X(x)\times f_Y(y)$.

</div>

<div class="alert example">
<h4>Calculated Example</h4>

Let $(X, Y)$ have joint PDF:
$$f_{X,Y}(x,y) = \left\{\begin{array}{ll} 2 & \text{ if } 0\leq x \leq y \leq 1 \\ 0 & \text{otherwise}\end{array}\right.$$

Find $E[XY]$.

</div>

<details>
<summary>Reveal solution</summary>

As we are looking for $E[XY]$, our function $g(X, Y) = X\cdot Y$

Using multivariate LOTUS:

$$E[XY] = \int_0^1\int_0^y xy \times 2 dxdy = 2 \int_0^1 \bigg[\frac{x^2y}{2}\bigg]_{x=0}^{x=y} dy = 2 \int_0^1 \bigg[\frac{y^2y}{2} - \frac{0^2y}{2} \bigg] dy =$$
$$= 2 \int_0^1 \frac{y^3}{2} dy = \int_0^1 y^3 dy = \bigg[\frac{y^4}{4}\bigg]_{y=0}^{y=1} = \frac{1^4}{4} - \frac{0^4}{4} = \frac{1}{4}$$

</details>

### Method of Transformations

In the case of a pair of jointly continuous r.v. $(X,Y)$, we can formulate the following theorem:

<div class="alert alert-success" style='background-color:white'>

Let $(X,Y)$ be a pair of jointly continuous r.v. Let the mapping $g : \mathbb{R}^2 \to \mathbb{R}$ be a function admitting continuous partial derivatives.

Let $(Z,W)$ be a pair of r.v. defined by:
$$(Z,W) = g(X,Y) = \left(g_1(X,Y), g_2(X,Y)\right)$$

Let $h = g^{-1}$, i.e.:
$$(X,Y) = h(Z,W) = \left(h_1(Z, W), h_2(Z, W)\right)$$

Then $(Z,W)$ is a pair of jointly continuous r.v. with joint PDF $f_{ZW}(z,w)$ defined by:
$$\forall (z, w)\in \mathbb{R}_{ZW}, \ f_{ZW}(z,w) = f_{XY}(h_1(z,w), h_2(z, w)) |J|$$

where $J$ is the Jacobian of $h$ given by:

$$J = \det \left[\begin{matrix}\frac{\partial h_1}{\partial z} & \frac{\partial h_1}{\partial w} \\ \frac{\partial h_2}{\partial z} & \frac{\partial h_2}{\partial w} \end{matrix}\right] = \frac{\partial h_1}{\partial z}\cdot \frac{\partial h_2}{\partial w} - \frac{\partial h_2}{\partial z}\cdot \frac{\partial h_1}{\partial w}$$

</div>

Following the development from [@pishro-nik_introduction_2014], consider two r.v. $X$ and $Y$ with joint PDF $f_{XY}(x,y)$. Let's define a new r.v. $Z$ as follows: $Z = X + Y$.

> Let $Z = X + Y$. What is its PDF $f_Z(z)$?

To be able to apply the previous theorem, we need two r.v. $Z$ and $W$. We can define $W = X$. In this case, the function $g$ ensures the following transformations:

$$\left\{\begin{array}{ll}z = x + y \\ w = x\end{array}\right.$$

Now let's find the inverse function $h$ for $x$ and $y$:

$$\left\{\begin{array}{ll} x = w \\ y = z - w \end{array}\right.$$

The Jacobian is therefore given by:

$$J = \det \left[\begin{matrix}\frac{\partial h_1}{\partial z} & \frac{\partial h_1}{\partial w} \\ \frac{\partial h_2}{\partial z} & \frac{\partial h_2}{\partial w} \end{matrix}\right] = \det \left[\begin{matrix}0 & 1 \\ 1 & -1\end{matrix}\right] = 0\times(-1) - 1\times 1 = -1$$

Therefore:
$$|J| = |-1| = 1$$

Then:

$$f_{ZW}(z,w) = f_{XY}(h_1(z,w), h_2(z, w)) |J| = f_{XY}(w, z-w) \times 1$$

According to the statement, we are looking for the PDF $f_Z(z)$. So it's the marginal PDF, i.e.:

$$f_Z(z) = \int\limits_{-\infty}^{\infty}f_{XY}(w,z-w)dw$$

Note that if $X$ and $Y$ are independent, then $f_{XY}(x,y) = f_X(x)f_Y(y)$. In this case:

$$f_Z(z) = \int\limits_{-\infty}^{\infty}f_{XY}(w,z-w)dw = \int\limits_{-\infty}^{\infty}f_{X}(w)f_{Y}(z-w)dw$$

This integral is called the **convolution** or **convolution product** of $f_X$ and $f_Y$. The following notation is used:

$$f_Z(z) = f_X(z) \mathbf{*}f_Y(z) = \int\limits_{-\infty}^{\infty}f_{X}(w)f_{Y}(z-w)dw = \int\limits_{-\infty}^{\infty}f_{Y}(w)f_{X}(z-w)dw$$

<div class="alert alert-success">
<h4>Definition: Distribution of the Sum of Two RV</h4>

Let $X$ and $Y$ be two independent r.v. Let $Z = X + Y$. The distribution of $Z$ is obtained by performing the **convolution product** of the distributions of $X$ and $Y$ as follows:

<h5>Discrete case</h5>

$$\forall k \in \mathbb{N}, \ \mathbb{P}(Z = k) = \sum_{i \in \mathbb{N}}\mathbb{P}(X = k-i)\times \mathbb{P}(Y=i)$$

<h5>Continuous case</h5>

Let $f_X(x)$ and $f_Y(y)$ be the PDFs of $X$ and $Y$ respectively. We obtain:

$$\forall z\in \mathbb{R}, \ f_Z(z) = \int_{\mathbb{R}}f_X(x)f_Y(z-x)dx$$

</div>


<div class="alert example">
<h4>Calculated Example</h4>

Let $X$ and $Y$ be two discrete r.v.

$X$ is defined by:

| $x$ | $-1$ | $1$ |
|--:|:--:|:--:|
|$\mathbb{P}(X=x)$ | $1/3$ | $2/3$ |

$Y$ is defined by:

| $y$ | $-2$ | $0$ | $2$ |
|--:|:--:|:--:|:--:|
|$\mathbb{P}(Y=y)$ | $2/6$ | $1/6$ | $3/6$|

What is the probability that $X+Y$ equals 1?

</div>

<details>
<summary>Reveal solution</summary>

Let's look at the cases where the sum $X + Y$ will equal $1$:

| $\downarrow X\ Y \rightarrow$| $-2$ | $0$ | $2$ |
|--:|:--:|:--:|:--:|
|$-1$ | $-3$ | $0$ | $\mathbf{1}$|
|$1$ | $-1$ | $\mathbf{1}$ | $3$|

The two possibilities are:

* $X = -1$ and $Y = 2$
* $X = 1$ and $Y = 0$

Then:

$$\mathbb{P}_{X+Y}(1) = \mathbb{P}_X(X=-1)\times \mathbb{P}_Y(Y=2) + \mathbb{P}_X(X=1)\times \mathbb{P}_Y(Y=0) =$$
$$= \frac{1}{3}\times \frac{3}{6} + \frac{2}{3} \times \frac{1}{6} = \frac{1}{6} + \frac{1}{9} = \frac{5}{18}$$

</details>

### Special Cases

<div class="alert alert-success" style='background-color:white'>
<h4>Special Case: Sum of Normal Distributions</h4>

Let $X\sim \mathcal{N}(\mu_X, \sigma_X^2)$ and $Y\sim \mathcal{N}(\mu_Y, \sigma_Y^2)$. Let $X$ and $Y$ be independent. Then:
$$X + Y \sim \mathcal{N}(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2)$$

</div>

<div class="alert alert-success" style='background-color:white'>
<h4>Special Case: Sum of Poisson Distributions</h4>

Let $X\sim \mathcal{P}(\lambda)$ and $Y\sim \mathcal{P}(\mu)$. Let $X$ and $Y$ be independent. Then:
$$X + Y \sim \mathcal{P}(\lambda + \mu)$$

</div>

## Functions of $n$ Random Variables

<div class="alert alert-success">
<h4>Definition: LOTUS for the Function of n r.v.</h4>

Let $\mathbf{X} = (X_1, ..., X_n)$ be a random vector defined on the probability space $(\Omega, \mathcal{A}, \mathbb{P})$.

Let $Z = h(X_1,...,X_n)$ where $h:\mathbb{R}^n \to \mathbb{R}$ is a bounded and piecewise continuous function.

<h5>Discrete case</h5>

Let $\mathbf{X}$ have values in $D = \mathbf{X}(\Omega) \subset \mathbb{R}^n$ a finite or countable subset.

Then, the expectation $\mathbb{E}Z$ is given by:

$$\mathbb{E}Z = \sum\limits_{k=(k_1,...,k_n)\in D} h(k_1,...,k_n)\mathbb{P}(X_1 = k_1, ..., X_n = k_n)$$

<h5>Continuous case</h5>

Let $f_\mathbf{X} : \mathbb{R}^n \to \mathbb{R}$ be the PDF of $\mathbf{X}$.

Then, the expectation $\mathbb{E}Z$ is given by:

$$\mathbb{E}Z = \int_{\mathbb{R}^n}h(t_1,...,t_n)f_{\mathbf{X}}(t_1,...,t_n) dt_1...dt_n$$

when this integral exists.

</div>

<div class="alert alert-primary">
<h4>🤖 ML Application: Expected Loss Computation</h4>

Problem: Training a neural network with custom loss.

Given:

- Model prediction: $\hat{Y} = f(X; \theta)$
- True label: $Y$
- Loss function: $L(\hat{Y}, Y) = (\hat{Y} - Y)^2 + \lambda|\hat{Y}|$

Goal: Compute expected loss $E[L(\hat{Y}, Y)]$

Using LOTUS:
$E[L(\hat{Y}, Y)] = \int\int L(\hat{y}, y) · f_{\hat{Y}Y}(\hat{y}, y) d\hat{y} dy$

No need to derive the distribution of $L$ itself!

Practical Application:

- Gradient descent minimizes $E[L]$ over $\theta$
- LOTUS justifies computing gradients on individual samples (stochastic gradient descent)
- Each mini-batch gives an unbiased estimate of $E[∇L]$

</div>

<div class="alert example">
<h4>Calculated Example: Feature Engineering Challenge</h4>

You're building a fraud detection model with 3 features:

- $X_1 = \text{transaction amount (\$)}$
- $X_2 = \text{time since last transaction (hours)}$
- $X_3 = \text{distance from usual location (km)}$

Your ML team proposes a "risk score":
$R = 0.01X_1 + 10/X_2 + 0.5X_3$

Assume the joint distribution is known: $f_{X_1X_2X_3}(x_1, x_2, x_3)$.

Tasks:

1. Write the formula for $E[R]$ using LOTUS
2. If you only cared about transaction amount, write the formula for the marginal density $f_{X_1}(x_1)$
3. Explain why LOTUS is helpful here (vs. deriving the distribution of $R$)

</div>

<details>
<summary>Reveal solution</summary>

1. $E[R] = \int\int\int (0.01x_1 + 10/x_2 + 0.5x_3) · f_{X_1X_2X_3}(x_1, x_2, x_3) dx_1 dx_2 dx_3$

2. $f_{X_1}(x_1) = \int_0^\infty\int_0^\infty f_{X_1X_2X_3}(x_1, x_2, x_3) dx_2 dx_3$

3. LOTUS is helpful because:

- $R$ is a complex nonlinear function (especially the $10/X_2$ term)
- Finding the distribution of $R$ would require transformation techniques
- LOTUS lets us compute $E[R]$ directly without finding the distribution of $R$
- In practice, we can estimate $E[R]$ from data without ever knowing $f_R$

</details>

<div class="alert alert-success">
<h4>Definition: Variance of the Sum of <strong>n</strong> r.v.</h4>

Let $Z = X_1 + ... + X_n$. Then the **variance** can be calculated as follows:

$$Var(Z) = Cov\left(\sum_{i=1}^{n}X_i, \ \sum_{j=1}^{n}X_j\right) = \sum_{i=1}^{n}\sum_{j=1}^{n}Cov(X_i, X_j) = \sum_{i=1}^{n}Var(X_i) + 2\sum_{i<j}Cov(X_i, X_j)$$

</div>

### Special Cases

<div class="alert alert-success"  style='background-color:white'>
<h4>Special Case: Sum of Normal Distributions</h4>

Let $X_1, X_2, ..., X_n$ be $n$ mutually independent random variables, where each $X_k, \ \forall k \in \{1,...,n\}$ follows a normal distribution $\mathcal{N}(\mu_k, \sigma_k^2)$. 

Then, the random variable defined as the sum $\sum_{i=1}^n X_i$ follows a normal distribution $\mathcal{N}(\mu, \sigma^2)$, where:

* $\mu = \sum\limits_{k=1}^{n}\mu_k$
* $\sigma^2 = \sum\limits_{k=1}^{n}\sigma^2_k$

If $\forall k=\{1,...,n\}$, $X_k\sim\mathcal{N}(0,1)$, then $T = \sum\limits_{k=1}^{n}X_k^2$ follows the Chi-squared distribution with $n$ degrees of freedom, denoted $\chi^2_n$ or $\chi^2(n)$.

</div>