# Conditional Distribution of Y Given X

The conditional distribution of a random variable $Y$ given another random variable $X$ is a fundamental concept in probability and statistics. It describes the probability distribution of $Y$ when $X$ is known to take a specific value. Let's denote the conditional distribution of $Y$ given $X = x$ as $Y | X = x$.

### Discrete Case
If $X$ and $Y$ are discrete random variables, the conditional probability mass function (pmf) of $Y$ given $X = x$ is defined as:

$ P(Y = y | X = x) = \frac{P(Y = y, X = x)}{P(X = x)} $

where $ P(Y = y, X = x) $ is the joint probability of $Y = y$ and $X = x$, and $ P(X = x) $ is the marginal probability of $X = x$.

### Continuous Case
If $X$ and $Y$ are continuous random variables, the conditional probability density function (pdf) of $Y$ given $X = x$ is defined as:

$ f_{Y|X}(y|x) = \frac{f_{Y,X}(y,x)}{f_X(x)} $

where $ f_{Y,X}(y,x) $ is the joint pdf of $Y$ and $X$, and $ f_X(x) $ is the marginal pdf of $X$.

### General Case
In a more general setting, whether $X$ and $Y$ are discrete, continuous, or mixed, the concept of conditional expectation is useful. The conditional expectation of $Y$ given $X = x$, denoted $E[Y | X = x]$, is a key tool in understanding the conditional distribution.

#### Conditional Expectation
The conditional expectation is given by:

$ E[Y | X = x] = \int_{-\infty}^{\infty} y f_{Y|X}(y|x) \, dy $

in the continuous case, or

$ E[Y | X = x] = \sum_{y} y P(Y = y | X = x) $

in the discrete case.

#### Conditional Cumulative Distribution Function (CDF)
The conditional cumulative distribution function (CDF) of $Y$ given $X = x$ is defined as:

$ F_{Y|X}(y|x) = P(Y \leq y | X = x) $

### Properties
1. **Law of Total Probability**:
   $ P(Y = y) = \sum_{x} P(Y = y | X = x) P(X = x) $
   for discrete random variables, or
   $ f_Y(y) = \int_{-\infty}^{\infty} f_{Y|X}(y|x) f_X(x) \, dx $
   for continuous random variables.

2. **Law of Total Expectation (Adam's Law)**:
   $ E[Y] = E[E[Y | X]] $

### Example: Normal Distribution
A common example is when $X$ and $Y$ are jointly normally distributed. If $(X, Y) \sim \mathcal{N}(\mu, \Sigma)$, where

$ \mu = \begin{pmatrix} \mu_X \\ \mu_Y \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \sigma_X^2 & \rho \sigma_X \sigma_Y \\ \rho \sigma_X \sigma_Y & \sigma_Y^2 \end{pmatrix}, $

the conditional distribution of $Y$ given $X = x$ is also normally distributed:

$ Y | X = x \sim \mathcal{N} \left( \mu_Y + \rho \frac{\sigma_Y}{\sigma_X} (x - \mu_X), (1 - \rho^2) \sigma_Y^2 \right) $

where $\rho$ is the correlation coefficient between $X$ and $Y$.

In summary, the conditional distribution $Y | X = x$ can be determined using the joint distribution of $X$ and $Y$ and the marginal distribution of $X$. This framework applies to both discrete and continuous random variables, providing a comprehensive understanding of how one random variable behaves given information about another.

# The Mean and Covariance of Conditional 
To find the conditional distribution of $ \mathbf{x} $ given $ \mathbf{y} $ when both are part of a joint Gaussian distribution, we start with the assumption that the joint distribution of $ \mathbf{x} $ and $ \mathbf{y} $ is multivariate normal. 

Let $ \mathbf{z} $ be the concatenation of $ \mathbf{x} $ and $ \mathbf{y} $:
$ \mathbf{z} = \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} $

Assume $ \mathbf{z} $ follows a multivariate normal distribution:
$ \mathbf{z} \sim \mathcal{N}(\mathbf{\mu_z}, \mathbf{\Sigma_z}) $
where
$ \mathbf{\mu_z} = \begin{bmatrix} \mathbf{\mu_x} \\ \mathbf{\mu_y} \end{bmatrix}, \quad \mathbf{\Sigma_z} = \begin{bmatrix} \mathbf{\Sigma_{xx}} & \mathbf{\Sigma_{xy}} \\ \mathbf{\Sigma_{yx}} & \mathbf{\Sigma_{yy}} \end{bmatrix} $

Given this structure, we aim to find the conditional distribution of $ \mathbf{x} $ given $ \mathbf{y} $. The result is a conditional normal distribution, which we can derive as follows.

### Conditional Mean and Covariance

1. **Conditional Mean:**
   The conditional mean of $ \mathbf{x} $ given $ \mathbf{y} $ is:
   $ \mathbf{\mu_{x|y}} = \mathbf{\mu_x} + \mathbf{\Sigma_{xy}} \mathbf{\Sigma_{yy}}^{-1} (\mathbf{y} - \mathbf{\mu_y}) $

2. **Conditional Covariance:**
   The conditional covariance of $ \mathbf{x} $ given $ \mathbf{y} $ is:
   $ \mathbf{\Sigma_{x|y}} = \mathbf{\Sigma_{xx}} - \mathbf{\Sigma_{xy}} \mathbf{\Sigma_{yy}}^{-1} \mathbf{\Sigma_{yx}} $

### Resulting Conditional Distribution

Thus, the conditional distribution of $ \mathbf{x} $ given $ \mathbf{y} $ is:
$ \mathbf{x} | \mathbf{y} \sim \mathcal{N}(\mathbf{\mu_{x|y}}, \mathbf{\Sigma_{x|y}}) $
where:
$ \mathbf{\mu_{x|y}} = \mathbf{\mu_x} + \mathbf{\Sigma_{xy}} \mathbf{\Sigma_{yy}}^{-1} (\mathbf{y} - \mathbf{\mu_y}) $
$ \mathbf{\Sigma_{x|y}} = \mathbf{\Sigma_{xx}} - \mathbf{\Sigma_{xy}} \mathbf{\Sigma_{yy}}^{-1} \mathbf{\Sigma_{yx}} $

### Summary
- The mean of the conditional distribution $ \mathbf{x} | \mathbf{y} $ shifts from $ \mathbf{\mu_x} $ to account for the information provided by $ \mathbf{y} $.
- The covariance of the conditional distribution is reduced, reflecting the decreased uncertainty about $ \mathbf{x} $ given $ \mathbf{y} $.

This approach leverages the properties of the multivariate normal distribution, ensuring that the resulting conditional distribution is also Gaussian.

## Poof

Certainly! Let's derive the equations for the conditional mean and conditional covariance of $\mathbf{x}$ given $\mathbf{y}$ in the context of a joint Gaussian distribution.

### Setup

Consider the joint Gaussian distribution of the random vectors $\mathbf{x}$ and $\mathbf{y}$:
$ \mathbf{z} = \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \sim \mathcal{N}\left( \begin{bmatrix} \mathbf{\mu_x} \\ \mathbf{\mu_y} \end{bmatrix}, \begin{bmatrix} \mathbf{\Sigma_{xx}} & \mathbf{\Sigma_{xy}} \\ \mathbf{\Sigma_{yx}} & \mathbf{\Sigma_{yy}} \end{bmatrix} \right) $

Here:
- $\mathbf{x}$ is an $n$-dimensional random vector.
- $\mathbf{y}$ is a $p$-dimensional random vector.
- $\mathbf{\mu_x}$ and $\mathbf{\mu_y}$ are the means of $\mathbf{x}$ and $\mathbf{y}$, respectively.
- $\mathbf{\Sigma_{xx}}$ is the covariance matrix of $\mathbf{x}$.
- $\mathbf{\Sigma_{yy}}$ is the covariance matrix of $\mathbf{y}$.
- $\mathbf{\Sigma_{xy}}$ is the cross-covariance matrix between $\mathbf{x}$ and $\mathbf{y}$.
- $\mathbf{\Sigma_{yx}} = \mathbf{\Sigma_{xy}}^\top$.

### Derivation of Conditional Mean

The goal is to find the conditional distribution of $\mathbf{x}$ given $\mathbf{y} = \mathbf{y_0}$.

The joint Gaussian distribution can be written as:
$ f(\mathbf{z}) = f\left( \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \right) = \frac{1}{(2\pi)^{\frac{n+p}{2}} |\mathbf{\Sigma_z}|^{\frac{1}{2}}} \exp \left( -\frac{1}{2} (\mathbf{z} - \mathbf{\mu_z})^\top \mathbf{\Sigma_z}^{-1} (\mathbf{z} - \mathbf{\mu_z}) \right) $

Using properties of the multivariate normal distribution, the conditional distribution of $\mathbf{x}$ given $\mathbf{y}$ is also normal:
$ \mathbf{x} | \mathbf{y} \sim \mathcal{N}(\mathbf{\mu_{x|y}}, \mathbf{\Sigma_{x|y}}) $

#### Conditional Mean

The conditional mean $\mathbf{\mu_{x|y}}$ is given by:
$ \mathbf{\mu_{x|y}} = \mathbf{\mu_x} + \mathbf{\Sigma_{xy}} \mathbf{\Sigma_{yy}}^{-1} (\mathbf{y} - \mathbf{\mu_y}) $

#### Derivation:

1. Consider the partitioned form of the joint covariance matrix:
   $ \mathbf{\Sigma_z} = \begin{bmatrix} \mathbf{\Sigma_{xx}} & \mathbf{\Sigma_{xy}} \\ \mathbf{\Sigma_{yx}} & \mathbf{\Sigma_{yy}} \end{bmatrix} $

2. The inverse of the partitioned covariance matrix $\mathbf{\Sigma_z}^{-1}$ can be expressed using block matrix inversion formulas:
   $
   \mathbf{\Sigma_z}^{-1} = \begin{bmatrix}
   \mathbf{A} & \mathbf{B} \\
   \mathbf{C} & \mathbf{D}
   \end{bmatrix}
   $
   where:
   $
   \mathbf{A} = \mathbf{\Sigma_{xx}}^{-1} - \mathbf{\Sigma_{xx}}^{-1} \mathbf{\Sigma_{xy}} (\mathbf{\Sigma_{yy}} - \mathbf{\Sigma_{yx}} \mathbf{\Sigma_{xx}}^{-1} \mathbf{\Sigma_{xy}})^{-1} \mathbf{\Sigma_{yx}} \mathbf{\Sigma_{xx}}^{-1}
   $
   
   
   
   $
   \mathbf{B} = -\mathbf{\Sigma_{xx}}^{-1} \mathbf{\Sigma_{xy}} (\mathbf{\Sigma_{yy}} - \mathbf{\Sigma_{yx}} \mathbf{\Sigma_{xx}}^{-1} \mathbf{\Sigma_{xy}})^{-1}
   $
   
   
   
   $
   \mathbf{C} = -(\mathbf{\Sigma_{yy}} - \mathbf{\Sigma_{yx}} \mathbf{\Sigma_{xx}}^{-1} \mathbf{\Sigma_{xy}})^{-1} \mathbf{\Sigma_{yx}} \mathbf{\Sigma_{xx}}^{-1}
   $
   $
   \mathbf{D} = (\mathbf{\Sigma_{yy}} - \mathbf{\Sigma_{yx}} \mathbf{\Sigma_{xx}}^{-1} \mathbf{\Sigma_{xy}})^{-1}
   $

3. The conditional mean formula comes from the linear property of the multivariate normal distribution and can be derived by completing the square in the exponent of the joint Gaussian density function. After completion, it is evident that the mean shifts by the term involving the covariance matrices and the deviation of $\mathbf{y}$ from its mean.

### Derivation of Conditional Covariance

The conditional covariance $\mathbf{\Sigma_{x|y}}$ is given by:
$ \mathbf{\Sigma_{x|y}} = \mathbf{\Sigma_{xx}} - \mathbf{\Sigma_{xy}} \mathbf{\Sigma_{yy}}^{-1} \mathbf{\Sigma_{yx}} $

#### Derivation:

1. The conditional covariance matrix can be derived from the Schur complement of $\mathbf{\Sigma_{yy}}$ in $\mathbf{\Sigma_z}$.

2. Intuitively, the conditional covariance matrix represents the reduction in uncertainty about $\mathbf{x}$ after observing $\mathbf{y}$. It accounts for the correlation between $\mathbf{x}$ and $\mathbf{y}$ and adjusts the variance accordingly.



To find the conditional distribution of $ \mathbf{x} $ given $ \mathbf{y} $ when both are part of a joint Gaussian distribution, we utilize the properties of multivariate normal distributions. Given that:

$
[\mathbf{x}, \mathbf{y}] \sim \mathcal{N}\left(\begin{bmatrix} \mu_{\mathbf{x}} \\ \mu_{\mathbf{y}} \end{bmatrix}, \begin{bmatrix} A & C \\ C^T & B \end{bmatrix}\right)
$

The conditional distribution $ \mathbf{x} | \mathbf{y} $ is also normally distributed where the mean and the covariance are calculated as follows:

### Mean of $ \mathbf{x} | \mathbf{y} $:

$
\text{Mean} = \mu_{\mathbf{x}} + C B^{-1} (\mathbf{y} - \mu_{\mathbf{y}})
$

This equation represents the expected value of $ \mathbf{x} $ given $ \mathbf{y} $, where:
- $ \mu_{\mathbf{x}} $ and $ \mu_{\mathbf{y}} $ are the mean vectors of $ \mathbf{x} $ and $ \mathbf{y} $ respectively.
- $ C $ is the covariance matrix between $ \mathbf{x} $ and $ \mathbf{y} $.
- $ B $ is the covariance matrix of $ \mathbf{y} $, and $ B^{-1} $ is its inverse.
- $ \mathbf{y} $ is the observed value of the random vector $ \mathbf{y} $.

### Covariance of $ \mathbf{x} | \mathbf{y} $:

$
\text{Covariance} = A - C B^{-1} C^T
$

This formula represents the covariance of $ \mathbf{x} $ conditional on $ \mathbf{y} $ and indicates how $ \mathbf{x} $ varies around its new mean given $ \mathbf{y} $:
- $ A $ is the covariance matrix of $ \mathbf{x} $.
- $ C $, $ B $, and $ C^T $ are as defined above.

### Conditional Distribution Expression

Thus, the conditional distribution of $ \mathbf{x} $ given $ \mathbf{y} $ is:

$
\mathbf{x} | \mathbf{y} \sim \mathcal{N}(\mu_{\mathbf{x}} + C B^{-1} (\mathbf{y} - \mu_{\mathbf{y}}), A - C B^{-1} C^T)
$

### Interpretation

This result highlights a fundamental property of Gaussian vectors: the conditional distribution of a subset of the vector given the other subset is also Gaussian. The conditional mean $ \mu_{\mathbf{x}} + C B^{-1} (\mathbf{y} - \mu_{\mathbf{y}}) $ adjusts the mean $ \mu_{\mathbf{x}} $ based on the deviation of $ \mathbf{y} $ from its mean $ \mu_{\mathbf{y}} $, weighted by the covariance between $ \mathbf{x} $ and $ \mathbf{y} $ relative to the variance of $ \mathbf{y} $. The conditional covariance $ A - C B^{-1} C^T $ reduces the uncertainty in $ \mathbf{x} $ due to the knowledge of $ \mathbf{y} $, reflecting less variability in $ \mathbf{x} $ once $ \mathbf{y} $ is known.

## Example 1: Robot Localization


Let's create a physical example in the context of robotics that illustrates the conditional distribution of one variable given another when they follow a joint Gaussian distribution.

### Scenario: Robot Localization

Imagine a robot navigating a two-dimensional space, equipped with a GPS sensor and a compass. The robot's state can be described by two variables:
1. $ x $: The robot's position along the x-axis.
2. $ y $: The robot's heading angle (orientation) measured by the compass.

The robot's state vector is:
$ \mathbf{z} = \begin{bmatrix} x \\ y \end{bmatrix} $

Due to sensor noise and environmental factors, both $ x $ and $ y $ are random variables and are jointly Gaussian distributed.

### Joint Gaussian Distribution

Assume the robot's state follows this joint Gaussian distribution:
$ \mathbf{z} \sim \mathcal{N} \left( \begin{bmatrix} 5 \\ 0 \end{bmatrix}, \begin{bmatrix} 2 & 0.5 \\ 0.5 & 1 \end{bmatrix} \right) $

- Mean position ($ x $): 5 meters along the x-axis.
- Mean heading ($ y $): 0 radians (pointing straight forward).
- Variance in position: 2 $(\text{meters}^2)$
- Variance in heading: 1 $(\text{radians}^2)$
- Covariance between position and heading: 0.5

### Problem

Given a specific heading measurement, $ y = y_0 $, we want to find the conditional distribution of the robot's position $ x $.

### Conditional Distribution Calculation

#### 1. Extract Parameters

From the joint distribution:
$ \mathbf{\mu_z} = \begin{bmatrix} 5 \\ 0 \end{bmatrix} $
$ \mathbf{\Sigma_z} = \begin{bmatrix} 2 & 0.5 \\ 0.5 & 1 \end{bmatrix} $

#### 2. Conditional Mean

The conditional mean of $ x $ given $ y = y_0 $:
$ \mu_{x|y} = \mu_x + \Sigma_{xy} \Sigma_{yy}^{-1} (y_0 - \mu_y) $

Plugging in the values:
- $\mu_x = 5$
- $\mu_y = 0$
- $\Sigma_{xy} = 0.5$
- $\Sigma_{yy} = 1$

$ \mu_{x|y} = 5 + 0.5 \cdot 1^{-1} (y_0 - 0) = 5 + 0.5 \cdot y_0 $

#### 3. Conditional Covariance

The conditional covariance of $ x $ given $ y $:
$ \Sigma_{x|y} = \Sigma_{xx} - \Sigma_{xy} \Sigma_{yy}^{-1} \Sigma_{yx} $

Plugging in the values:
- $\Sigma_{xx} = 2$
- $\Sigma_{xy} = 0.5$
- $\Sigma_{yy} = 1$

$ \Sigma_{x|y} = 2 - 0.5 \cdot 1^{-1} \cdot 0.5 = 2 - 0.25 = 1.75 $

### Conditional Distribution

Given $ y = y_0 $, the conditional distribution of $ x $ is:
$ x | y = y_0 \sim \mathcal{N} \left( 5 + 0.5 y_0, 1.75 \right) $

### Physical Interpretation

1. **Prior Distribution:**
   - Before any heading measurement, the robot's position $ x $ is normally distributed with mean 5 meters and variance 2 $(\text{meters}^2)$.
   - The heading $ y $ is normally distributed with mean 0 radians and variance 1 $(\text{radians}^2)$.

2. **Conditional Distribution:**
   - Once the robot measures its heading $ y = y_0 $, it updates its belief about its position $ x $.
   - The new mean position $ x $ is adjusted based on the measured heading $ y_0 $, specifically by the amount $ 0.5 y_0 $.
   - The uncertainty (variance) about the position $ x $ is reduced to 1.75 $(\text{meters}^2)$.

### Example Calculation

Suppose the robot measures its heading to be $ y_0 = 2 $ radians.

1. **Conditional Mean:**
   $ \mu_{x|y} = 5 + 0.5 \cdot 2 = 5 + 1 = 6 $

2. **Conditional Covariance:**
   $ \Sigma_{x|y} = 1.75 $

Given this heading measurement, the robot's updated belief about its position is:
$ x | y = 2 \sim \mathcal{N} \left( 6, 1.75 \right) $

This means the robot now believes it is centered around 6 meters along the x-axis, with a reduced uncertainty compared to before the heading measurement.

### Summary

- **Joint Distribution:** $\mathbf{z} = \begin{bmatrix} x \\ y \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} 5 \\ 0 \end{bmatrix}, \begin{bmatrix} 2 & 0.5 \\ 0.5 & 1 \end{bmatrix} \right)$
- **Conditional Distribution (given $ y = 2 $):** $ x | y = 2 \sim \mathcal{N} \left( 6, 1.75 \right) $

This example shows how a robot can use joint Gaussian properties to update its position estimate based on a heading measurement, reducing uncertainty in its localization process.

## Example 2: Weight and Height



Refs: [1](https://online.stat.psu.edu/stat414/lesson/21)