In [38]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import scipy.integrate as integrate

## 1. Understanding conditioning

A cohort of $N=100$ students sat an exam. We know that the marks are distributed according to the Beta distribution (scaled to 100 marks) with parameters $\alpha=4$ and $\beta=2.5$. The expected value of the mark, $X$, is
$${\rm I\!E}[X] = \frac{\alpha}{\alpha+\beta}\cdot 100 \approx 61.538.$$

In [31]:
np.random.seed(1)
num_students = 100
alpha = 4
beta = 2.5
marks = 100 * np.random.beta(alpha, beta, size=num_students)

# Unconditional mean
mean_uncond = marks.mean()
print(f"The unconditional (sample) mean is {mean_uncond:.2f}")

# Increase the number of students and see what happens

The unconditional (sample) mean is 62.51


Suppose a student has found out that they have passed (their mark is at least 50), but they don't know their exact mark. What is their expected mark?

In [32]:
# Conditional mean **GIVEN** that the mark is above 50
mean_cond = marks[marks >= 50].mean()
print(f"The conditional sample mean is {mean_cond:.2f}")

The conditional sample mean is 69.74


What is the theoretical value of the conditional expected mark?

The conditional expectation is
$${\rm I\!E}[X | X \geq 50] = \frac{{\rm I\!E}[X 1_{\geq 0.5}(X)]}{P[X \geq 0.5]} = \frac{\int_{0.5}^{1} x p_X(x) dx}{1 - F_X(0.5)}$$

The cdf of the Beta distribution can be computed with `stats.beta.cdf` as follows

In [65]:
prob_x_leq_50 = stats.beta.cdf(0.5, alpha, beta)

We will determine the integral numerically in Python using `scipy.integrate.quad`.

In [68]:
my_integrand = lambda x: x * stats.beta.pdf(x, alpha, beta)
integral = integrate.quad(my_integrand, 0.5, 1)[0]
cond_expectation_theoretical = 100 * integral / (1 - prob_x_leq_50)
print(f"E[X | X ≥ 50] = {cond_expectation_theoretical:.2f} (theoretical)")

E[X | X ≥ 50] = 69.87 (theoretical)


### 2. Conditioning of multivariate normal distributions

Let $X \sim \mathcal{N}(\mu, \Sigma)$ be an $n$-dimensional random vector. Let us partition $X$ into two random vectors $X_1$ and $X_2$ as
$$X = \begin{bmatrix}X_1\\X_2\end{bmatrix},$$
with $X_1 \in {\rm I\!R}^{n_1}$, $X_2 \in {\rm I\!R}^{n_2}$ with $n = n_1 + n_2$.
Let
$$\mu=\begin{bmatrix}\mu_1\\\mu_2\end{bmatrix}, \text{and } \Sigma = \begin{bmatrix}\Sigma_{11} & \Sigma_{12}\\\Sigma_{21} & \Sigma_{22}\end{bmatrix},$$
and assume that $\Sigma_{22}\in\mathbb{S}_{++}^{n_2}$. Then, the conditional distribution of $X_1$ given that $X_2 = x_2$ is normal with mean
$${\rm I\!E}[X_1 {}\mid{} X_2 = x_2] {}={} \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2),$$
and
$${\rm Var}[X_1{}\mid{} X_2 = x_2] {}={} \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1} \Sigma_{21}.$$