# FTML Project - Exercice 3

We recall the following notations and settings :

- $\epsilon$, vector of centered Gaussian noise with variance matrix $\sigma^2I_n.$
- $X = \mathbb{R}^d$, input space
- $\mathcal{Y} = \mathbb{R}$, output space

The dataset is stored in the **design matrix** $\mathcal{X} \in \mathbb{R}^{n \times d}$.

$$ 
X =
\begin{bmatrix}
x_1^T \\ \ldots \\ x_i^T \\ \ldots \\ x_n^T
\end{bmatrix} = 
\begin{bmatrix}
x_{11} & \ldots & x_{1j} & \ldots & x_{1d} \\
\vdots & \ddots & \vdots & \ddots & \vdots \\
x_{i1} & \ldots & x_{ij} & \ldots & x_{id} \\
\vdots & \ddots & \vdots & \ddots & \vdots \\
x_{n1} & \ldots & x_{nj} & \ldots & x_{nd} \\
\end{bmatrix}
$$

We note $\Vert\cdot\Vert = \Vert\cdot\Vert_2$

## Question 1

We aim to show that :

$$
E\left[R_n(\hat{\theta})\right] = E_{\epsilon}\left[\frac{1}{n}\Vert(I_n - X(X^TX)^{-1}X^T)\epsilon\Vert^2\right]
$$

where $E_\epsilon$ means that the expected valie is over $\epsilon$.

$$
\begin{aligned}
E\left[R_n(\hat{\theta})\right] &= E\left[\frac{1}{n}\sum_{i=1}^n(y_i - \hat{\theta}^T x_i)^2\right] \\
&= E\left[\frac{1}{n}\Vert y - X\hat{\theta}\Vert^2\right]
\end{aligned}
$$

We recall that, given $X$ and $y$, the ordinary least squared estimator, that minimizes the empirical risk is defined as :

$$
\hat{\theta} = (X^TX)^{-1}X^Ty
$$

Thus,

$$
\begin{aligned}
E\left[R_n(\hat{\theta})\right] &= E\left[\frac{1}{n}\Vert y - X(X^TX)^{-1}X^Ty\Vert^2\right]
\end{aligned}
$$

In the **linear model**, we assume that

$$
y = X\hat{\theta} + \epsilon
$$

Therefore,

$$
\begin{aligned}
E\left[R_n(\hat{\theta})\right] &= E\left[\frac{1}{n}\Vert X\hat{\theta} + \epsilon - X(X^TX)^{-1}X^T(X\hat{\theta} + \epsilon)\Vert^2\right] \\
&= E\left[\frac{1}{n}\Vert X \hat{\theta} + \epsilon - X(X^T X)^{-1} X^T X \hat{\theta} - X(X^T X)^{-1} X^T \epsilon \Vert^2\right]
\end{aligned}
$$

## Question 3

Let us show that :

$$
\begin{aligned}
E_{\epsilon} \left[ \frac{1}{n} \Vert A \epsilon \Vert^{2} \right] &= \frac{\sigma^{2}}{n} tr(A^{T}A)
\end{aligned}
$$

$$
\begin{aligned}
E_{\epsilon} \left[ \frac{1}{n} \Vert A \epsilon \Vert^{2} \right] &= \frac{1}{n} E_{\epsilon} \left[ \Vert A \epsilon \Vert^{2} \right]
\end{aligned}
$$

Now,

$$
\Vert A \epsilon \Vert^2 = (A \epsilon)^T (A \epsilon) = \epsilon^T A^T A \epsilon
$$

Thus,

$$
E_\epsilon \left[ \frac{1}{n} \Vert A \epsilon \Vert^2 \right] = \frac{1}{n} E_\epsilon \left[ \epsilon^T A^T A \epsilon \right]
$$

But, for a random vector $\epsilon$ centered and of covariance $\Sigma$ :

$$
B \left[ \epsilon^T B \epsilon \right] = tr(B \Sigma)
$$

Here, $\Sigma = \sigma^2 I$ so :

$$
E_\epsilon \left[ \epsilon^T A^T A \epsilon \right] = tr(A^T A \cdot \sigma^2 I) = \sigma^2 tr(A^T A)
$$

So :

$$
\begin{aligned}
E_{\epsilon} \left[ \frac{1}{n} \Vert A \epsilon \Vert^{2} \right] &= \frac{1}{n} \cdot \sigma^2 \cdot tr(A^T A) \\
&= \frac{\sigma^2}{n} tr(A^T A)
\end{aligned}
$$

## Question 6

We aim to find the expected value of

$$
\frac{\Vert y - X \hat{\theta} \Vert^2}{n - d}
$$

First, we recall that :

$$
\begin{aligned}
R_n(\hat{\theta}) &= \frac{1}{n} \sum_{i=1}^n (y_i - \hat{\theta}x_i)^2 \\
&= \frac{1}{n} \Vert y - X \hat{\theta} \Vert^2
\end{aligned}
$$

Thus,

$$
\begin{aligned}
E \left[ \frac{\Vert y - X \hat{\theta} \Vert^2}{n - d} \right] &= E \left[ \frac{nR_n(\hat{\theta})}{n - d} \right] \\
&= \frac{n}{n - d} E \left[ R_n(\hat{\theta}) \right]
\end{aligned}
$$

From Propositon 1. we know that :

$$
\begin{aligned}
E \left[ R_X(\hat{\theta}) \right] &= \frac{n - d}{n} \sigma^2 \\
\Leftrightarrow \frac{\sigma^2}{E \left[ R_X(\hat{\theta}) \right]} &= \frac{n}{n - d}
\end{aligned}
$$

Therefore,

$$
\begin{aligned}
E \left[ \frac{\Vert y - X \hat{\theta} \Vert^2}{n - d} \right] &= \frac{\sigma^2}{E \left[ R_X(\hat{\theta}) \right]} \cdot E \left[ R_X(\hat{\theta}) \right] \\
&= \sigma^2
\end{aligned}
$$

## Question 7

In [1]:
import numpy as np

In [7]:
def OLS_estimator(X: np.ndarray, y: np.ndarray) -> np.ndarray:
    covariance_matrix = X.T @ X
    inverse_covariance = np.linalg.inv(covariance_matrix)
    theta_hat = inverse_covariance @ (X.T @ y)
    return theta_hat

In [3]:
def generate_output_data(
    X: np.ndarray, theta_star: np.ndarray, sigma: float, rng, n_repetitions: int
) -> np.ndarray:
    n = X.shape[0]
    noise = rng.normal(0, sigma, size=(n, n_repetitions))
    y = X @ theta_star + noise
    return y

In [4]:
def test_error(sigma: int, n_train: int, d: int, n_repetitions: int) -> float:
    rng = np.random.default_rng()
    
    X = rng.uniform(low=0, high=1, size=(n_train, d))
    
    theta_star = rng.uniform(low=0, high=1, size=(d, 1))
    
    y = generate_output_data(
        X=X,
        theta_star=theta_star,
        sigma=sigma,
        rng=rng,
        n_repetitions=n_repetitions,
    )
    
    theta_hat = OLS_estimator(
        X=X,
        y=y,
    )

    mean_test_error = ((np.linalg.norm(y - (X @ theta_hat))**2) / (n_train - d)) / n_repetitions
    return mean_test_error

In [6]:
sigma = 8
mean_test_error = test_error(sigma, 200, 30, 100000)
print(f'sigma = {sigma}, sigma^2 = {sigma**2}, E = {mean_test_error:.2f}')
# C.f https://github.com/numpy/numpy/issues/28687 pour les warnings...

sigma = 8, sigma^2 = 64, E = 63.99


  y = X @ theta_star + noise
  y = X @ theta_star + noise
  y = X @ theta_star + noise
  covariance_matrix = X.T @ X
  covariance_matrix = X.T @ X
  covariance_matrix = X.T @ X
  theta_hat = inverse_covariance @ (X.T @ y)
  theta_hat = inverse_covariance @ (X.T @ y)
  theta_hat = inverse_covariance @ (X.T @ y)
  mean_test_error = ((np.linalg.norm(y - (X @ theta_hat))**2) / (n_train - d)) / n_repetitions
  mean_test_error = ((np.linalg.norm(y - (X @ theta_hat))**2) / (n_train - d)) / n_repetitions
  mean_test_error = ((np.linalg.norm(y - (X @ theta_hat))**2) / (n_train - d)) / n_repetitions
