<a href="https://colab.research.google.com/github/Jandsy/ml_finance_imperial/blob/main/Additional_Materials/Optional_Reading_Session_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bias-Variance Tradeoff Analysis

## General Model Framework

Consider a prediction model where the output $Y$ is related to inputs $X$ through a true function $f$, with added noise $\epsilon$:
$$
Y = f(X) + \epsilon
$$
where $\epsilon$ is a noise term with $E[\epsilon] = 0$ and $\text{Var}(\epsilon) = \sigma^2$, independent of $X$.

Let $f_D$ be the model trained on dataset $D$. The prediction for a new input $X = x$ is given by $f_D(x)$.

### Expected Prediction Error (EPE)

The EPE at a point $x$ for predictions made by $f_D$ is:
$$
E[(Y - f_D(x))^2 | X = x]
$$
Expanding this, we find:
$$
E[(f(x) + \epsilon - f_D(x))^2 | X = x] = E[(f(x) - f_D(x) + \epsilon)^2 | X = x]
$$
$$
= E[(f(x) - f_D(x))^2 | X = x] + 2E[\epsilon(f(x) - f_D(x)) | X = x] + E[\epsilon^2 | X = x]
$$
Given $\epsilon$ is independent of $X$ and has mean 0:
$$
E[\epsilon(f(x) - f_D(x)) | X = x] = 0
$$
The EPE then simplifies to:
$$
EPE = \text{Bias}^2(f_D(x)) + \text{Var}(f_D(x)) + \sigma^2
$$

#### Bias and Variance Definitions

- **Bias** of $f_D$ at $x$:
  $$
  \text{Bias}(f_D(x)) = E[f_D(x) | X = x] - f(x)
  $$
  Squared bias:
  $$
  \text{Bias}^2(f_D(x)) = (E[f_D(x) | X = x] - f(x))^2
  $$

- **Variance** of $f_D$ at $x$:
  $$
  \text{Var}(f_D(x)) = E[(f_D(x) - E[f_D(x) | X = x])^2 | X = x]
  $$

### Impact of Increasing Dataset Size

Increasing the dataset size generally results in:
- **Reduced Variance:** $\text{Var}(f_D(x))$ decreases due to the law of large numbers, assuming the model is well-posed.
- **Unchanged Bias:** Bias does not change unless the functional form of $f_D$ is altered. It is determined by the model’s capacity to approximate $f$.

## Special Case: Linear Regression

In the context of linear regression, consider the model:
$$
Y = X\beta + \epsilon
$$
where $X$ is the matrix of input features, $\beta$ is the vector of coefficients, and $\epsilon \sim N(0, \sigma^2I)$.

### OLS Estimator

The ordinary least squares (OLS) estimator for $\beta$ is:
$$
\hat{\beta} = (X^T X)^{-1} X^T Y
$$
The prediction at a new point $X = x_0$ is:
$$
\hat{Y}_0 = x_0^T \hat{\beta}
$$

### Bias and Variance in Linear Regression

- **Bias**:
  Assuming the model form correctly includes all relevant variables and interactions:
  $$
  \text{Bias}(\hat{Y}_0) = 0
  $$

- **Variance**:
  The variance of predictions is influenced by the inverse of the design matrix:
  $$
  \text{Var}(\hat{Y}_0) = \sigma^2 x_0^T (X^T X)^{-1} x_0
  $$

### Impact of Dataset Size in Linear Regression

Increasing the dataset size:
- **Reduces Variance:** The matrix $(X^T X)$ becomes larger, making $(X^T X)^{-1}$ smaller, thus reducing the variance of the estimator $\hat{\beta}$ and predictions $\hat{Y}_0$.
- **Bias Unchanged:** The bias remains zero if the model is correctly specified.

## Conclusion

The bias-variance tradeoff illustrates the fundamental challenges in model training. For both general and linear models, increasing the dataset size improves model accuracy by reducing variance without affecting bias, provided the model's complexity is appropriate for the underlying function $f$.









