# Relationship between R² and MSE

Consider the following statement:

> _"r² is just a scaled (and translated) version of the MSE"_ 

This statement reflects the mathematical relationship between **R² (coefficient of determination)** and the **MSE (mean squared error)** when comparing a regression model's predictions to a baseline. Let’s break it down with definitions, derivation, and a concrete example.

### Definitions

For a regression problem with:
- true values: $y_1, y_2, \dots, y_n$
- predicted values: $\hat{y}_1, \hat{y}_2, \dots, \hat{y}_n$
- mean of true values: $\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i$

We define:

- **MSE (Mean Squared Error)**:
  $$
  \text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
  $$

- **TSS (Total Sum of Squares)** (variance of the target, scaled by \( n \)):
  $$
  \text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2
  $$

- **RSS (Residual Sum of Squares)**:
  $$
  \text{RSS} = \sum_{i=1}^n (y_i - \hat{y}_i)^2 = n \cdot \text{MSE}
  $$

- **R²**:
  $$
  R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = 1 - \frac{n \cdot \text{MSE}}{\text{TSS}}
  $$

---

### Rearranging R² in terms of MSE

From the formula:
$$
R^2 = 1 - \frac{n \cdot \text{MSE}}{\text{TSS}}
$$

This shows that R² is a **linear transformation** of MSE:
$$
R^2 = -\left(\frac{n}{\text{TSS}}\right) \cdot \text{MSE} + 1
$$

This is a **scaling** (by $-\frac{n}{\text{TSS}}$) and **translation** (adding 1). That’s exactly what the original statement refers to.

### Example

Let’s walk through a simple example.

#### Given:
- True values: $y = [3, 4, 5]$
- Predicted values: $\hat{y} = [2.5, 4.0, 5.5]$

#### Step 1: Compute MSE

$$
\text{MSE} = \frac{1}{3}[(3 - 2.5)^2 + (4 - 4.0)^2 + (5 - 5.5)^2] = \frac{1}{3}(0.25 + 0 + 0.25) = \frac{0.5}{3} \approx 0.167
$$

#### Step 2: Compute TSS

$$
\bar{y} = \frac{3 + 4 + 5}{3} = 4
$$

$$
\text{TSS} = (3 - 4)^2 + (4 - 4)^2 + (5 - 4)^2 = 1 + 0 + 1 = 2
$$

#### Step 3: Compute R²

$$
R^2 = 1 - \frac{n \cdot \text{MSE}}{\text{TSS}} = 1 - \frac{3 \cdot 0.167}{2} = 1 - 0.25 = 0.75
$$

---

### Summary

So, the equation:
$$
R^2 = -\left(\frac{n}{\text{TSS}}\right) \cdot \text{MSE} + 1
$$
shows that **R² is essentially MSE rescaled and shifted**, which supports the claim that it's a scaled (and translated) version of MSE.

While MSE measures error in absolute terms (same units as the target), R² provides a **relative measure** of how much better the model is than simply predicting the mean. But mathematically, they are tightly linked through a linear transformation.

The code below lets you explore how R² changes as a function of MSE using interactive widgets. You can vary the values of $n$ (sample size) and TSS (total sum of squares) to see how they affect the relationship.

In [4]:
import matplotlib.pyplot as plt
import numpy as np
import ipywidgets as widgets
from ipywidgets import interact

In [2]:
def plot_r2_vs_mse(n=100, TSS=50):
    mse_values = np.linspace(0, 5, 200)
    r2_values = 1 - (n * mse_values / TSS)

    plt.figure(figsize=(8, 5))
    plt.plot(mse_values, r2_values, label=r'$R^2 = 1 - \frac{n \cdot MSE}{TSS}$')
    plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)
    plt.axvline(TSS / n, color='red', linestyle='--', linewidth=0.8, label='MSE = TSS/n → R² = 0')
    plt.title("Interactive: R² vs MSE")
    plt.xlabel("Mean Squared Error (MSE)")
    plt.ylabel("R²")
    plt.grid(True)
    plt.legend()
    plt.tight_layout()
    plt.show()


In [3]:
interact(plot_r2_vs_mse,
         n=widgets.IntSlider(value=100, min=10, max=500, step=10, description='n'),
         TSS=widgets.FloatSlider(value=50, min=10, max=200, step=5, description='TSS'))

interactive(children=(IntSlider(value=100, description='n', max=500, min=10, step=10), FloatSlider(value=50.0,…

<function __main__.plot_r2_vs_mse(n=100, TSS=50)>