# MSELoss

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Mitchell-Mirano/sorix/blob/qa/docs/learn/loss/01-MSELoss.ipynb)
[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-black?logo=github)](https://github.com/Mitchell-Mirano/sorix/blob/qa/docs/learn/loss/01-MSELoss.ipynb)
[![Open in Docs](https://img.shields.io/badge/Open%20in-Docs-blue?logo=readthedocs)](https://mitchell-mirano.github.io/sorix/latest/learn/loss/01-MSELoss)

The **Mean Squared Error** (MSE) loss measures the average of the squares of the errors. It is the most common loss function for **Regression** tasks.

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Where:
- $n$ is the batch size.
- $y_i$ is the target value.
- $\hat{y}_i$ is the predicted value.

In [1]:
# Uncomment the next line and run this cell to install sorix
#!pip install 'sorix @ git+https://github.com/Mitchell-Mirano/sorix.git@qa/docs_learn/docs_learn/docs_learn/docs_learn'

In [2]:
import numpy as np
from sorix import tensor
from sorix.nn import MSELoss

# Create data
y_pred = tensor([2.5, 0.0, 2.1], requires_grad=True)
y_true = tensor([3.0, 0.0, 2.0])

criterion = MSELoss()
loss = criterion(y_pred, y_true)

print(f"Predictions: {y_pred.numpy()}")
print(f"Targets:     {y_true.numpy()}")
print(f"MSE Loss:    {loss.item():.4f}")

Predictions: [2.5 0.  2.1]
Targets:     [3. 0. 2.]
MSE Loss:    0.0867


### Verification with Autograd

MSELoss in Sorix is fully differentiable. If we compute the backward pass, we can see the gradients w.r.t the predictions.

In [3]:
loss.backward()
print(f"Gradients w.r.t y_pred: {y_pred.grad}")

# Manual verification: d/dy_pred (1/n * (y_pred - y_true)^2) = 2/n * (y_pred - y_true)
n = y_pred.data.size
manual_grad = 2/n * (y_pred.data - y_true.data)
print(f"Manual Gradients:     {manual_grad}")

Gradients w.r.t y_pred: [-0.33333334  0.          0.0666666 ]
Manual Gradients:     [-0.33333334  0.          0.0666666 ]


### Training Example

Let's see how `MSELoss` guides a single value to match a target.

In [4]:
from sorix.optim import SGD

weight = tensor([10.0], requires_grad=True)
target = tensor([42.0])
optimizer = SGD([weight], lr=0.1)

print(f"Initial weight: {weight.item():.2f}")

for i in range(21):
    loss = criterion(weight, target)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    if i % 5 == 0:
        print(f"Step {i:2d} | Weight: {weight.item():.4f} | Loss: {loss.item():.4f}")

print(f"Final weight: {weight.item():.2f}")

Initial weight: 10.00
Step  0 | Weight: 16.4000 | Loss: 1024.0000
Step  5 | Weight: 33.6114 | Loss: 109.9512
Step 10 | Weight: 39.2512 | Loss: 11.8059
Step 15 | Weight: 41.0993 | Loss: 1.2676
Step 20 | Weight: 41.7049 | Loss: 0.1361
Final weight: 41.70
