# Mean Squared Error (MSE) Loss

**Mean Squared Error** is a widely used loss function, <u>particularly in regression tasks</u>, where the goal is to predict continuous numerical values.  

*It measures the average squared difference between the predicted values and the actual values*.

The MSE loss is calculated by taking the average of the squared differences between each predicted value and its corresponding actual value. The formula for MSE is as follows:

$$MSE = \frac{1}{N}\sum_{i=1}^{N}(y_{i} - \hat{y}_{i})^2$$

In this formula, $N$ represents the total number of samples, $y_{i}$ is the actual value, and $\hat{y}_{i}$ is the predicted value.

The MSE loss quantifies the overall average squared difference between the predicted values and the true values. By minimizing this loss, the model aims to reduce the discrepancy between its predictions and the actual values.



### Some additional insights on the Mean Squared Error:

1. **Squared Differences**: MSE calculates the average of the squared differences between predicted values and actual values. By squaring the differences, MSE places a higher weight on larger errors, <span style="font-size: 11pt; color: orange; font-weight: normal">making it sensitive to outliers.</span>

2. **Differentiability**: <span style="font-size: 11pt; color: green; font-weight: normal">MSE is a differentiable loss function, which means it has a smooth and continuous derivative</span>. This property is crucial for optimization algorithms that rely on gradient-based methods, such as backpropagation, to update the model's parameters during training.

3. **Non-Negative Values**: The MSE loss is always non-negative since it involves squared differences. A value of 0 indicates a perfect match between the predicted and actual values.

4. **Units of Measurement**: The <span style="font-size: 11pt; color: orange; font-weight: normal">units of MSE are the square of the units of the target variable.</span> For example, if the target variable represents distances in meters, the MSE loss will be expressed in square meters. This can make the interpretation of the loss function challenging when the units differ significantly from the original target variable.

5. **Scale Sensitivity**: <span style="font-size: 11pt; color: orange; font-weight: normal">MSE is sensitive to the scale of the data</span>. Variables with larger magnitudes can dominate the loss calculation, potentially affecting the training process. It is often recommended to scale the input features to a similar range to mitigate this issue.

```
These examples illustrate how MSE is sensitive to outliers.  

Even a single outlier that deviates greatly from the true values can significantly affect the MSE. Squaring the differences in the calculation amplifies the impact of outliers, as the squared differences contribute more to the overall error. Consequently, the MSE is skewed by the presence of outliers, making it a less robust metric when dealing with data containing extreme values.
```
```python
y_true = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
y_pred = [90, 90, 90, 90, 90, 100, 100, 100, 100, 100]
>>> MSE = 50

y_true = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
y_pred = [50, 100, 100, 100, 100, 100, 100, 100, 100, 100]
>>> MSE = 250

```
6. **Application to Regression**: MSE is commonly used as an evaluation metric and a loss function for regression tasks. By minimizing MSE during training, the model learns to minimize the average squared difference between its predictions and the true values, resulting in a model that performs well in terms of minimizing overall error.

7. **Comparing Models**: MSE allows for easy comparison of different models. Lower MSE values indicate better performance, as they indicate smaller prediction errors on average.

It's worth noting that while MSE has several advantages, it may not always be the most appropriate loss function for all scenarios. Depending on the specific characteristics of the problem, other loss functions, such as MAE (Mean Absolute Error) or custom loss functions, might be more suitable.

**Below we will use three different methods to compute Mean Squared Error and compare the results**.

### Importing libraries and preparing data

In [1]:
import torch
import numpy as np
from sklearn.metrics import mean_squared_error

In [2]:
# Simulated true values and predicted values

# For PyTorch
y_true_torch = torch.tensor([100, 100, 100, 100, 100, 100, 100, 100, 100, 100], dtype=torch.float32)
y_pred_torch = torch.tensor([80, 100, 90, 95, 105, 101, 110, 99, 87, 100], dtype=torch.float32)

# For SkLearn and Numpy
y_true = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
y_pred = [80, 100, 90, 95, 105, 101, 110, 99, 87, 100]

### Compute Mean Squared Error Loss with PyTorch

In [3]:
# Create an instance of MSELoss
mse_loss = torch.nn.MSELoss()

# Use it to compute MSE
torch_mse = mse_loss(y_true_torch, y_pred_torch).item()

# Round the result to 1 decimal number
torch_mse = round(torch_mse, 1)

print('Mean Squared Error Loss:', torch_mse)

Mean Squared Error Loss: 82.1


### Compute Mean Squared Error Loss with SciKit-Learn

In [4]:
# Use mean_squared_error function from sklearn to compute MSE
sklearn_mse = mean_squared_error(y_true, y_pred)

print('Mean Squared Error Loss:', sklearn_mse)

Mean Squared Error Loss: 82.1


### Compute Mean Squared Error Loss with Numpy

In [5]:
# Create a custom function to compute MSE
def mean_squared_err(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean((y_true - y_pred)**2)

numpy_mse = mean_squared_err(y_true, y_pred)

print('Mean Squared Error Loss:', numpy_mse)

Mean Squared Error Loss: 82.1


### Comparison of the BCE computation results between PyTorch, Sci-kit Learn and Numpy

Let's compare computed Mean Squared Error values.

In [6]:
torch_mse == sklearn_mse == numpy_mse

True

All three different method of computation of MSE provided the same result.