# Mean Squared Error (MSE)

**Definition:**  
Mean Squared Error (MSE) is a measure of the average of the squares of the errors—that is, the average squared difference between the predicted values and the actual values. MSE is commonly used in regression analysis to evaluate the performance of a model.

**Formula:**

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

where:
- \( y_i \) is the actual value,
- \( \hat{y}_i \) is the predicted value,
- \( n \) is the total number of observations.

**Importance of MSE:**
MSE is important for quantifying the accuracy of a model's predictions, especially in regression tasks. It is used in various applications, such as:

- **Machine Learning:** In supervised learning, MSE is used as a loss function for regression models to optimize their predictions.
- **Signal Processing:** In comparing signals and assessing the quality of estimates.

**Interpretation:**
- **Low MSE:** A low MSE value indicates that the model's predictions are close to the actual values, suggesting good performance.
  
- **High MSE:** A high MSE value indicates that the model's predictions deviate significantly from the actual values, which may require model tuning or improvement.

**Example:**
Consider a regression problem where we are predicting the temperature in degrees Celsius for a week. Suppose we have the following actual and predicted temperature data:

| Day | Actual Temperature (\(y\)) | Predicted Temperature (\(\hat{y}\)) |
|-----|-----------------------------|-------------------------------------|
| 1   | 20                          | 18                                  |
| 2   | 22                          | 24                                  |
| 3   | 25                          | 23                                  |
| 4   | 19                          | 21                                  |
| 5   | 30                          | 29                                  |

To calculate MSE, we first compute the squared errors:

- Day 1: (20 - 18)² = 4
- Day 2: (22 - 24)² = 4
- Day 3: (25 - 23)² = 4
- Day 4: (19 - 21)² = 4
- Day 5: (30 - 29)² = 1

Now we calculate the MSE:

$$
\text{MSE} = \frac{1}{5} (4 + 4 + 4 + 4 + 1) = \frac{17}{5} = 3.4
$$

This indicates that the average squared deviation of the model's predictions from the actual values is 3.4 degrees squared.

**Relation to Other Metrics:**
MSE is often compared to other error metrics:
- **Mean Absolute Error (MAE):** MAE measures the average magnitude of errors in a set of predictions, without considering their direction, while MSE squares the errors, giving more weight to larger errors.
  
- **Root Mean Squared Error (RMSE):** RMSE is the square root of MSE, providing a measure of error in the same units as the target variable.

**Conclusion:**
Mean Squared Error (MSE) is a widely used metric for evaluating the accuracy of regression models. It provides a clear quantification of model performance by measuring the average squared difference between predicted and actual values. Understanding MSE helps practitioners make informed decisions about model selection and performance evaluation. By considering MSE alongside other metrics, such as MAE and RMSE, stakeholders can gain a comprehensive view of model effectiveness.

In [2]:
import numpy as np
from sklearn.metrics import mean_squared_error

y_true = np.array([20, 22, 25, 19, 30])
y_pred = np.array([18, 24, 23, 21, 29])

mse = mean_squared_error(y_true, y_pred)

print(f"Actual Values: {y_true}")
print(f"Predicted Values: {y_pred}")
print(f"Mean Squared Error (MSE): {mse:.2f}")

squared_errors = (y_true - y_pred) ** 2

print("\nSquared Errors:")
for i in range(len(y_true)):
    print(f"Day {i + 1}: ({y_true[i]} - {y_pred[i]})² = {squared_errors[i]}")

Actual Values: [20 22 25 19 30]
Predicted Values: [18 24 23 21 29]
Mean Squared Error (MSE): 3.40

Squared Errors:
Day 1: (20 - 18)² = 4
Day 2: (22 - 24)² = 4
Day 3: (25 - 23)² = 4
Day 4: (19 - 21)² = 4
Day 5: (30 - 29)² = 1
