## Mean Absolute Error (MAE) Loss

**Mean Absolute Error** (MAE), also known as **L1 Loss** is a popular loss function used in <u>regression tasks</u>, where the goal is to predict continuous numerical values. It measures the average absolute difference between the predicted values and the actual values.

The MAE loss is calculated by taking the average of the absolute differences between each predicted value and its corresponding actual value.  

The formula for MAE is as follows:

$$MAE = \frac{1}{N}\sum_{i=1}^{N}\mid y_{i} - \hat{y}_{i}\mid$$

In this formula, $N$ represents the total number of samples, $y_{i}$ is the actual value, and $\hat{y}_{i}$ is the predicted value.

The Mean Absolute Error loss <u>quantifies the overall average absolute difference between the predicted values and the true values</u>. By minimizing this loss, the model aims to reduce the discrepancy between its predictions and the actual values.

### Additional Insights on Mean Absolute Error (MAE):

1. **Application to Regression**: MAE is commonly used as <span style="font-size: 10pt; color: green; font-weight: normal">both an evaluation metric and a loss function in regression tasks</span>.

2. **Interpretability**: <span style="font-size: 10pt; color: green; font-weight: normal">MAE has a straightforward interpretation</span>. The value of MAE represents the average absolute difference between the predicted and actual values. For example, an MAE of 5 means, on average, the model's predictions deviate from the true values by 5 units.

3. **Units of Measurement**: <span style="font-size: 10pt; color: green; font-weight: normal">The units of MAE are the same as the units of the target variable</span>. This property makes it easier to interpret the magnitude of the error in the context of the problem domain.

4. **Robustness to Outliers**: <span style="font-size: 10pt; color: green; font-weight: normal">MAE is robust to outliers</span> compared to Mean Squared Error. In MSE, the squared differences amplify the impact of outliers, whereas <span style="font-size: 10pt; color: green; font-weight: normal">MAE treats all errors equally</span>, regardless of their magnitude.   
Therefore, <u>MAE provides a more robust measure of error when dealing with data containing extreme values</u>.

5. **Robustness to Scale**: <span style="font-size: 10pt; color: green; font-weight: normal">MAE is not sensitive to the scale of the data</span>. Unlike MSE, which depends on squared differences, MAE treats all errors equally regardless of their scale. This characteristic makes MAE a suitable choice when the scale of the data varies significantly or when the magnitude of errors is equally important regardless of their scale.

6. **Gradient Behavior**: MAE is less smooth compared to MSE. Since MAE does not involve squared differences, <span style="font-size: 10pt; color: orange; font-weight: normal">its derivative is not continuous at zero</span>. This characteristic can make optimization more challenging, particularly when using gradient-based methods such as backpropagation. However, subgradients can be used to address this issue.

7. **Computational Efficiency**: <span style="font-size: 10pt; color: green; font-weight: normal">MAE is computationally efficient compared to MSE</span>. Since MAE does not involve squaring the differences, it avoids the computation of expensive square operations, making it faster to compute. <u>This can be particularly beneficial when working with large datasets or complex models</u>.

**Below we will use three different methods to compute Mean Absolute Error and compare the results**.

### Importing libraries and preparing data

In [1]:
import torch
import numpy as np
from sklearn.metrics import mean_absolute_error

In [2]:
# Simulated true values and predicted values

# For PyTorch
y_true_torch = torch.tensor([100, 100, 100, 100, 100, 100, 100, 100, 100, 100], dtype=torch.float32)
y_pred_torch = torch.tensor([80, 100, 90, 95, 105, 101, 110, 99, 87, 100], dtype=torch.float32)

# For SkLearn and Numpy
y_true = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
y_pred = [80, 100, 90, 95, 105, 101, 110, 99, 87, 100]

### Compute Mean Absolute Error Loss with PyTorch

In [3]:
# Create an instance of MAELoss
mae_loss = torch.nn.L1Loss()

# Use it to compute MAE
torch_mae = mae_loss(y_true_torch, y_pred_torch).item()

# Round the result to 1 decimal number
torch_mae = round(torch_mae, 1)

print('Mean Absolute Error Loss:', torch_mae)

Mean Absolute Error Loss: 6.5


### Compute Mean Absolute Error Loss with SciKit-Learn

In [4]:
# Use mean_absolute_error function from sklearn to compute MAE
sklearn_mae = mean_absolute_error(y_true, y_pred)

print('Mean Absolute Error Loss:', sklearn_mae)

Mean Absolute Error Loss: 6.5


### Compute Mean Absolute Error Loss with Numpy

In [5]:
# Create a custom function to compute MAE
def mean_absolute_err(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(abs(y_true - y_pred))

numpy_mae = mean_absolute_err(y_true, y_pred)

print('Mean Absolute Error Loss:', numpy_mae)

Mean Absolute Error Loss: 6.5


### Comparison of the MAE computation results between PyTorch, Sci-kit Learn and Numpy

Let's compare computed Mean Absolute Error values.

In [6]:
torch_mae == sklearn_mae == numpy_mae

True

All three different method of computation of MAE provided the same result.