# 04.05 - Metrics for Regression

In regression analysis, we have several metrics that allow us to compare the performance of different models, including an adjusted model versus a baseline model. These metrics quantify the difference between the predicted values and the actual values, giving us an idea of how well our model is performing. It's important to note that in this discussion, we'll focus on the implementation using PyTorch, a machine learning library for the Python programming language.

One common metric is Mean Squared Error (MSE). This metric takes the average of the squared differences between the predicted and actual values. The squaring is significant because it removes the direction of the error, focusing solely on the magnitude. In PyTorch, you can compute the MSE using the `torch.nn.MSELoss()` function.

Another metric is Root Mean Squared Error (RMSE), which is the square root of the MSE. Taking the square root is useful because it brings the error metric back to the same units as the target variable, making it more interpretable. You can compute the RMSE in PyTorch by first calculating the MSE and then taking the square root using `torch.sqrt()`.

Mean Absolute Error (MAE) is another commonly used metric. It calculates the average absolute difference between the predicted and actual values. This can be particularly useful if you want to know the size of the error, but don't care whether it's over or under-prediction. You can compute the MAE in PyTorch using the `torch.nn.L1Loss()` function.

We also often use R-squared (R²) to compare an adjusted model to a baseline. R-squared quantifies the proportion of the variance in the target variable that is predictable from the input variables. A higher R-squared indicates a better fit to the data. While PyTorch doesn't have a built-in function for R-squared, you can calculate it manually by first computing the total sum of squares and the residual sum of squares.

Remember, no single metric can tell the whole story. It's important to consider these metrics in conjunction, and in context of the specific problem and dataset you're working with.

## Train a Simple Linear Regression Model using PyTorch

In [1]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import torch

# Path to the diamonds.csv file
diamonds_csv: str = './diamonds.csv'

# Reading the csv file into pandas DataFrame
diamonds: pd.DataFrame = pd.read_csv(diamonds_csv)

# Selecting the features for the model
df: pd.DataFrame = pd.DataFrame(diamonds.loc[:,['carat', 'depth', 'table', 'x', 'y', 'z']], columns=['carat', 'depth', 'table', 'x', 'y', 'z'])

# Extracting the target variable (price) from the original dataset
price: np.ndarray = diamonds['price'].values
target: pd.DataFrame = pd.DataFrame(price)

# Determining the device to be used for computations (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else \
                      "mps" if torch.backends.mps.is_available() else \
                      "cpu")

# Converting the target and features to tensors and moving them to the appropriate device
y = torch.tensor(target.values).float().to(device)
y = y.view(y.shape[0], 1)
X = torch.tensor(df[['carat']].values).float().to(device)

# Defining the model (a simple linear regression)
model = torch.nn.Linear(in_features = 1, out_features = 1).to(device)

# Defining the loss function (Mean Squared Error) and the optimizer (SGD)
loss_function = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

# Training the model for 5000 epochs
for epoch in range(5000):
    # Forward pass: compute predicted y by passing x to the model
    y_pred = model(X)

    # Compute and print loss
    loss = loss_function(y_pred, y)
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Loss {loss.item()}')

    # Zero gradients, perform a backward pass, and update the weights
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

# Print final loss, weights and bias
print(f'\nFinal loss: {loss.item()}')
print(f'Beta coefficients (weights): {model.weight.item()}')
print(f'Bias: {model.bias.item()}')

# Getting the model's predictions and converting them back to a numpy array
predictions = model.cpu()(X.cpu()).detach().numpy()

Epoch 0, Loss 31386426.0
Epoch 100, Loss 29866386.0
Epoch 200, Loss 28445910.0
Epoch 300, Loss 27118346.0
Epoch 400, Loss 25877488.0
Epoch 500, Loss 24717536.0
Epoch 600, Loss 23633092.0
Epoch 700, Loss 22619116.0
Epoch 800, Loss 21670894.0
Epoch 900, Loss 20784046.0
Epoch 1000, Loss 19954470.0
Epoch 1100, Loss 19178342.0
Epoch 1200, Loss 18452098.0
Epoch 1300, Loss 17772408.0
Epoch 1400, Loss 17136170.0
Epoch 1500, Loss 16540483.0
Epoch 1600, Loss 15982638.0
Epoch 1700, Loss 15460123.0
Epoch 1800, Loss 14970578.0
Epoch 1900, Loss 14511807.0
Epoch 2000, Loss 14081762.0
Epoch 2100, Loss 13678531.0
Epoch 2200, Loss 13300329.0
Epoch 2300, Loss 12945487.0
Epoch 2400, Loss 12612458.0
Epoch 2500, Loss 12299786.0
Epoch 2600, Loss 12006125.0
Epoch 2700, Loss 11730203.0
Epoch 2800, Loss 11470850.0
Epoch 2900, Loss 11226967.0
Epoch 3000, Loss 10997527.0
Epoch 3100, Loss 10781570.0
Epoch 3200, Loss 10578207.0
Epoch 3300, Loss 10386606.0
Epoch 3400, Loss 10205983.0
Epoch 3500, Loss 10035620.0
Epoc

## Mean Absolute Error (MAE)

Mean Absolute Error (MAE) is another metric used in regression analysis to evaluate the performance of a model. It is calculated as the average absolute difference between the actual target values and the predicted values from our model. The formula for MAE is:

```
MAE = (1/n) * Σ|actual - prediction|

```

Where:

- `n` is the total number of data points
- `Σ` represents the sum
- `actual` is the actual target value
- `prediction` is the predicted value from our model

In PyTorch, we can calculate the MAE using the `torch.nn.L1Loss()` function, which computes the mean absolute error between the predicted and actual values.

After you've trained your linear regression model and generated predictions, you can calculate the MAE as follows:

In [2]:
# Defining the loss function (Mean Absolute Error)
loss_function_mae = torch.nn.L1Loss()

# Compute the Mean Absolute Error
mae = loss_function_mae(torch.tensor(predictions), y.cpu())
print(f'Mean Absolute Error: {mae.item()}')


Mean Absolute Error: 2012.4505615234375


One of the key features of MAE is that it provides a linear score, meaning all individual differences are weighted equally in the average. However, this could be a limitation if you have outliers in your data, as MAE does not make use of the direction of the error and does not penalize large errors as much as Mean Squared Error does. Therefore, if large errors are particularly undesirable in your specific use case, MAE might not be the best metric to use.

## Residual Sum of Squares (RSS)

Residual Sum of Squares (RSS) is another important metric used in regression analysis. It is the sum of the squares of the residuals, which are the differences between the actual and predicted values. The formula for RSS is:

```
RSS = Σ(actual - prediction)^2

```

Where:

- `Σ` represents the sum
- `actual` is the actual target value
- `prediction` is the predicted value from our model

In PyTorch, although there isn't a built-in function to compute RSS, it can be calculated easily using basic operations:

In [3]:
# Compute the Residual Sum of Squares
residuals = y.cpu() - torch.tensor(predictions)
rss = torch.sum(residuals**2)
print(f'Residual Sum of Squares: {rss.item()}')

Residual Sum of Squares: 449957199872.0


One of the limitations of the RSS metric is that it heavily penalizes larger errors due to the squaring operation. This means that even if your model performs well on a majority of predictions, a few large errors can significantly increase the RSS, suggesting a worse fit than may be the case. Furthermore, RSS is scale-dependent — it depends on the scale of your target variable. Consequently, comparing the RSS between different datasets or even between different target variables within the same dataset can be misleading.

## Mean Squared Error (MSE)

Mean Squared Error (MSE) is a frequently used regression metric that assesses the quality of a model by calculating the average of squared differences between actual and predicted values. The formula for MSE is:

```
MSE = (1/n) * Σ(actual - prediction)^2

```

Where:

- `n` is the total number of data points
- `Σ` represents the sum
- `actual` is the actual target value
- `prediction` is the predicted value from our model

The squaring operation within the formula ensures that each term is positive and emphasizes larger errors over smaller ones, which can be valuable when larger errors are particularly undesirable.

In PyTorch, you can calculate the MSE using the `torch.nn.MSELoss()` function. After training your linear regression model and making predictions, you can compute the MSE as follows:

In [4]:
# Defining the loss function (Mean Squared Error)
loss_function_mse = torch.nn.MSELoss()

# Compute the Mean Squared Error
mse = loss_function_mse(torch.tensor(predictions), y.cpu())
print(f'Mean Squared Error: {mse.item()}')

Mean Squared Error: 8341809.5


While MSE is a useful metric, it's not without limitations. One significant drawback is that because the errors are squared before they're averaged, MSE places a heavier penalty on larger errors. This means that even a single outlier can significantly skew the MSE, suggesting a worse fit than may actually be the case. Additionally, because MSE squares the units of the target variable, the resulting metric can sometimes be challenging to interpret intuitively. It's also worth noting that, like RSS, MSE is scale-dependent, so comparing the MSE between different datasets or different target variables within the same dataset can be misleading.

## Root Mean Squared Error (RMSE)

Root Mean Squared Error (RMSE) is a standard way to measure the error of a model in predicting quantitative data. Formally, it is the square root of the average of squared differences between prediction and actual observation. The formula for RMSE is:

```
RMSE = sqrt[(1/n) * Σ(actual - prediction)^2]

```

Where:

- `n` is the total number of data points
- `Σ` represents the sum
- `actual` is the actual target value
- `prediction` is the predicted value from our model

The square root operation is useful because it brings the error metric back to the same units as the target variable, making it more interpretable than the Mean Squared Error (MSE).

Let's compute RMSE in PyTorch using the predictions from our linear regression model:

In [5]:
# Compute the Root Mean Squared Error
rmse = torch.sqrt(torch.mean((y.cpu() - torch.tensor(predictions))**2))
print(f'Root Mean Squared Error: {rmse.item()}')

Root Mean Squared Error: 2888.218994140625


While RMSE is a useful metric, it's not without limitations. Similar to MSE, RMSE also squares the error before taking the square root, placing a heavier penalty on larger errors. This means that even a single outlier can significantly skew the RMSE, suggesting a worse fit than may actually be the case. Additionally, because RMSE is scale-dependent, so comparing the RMSE between different datasets or different target variables within the same dataset can be misleading.

## Coefficient of Determination (R-Squared)

The Coefficient of Determination, also known as R-squared (R²), is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model. The formula for R-squared is:

```
R² = 1 - (RSS/TSS)

```

Where:

- `RSS` is the residual sum of squares
- `TSS` is the total sum of squares

In PyTorch, while there isn't a built-in function to compute R-squared, you can calculate it manually as follows:

In [6]:
# Compute the total sum of squares
tss = torch.sum((y.cpu() - torch.mean(y.cpu()))**2)

# Compute the residual sum of squares
residuals = y.cpu() - torch.tensor(predictions)
rss = torch.sum(residuals**2)

# Compute R-squared
r_squared = 1 - (rss / tss)
print(f'R-Squared: {r_squared.item()}')

R-Squared: 0.47586339712142944


R-squared ranges from 0 to 1. An R-squared of 100 percent indicates that all changes in the dependent variable are completely explained by changes in the independent variable(s). Conversely, an R-squared of 0 percent indicates that the model explains none of the variability of the response data around its mean.

However, R-squared is not without its limitations. A high R-squared does not necessarily indicate a good fit to the data, and conversely, a low R-squared is not necessarily indicative of a poor fit. R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data. It also does not provide a formal hypothesis test for the adequacy of the model. Furthermore, R-squared cannot determine whether the coefficient estimates and predictions are biased.

## Adjusted R-Squared

Adjusted R-squared is a modification of R-squared that adjusts for the number of predictors in a model. Unlike R-squared, the adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. The adjusted R-squared can be negative, but it's usually not. It is always lower than the R-squared.

The formula for adjusted R-squared is:

```
Adjusted R² = 1 - [(1-R²)*(n-1)/(n-p-1)]

```

Where:

- `n` is the total number of data points
- `p` is the number of predictors

The adjusted R-squared compensates for the addition of variables and only increases if the new variable improves the model above what would be expected by chance. It takes into account the degrees of freedom (essentially, the number of variables) and adjusts its score accordingly.

It is particularly useful when comparing the performance of different regression models in predicting the same outcome variable: while a model with more predictors may have a higher R-squared, adjusted R-squared allows you to compare these models on even footing.

In PyTorch, while there isn't a built-in function to compute adjusted R-squared, you can calculate it manually as follows. Let's say `p` is the number of predictors:

In [7]:
# Compute the total sum of squares
tss = torch.sum((y.cpu() - torch.mean(y.cpu()))**2)

# Compute the residual sum of squares
residuals = y.cpu() - torch.tensor(predictions)
rss = torch.sum(residuals**2)

# Compute R-squared
r_squared = 1 - (rss / tss)

# Number of observations
n = y.shape[0]

# Number of predictors
p = X.shape[1]

# Compute Adjusted R-squared
adjusted_r_squared = 1 - (1 - r_squared) * ((n - 1) / (n - p - 1))
print(f'Adjusted R-Squared: {adjusted_r_squared.item()}')

Adjusted R-Squared: 0.4758536219596863


By considering the number of predictors, the Adjusted R-squared helps to avoid the risk of overfitting, which is when a model is excessively complex and includes too many parameters, leading it to perform well on the training data but poorly on unseen data. Adjusted R-Squared is therefore an essential tool for reliable and robust model selection.

## Train an Adjusted Model (Multiple Linear Regression)

In [8]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import torch

# Define the path to the diamonds.csv file
diamonds_csv: str = './diamonds.csv'

# Read the csv file into a pandas DataFrame
diamonds: pd.DataFrame = pd.read_csv(diamonds_csv)

# Select multiple features for the model
df: pd.DataFrame = diamonds.loc[:,['carat', 'depth', 'table', 'x', 'y', 'z']]

# Extract the target variable (price) from the original dataset
price: np.ndarray = diamonds['price'].values
target: pd.DataFrame = pd.DataFrame(price)

# Determine the device to be used for computations (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else \
                      "mps" if torch.backends.mps.is_available() else \
                      "cpu")

# Convert the target and features to tensors and move them to the appropriate device
y = torch.tensor(target.values).float().to(device)
y = y.view(y.shape[0], 1)
X = torch.tensor(df.values).float().to(device) # Here we are using all the features ['carat', 'depth', 'table', 'x', 'y', 'z']

# Define the model (a multiple linear regression)
model = torch.nn.Linear(in_features = 6, out_features = 1).to(device) # We have 6 features, so in_features is set to 6

# Define the loss function (Mean Squared Error) and the optimizer (SGD)
loss_function = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

# Train the model for 5000 epochs
for epoch in range(5000):
    # Forward pass: compute predicted y by passing x to the model
    y_pred = model(X)

    # Compute and print loss
    loss = loss_function(y_pred, y)
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Loss {loss.item()}')

    # Zero gradients, perform a backward pass, and update the weights
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

# Print final loss, weights and bias
print(f'\nFinal loss: {loss.item()}')
print(f'Beta coefficients (weights): {model.weight.detach().cpu().numpy()}')
print(f'Bias: {model.bias.item()}')

# Getting the model's predictions and converting them back to a numpy array
predictions = model.cpu()(X.cpu()).detach().numpy()

Epoch 0, Loss 31678758.0
Epoch 100, Loss 14326399.0
Epoch 200, Loss 13036275.0
Epoch 300, Loss 11901428.0
Epoch 400, Loss 10902685.0
Epoch 500, Loss 10023317.0
Epoch 600, Loss 9248715.0
Epoch 700, Loss 8566113.0
Epoch 800, Loss 7964342.5
Epoch 900, Loss 7433629.5
Epoch 1000, Loss 6965412.5
Epoch 1100, Loss 6552189.5
Epoch 1200, Loss 6187382.5
Epoch 1300, Loss 5865213.5
Epoch 1400, Loss 5580614.0
Epoch 1500, Loss 5329129.0
Epoch 1600, Loss 5106840.0
Epoch 1700, Loss 4910308.0
Epoch 1800, Loss 4736500.5
Epoch 1900, Loss 4582750.0
Epoch 2000, Loss 4446706.5
Epoch 2100, Loss 4326301.0
Epoch 2200, Loss 4219711.0
Epoch 2300, Loss 4125325.0
Epoch 2400, Loss 4041726.5
Epoch 2500, Loss 3967662.5
Epoch 2600, Loss 3902029.5
Epoch 2700, Loss 3843851.25
Epoch 2800, Loss 3792267.5
Epoch 2900, Loss 3746516.75
Epoch 3000, Loss 3705926.75
Epoch 3100, Loss 3669904.25
Epoch 3200, Loss 3637924.0
Epoch 3300, Loss 3609521.0
Epoch 3400, Loss 3584286.0
Epoch 3500, Loss 3561854.5
Epoch 3600, Loss 3541907.25
Ep

## Compute Metrics Function

In [9]:
# Define a function to compute all the metrics
def compute_metrics(y_actual, y_pred, n, p):

    # Convert the actual and predicted values to tensors
    y_actual = torch.tensor(y_actual).float()
    y_pred = torch.tensor(y_pred).float()

    # Define the loss functions
    loss_function_mae = torch.nn.L1Loss()
    loss_function_mse = torch.nn.MSELoss()

    # Compute Mean Absolute Error
    mae = loss_function_mae(y_pred, y_actual)
    print(f'Mean Absolute Error: {mae.item()}')

    # Compute Mean Squared Error
    mse = loss_function_mse(y_pred, y_actual)
    print(f'Mean Squared Error: {mse.item()}')

    # Compute Root Mean Squared Error
    rmse = torch.sqrt(mse)
    print(f'Root Mean Squared Error: {rmse.item()}')

    # Compute the Residual Sum of Squares
    residuals = y_actual - y_pred
    rss = torch.sum(residuals**2)
    print(f'Residual Sum of Squares: {rss.item()}')

    # Compute the total sum of squares
    tss = torch.sum((y_actual - torch.mean(y_actual))**2)

    # Compute R-squared
    r_squared = 1 - (rss / tss)
    print(f'R-Squared: {r_squared.item()}')

    # Compute Adjusted R-squared
    adjusted_r_squared = 1 - (1 - r_squared) * ((n - 1) / (n - p - 1))
    print(f'Adjusted R-Squared: {adjusted_r_squared.item()}')

In [10]:
# Call the function with the actual and predicted values, number of observations (n), and number of predictors (p)
compute_metrics(y_actual = target.values,
                y_pred = predictions,
                n = y.shape[0],
                p = X.shape[1])

Mean Absolute Error: 1292.947021484375
Mean Squared Error: 3410216.25
Root Mean Squared Error: 1846.6771240234375
Residual Sum of Squares: 183947067392.0
R-Squared: 0.7857276201248169
Adjusted R-Squared: 0.7857037782669067
