## Evaluation metrics

Linear regression models are commonly evaluated using various metrics to assess their performance and measure the quality of the predictions. Here are some common evaluation metrics for linear regression:

Mean Squared Error (MSE): MSE is one of the most widely used metrics for regression problems. It calculates the average squared difference between the predicted and actual values. A lower MSE indicates better model performance.

Root Mean Squared Error (RMSE): RMSE is the square root of the MSE. It provides a measure of the average magnitude of the residuals. Like MSE, a lower RMSE indicates better model performance.

Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted and actual values. It provides a more interpretable metric as it represents the average magnitude of the residuals. Lower MAE indicates better model performance.

R-squared (R^2) Score: R-squared is a statistical measure that represents the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features). It ranges from 0 to 1, where a higher value indicates a better fit. However, R-squared alone does not account for the complexity of the model.

Adjusted R-squared: Adjusted R-squared is an adjusted version of the R-squared metric that considers the number of predictors in the model. It penalizes the addition of unnecessary predictors, helping to prevent overfitting.

Mean Percentage Error (MPE): MPE calculates the average percentage difference between the predicted and actual values. It provides insights into the direction and magnitude of the errors.

Mean Absolute Percentage Error (MAPE): MAPE calculates the average percentage difference between the predicted and actual values, considering the absolute value of the errors. It is a commonly used metric for evaluating forecasting models.

Residual Analysis: In addition to the above metrics, analyzing the residuals (the differences between predicted and actual values) is essential to understand the model's performance. Visualizing the residuals through scatter plots, histograms, or Q-Q plots can reveal patterns, heteroscedasticity, or other issues.

It's important to choose evaluation metrics that align with the specific problem and requirements. Some metrics may be more suitable depending on the nature of the data and the goals of the analysis.

## Code 

In [4]:
y_true = [1, 2, 3, 1, 2, 3, 1, 2, 3]
y_pred = [2, 1, 3, 1, 2, 3, 3, 1, 2]

In [2]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

### Mean Absolute Error from scratch

In [20]:
def mean_absolute_error(y_true, y_pred):

    """
    This function calculates mae
    :param y_true: list of real numbers , true values
    :param y_pred: list of real numbers, predicted values
    :return: mean absolute error
    """

    error = 0

    for yt, yp in zip(y_true, y_pred):
        error += np.abs(yt - yp)
    
    return error/len(y_true)

In [27]:
mean_absolute_error(y_true,y_pred)

0.6666666666666666

### Mean Absolute Error Using Sklearn

In [5]:
mean_absolute_error(y_true,y_pred)

0.6666666666666666

In [22]:
def mean_squared_error(y_true, y_pred):
    """
    This function calculates MSE
    :param y-true: list of real numbers, true values
    :param y_pred: list of real numbers, predicted values
    :return: mean squared error
    """
    error = 0

    for yt,yp in zip(y_true,y_pred):
        error += (yt-yp) ** 2

    return error/len(y_true)

In [23]:
mean_squared_error(y_true, y_pred)

0.8888888888888888

In [24]:
def r2(y_true, y_pred):
    """
    This function calculates r-squared score
    :param y_true: list of real numbers, true values
    :param y_pred: list of real numbers, predicted values
    :return: r2 score
    """

    # calculate the mean value of true values
    mean_true_value = np.mean(y_true)

    # initialize numerator with 0
    numerator = 0
    # initialize denominator with 0
    denominator = 0

    # loop over all true and predicted values
    for yt, yp in zip(y_true, y_pred):
        # update numerator
        numerator += (yt - yp) ** 2
        # update denominator
        denominator += (yt - mean_true_value) ** 2
        # calculate the ratio
        ratio = numerator / denominator
        # return 1 - ratio
    return 1 - ratio

In [26]:
r2(y_true, y_pred)

-0.33333333333333326