# <FONT COLOR = "red">***HANDMADE METRICS CODE***</FONT
---
---

The main objective of this notebook is to understand how the metrics code works.To do so, in this notebook you can find the handmade implementation of the metrics:

1.   Mean Squared Error (MSE).
2.   Mean Absolute Error (MAE).
3.   Determintaion Coefficient ($R^2$)

In [11]:
# IMPORT COMMON LIBRARIES
import numpy as np
import pandas as pd

# IMPORT METRICS LIBRARIES
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# IMPORT LIBRARIES TO SIMULATION
from random import randint

## <FONT COLOR = "orange">**Mean Squared Error (MSE)**</FONT>
---
---

Mean Squared Error (MSE) measures the average squared difference between predicted and actual values in a dataset. It's a common metric for regression model evaluation. MSE metric has a range from 0 to infinity (∞) when a lower MSE indicates better model performance, with 0 representing a *perfect prediction*.

The MSE metric is described by the following equation:
$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
Where:

*   $n$ is the number of data points.
*   $y_{i}$ represents the real values.
*   $\hat{y_{i}}$ represents the predicted values.

In [4]:
# MSE FUNTION
def mse_metric(y_true:np.ndarray, y_pred:np.ndarray) -> float:

  # TRANSFORM THE LABELS INTO FLOAT VALUES
  y_true = y_true.astype(float)
  y_pred = y_pred.astype(float)

  # CALCULATE THE LABEL DIFFERENCE
  dif = y_true - y_pred
  square_dif = np.square(dif)

  # CALCULATE THE MEAN VALUE OF THE SQUARED DIFFERENCE
  mse = np.mean(square_dif)
  return float(mse)

## <FONT COLOR = "orange">**Mean Absolute Error (MAE)**</FONT>
---
---
Mean Absolute Error (MAE) measures the average absolute difference between predicted and actual values in a dataset. It's a common metric for regression model evaluation. MAE metric has a range from 0 to infinity (∞) when a lower MAE indicates better model performance, with 0 representing a *perfect prediction*.

The MAE metric is described by the following equation.
$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
Where:

*   $n$ is the number of data points.
*   $y_{i}$ represents the real values.
*   $\hat{y_{i}}$ represents the predicted values.



In [5]:
# MAE FUNCTION
def mae_metric(y_true:np.ndarray, y_pred:np.ndarray) -> float:

  # TRANSFORM THE LABELS INTO FLOAT VALUES
  y_true = y_true.astype(float)
  y_pred = y_pred.astype(float)

  # CALCULATE THE LABEL DIFFERENCE
  dif = y_true - y_pred
  abs_dif = np.abs(dif)

  # CALCULATE THE MEAN VALUE OF THE ABSOLUTE DIFFERENCE
  mae = np.mean(abs_dif)
  return float(mae)

## <FONT COLOR = "orange">**Determination coefficient**</FONT>
---
---

R-squared ($R^2$) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. $R^2$ metric has a range from 0 to 1 where:

*   0 indicates that the model explains none of the variability of the response data around its mean.
*   1 indicates that the model explains all the variability of the response data around its mean.

$R^2$ is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. $R^2$ measures the strength of the relationship between your model and the dependent variable on a convenient $0\%$ $–$ $100\%$ scale.

The $R^2$ metric is described by the following equation:
$$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$$
Where:

*   $SS_{res}$ is the sum of squares of residuals (the difference between predicted and actual values).
*   $SS_{tot}$ is the total sum of squares (the difference between actual values and the mean of actual values).

To understand better what is mean $SS_{res}$ and $SS_{tot}$, to continue is presented the corresponding equations.
$$SS_{res} = \sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^{2}$$
$$SS_{tot} = \sum_{i=1}^{n} (y_{i} - \bar{y})^{2}$$
Where:

*   $n$ is the number of data points.
*   $y_{i}$ represent the real values.
*   $\hat{y_{i}}$ represent the predicted values.
*   $\bar{y}$ represent the mean of all real values.

In [6]:
# R^2 FUNCTION
def r2_metric(y_true:np.ndarray, y_pred:np.ndarray) -> float:

  # TRANSFORM THE LABELS INTO FLOAT VALUES
  y_true = y_true.astype(float)
  y_pred = y_pred.astype(float)

  # CALCULATE SS_RES
  ss_res_dif = y_true - y_pred
  ss_res_square_dif = np.square(ss_res_dif)
  ss_res = np.sum(ss_res_square_dif)

  # CALCULATE SS_TOT
  ss_tot_dif = y_true - np.mean(y_true)
  ss_tot_square_dif = np.square(ss_tot_dif)
  ss_tot = np.sum(ss_tot_square_dif)

  # CALCULATE R^2
  r2 = 1 - (ss_res / ss_tot)
  return float(r2)

##<FONT COLOR = "orange">**Proof of Concept**</FONT>
---
---

In this section you can find a data simulation to compare the performance of my metrics functions with the libraries metrics function.

In [13]:
# DEFINE THE LINEAR EQUATION PARAMETERS
slope = randint(-100, 100)
intercept = randint(-100, 100)

# GENERATE X VALUES
x = np.arange(0, 10, 0.1)

# GENERATE PREDICTED VALUES
predicted_values = slope * x + intercept

# GENERATE REAL VALUES
noise = np.random.normal(0, 1, size=len(x))
real_values = predicted_values + noise

In [14]:
# CALCULATE METRICS WITH THE LIBRARIES FUNCTIONS
mse_lib = mean_squared_error(real_values, predicted_values)
mae_lib = mean_absolute_error(real_values, predicted_values)
r2_lib = r2_score(real_values, predicted_values)

In [15]:
# CALCULATE METRICS WITH MY FUNCTIONS
mse_my = mse_metric(real_values, predicted_values)
mae_my = mae_metric(real_values, predicted_values)
r2_my = r2_metric(real_values, predicted_values)

In [17]:
# CREATE A PANDAS DATAFRAME TO COMPARE RESULTS
df_metrics = pd.DataFrame({
    'Metrics': ['MSE', 'MAE', 'R^2'],
    'Libraries': [mse_lib, mae_lib, r2_lib],
    'Handmade': [mse_my, mae_my, r2_my]
})

# DISPLAY DATAFRAME
display(df_metrics)

Unnamed: 0,Metrics,Libraries,Handmade
0,MSE,1.290774,1.290774
1,MAE,0.919392,0.919392
2,R^2,0.999983,0.999983
