**Error-Based Metrics**

These metrics focus directly on the magnitude of the errors (residuals).

**25. Mean Absolute Error (MAE)**

* **Concept:** Calculates the average of the absolute differences between the predicted values and the actual values. It tells you, on average, how far off your predictions are.
* **Formula:**
    $MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$
    Where $n$ is the number of samples.
* **Interpretation:**
    * Represents the average absolute prediction error.
    * Measured in the **same units** as the target variable (e.g., dollars, degrees Celsius).
    * Ranges from 0 to $\infty$. A score of 0 means perfect prediction. Lower values are better.
    * An MAE of 5 means, on average, the predictions are off by 5 units from the true values.
* **Pros:**
    * **Easy to understand and interpret** due to being in the original units.
    * **Robust to outliers:** Doesn't disproportionately penalize large errors because it doesn't square the errors. Each error contributes proportionally to its magnitude.
* **Cons:**
    * Doesn't penalize large errors significantly more than small ones, which might be undesirable if large errors are particularly costly.
    * The absolute value function is not smoothly differentiable at zero, which can be a disadvantage mathematically (e.g., as a direct loss function for some gradient-based optimization methods).
* **Example:**
    Suppose true house prices (`y_true`) and predicted prices (`y_pred`) in $1000s are:
    `y_true = [200, 350, 150, 500, 275]`
    `y_pred = [210, 330, 165, 480, 280]`

    Errors ($y_i - \hat{y}_i$): `[-10, 20, -15, 20, -5]`
    Absolute Errors ($|y_i - \hat{y}_i|$): `[10, 20, 15, 20, 5]`
    $MAE = \frac{10 + 20 + 15 + 20 + 5}{5} = \frac{70}{5} = 14$
    The MAE is $14k (or $14,000). On average, the price prediction is off by $14,000.

In [4]:
# Implementation (Scikit-learn):**

from sklearn.metrics import mean_absolute_error
import numpy as np

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])

mae = mean_absolute_error(y_true, y_pred)
print(f"Mean Absolute Error (MAE): {mae}")

Mean Absolute Error (MAE): 14.0


* **Context:** A good choice when you need a metric that is easily interpretable in the original units and when you don't want outliers to dominate the error measure. Useful for reporting prediction accuracy to stakeholders.

---

**26. Mean Squared Error (MSE)**

* **Concept:** Calculates the average of the *squared* differences between the predicted values and the actual values.
* **Formula:**
    $MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$
* **Interpretation:**
    * Represents the average squared prediction error.
    * Measured in the **square of the units** of the target variable (e.g., dollars squared, degrees Celsius squared). This makes direct interpretation difficult.
    * Ranges from 0 to $\infty$. A score of 0 means perfect prediction. Lower values are better.
    * Penalizes large errors much more heavily than small errors due to the squaring. An error of 10 contributes 100 to the sum, while an error of 2 contributes only 4.
* **Pros:**
    * **Penalizes large errors significantly,** which is often desirable.
    * **Mathematically convenient:** The squared term makes it smoothly differentiable, which is useful for optimization algorithms (it's the standard loss function for linear regression).
* **Cons:**
    * **Highly sensitive to outliers:** A single large error can inflate the MSE substantially.
    * **Units are squared,** making it hard to interpret the value directly in the context of the problem (e.g., an MSE of 250 dollars-squared doesn't have an intuitive meaning).
* **Example:**
    Using the same house price data:
    `y_true = [200, 350, 150, 500, 275]`
    `y_pred = [210, 330, 165, 480, 280]`
    Errors: `[-10, 20, -15, 20, -5]`
    Squared Errors ($(y_i - \hat{y}_i)^2$): `[100, 400, 225, 400, 25]`
    $MSE = \frac{100 + 400 + 225 + 400 + 25}{5} = \frac{1150}{5} = 230$
    The MSE is 230 (in units of thousands-of-dollars squared).

    *Outlier Impact:* Let's say the last prediction was way off: `y_pred = [210, 330, 165, 480, 575]`. True value was 275.
    New Errors: `[-10, 20, -15, 20, -300]`
    New Abs Errors: `[10, 20, 15, 20, 300]` -> New MAE = (10+20+15+20+300)/5 = 365/5 = 73 (Increased significantly, but linearly)
    New Squared Errors: `[100, 400, 225, 400, 90000]` -> New MSE = (100+400+225+400+90000)/5 = 91125/5 = 18225 (Exploded due to the outlier!)

In [3]:
# **Implementation (Scikit-learn):**

from sklearn.metrics import mean_squared_error

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])
y_pred_outlier = np.array([210, 330, 165, 480, 575]) # With outlier

mse = mean_squared_error(y_true, y_pred)
mse_outlier = mean_squared_error(y_true, y_pred_outlier)

print(f"Mean Squared Error (MSE): {mse}")
# Output: Mean Squared Error (MSE): 230.0
print(f"MSE with Outlier: {mse_outlier}")

Mean Squared Error (MSE): 230.0
MSE with Outlier: 18225.0


* **Context:** Commonly used as a loss function for training models. Useful as an evaluation metric when large errors should be penalized heavily. Be cautious about its sensitivity to outliers and the non-intuitive units.

---

**27. Root Mean Squared Error (RMSE)**

* **Concept:** The square root of the Mean Squared Error (MSE). This effectively brings the units back to the original scale of the target variable.
* **Formula:**
    $RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$
* **Interpretation:**
    * Represents the **standard deviation of the residuals** (prediction errors). It measures the typical magnitude of the error.
    * Measured in the **same units** as the target variable (like MAE).
    * Ranges from 0 to $\infty$. A score of 0 means perfect prediction. Lower values are better.
    * An RMSE of 15 means the typical deviation of the prediction from the true value is about 15 units.
* **Pros:**
    * **Interpretable units:** Same units as the target variable, easier to understand than MSE.
    * **Penalizes large errors:** Retains the property of MSE where large errors have a disproportionately large impact (though dampened by the square root).
    * Very commonly used and reported metric.
* **Cons:**
    * **Sensitive to outliers:** Like MSE, it can be significantly affected by outliers (though the impact is somewhat reduced compared to MSE due to the square root).
    * Mathematically slightly more complex than MAE.
* **Example:**
    Using the MSE values from the previous example:
    * Original data: $MSE = 230$.
        $RMSE = \sqrt{230} \approx 15.17$
        The RMSE is $15.17k (or $15,170).
    * Data with outlier: $MSE = 18225$.
        $RMSE = \sqrt{18225} = 135$
        The RMSE is $135k. Compare this to the MAE of 73k for the outlier case. RMSE is larger, reflecting the stronger penalty for the large error.


In [9]:
#* **Implementation (Scikit-learn):**

from sklearn.metrics import mean_squared_error
import numpy as np # Needed for np.sqrt

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])
y_pred_outlier = np.array([210, 330, 165, 480, 575]) # With outlier

# Method 1: Calculate MSE then take sqrt
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE - Method 1): {rmse:.2f}")

Root Mean Squared Error (RMSE - Method 1): 15.17


In [8]:
mse_outlier = mean_squared_error(y_true, y_pred_outlier)
rmse_outlier = np.sqrt(mse_outlier)
print(f"RMSE with Outlier (Method 1): {rmse_outlier:.2f}")

RMSE with Outlier (Method 1): 135.00


In [12]:
pip install -U scikit-learn

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [15]:
# Method 2: Use squared=False argument (newer sklearn versions 0.24+)
try:
    rmse_direct = mean_squared_error(y_true, y_pred, squared=False)
    rmse_outlier_direct = mean_squared_error(y_true, y_pred_outlier, squared=False)
    print(f"RMSE (Method 2 - requires sklearn 0.24+): {rmse_direct:.2f}")
    print(f"RMSE with Outlier (Method 2 - requires sklearn 0.24+): {rmse_outlier_direct:.2f}")
except TypeError as e:
    print(f"\nNote: Could not use squared=False parameter. Error: {e}")


Note: Could not use squared=False parameter. Error: got an unexpected keyword argument 'squared'


* **Context:** Perhaps the most frequently used regression metric. It offers a good balance between interpretability (original units) and sensitivity to large errors. It's often the default metric reported for regression tasks, but always consider the potential impact of outliers.

---

**28. Median Absolute Error (MedAE)**

* **Concept:** Calculates the median of all the absolute differences between the predicted values and the actual values.
* **Formula:**
    $MedAE = \text{median}(|y_1 - \hat{y}_1|, |y_2 - \hat{y}_2|, ..., |y_n - \hat{y}_n|)$
* **Interpretation:**
    * Represents the median absolute prediction error. Tells you the error magnitude for the "middle" data point if you were to sort all absolute errors.
    * Measured in the **same units** as the target variable.
    * Ranges from 0 to $\infty$. Lower is better.
* **Pros:**
    * **Highly robust to outliers:** The median is not affected by extreme values, making this metric excellent when outliers are present and shouldn't influence the overall error assessment.
    * Easy to interpret units.
* **Cons:**
    * Ignores the magnitude and distribution of errors beyond the median point. A model could have very large errors for half the data, but MedAE would only reflect the error of the middle value.
    * Less common than MAE or RMSE.
* **Example:**
    Using the house price data:
    * Original data: Absolute Errors: `[10, 20, 15, 20, 5]`. Sorted: `[5, 10, 15, 20, 20]`.
        $MedAE = 15$ (the middle value). The median error is $15k.
    * Data with outlier: Absolute Errors: `[10, 20, 15, 20, 300]`. Sorted: `[10, 15, 20, 20, 300]`.
        $MedAE = 20$ (the middle value). The median error is $20k. Notice how the huge outlier (300) had very little impact on MedAE (it only shifted from 15 to 20), unlike its drastic effect on MAE and RMSE.

In [17]:
#**Implementation (Scikit-learn):**

from sklearn.metrics import median_absolute_error

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])
y_pred_outlier = np.array([210, 330, 165, 480, 575]) # With outlier

medae = median_absolute_error(y_true, y_pred)
medae_outlier = median_absolute_error(y_true, y_pred_outlier)

print(f"Median Absolute Error (MedAE): {medae}")

print(f"MedAE with Outlier: {medae_outlier}")

Median Absolute Error (MedAE): 15.0
MedAE with Outlier: 20.0


* **Context:** Use when you need a measure of central tendency for the error that is insensitive to outliers. Good for understanding the typical error magnitude in skewed or outlier-prone datasets.

---

**29. Max Error**

* **Concept:** Identifies the single largest absolute difference between any predicted value and its corresponding actual value across the entire dataset.
* **Formula:**
    $MaxError = \max_{i} (|y_i - \hat{y}_i|)$
* **Interpretation:**
    * Represents the **worst-case scenario** error for any single prediction.
    * Measured in the **same units** as the target variable.
    * Ranges from 0 to $\infty$. Lower is better.
* **Pros:**
    * Directly captures the magnitude of the largest prediction error.
    * Useful for understanding the upper bound of the model's errors.
* **Cons:**
    * **Extremely sensitive to outliers:** Determined by just one data point.
    * Provides no information about the typical error or the distribution of errors.
* **Example:**
    Using the house price data:
    * Original data: Absolute Errors: `[10, 20, 15, 20, 5]`.
        $MaxError = 20$. The worst prediction was off by $20k.
    * Data with outlier: Absolute Errors: `[10, 20, 15, 20, 300]`.
        $MaxError = 300$. The worst prediction was off by $300k.

In [18]:
#* **Implementation (Scikit-learn):**
from sklearn.metrics import max_error

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])
y_pred_outlier = np.array([210, 330, 165, 480, 575]) # With outlier

max_err = max_error(y_true, y_pred)
max_err_outlier = max_error(y_true, y_pred_outlier)

print(f"Max Error: {max_err}")

print(f"Max Error with Outlier: {max_err_outlier}")

Max Error: 20
Max Error with Outlier: 300


* **Context:** Relevant in applications where the maximum possible error is critical, such as in engineering safety tolerances, financial predictions needing guarantees, or any domain where large individual errors are unacceptable.

---

**B. Relative Performance Metrics**

These metrics evaluate the model's performance relative to the variability inherent in the data itself.

---

**30. R-squared (R²) - Coefficient of Determination**

* **Concept:** Measures the proportion of the total variance in the target variable ($y$) that is explained by the model's predictions ($\hat{y}$). It compares the model's errors ($SS_{res}$) to the variance of the target variable around its mean ($SS_{tot}$).
* **Formula:**
    $R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$
    Where:
    * $SS_{res} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$ is the Sum of Squared Residuals (model errors).
    * $SS_{tot} = \sum_{i=1}^{n} (y_i - \bar{y})^2$ is the Total Sum of Squares (proportional to the variance of $y$).
    * $\bar{y}$ is the mean of the true values $y_i$.
* **Interpretation:**
    * Ranges theoretically from $-\infty$ to 1. Practically often seen between 0 and 1 on training data.
    * $R^2 = 1$: The model perfectly explains all the variance in the target variable. ($SS_{res}=0$).
    * $R^2 = 0$: The model explains none of the variance; it performs no better than simply predicting the mean $\bar{y}$ for all instances. ($SS_{res}=SS_{tot}$).
    * $R^2 < 0$: The model performs *worse* than predicting the mean. This can happen on test data or with cross-validation if the model fits the training data poorly or makes systematically worse predictions than the mean.
    * Often expressed as a percentage: $R^2 = 0.75$ means "the model explains 75% of the variability in the target variable".
* **Pros:**
    * Provides a **relative measure of fit** (unitless).
    * Gives an intuitive percentage interpretation of how much variance the model accounts for.
    * Very common in statistical modeling, especially linear regression.
* **Cons:**
    * **R² always increases or stays the same** when more features (predictors) are added to the model, even if they are irrelevant. This makes it unsuitable for comparing models with different numbers of features, as it encourages overfitting.
    * A high R² doesn't necessarily mean the model makes accurate predictions in an absolute sense (MAE/RMSE could still be high).
    * Doesn't indicate if the model is biased or if the relationship is truly linear (if assuming linear regression).
* **Example:**
    Using the original house price data:
    `y_true = [200, 350, 150, 500, 275]` -> Mean $\bar{y} = (200+350+150+500+275)/5 = 1475/5 = 295$.
    `y_pred = [210, 330, 165, 480, 280]`
    $SS_{res}$: We know $MSE = 230$, and $MSE = SS_{res}/n$, so $SS_{res} = MSE \times n = 230 \times 5 = 1150$.
    $SS_{tot} = (200-295)^2 + (350-295)^2 + (150-295)^2 + (500-295)^2 + (275-295)^2$
    $SS_{tot} = (-95)^2 + (55)^2 + (-145)^2 + (205)^2 + (-20)^2$
    $SS_{tot} = 9025 + 3025 + 21025 + 42025 + 400 = 75500$.
    $R^2 = 1 - \frac{1150}{75500} = 1 - 0.01523... \approx 0.9848$
    The model explains about 98.5% of the variance in house prices.

In [19]:
# **Implementation (Scikit-learn):**

from sklearn.metrics import r2_score

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])

r2 = r2_score(y_true, y_pred)
print(f"R-squared (R2): {r2:.4f}")

R-squared (R2): 0.9848


* **Context:** A standard measure of goodness-of-fit, particularly in linear regression contexts. Useful for understanding how much of the data's variability is captured by the model, but should be used cautiously for model comparison, especially if models differ in complexity (number of features). Always complement with absolute error metrics (MAE, RMSE).

---

**31. Adjusted R-squared**

* **Concept:** A modified version of R² that penalizes the score for including extra predictors (features) that do not significantly improve the model's fit. It adjusts R² based on the number of data points ($n$) and the number of features ($p$).
* **Formula:**
    $Adjusted \ R^2 = 1 - (1 - R^2) \frac{n - 1}{n - p - 1}$
    Where:
    * $R^2$ is the standard R-squared value.
    * $n$ is the number of samples (data points).
    * $p$ is the number of predictors (features) in the model.
* **Interpretation:**
    * Similar interpretation to R², but the value will only increase if adding a new feature improves $R^2$ enough to compensate for the penalty of adding a feature.
    * Adjusted $R^2$ is always less than or equal to $R^2$.
    * Can be negative.
    * More suitable for comparing models with different numbers of features. A higher adjusted R² suggests a better model considering complexity.
* **Pros:**
    * **Accounts for model complexity:** Penalizes the addition of non-informative features.
    * **Better for model comparison:** More reliable than R² when comparing models with different numbers of predictors.
* **Cons:**
    * Interpretation is slightly less direct than the R² percentage.
    * Still doesn't indicate absolute prediction accuracy or model bias.
    * Requires knowing the number of features ($p$), which might not always be straightforward (e.g., after complex feature engineering).
* **Example:**
    Using the previous R² = 0.9848. Let $n=5$. Assume our model used $p=2$ features.
    $Adjusted \ R^2 = 1 - (1 - 0.9848) \frac{5 - 1}{5 - 2 - 1} = 1 - (0.0152) \frac{4}{2}$
    $Adjusted \ R^2 = 1 - (0.0152 \times 2) = 1 - 0.0304 = 0.9696$

    Now, suppose we added another useless feature ($p=3$) and $R^2$ only slightly increased to 0.9850.
    $Adjusted \ R^2_{new} = 1 - (1 - 0.9850) \frac{5 - 1}{5 - 3 - 1} = 1 - (0.0150) \frac{4}{1}$
    $Adjusted \ R^2_{new} = 1 - 0.0600 = 0.9400$
    Even though $R^2$ slightly increased, Adjusted $R^2$ decreased, correctly indicating that adding the third feature wasn't worthwhile.

In [22]:
#*Implementation (Scikit-learn):** Not a direct function. Calculate manually.

from sklearn.metrics import r2_score

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])
n = len(y_true) # Number of samples
p = 2 # Assume 2 features were used for this model

r2 = r2_score(y_true, y_pred)

# Calculate Adjusted R-squared manually
if n - p - 1 != 0: # Avoid division by zero
    adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
else:
    adj_r2 = np.nan # Or handle as appropriate

print(f"R-squared: {r2:.4f}")

print(f"Adjusted R-squared (n={n}, p={p}): {adj_r2:.4f}")

R-squared: 0.9848
Adjusted R-squared (n=5, p=2): 0.9695


* **Context:** Use Adjusted R² instead of R² when comparing models with different numbers of features or during feature selection processes. It provides a more honest assessment of model fit by penalizing unnecessary complexity.

---

**C. Percentage Error Metrics**

These metrics express the error relative to the magnitude of the true value, often resulting in a percentage.

---

**32. Mean Absolute Percentage Error (MAPE)**

* **Concept:** Calculates the average of the absolute errors taken as a percentage of the actual values.
* **Formula:**
    $MAPE = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100\%$
* **Interpretation:**
    * Represents the average percentage deviation of the predictions from the actual values.
    * Unitless (a percentage). Lower is better. Ranges from 0% to $\infty$.
    * A MAPE of 10% suggests the model's predictions are, on average, within 10% of the true values.
* **Pros:**
    * **Intuitive percentage interpretation,** making it easy to communicate.
    * Scale-independent, allowing comparison across datasets or variables with different scales.
* **Cons:**
    * **Undefined if any true value $y_i$ is zero.** Can explode if $y_i$ is close to zero.
    * **Asymmetric:** It penalizes under-predictions ($\hat{y}_i < y_i$) less heavily than over-predictions ($\hat{y}_i > y_i$) of the same absolute magnitude, relative to the true value. For example, if True=10, Pred=5 (error -5), |error/true| = 50%. If True=5, Pred=10 (error +5), |error/true| = 100%.
    * Assumes percentage errors are meaningful (e.g., a 10% error on 1,000,000 is much larger in absolute terms than a 10% error on 10).
* **Example:**
    Using the original house price data (ensure no zeros):
    `y_true = [200, 350, 150, 500, 275]`
    `y_pred = [210, 330, 165, 480, 280]`
    Errors: `[-10, 20, -15, 20, -5]`
    Percentage Errors ($ (y_i - \hat{y}_i) / y_i $):
    `[-10/200, 20/350, -15/150, 20/500, -5/275]`
    `[-0.05, 0.057, -0.10, 0.04, -0.018]`
    Absolute Percentage Errors: `[0.05, 0.057, 0.10, 0.04, 0.018]`
    $MAPE = \frac{0.05 + 0.057 + 0.10 + 0.04 + 0.018}{5} \times 100\%$
    $MAPE = \frac{0.265}{5} \times 100\% = 0.053 \times 100\% = 5.3\%$
    On average, the predictions are about 5.3% off the actual price.

In [23]:
# **Implementation (Scikit-learn):**

from sklearn.metrics import mean_absolute_percentage_error

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])

# Ensure no zeros in y_true if calculating manually or using libraries that might not handle it
if np.any(y_true == 0):
    print("Warning: y_true contains zeros, MAPE is undefined or problematic.")
    mape = np.nan
else:
    mape = mean_absolute_percentage_error(y_true, y_pred)

# Output from sklearn is a proportion, multiply by 100 for percentage
print(f"Mean Absolute Percentage Error (MAPE): {mape * 100:.2f}%")

# Note: Slight diff from manual due to precision in intermediate steps.

Mean Absolute Percentage Error (MAPE): 5.31%


* **Context:** Often used in business forecasting (e.g., sales, demand) because of its intuitive percentage interpretation. However, be extremely careful if your target variable can be zero or close to zero, and be aware of its asymmetric penalization.

---

**33. Symmetric Mean Absolute Percentage Error (sMAPE)**

* **Concept:** An alternative to MAPE that attempts to correct its asymmetry and division-by-zero issues by normalizing the absolute error by the *average of the absolute values* of the actual and predicted figures.
* **Formula:** (Using the definition common in recent forecasting competitions and aligned with how one might implement based on descriptions, although *not* directly in sklearn.metrics as `sMAPE`):
    $sMAPE = \frac{1}{n} \sum_{i=1}^{n} \frac{2 \times |y_i - \hat{y}_i|}{|y_i| + |\hat{y}_i|} \times 100\%$
    *(Note: Different formulas exist. This one ensures the result is between 0% and 200%).*
* **Interpretation:**
    * Represents a percentage error, adjusted for symmetry.
    * Ranges from 0% to 200%. Lower is better.
    * The interpretation is less direct than MAPE (it's roughly the absolute error as a percentage of the average magnitude of the true and predicted values).
* **Pros:**
    * **More symmetric** in penalizing over- and under-predictions compared to MAPE.
    * **Avoids division by zero** unless *both* $y_i$ and $\hat{y}_i$ are zero (in which case the term is typically defined as 0).
    * Bounded range [0%, 200%].
* **Cons:**
    * **Less intuitive interpretation** than MAPE.
    * Can produce **strange results** if one value is zero and the other is non-zero (the term becomes $\frac{2|y_i|}{ |y_i|} = 2$, resulting in a 200% error for that point, which might be unexpected).
    * Not as widely used or standardized as MAPE or RMSE.
    * **Not directly available in `sklearn.metrics`** (as of common versions, always check documentation for updates).
* **Example:**
    Using the original house price data:
    `y_true = [200, 350, 150, 500, 275]`
    `y_pred = [210, 330, 165, 480, 280]`
    Absolute Errors $|y_i - \hat{y}_i|$: `[10, 20, 15, 20, 5]`
    Sum of Abs Values $|y_i| + |\hat{y}_i|$: `[410, 680, 315, 980, 555]`
    Term $2 \times |err| / (|y| + |\hat{y}|)$:
    `[2*10/410, 2*20/680, 2*15/315, 2*20/980, 2*5/555]`
    `[0.0488, 0.0588, 0.0952, 0.0408, 0.0180]`
    $sMAPE = \frac{0.0488 + 0.0588 + 0.0952 + 0.0408 + 0.0180}{5} \times 100\%$
    $sMAPE = \frac{0.2616}{5} \times 100\% = 0.0523 \times 100\% = 5.23\%$
    (In this case, very similar to MAPE because predictions are close to actuals).

In [27]:
# **Implementation (Manual):**

import numpy as np

def smape(y_true, y_pred):
#""" Calculates Symmetric Mean Absolute Percentage Error (sMAPE) """
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    numerator = np.abs(y_true - y_pred)
    denominator = (np.abs(y_true) + np.abs(y_pred)) / 2 # Or use sum directly in formula
# Handle case where both are zero (should be 0 error)
    ratio = np.divide(numerator, denominator, out=np.zeros_like(numerator, dtype=float), where=denominator!=0)
# Alternative formula structure as used in manual example
# ratio = np.divide(2 * numerator, np.abs(y_true) + np.abs(y_pred), out=np.zeros_like(numerator, dtype=float), where=(np.abs(y_true) + np.abs(y_pred))!=0)
    return np.mean(ratio) * 100

y_true = np.array([200, 350, 150, 500, 275])
y_pred = np.array([210, 330, 165, 480, 280])

# Using the formula: mean( 2 * |y-yhat| / (|y| + |yhat|) ) * 100
smape_val = np.mean(2 * np.abs(y_true - y_pred) / (np.abs(y_true) + np.abs(y_pred))) * 100
print(f"sMAPE (Manual Calculation): {smape_val:.2f}%")

sMAPE (Manual Calculation): 5.23%


* **Context:** Consider using sMAPE as an alternative to MAPE if dealing with potential zeros or near-zeros in your data, or if the asymmetry of MAPE is a significant concern. Be aware of its own definition variations and interpretation nuances.

---

In summary for Regression Metrics:

* Use **MAE** or **MedAE** for interpretable error in original units, especially if robustness to outliers is needed (MedAE being more robust).
* Use **RMSE** if you want interpretable units but also want to penalize large errors more heavily (most common choice).
* Use **MSE** primarily as a loss function during training or if the squared penalty is specifically desired for evaluation (less common for reporting due to units).
* Use **Max Error** if the worst-case prediction error is critical.
* Use **R²** for a quick relative measure of variance explained, but be wary of its increase with model complexity.
* Use **Adjusted R²** when comparing models with different numbers of features.
* Use **MAPE** for intuitive percentage errors in forecasting, but *only* if true values are reliably non-zero and its asymmetry is acceptable.
* Consider **sMAPE** if MAPE's issues are problematic, but understand its own limitations.

As with classification, relying on a single metric can be misleading. It's often best to evaluate regression models using a combination of metrics (e.g., RMSE and R², or MAE and R²) to get a more complete picture of performance.