## Explanation of Evaluation Metrics for Regression Models

### Mean Squared Error (MSE)
**Pros:**
- Sensitive to outliers as it squares the errors.
- Provides a clear idea of the magnitude of the error.

**Cons:**
- Not intuitive because it is not on the same scale as the target variable.
- Can be misleading if the data has many outliers.

### Mean Absolute Error (MAE)
**Pros:**
- Less sensitive to outliers compared to MSE.
- More interpretable because it is in the same units as the target variable.

**Cons:**
- Does not penalize large errors heavily, which might not always be desirable.

### Root Mean Squared Error (RMSE)
**Pros:**
- Provides a measure of the average magnitude of the errors without squaring.
- Interpretable and in the same units as the target variable.

**Cons:**
- Sensitive to outliers, similar to MSE.
- Can be high even with small errors if they are large enough.

### R-Squared (R²)
**Pros:**
- Provides an indication of how well future samples are likely to be predicted by the model.
- Indicates the proportion of variance in the dependent variable that is predictable from the independent variable(s).

**Cons:**
- Can be misleading due to the addition of predictors, which can always increase R² regardless of their relevance.

### Adjusted R-Squared
**Pros:**
- Provides a more honest and reliable measure of the model's fit by adjusting for the number of predictors in the model.
- Penalizes adding irrelevant variables, thus providing a better indication of the model’s true explanatory power.

**Cons:**
- Can be less intuitive compared to R².
- May not increase as well with additional variables.

### Code Examples


In [3]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Assume y_true and y_pred are the actual and predicted values respectively
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

mse = mean_squared_error(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)
rmse = mean_squared_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
adjusted_r2 = 1 - (1-r2) * (len(y_true)-1)/(len(y_true)-(len(y_pred)+1))

print(f"MSE: {mse}")
print(f"MAE: {mae}")
print(f"RMSE: {rmse}")
print(f"R²: {r2}")
print(f"Adjusted R²: {adjusted_r2}")


MSE: 0.375
MAE: 0.5
RMSE: 0.375
R²: 0.9486081370449679
Adjusted R²: 1.1541755888650962


## Scenarios for Using Different Evaluation Metrics

### Mean Squared Error (MSE)
** Use
Case: **
- When
dealing
with data that has a high proportion of outliers.
- When
the
error
magnitudes
are
important and not just
their
directions.

### Mean Absolute Error (MAE)
** Use
Case: **
- When
you
want
a
measure
of
average
absolute
deviation
from the true

values, which is easier
to
interpret.
- In
applications
where
the
scale
of
the
target
variable is meaningful.

### Root Mean Squared Error (RMSE)
** Use
Case: **
- Similar
to
MSE
but
more
interpretable as it is in the
same
units as the
target
variable.
- Suitable
when
large
errors
are
particularly
undesirable and should
be
penalized
heavily.

### R-Squared (R²)
** Use
Case: **
- When
you
want
to
assess
how
well
a
model
explains
the
variability
of
outcome
data
around
its
mean.
- For
comparing
models
that
predict
continuous
outcomes
on
different
scales.

### Adjusted R-Squared
** Use
Case: **
- When
comparing
models
with different numbers of predictors and need an unbiased measure of the explained variance.
- To
penalize
adding
irrelevant
variables
to
the
model.

## Summary

- ** MSE: ** Best
for datasets with outliers or when error magnitude is crucial.
- ** MAE: ** Better
for interpreting errors in terms of target variable units.
- ** RMSE: ** Preferred
for applications where large errors are particularly undesirable.
- ** R² and Adjusted
R²: ** Useful
for comparing models and assessing the explanatory power, especially in multiple regression scenarios.


