## Model Evaluation in Regression Models
To build a model to accuretly predict an unknown case, we have to perform regression evaluation after building the model.

* train and test on the same dataset
* train/test split

Regression Evaluation Metrics


## train and test on the same dataset

In [None]:
import pandas as pd
df = pd.read_csv("../data/FuelConsumption.csv")
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]

In [None]:
cdf.head(10)

We use entire dataset for training. For example, assume that we have 10 records in our dataset. We select a small portion of the dataset, such as row numbers 6 to 9. (Independent variables)

The labels are called "Actual values" of the test set.

* high "traning accuracy" A high training accuracy isn't necessarily a good thing. Over-fit

* low "out-of-sample accuracy"

It's important that our models have high out-of-sample accuracy.

## train/test split
* Mutually exclusive
* More accurate evaluation on out-of-sample accuracy. Because the testing dataset is not part of the training dataset.
* It is more realistic for real-world problems.
* Train/test split is highly dependent on the datasets on which the data was trained and tested.
Another evaluation model to resolve most of these issues: K-Fold Cross-Validation

# Evaluation Metrics in Regression

### Mean Absolute Error (MAE)

In [None]:
import numpy as np

expected = [1.0] * 11

predicted = [round(1.0 - i * 0.1, 1) for i in range(11)]

In [None]:
print("real values:", expected)
print("predicted values:", predicted)

In [None]:
from sklearn.metrics import mean_absolute_error

mae = mean_absolute_error(expected, predicted)
print("Mean Absolute Error:", mae)

In [None]:
errors = []
for i in range(len(expected)):
    error = abs(expected[i] - predicted[i])
    errors.append(error)
    print(f"{expected[i]} - {predicted[i]} - {error:.2f}")

In [None]:
from matplotlib import pyplot
pyplot.xticks(ticks=[i for i in range(len(errors))], labels=predicted)
pyplot.plot(errors)
pyplot.show()

## Mean Squared Error (MSE)

In [None]:
from sklearn.metrics import mean_squared_error

mse = mean_squared_error(expected, predicted)
print("MSE:", mse)

In [None]:
errors = []
for i in range(len(expected)):
    error = (expected[i] - predicted[i])**2
    errors.append(error)
    print(f"{expected[i]} - {predicted[i]} - {error:.2f}")

In [None]:
from matplotlib import pyplot
pyplot.xticks(ticks=[i for i in range(len(errors))], labels=predicted)
pyplot.plot(errors)
pyplot.show()


## Root Mean Squared Error (RMSE)

RMSE is interpretable in the same units as the response.

In [None]:
rmse = np.sqrt(mean_squared_error(expected, predicted))
print("RMSE:", rmse)

## Relative Absolute Error (RAE)
## Relative Squared Error (RSE)
## R-squared (r2) Score

The higher the r2 score, the better the model fits your data.

In [None]:
from sklearn.metrics import r2_score

r2 = r2_score(expected, predicted)
print("R-squared Score:", r2)

https://developer.nvidia.com/blog/a-comprehensive-overview-of-regression-evaluation-metrics/