# Performance Measures

We have to define some metrics to evaluate our model. Those metrics are different for regression and classification.

When we are dealing with a regression task, there are plenty of metrics that we can use to evaluate the model. We are going to talk about three metrics that you are going to be dealing with.

**MAE: Mean Absolute Error**

**MSE: Mean Squared Error**

**RMSE: Root Mean Squared Error**

--------------------------------------------

In [None]:
from google.colab import drive
drive.mount('/content/drive')

![img_01.png](attachment:img_01.png)

# MAE

**Simply put, the average difference observed in the predicted and actual values across the whole test set.**

the algorithm takes the differences in all of the predicted and actual prices, adds them up and then divides them by the number of observations. It doesn’t matter if the prediction is higher or lower than the actual price, the algorithm just looks at the absolute value. **A lower value indicates better accuracy.**

![image017.png](attachment:image017.png)

**Keep in mind that the MAE treats all errors equally, while other metrics like the MSE give more weight for larger errors**

##### Using NumPy

In [None]:
# We are going to define some arrays as the true values and as our predictions
y_true = [40, 20, 30, 20, 25, 15, 4, 77, 60, 93, 56, 44, 23, 5]
y_predicted = [37, 21, 29, 19, 25, 20, 2, 50, 57, 73, 49, 41, 23, 8]


import numpy as np

y_true = np.array(y_true)  # Convert Python list to numpy array so we can perform element-wise operations
y_predicted = np.array(y_predicted)


# Take your time to understand the code here, it's just an implementation of the equation above
mae = (np.sum(np.abs(y_true - y_predicted))) / len(y_true)  

print(f'Mean Absolute Erroe MAE = {mae}')

Mean Absolute Erroe MAE = 5.428571428571429


##### fortunately, we don't have to do this every time. Scikit-learn has functions that can do this directly

In [None]:
from sklearn.metrics import mean_absolute_error

mae = mean_absolute_error(y_true, y_predicted)


print(f'Mean Absolute Error MAE = {mae}')

Mean Absolute Error MAE = 5.428571428571429


## MSE

**squares the errors so a difference of 2, becomes 4, a difference of 3 becomes 9**

As you can see, as a result of the squaring, it assigns more weight to the bigger errors. The algorithm then continues to add them up and average them.
**As before, lower the number the better.**

![image003-1.png](attachment:image003-1.png)

**One disadvantage for this metric is that it is not the same in units as our dependent value (the target feature). So MSE is hard to interpret and be compared to other metrics.**

In [None]:
from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_true, y_predicted)

# Remember, this value is in squared units and that's why it is much greater than the MAE
print(f'Mean Squared Error MSE = {mse}')

Mean Squared Error MSE = 89.0


## RMSE

**RMSE can be obtained just be obtaining the square root of MSE**

This number is in the same unit as the value that was to be predicted.

![13.-RMSE-formula.png](attachment:13.-RMSE-formula.png)

##### There is no scikit-learn function that computes the RMSE directly, but we can find it easily by taking the square root of the MSE:

In [None]:
rmse = np.sqrt(mse)

print(f'Root Mean Squared Error RMSE = {rmse}')

Root Mean Squared Error RMSE = 9.433981132056603


## What is the best metric to use?

It is entirely your call.


MAE is just a simple way to find out how much your predictions are different from the actual true values

MSE & RMSE are really useful when you want to see if the outliers are messing with your predictions.

It’s possible that you might decide to investigate those outliers and remove them altogether from your dataset. It's useful to use MSE or RMSE here.

But since MSE has different units, it's usually better to use RMSE 

## How to decide if a certain value for MAE or RMSE is good?

**To compare, we can go back and find the mean for our target column**

Suppose that we have an RMSE of 200, and the average value (mean) of our target column is 8000. This means that we have an error rate of about 2.5%, which seems pretty good.

# Good Luck