## Regression Models Evaluation

In [1]:
# import numpy
import numpy as np

In [2]:
# We will generate 10 observations (y_true) and 10 predictions (y_pred) from a model.

# generate 'ground truth'
y_true = np.random.normal(0,1,10)

# generate random errors
errors = np.random.normal(0,0.02,10)

# simulate predictions
y_pred = y_true + errors

### Mean Squared Error (MSE) / Root Means Squared Error (RMSE)


In [3]:
# import MSE from sklearn
from sklearn.metrics import mean_squared_error

# compute MSE
MSE = mean_squared_error(y_true,y_pred)  

# print MSE
print(MSE)

0.00046649956601397993


All regression evaluation functions from sklearn.metrics take two mandatory arrays as parameters. The first is an array with ground truth values (in our case y_true variable) and the second is our prediction (in our case y_pred variable).

#### Root Means Squared Error (RMSE)
To get RMSE from MSE we have to options: the first option is to compute the square root from MSE by Numpy and the second option is to use the squared=False option in a function

In [4]:
# RMSE by Numpy
RMSE = np.sqrt(MSE)
print(RMSE)

# RMSE by sklearn
RMSE = mean_squared_error(y_true,y_pred,squared=False)
print(RMSE)

0.02159860101983413
0.02159860101983413


## Classification Models Evaluation
We will use the same principle as in regression model evaluation and use synthetic data. With the only difference that we will need predicted probabilities instead of predicted labels (predicted values in regression). The important thing to mention is that we are simulating the behavior of a binary classifier. It means that the predicted class is only positive (returns 1) or negative (returns 0).

In [6]:
# ground truth
y_true = [1,1,0,1,0,0,1,0,0,1]

# simulate probabilites of positive class
# (probabilities of the observations from the positive class)
y_proba = [0.9,0.7,0.2,0.99,0.7,0.1,0.5,0.2,0.4,0.6]

# set the threshold to predict positive class
thres = 0.5

# class predictions
# (All observations with probabilities above this threshold are assigned to the positive class_
y_pred = [int(value > thres) for value in y_proba]

### Accuracy

In [7]:
# import accuracy_score from sklearn
from sklearn.metrics import accuracy_score

# compute accuracy
accuracy = accuracy_score(y_true,y_pred)

# print accuracy
print(accuracy)

0.8


### F1-score

In [8]:
# import f1_score from sklearn
from sklearn.metrics import f1_score

# compute F1-score
f1_score = f1_score(y_true,y_pred)

# print F1-score
print(f1_score)

0.8000000000000002


### AUC-score
NOTE!! In roc_auc_score we use probabilities (y_proba) instead of class labels. If we passed class labels no errors would be shown but the score would be inaccurate. Be careful and read the documentation before using Sklearn functions.

In [None]:
# import roc_auc_score from sklearn
from sklearn.metrics import roc_auc_score

# compute AUC-score
auc = roc_auc_score(y_true,y_proba)

# print AUC-score
print(auc)