# ML Algorithm Performance Metrics

Let's see how to select and use different ML performance metrics in Python with *scikit-learn*. 

## Algorithm Evaluation Metrics

We will see various algorithm evaluation metrics and we will demonstrate them for both classification and regression type ML problems. 

In this section we need to focus on how to evaluate and compare algos themselves, so we need more models, and to do so we need more input datasets:

* For CLASSIFICATION metrics, the Pima Indians onset of diabetes dataset is used as demonstration. This is a binary classification problem where all of the input variables are numeric.

* For REGRESSION metrics, we introduce the Boston House Price dataset and we use it as demonstration. This is a regression problem where all of the input variables are also numeric.

All recipes evaluate the same algorithms, ***Logistic Regression*** for classification and ***Linear Regression*** for the regression problems. A 10-fold CV test harness is used to demonstrate each metric.

More about ML algorithm performance metrics supported by scikit-learn can be found [here](http://scikit-learn.org/stable/modules/model_evaluation.html) on the page "Model evaluation: quantifying the quality of predictions". 

# CLASSIFICATION Metrics

We will review how to use the following metrics:

* [CLAS-1] Classification Accuracy
* [CLAS-2] Logarithmic Loss
* [CLAS-3] Area Under ROC Curve
* [CLAS-4] Confusion Matrix
* [CLAS-5] Classification Report

## [CLAS-1] Classification Accuracy

In [1]:
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

In [2]:
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]

kfold = KFold(n_splits=10, random_state=7)
model = LogisticRegression()

# Cross Validation Classification Accuracy
scoring = 'accuracy'                                             # <--- 
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("Accuracy: %.3f (%.3f)" % (results.mean(), results.std()))

Accuracy: 0.770 (0.048)


## [CLAS-2] Logarithmic Loss

In [3]:
# Cross Validation Classification LogLoss
scoring = 'neg_log_loss'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("Logloss: %.3f (%.3f)" % (results.mean(), results.std()))

Logloss: -0.493 (0.047)


## [CLAS-3] Area Under ROC Curve

In [4]:
# Cross Validation Classification ROC AUC
scoring = 'roc_auc'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("AUC: %.3f (%.3f)" % (results.mean(), results.std()))

AUC: 0.824 (0.041)


## [CLAS-4] Confusion Matrix

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix              # <---

In [6]:
test_size = 0.33
seed = 7

# Cross Validation Classification Confusion Matrix
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size,
    random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(Y_test, predicted)              # <---
print(matrix)

[[141  21]
 [ 41  51]]


## [CLAS-5] Classification Report

In [7]:
from sklearn.metrics import classification_report               # <---

In [9]:
test_size = 0.33
seed = 7

# Cross Validation Classification Report
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size,
    random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
report = classification_report(Y_test, predicted)               # <---
print(report)                  

             precision    recall  f1-score   support

        0.0       0.77      0.87      0.82       162
        1.0       0.71      0.55      0.62        92

avg / total       0.75      0.76      0.75       254



# REGRESSION Metrics

Here we will review 3 of the most common metrics for evaluating predictions on regression ML problems:
* [REGR-1] Mean Absolute Error
* [REGR-2] Mean Squared Error
* [REGR-3] $R^2$

In the regression examples, we will use the Boston house price dataset, which you can find (and should dowload from) here:
* https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data


## [REGR-1] Mean Absolute Error

In [10]:
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

In [11]:
# input data
filename = 'housing.data.csv'
names=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO',
'B', 'LSTAT', 'MEDV']
dataframe = read_csv(filename, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]

In [12]:
# Cross Validation Regression MAE
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'neg_mean_absolute_error'                                  # <---
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("MAE: %.3f (%.3f)" % (results.mean(), results.std()))

MAE: -4.005 (2.084)


## [REGR-2] Mean Squared Error

In [14]:
#num_folds = 10       # a remnant..
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("MSE: %.3f (%.3f)" % (results.mean(), results.std()))

MSE: -34.705 (45.574)


## [REGR-3] $R^2$ metric

In [15]:
# Cross Validation Regression R^2
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'r2'                                                     # <---
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("R^2: %.3f (%.3f)" % (results.mean(), results.std()))

R^2: 0.203 (0.595)


## Summary

What we did:

* we showe metrics  you can use to evaluate your ML algorithms. We showed how to implement in scikitlearn the use of  3 classification metrics (Accuracy, LogLoss and AUC) and 2 convenience methods for classification prediction results (Confusion Matrix and Classification Report), as well as 3 metrics for regression problems (MAE, MSE, R2). There are more, of course!

## What's next 

We know how to evaluate the performance of ML algorithms using various metrics. We know how to use those metrics to estimate the performance of algorithms on new unseen data using resampling. It is time to start looking at ML algorithms themselves. Starting with CLASSIFICATION techniques..