### 10.1 Algorithm Evaluation Metrics

Various different algorithm evaluation metrics are demonstrated for both classification and regression type machine learning problems.
A caveat in these recipes is the cross validation.cross val score function1 used to report the performance in each recipe.


### 10.2 Classification Metrics
- Classification Accuracy. 
- Logarithmic Loss.
- Area Under ROC Curve. 
- Confusion Matrix. 
- Classification Report.

### 10.2.1 Classification Accuracy
Classification accuracy is the number of correct predictions made as a ratio of all predictions made. 

In [5]:
# Cross Validation Classification Accuracy
import pandas
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
url = "http://ftp.ics.uci.edu/pub/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names=["preg", "plas", "pres", "skin", "test", "mass", "pedi", "age", "class"] 
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
num_folds = 10
num_instances = len(X)
seed = 7
kfold = cross_validation.KFold(n=num_instances, n_folds=num_folds, random_state=seed) 
model = LogisticRegression()
scoring = 'accuracy'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("Accuracy: %.3f (%.3f)" % (results.mean(), results.std()))

Accuracy: 0.770 (0.048)


### 10.2.2 Logarithmic Loss
Logarithmic loss (or logloss) is a performance metric for evaluating the predictions of probabilities of membership to a given class. 
 The scalar probability between 0 and 1 can be seen as a measure of confidence for a prediction by an algorithm. Predictions that are correct or incorrect are rewarded or punished proportionally to the confidence of the prediction. 

In [6]:
# Cross Validation Classification LogLoss
import pandas
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
url = "http://ftp.ics.uci.edu/pub/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names=["preg", "plas", "pres", "skin", "test", "mass", "pedi", "age", "class"] 
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
num_folds = 10
num_instances = len(X)
seed = 7
kfold = cross_validation.KFold(n=num_instances, n_folds=num_folds, random_state=seed) 
model = LogisticRegression()
scoring = 'log_loss'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("Logloss: %.3f (%.3f)" % (results.mean(), results.std()))

  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)


Logloss: -0.493 (0.047)


  sample_weight=sample_weight)
  sample_weight=sample_weight)


Smaller logloss is better with 0 representing a perfect logloss. 

### 10.2.3 Area Under ROC Curve
The AUC represents a model’s ability to discriminate between positive and negative classes. 
An area of 1.0 represents a model that made all predictions perfectly. An area of 0.5 represents a model that is as good as random. 

In [8]:
# Cross Validation Classification ROC AUC
import pandas
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
url = "http://ftp.ics.uci.edu/pub/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names=["preg", "plas", "pres", "skin", "test", "mass", "pedi", "age", "class"]
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
num_folds = 10
num_instances = len(X)
seed = 7
kfold = cross_validation.KFold(n=num_instances, n_folds=num_folds, random_state=seed) 
model = LogisticRegression()
scoring = 'roc_auc'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("AUC: %.3f (%.3f)" % (results.mean(), results.std()))

AUC: 0.824 (0.041)


### 10.2.4 Confusion Matrix
The confusion matrix is a handy presentation of the accuracy of a model with two or more classes.

In [1]:
# Cross Validation Classification Confusion Matrix
import pandas
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
url = "http://ftp.ics.uci.edu/pub/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names=["preg", "plas", "pres", "skin", "test", "mass", "pedi", "age", "class"]
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(Y_test, predicted)
print(matrix)



[[141  21]
 [ 41  51]]


you can see that the majority of the predictions fall on the diagonal line of the matrix

### 10.2.5 Classification Report
The scikit-learn library provides a convenience report when working on classification problems to give you a quick idea of the accuracy of a model using a number of measures.
The classification report() function displays the precision, recall, F1-score and support for each class. 

In [2]:
# Cross Validation Classification Report
import pandas
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
url = "http://ftp.ics.uci.edu/pub/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names=["preg", "plas", "pres", "skin", "test", "mass", "pedi", "age", "class"]
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y,
    test_size=test_size, random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
report = classification_report(Y_test, predicted)
print(report)

             precision    recall  f1-score   support

        0.0       0.77      0.87      0.82       162
        1.0       0.71      0.55      0.62        92

avg / total       0.75      0.76      0.75       254



### 10.3 Regression Metrics
- Mean Absolute Error. 
- Mean Squared Error. 
- R2.

#### 10.3.1 Mean Absolute Error
The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values. 

In [5]:
# Cross Validation Regression MAE
import pandas
from sklearn import cross_validation
from sklearn.linear_model import LinearRegression
url = "https://goo.gl/sXleFv"
names=["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO","B", "LSTAT", "MEDV"]
dataframe = pandas.read_csv(url, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
num_folds = 10
num_instances = len(X)
seed = 7
kfold = cross_validation.KFold(n=num_instances, n_folds=num_folds, random_state=seed) 
model = LinearRegression()
scoring = 'mean_absolute_error'
results = cross_validation.cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 
print("MAE: %.3f (%.3f)" % (results.mean(), results.std()))

MAE: -4.005 (2.084)


  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)


A value of 0 indicates no error or perfect predictions. Like logloss, this metric is inverted by the cross val score() function.

#### 10.3.2 Mean Squared Error
The Mean Squared Error (or MSE) is much like the mean absolute error in that it provides a gross idea of the magnitude of error. Taking the square root of the mean squared error converts the units back to the original units of the output variable and can be meaningful for description and presentation. 
#### 10.3.3 R2 Metric