# Scikit Learn - Evaluating Classification Models

## Outline
- Purpose of model evaluation
- Confusion matrix
- Calculating metrics from a confusion matrix
- Adjusting classifier performance by changing the classification threshold
- The purpose of an ROC curve
- The difference between Area Under the Curve (AUC) and classification accuracy

## Review of model evaluation
- Need a way to choose between models: different model types, tuning parameters, and features
- Use a model evaluation procedure to estimate how well a model will generalize to out-of-sample data
- Requires a model evaluation metric to quantify model performance

## Model evaluation metrics
- Regression: MAE, MSE, RMSE
- Classification: Classfication accuracy

In [46]:
import pandas as pd
url = 'data/pima-indians-diabetes.csv'
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
pima = pd.read_csv(url, header=None, names=col_names)
pima.head()

Unnamed: 0,pregnant,glucose,bp,skin,insulin,bmi,pedigree,age,label
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## Question: can we predict the diabetes status of a patient given their health measurements?

In [47]:
# define X and y
feature_cols = ['pregnant', 'insulin', 'bmi', 'age']
X = pima[feature_cols]
y = pima.label

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [48]:
# train a logistic regression model on the training set
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(solver='liblinear')
fit_output = logreg.fit(X_train, y_train)
print(fit_output)
# make class predictions for the testing set
y_pred_class = logreg.predict(X_test)
print(y_pred_class)

LogisticRegression(solver='liblinear')
[0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0
 0 0 0 0 0 0 0]


## Classification accuracy: percentage of correct predictions

In [49]:
from sklearn import metrics
print(metrics.accuracy_score(y_test, y_pred_class))

0.6927083333333334


### Null accuracy: accuracy that could be achieved by always predicting the most frequent class


In [50]:
# Null accuracy: accuracy that could be achieved by always predicting the most frequent class
print(y_test.value_counts())

0    130
1     62
Name: label, dtype: int64


In [51]:
# the percentage of ones
y_test.mean()

0.3229166666666667

In [52]:
# percentage of ones
1 - y_test.mean()

0.6770833333333333

In [53]:
# calculate null accuracy (for binary classification problems coded as 0/1)
max(y_test.mean(), 1 - y_test.mean())

0.6770833333333333

In [54]:
# calculate null accuracy (for multi-class classification problems)
null_accuracy = y_test.value_counts().head(1)/len(y_test)
print(null_accuracy)

0    0.677083
Name: label, dtype: float64


In [55]:
# print the first 25 true and predicted responses
print(f'True: {y_test.values[0:25]}')
print(f'Pred: {y_pred_class[0:25]}')

True: [1 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 0 1 1 0 0 0]
Pred: [0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]


## Conclusion
- Classification accuracy is the easiest classification metric to understand
- But, it does not tell you the underlying distribution of response values
- And, it does not tell you what "types" of errors your classifier is making

## Confusion Matrix

In [56]:
print(metrics.confusion_matrix(y_test, y_pred_class))

[[118  12]
 [ 47  15]]
