Here we are evaluating ML algorithms for a classification problem:
Classification problems are evaluated using following metrics- 
1. Classification Accuracy-  
    number of correct predictions made as a ratio of all predictions made.Suitable only when there are equal number of observations in each class (which is rarely the case) and that all predictions and prediction errors are equally important. 
2. Logarithmic loss- 
    metric for evaluating the preductions of probabilities of membership to a given class. 
3. Area Under ROC Curve-  
    metric for binary classification problems. An area of 1 represnets a model that made all predictions perfectly. An area of 0.5 means a model is as good as a random. ROC can be broken down in to sensitivity & specificity.  
    A binary classification is problem is really a trade-off between sensitvity & specificity. Sensitivity is the true positive rate call recall. IT is the number of instances from the positive (first) class that actually predicted correctly. Specificity is also called the true negative rate. It is the number of in stances from the negative (second) 
    class that were actually predicted correctly.
4. Confusion Matrix-  
    It presnets the accuracy of a model with two or more classes. 
5. Classification report-  
    scikit-learn library provide a convenience report when working on classification problems to give a quick idea of the accuracy of a model usnig a number of measures. The classification_report() function display the precision, recall, F1 score, and support for each class.
    

In [1]:
# load libraries 
from pandas import read_csv
from sklearn.model_selection import KFold, cross_val_score,train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

In [2]:
# load data
file = '../data/pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 
df = read_csv(file, names= names)
array = df.values

In [3]:
X = array[:,0:8]
Y = array[:,8]
print("-- Classification Accuracy --")
kfold = KFold(n_splits=10, random_state=7, shuffle=True)
model = LogisticRegression(solver='lbfgs', max_iter=1000)
scoring= 'accuracy'
results= cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("Accuracy: %.3f (%.3f)" %(results.mean(), results.std()))


-- Classification Accuracy --
Accuracy: 0.772 (0.050)


In [4]:
print("-- Logarithmic Loss -- ")
scoring = 'neg_log_loss'
results = cross_val_score(model, X, Y, cv = kfold, scoring=scoring)
print("Logloss %.3f (%.3f)" % (results.mean(), results.std()))


-- Logarithmic Loss -- 
Logloss -0.485 (0.057)


In [5]:
print("-- Cross validation classification ROC AUC -- ")
scoring = 'roc_auc'
results = cross_val_score(model, X, Y, cv = kfold, scoring=scoring)
# AUC is relatively close to 1 & greater than 0.5, suggesting some skill in the predictions
print("AUC: %.3f (%.3f)" %(results.mean(), results.std()))

-- Cross validation classification ROC AUC -- 
AUC: 0.829 (0.047)


In [6]:
print("-- Confusion matrix -- ")
test_size= 0.33
seed= 7
X_train, X_test, Y_train, Y_test= train_test_split(X,Y, test_size=test_size, random_state=seed)
model = LogisticRegression(solver='lbfgs', max_iter=1000)
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(Y_test, predicted)
print(matrix)

-- Confusion matrix -- 
[[142  20]
 [ 34  58]]


In [7]:
print("-- Classification report from sklearn --")
report = classification_report(Y_test, predicted)
print(report)


-- Classification report from sklearn --
              precision    recall  f1-score   support

         0.0       0.81      0.88      0.84       162
         1.0       0.74      0.63      0.68        92

    accuracy                           0.79       254
   macro avg       0.78      0.75      0.76       254
weighted avg       0.78      0.79      0.78       254

