In [2]:
import numpy as np

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, zero_one_loss, jaccard_score, confusion_matrix, \
    precision_score, recall_score, fbeta_score


# For reproducibility
np.random.seed(1000)

nb_samples = 500

In [3]:
# Create dataset
X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0,
                               n_clusters_per_class=1)

# Split dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25)

In [8]:
# Create and train logistic regressor
lr = LogisticRegression()
lr.fit(X_train, Y_train)

print('Accuracy score: %.3f' % accuracy_score(Y_test, lr.predict(X_test)))
print('Zero-one loss (normalized): %.3f' % zero_one_loss(Y_test, lr.predict(X_test)))
print('Zero-one loss (unnormalized): %.3f' % zero_one_loss(Y_test, lr.predict(X_test), normalize=False))
print('Jaccard similarity score: %.3f' % jaccard_score(Y_test, lr.predict(X_test)))

Accuracy score: 0.992
Zero-one loss (normalized): 0.008
Zero-one loss (unnormalized): 1.000
Jaccard similarity score: 0.986


It seesm that zero-one loss: misclassification/total # of samples, ***without differentiating the types of error***. Another detail: ***"If normalize is True, return the fraction of misclassifications (float), else it returns the number of misclassifications (int). The best performance is 0."***

In [6]:
# Compute confusion matrix
cm = confusion_matrix(y_true=Y_test, y_pred=lr.predict(X_test))
print('Confusion matrix:')
print(cm)

print('Precision score: %.3f' % precision_score(Y_test, lr.predict(X_test)))
print('Recall score: %.3f' % recall_score(Y_test, lr.predict(X_test)))
print('F-Beta score (1): %.3f' % fbeta_score(Y_test, lr.predict(X_test), beta=1))
print('F-Beta score (0.75): %.3f' % fbeta_score(Y_test, lr.predict(X_test), beta=0.75))
print('F-Beta score (1.25): %.3f' % fbeta_score(Y_test, lr.predict(X_test), beta=1.25))

Confusion matrix:
[[56  1]
 [ 0 68]]
Precision score: 0.986
Recall score: 1.000
F-Beta score (1): 0.993
F-Beta score (0.75): 0.991
F-Beta score (1.25): 0.994


confusion_matrix: By definition a confusion matrix C is such that Cij is equal to the number of observations known to be in group i and predicted to be in group j.\
\
Thus in binary classification, the count of true negatives is C<sub>0,0</sub>, false negatives is C<sub>1,0</sub>, true positives is C<sub>1,1</sub> and false positives is C<sub>0,1</sub>(So in both axes, negative comes first by default).\
\
If you don't flip the confusion matrix, but want to get the same measures, it's necessary to add the ***pos_label=0*** parameter to all metric score functions.

***Precision = TP/(TP + FP)\
\
Recall = TP/(TP + FN)\
\
Jaccard Similarity Score = TP/(TP + FP + FN)\
\
Accuracy = (TP + TN)/(TP + FP + FN + TN)***\
\
Therefore, for what it's worth, Jaccard similarity score should be smaller than any of the other three.\ 

\\(F_{\beta} = (\beta^2 + 1)*\frac{Precision * Recall}{\beta^2Precision + Recall}\\)
\
\
Or, it can also be expressed as, \\(F_{\beta} = \frac{Precision * Recall}{\frac{\beta^2}{\beta^2 + 1}Precision + \frac{1}{\beta^2 + 1} Recall}\\)\
\
The most common F score is the case of \\(\beta = 1\\), where equal weights are given to precision and recall. The bigger the \\(\beta\\), the bigger emphasis will be placed on precision.

The highest score is achieved by giving more importance to precision (which is higher), while the least one corresponds to a recall predominance. \\(F_{\beta}\\) is
hence useful to have a compact picture of the accuracy as a trade-off between high precision and a limited number of false negatives.