### Import Libraries

In [8]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score,
    balanced_accuracy_score,
    confusion_matrix,
    f1_score,
    precision_score,
    recall_score,
)

Scikit-learn has a function built in for each of the metrics that we have introduced. We have a separate function for each of the accuracy, precision, recall and F1 score. 

### Prepare Data and Fit Model

In [4]:
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male'] = df['Sex'] == 'male'
X = df[
    ['Pclass', 'male', 'Age', 'Siblings/Spouses', 'Parents/Children', 'Fare']
].values
y = df['Survived'].values
model = LogisticRegression()
model.fit(X, y)
y_pred = model.predict(X)


### Confusion Matrix

In [5]:
print(confusion_matrix(y, y_pred))

[[475  70]
 [103 239]]


Scikit-learn reverses the confusion matrix to show the negative counts first! Here is how this confusion matrix should be labeled.   

![image.png](attachment:image.png)   

### accuracy (ACC)   

![image.png](attachment:image.png)   

In [11]:
print("accuracy:", round(100*accuracy_score(y, y_pred), 2))

accuracy: 80.5


### sensitivity, recall, hit rate, or true positive rate (TPR)   

![image.png](attachment:image.png)  


In [12]:
print("recall:", round(100*recall_score(y, y_pred), 2))

recall: 69.88


### precision or positive predictive value (PPV)   

![image.png](attachment:image.png)   


In [13]:
print("precision:", round(100*precision_score(y, y_pred), 2))

precision: 77.35


### specificity, selectivity or true negative rate (TNR)

![image.png](attachment:image.png)  


In [15]:
print("Specificity:", round(100*recall_score(y, y_pred, pos_label=0), 2))

Specificity: 87.16


### F1 score   
is the harmonic mean of precision and sensitivity (i.e., recall):   

![image.png](attachment:image.png)   


In [16]:
print("f1 score:", round(100*f1_score(y, y_pred), 2))

f1 score: 73.43


https://en.wikipedia.org/wiki/Sensitivity_and_specificity

In [9]:
print("accuracy:", accuracy_score(y, y_pred))
print("precision:", precision_score(y, y_pred))
print("recall:", recall_score(y, y_pred))
print("f1 score:", f1_score(y, y_pred))
print("Specificity:", recall_score(y, y_pred, pos_label=0))
print("balanced_accuracy_score:", balanced_accuracy_score(y, y_pred))

accuracy: 0.8049605411499436
precision: 0.7734627831715211
recall: 0.6988304093567251
f1 score: 0.7342549923195083
Specificity: 0.8715596330275229
balanced_accuracy_score: 0.785195021192124


We see that the accuracy is 80% which means that 80% of the model’s predictions are correct. The precision is 78%, which we recall is the percent of the model’s positive predictions that are correct. The recall is 68%, which is the percent of the positive cases that the model predicted correctly. The F1 score is 73%, which is an average of the precision and recall. 

With a single model, the metric values do not tell us a lot. For some problems a value of 60% is good, and for others a value of 90% is good, depending on the difficulty of the problem. We will use the metric values to compare different models to pick the best one.