<h1>Calculating Metrics in Scikit-learn</h1>

<h3>Accuracy, Precision, Recall & F1 Score in Sklearn</h3>

<p>Scikit-learn has a function built in for each of these metrics</p>

<p>Below we will use these built inn functions on our Logistic Regression model from the titanic dataset</p>

In [1]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male'] = df['Sex'] == 'male'
X = df[['Pclass', 'male', 'Age', 'Siblings/Spouses', 'Parents/Children', 'Fare']].values
y = df['Survived'].values
model = LogisticRegression()
model.fit(X, y)
y_pred = model.predict(X)

print("accuracy:", accuracy_score(y, y_pred))
print("precision:", precision_score(y, y_pred))
print("recall:", recall_score(y, y_pred))
print("f1 score:", f1_score(y, y_pred))

accuracy: 0.8049605411499436
precision: 0.7734627831715211
recall: 0.6988304093567251
f1 score: 0.7342549923195084


<strong>Each function takes two 1-dimensional numpy arrays: the true values of the target & the predicted values of the target</strong>

<p>From the metrics scores above we see that:</p>
<ul>
    <li>The <strong>accuracy is 80%</strong>, which means that 80% of the model’s predictions are correct.</li>
    <li>The <strong>precision is 77%</strong>, which we recall is the percent of the model’s positive predictions that are correct.</li>
    <li>The <strong>recall is 70%</strong>, which is the percent of the positive cases that the model predicted correctly.</li>
    <li>The <strong>F1 score is 73%</strong>, which is an average of the precision and recall.</li>
</ul>

<strong>Note!</strong>
<ul>
    <li>With a single model, the metric values do not tell us a lot.</li>
    <li>For some problems a value of 60% is good, and for others a value of 90% is good, depending on the difficulty of the problem.</li>
</ul>
<p>We will use the metric values to compare different models to pick the best one.</p>

<h3>Confusion Matrix in Sklearn</h3>

<p>Scikit-learn has a confusion matrix function that we can use to get the four values in the confusion matrix (true positives, false positives, false negatives, and true negatives).</p>

In [2]:
from sklearn.metrics import confusion_matrix

print(confusion_matrix(y, y_pred))

[[475  70]
 [103 239]]


<strong>Note that scikit-learn reverses the confusion matrix to show the negative counts first! Here is how this confusion matrix should be labeled, so it becomes shown below</strong>

<table border="1">
  <tr>
      <th></th>
      <th>Predicted Negative</th>
      <th>Predicted Positive</th>
  </tr>
  <tr style="background-color: white;">
      <th>Actual Negative</th>
      <td style="background-color: lightblue;">TN</td>
      <td>FP</td>
  </tr>
  <tr>
      <th>Actual Positive</th>
      <td>FN</td>
      <td style="background-color: lightblue;">TP</td>
  </tr>
</table>
<br/>
<p>Filling in our results from scikit-learn, we get the following confusion matrix:</p>
<table border="1">
  <tr>
      <th></th>
      <th>Predicted Negative</th>
      <th>Predicted Positive</th>
  </tr>
  <tr style="background-color: white;">
      <th>Actual Negative</th>
      <td style="background-color: lightblue;">475</td>
      <td>70</td>
  </tr>
  <tr>
      <th>Actual Positive</th>
      <td>103</td>
      <td style="background-color: lightblue;">239</td>
  </tr>
</table>
<br/>
<p>But, for just for the record, this is how we would typically draw the confusion matrix.</p>
<table border="1">
  <tr>
      <th></th>
      <th>Actual Positive</th>
      <th>Actual Negative</th>
  </tr>
  <tr style="background-color: white;">
      <th>Predicted Positive</th>
      <td style="background-color: lightblue;">239</td>
      <td>70</td>
  </tr>
  <tr>
      <th>Predicted Negative</th>
      <td>103</td>
      <td style="background-color: lightblue;">475</td>
  </tr>
</table>
<br/>
<strong>Note! Since negative target values correspond to 0 and positive to 1, scikit-learn has ordered them in this order. Make sure you double check that you are interpreting the values correctly!</strong>