# Performance Metrics

In [1]:
import pandas as pd
import math

In [2]:
df = pd.read_csv('datasets/fraud_detection.csv')
X = df.iloc[:, 1:-1]
y = df['targets']

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20)
total_training_examples = X_train.shape[0]


In [4]:
from sklearn.preprocessing import MinMaxScaler
mm = MinMaxScaler()
mm.fit(X_train)
X_train = mm.transform(X_train)
X_test = mm.transform(X_test)

In [5]:
from sklearn.neighbors import KNeighborsClassifier
k = math.floor(math.sqrt(total_training_examples))
knn = KNeighborsClassifier(n_neighbors=k)

In [6]:
knn.fit(X_train, y_train)

In [7]:
from sklearn.metrics import confusion_matrix, classification_report
y_pred = knn.predict(X_test)

- #### Confusion Matrix
![Image](images/confusion-matrix.webp)

In [8]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:\n', cm, '\n')
print('True positive (Actually positive, classified as positive):', cm[0][0])
print('False positive (Actually negative, classified as positive):', cm[0][1])
print('False negative (Actually positive, classified as negative):', cm[1][0])
print('True negative (Actually negative, classified as negative):', cm[1][1])

Confusion Matrix:
 [[2883  110]
 [ 456  645]] 

True positive (Actually positive, classified as positive): 2883
False positive (Actually negative, classified as positive): 110
False negative (Actually positive, classified as negative): 456
True negative (Actually negative, classified as negative): 645


### Accuracy
Accuracy gives an estimate of the number of correct decisions made by the classifier.

Formula:
$$Accuracy = \frac{\text{Correct Decisions}}{\text{Total Decisions}}$$

For skewed datasets, the accuracy may not give a fair estimate of the algorithm's performance. For instance, suppose in a binary classification problem, the class 1 has 2 instances and class 0 has 98 instances and regardless of the input, the algorithm classifies it as instance of class 0. In this case, out of the total 100 decisions, 98 will be correct and 2 will be wrong, giving an accuracy of 98%. But we know that this classifier couldn't be more worse. For such scenarios, the concepts of precision and recall come to rescue.

### Precision
Precision is the measure of correctness of a prediction that if an algorithm predicted a particular class a certain number of times, how many times it was correct.

Formula:
$$Precision = \frac{\text{Correct Positive}}{\text{Correct Positive} + \text{Wrong Positive}}$$

- A High Precision means when the algorithm did predict a particular class, mostly it was correct.
- A Low Precision means when the algorithm did predict a particular class, mostly it was wrong.

Notice that, precision doesn't tell anything about how many times the algorithm did predict a particular class when it should have. For instance, let's say the algorithm predicted a particular class 10 times and all those instances actually belonged to that class. In this case, the precision is 100%. You would think that this is the best algorithm, but here's the catch: The total instances of that class were 100 and the algorithm only predicted 10 of them and misclassified the 90 instances, which is not a good thing, for sure. To estimate the latter aspect, we use Recall.

### Recall
Recall is the measure of how well the algorithm is able to identify a particular class, that if a particular class had a certain number of instances, how many times the algorithm was able to identify them. 

Formula:
$$Recall = \frac{\text{Correct Positive}}{\text{Actual Positive}}$$


- A High Recall means the algorithm was able to identify a particular class instances well.
- A Low Recall means the algorithm couldn't identify a particular class instances well.

If we advance our previous example with 100 instances of a particular class and the algorithm identified correctly 10 of them, giving a precision of 100%, the recall would be very low and that is 10/100 = 10%.

That's how we can use precision and recall together to get insight into the algorithm's performance.

### The Precision and Recall Tradeoff
In practical scenarios, there's a tradeoff between precision and recall. Here's how:
- A high precision ensures that most of the predictions of a particular class are correct. This means that the algorithm will only predict that class when the sureity or confidence is very high. This implies that if the confidence is low, the algorithm won't predict that class but it's possible that an instance with relatively low confidence does belong to that class. This results in many instances, which do belong to a particular class, not correctly classified by the algorithm, constituting low recall.

- A high recall ensures that most of the instances of a particular class are correctly identified by the algorithm. This is possible when the algorithm predicts that class even if the confidence is little bit low. But this will result in many instances classified to belong to a particular class when they don't actually, yielding in a low precision.

<div style="text-align: center;">
    <img src="images/precision-recall-tradeoff.png" width="300"/>
</div>

You would want to have a high precision (high confidence or threshold) when it's more important that the predicted class is correct than class being predicted at all. This could be the case when a particular disease treatment is very expensive and invasive, however, if not treated, isn't life threatening.

Conversely, you would want to have a high recall (relatively low threshold/confidence is applicable) when it's more important to predict a particular class than the predicted class being correct. This could be the case when a particular disease is life threatening and the risk of leaving it untreated can't be taken.

### The F1 Score
The F1 score is the harmonic mean of precision and recall which gives a performance metric having effect of both precision and recall. The harmonic mean is highly influenced by small values, so if any of precision or recall is low, the F1 score will also be hit.

Formula:
$$\text{F1 Score} = 2\cdot\frac{\text{Precision}\times\text{Recall}}{\text{Precision + Recall}}$$

### Support
Support is the number of instances, samples or examples of a particular class


In [9]:
cr = classification_report(y_test, y_pred)
print('Classification Report\n', cr)

Classification Report
               precision    recall  f1-score   support

           0       0.86      0.96      0.91      2993
           1       0.85      0.59      0.70      1101

    accuracy                           0.86      4094
   macro avg       0.86      0.77      0.80      4094
weighted avg       0.86      0.86      0.85      4094



### Classification Report Explanation:
- The precision and recall is shown separately for all classes i-e 0 and 1.
- Both precision and recall is decent for class 0, which means that the algorithm performs good when it comes to classifying class 0 correctly.
- The precision of algorithm for class 1 is high, however, the recall is low, which means that the when algorithm predicts class 1, it is correct most of the time. But the algorithm misclassifies or fails to identify correctly many instances of class 1.
- Support of 2993 for class 0 is the number of instances of class 0 in test set.
- Support of 1101 for class 1 is the number of instances of class 1 in test set.
- Accuracy of 86% is calulated by the ratio of correct decisions to the total decisions.
- The support of 4094 in the last 3 rows is the sum of instances of all the classes in the test set (2993+1101).
- Macro avg is calculated by simply adding the values of a particular column for all classes, divided by the total number of classes.
- Weighted avg is calculated using the support as weights. For instance, weighted avg of F1 score is calculated as:

$$\text{F1 Score (Weighted Avg)} = \frac{2993\times 0.91 + 1101\times 0.70}{2993+1101} = 0.85$$

The weighted average of F1 score is the most reasonable parameter in this classification report. However, the precision and recall of individual classes should also be analyzed, especially of those classes for whom accuracy is critical.

