<div class="alert alert-block alert-success">
    <h1 align="center">Machine Learning in Python</h1>
    <h3 align="center">Confusion Matrix</h3>
    <h4 align="center"><a href="https://github.com/AliBinary">Ali Ghanbari</a></h5>
</div>

![image.png](attachment:image.png)

# Topics:

- [ ] What is a Confusion Matrix
- [ ] Confusion Matrix Metrics
- [ ] Displaying the Confusion Matrix using seaborn
- [ ] Confusion Matrix with Scikit-learn
- [ ] Multi-Class Confusion Matrix

## What is Confusion Matrix?

Confusion matrix represents the accuracy of the model in the tabular format by representing the count of correct/incorrect labels.

![image.png](attachment:image.png)

* Positive (P): Observation is positive.
* Negative (N): Observation is not positive.
* True Positive (TP): Outcome where the model correctly predicts the positive class.
* True Negative (TN): Outcome where the model correctly predicts the negative class.
* False Positive (FP): Also called a type 1 error, an outcome where the model incorrectly predicts the positive class when it is actually negative.
* False Negative (FN): Also called a type 2 error, an outcome where the model incorrectly predicts the negative class when it is actually positive.

![image.png](attachment:image.png)

The total number of correct predictions for a class go into the expected row for that class value and the predicted column for that class value.

In the same way, the total number of incorrect predictions for a class go into the expected row for that class value and the predicted column for that class value.

The diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabelled by the classifier. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions.

![image.png](attachment:image.png)

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

## Example of 2x2 Confusion Matrix

logistic regression would, in this synthetic dataset, classify values as either 0 or 1, i.e. class one or two, using the logistic curve.

![image.png](attachment:image.png)

In [None]:
data = {'y_Actual':    [0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0],
        'y_Predicted': [0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0]
        }

In [None]:
df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
df

In [None]:
confusion_matrix = pd.crosstab(df['y_Predicted'], df['y_Actual'], rownames=['Predicted'], colnames=['Actual'])
confusion_matrix

In [None]:
sns.heatmap(confusion_matrix, annot=True)
plt.show()

In [None]:
# Confusion Matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(df["y_Actual"], df["y_Predicted"])

In [None]:
# Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(df["y_Actual"], df["y_Predicted"])

In [None]:
# Recall
from sklearn.metrics import recall_score
recall_score(df["y_Actual"], df["y_Predicted"])

In [None]:
# Precision
from sklearn.metrics import precision_score
precision_score(df["y_Actual"], df["y_Predicted"])

####  Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.

Precision is important in music or video recommendation systems, e-commerce websites, etc. Wrong results could lead to customer churn and be harmful to the business.

####  Recall is a useful metric in cases where False Negative is a higher concern than False Positive.

Example : Covid-19

![image.png](attachment:image.png)

In [None]:
# Method 1: sklearn
from sklearn.metrics import f1_score
f1_score(df["y_Actual"], df["y_Predicted"])

In [None]:
# Method 2: Manual Calculation
recall = recall_score(df["y_Actual"], df["y_Predicted"])
precision = precision_score(df["y_Actual"], df["y_Predicted"])

F1 = 2 * (precision * recall) / (precision + recall)
F1

In [None]:
# Method 3: Classification report 
from sklearn.metrics import classification_report
print(classification_report(df["y_Actual"], df["y_Predicted"]))

# Confustion Matrix in a nutshell

![image.png](attachment:image.png)

# Confusion Matrix for Multi-Class Classification

* TP = 7
* TN = (2+3+2+1) = 8
* FP = (8+9) = 17
* FN = (1+3) = 4

![image.png](attachment:image.png)

> Precision = 7/(7+17) = 0.29

> Recall = 7/(7+4) = 0.64

> F1-score = 0.40