# Confusion Matrix quick Demo

The following code demonstrates how to create a confusion matrix on a predicted model.


For this, we have to import the `confusion_matrix` module from the `sklearn.metrics` library which helps us to generate the confusion matrix. We'll also have a look at the `accuracy_score` and the `classification_report`. 

In [1]:
from sklearn.metrics import confusion_matrix 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report

In [2]:
# Within a confusion matrix, we are comparing the actual target values (y_test) with the values predicted by our model (often called y_pred)
# Here, we are creating random values for our actual and predicted values as example
actual    = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0] 
predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0] 

# Here, we are creating a confusion matrix which compares actual and predicted values
results = confusion_matrix(actual, predicted) 

print ('Confusion Matrix :')
print((results) )

print ('Accuracy Score :',accuracy_score(actual, predicted) )

print ('Report : ')
print (classification_report(actual, predicted) )

Confusion Matrix :
[[4 2]
 [1 3]]
Accuracy Score : 0.7
Report : 
              precision    recall  f1-score   support

           0       0.80      0.67      0.73         6
           1       0.60      0.75      0.67         4

    accuracy                           0.70        10
   macro avg       0.70      0.71      0.70        10
weighted avg       0.72      0.70      0.70        10



## Orientation
Take a moment to orient yourself so you don't mix things up:  
- In the confusion matrix, what do rows mean and what do columns mean?  
- Where are the TP, FP, TN, and FN? 
> Hint: They are easily identifiable by counting.

Here’s a generic confusion matrix:

        | Predicted Positive	            | Predicted Negative
-------------------------------------------------------------        
Actual  | Positive	True Positive (TP)	    |   False Negative (FN)
Actual  | Negative	False Positive (FP)	    |   True Negative (TN)

If FP = 1, but FN is high, the model misses too many true cases!
If FN = 1, but FP is high, healthy people are repeatedly flagged.
Best model: Minimizes both, but how much you tolerate of each depends on the application.

TN: 4
FP: 2
FN: 1
TP: 3
Quick Recap on Orientation
Rows = Actual
Columns = Predicted
Each cell tells you "How many times did the model predict X when the actual value was Y?"
If you count:

The sum of row 1 (actual 0) is 6 (4+2)
The sum of row 2 (actual 1) is 4 (1+3)
This matches the “support” values from our report.

## Further Reading


https://machinelearningmastery.com/confusion-matrix-machine-learning/ 

http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/ 


A classic Data Science interview question is to ask "What is better--more false positives, or false negatives?" 


This is a trick question designed to test your critical thinking on the topics of precision and recall. 



As you're probably thinking, the answer is "It depends on the problem!". 



Sometimes, our model may be focused on a problem where False Positives are much worse than False Negatives, or vice versa. For instance, detecting credit card fraud. A False Positive would be when our model flags a transaction as fraudulent, and it isn't. This results in a slightly annoyed customer. On the other hand, a False Negative might be a fraudulent transaction that the company mistakenly lets through as normal consumer behavior. In this case, the credit card company could be on the hook for reimbursing the customer for thousands of dollars because they missed the signs that the transaction was fraudulent! Although being wrong is never ideal, it makes sense that credit card companies tend to build their models to be a bit too sensitive, because having a high recall saves them more money than having a high precision score.

Take a few minutes and see if you can think of at least 2 examples each of situations where a high precision might be preferable to high recall, and 2 examples where high recall might be preferable to high precision. 

 So, in summary, and if I understand correctly, the “best” model is not always the model with only one false positive result. The best model depends on your objective and the actual costs of each type of error. In healthcare or other sensitive sectors. Let me try to answer this question. In health cases, it is preferable to have higher precision, and in cases involving less bias, a higher recall. Likewise, you would want higher precision in a model to predict a contaminated spot in soil in contact with groundwater, and higher recall when predicting a good match between pets and new pet owners. 

High Precision Preferred
High Precision = When you really want to trust that a positive prediction is correct
(You want very few false positives, even if you miss some real positives.)

Examples:

Identifying contaminated soil spots near groundwater
 As you mentioned! Here, a false positive might lead to unnecessary and expensive soil remediation, so you want to avoid incorrectly labeling a clean spot as contaminated.

Spam email filters
 You want high precision so that when an email is marked as spam, you can trust it really is spam—otherwise, you might miss an important non-spam message (a damaging false positive).

Medical diagnosis of a rare disease with dangerous treatment
 If the treatment is risky or costly, you want to be sure those labeled “disease positive” actually have it (e.g., before recommending chemotherapy).

High Recall Preferred
High Recall = When catching every possible positive is more important than making a few mistakes
(You want very few false negatives, even if some false alarms slip through.)

Examples:

Health screenings for contagious diseases
 Missing an infected person (false negative) could let an illness spread, so you want to catch every possible case, even if it means some healthy people are told to isolate unnecessarily.

Predicting good matches between pets and new pet-holders
If the goal is to make sure every potentially suitable pairing is considered, you'd rather have a few mismatches than overlook a great adoption opportunity.

Fire alarm systems
 Better to have too many false alarms (people evacuate unnecessarily) than to miss a real fire.