In [15]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

In [16]:
# Example predictions
y_true = [0, 1, 1, 0, 1, 0, 1, 1, 0, 0]
y_pred = [0, 1, 1, 1, 1, 0, 0, 1, 1, 0]

In [17]:
# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
tn, fp, fn, tp = cm.ravel()

In [18]:
print("Confusion Matrix:")
print(cm)
print("\nTrue Positives (TP):", tp)
print("True Negatives (TN):", tn)
print("False Positives (FP):", fp)
print("False Negatives (FN):", fn)

Confusion Matrix:
[[3 2]
 [1 4]]

True Positives (TP): 4
True Negatives (TN): 3
False Positives (FP): 2
False Negatives (FN): 1


In [19]:
# Evaluation metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

print("\nAccuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)



Accuracy: 0.7
Precision: 0.6666666666666666
Recall: 0.8
F1 Score: 0.7272727272727272


2. Why Accuracy Can Be Misleading:

Imagine a scenario where only 1% of a large dataset belongs to the positive class (e.g., detecting a rare disease). Even if a model predicts the negative class for all instances, its accuracy would still be 99%.

In [20]:
# Simulated rare disease dataset

y_true_rare = [1] + [0] * 99
y_pred_rare = [0] * 100

In [23]:


accuracy_rare = accuracy_score(y_true_rare, y_pred_rare)
print("\nAccuracy for rare disease dataset:", accuracy_rare)



Accuracy for rare disease dataset: 0.99


3. Different Costs for FP and FN:
 
In some scenarios, false positives and false negatives have different implications. For instance, consider spam email detection:

False Positive (FP): Marking a legitimate email as spam.
False Negative (FN): Failing to mark a spam email as spam.
Here, FPs might be costlier because important emails could be missed, while FNs just mean some spam gets through.

In [25]:
# Simulated email predictions
y_true_email = [1, 0, 1, 1, 0, 0, 1, 0, 0, 0]  # 1 is spam, 0 is legitimate
y_pred_email = [0, 0, 1, 1, 0, 0, 1, 1, 0, 0]

cm_email = confusion_matrix(y_true_email, y_pred_email)
tn_email, fp_email, fn_email, tp_email = cm_email.ravel()

cost_fp = 5  # cost of marking a legitimate email as spam
cost_fn = 2   # cost of failing to mark a spam email

total_cost = fp_email * cost_fp + fn_email * cost_fn
print("\nTotal cost for email predictions:", total_cost)



Total cost for email predictions: 7
