# Precision, Recall & F1-Score — Complete Theory (Binary + Multi-Class)
---

## 1. Why Accuracy Fails

Accuracy becomes **misleading** when the dataset is **imbalanced**.

Example: Airport terrorist detection

* 99,999 normal passengers
* 1 terrorist

A model that predicts **"no terrorist" for everyone**:

$$
Accuracy = \frac{99999}{100000} = 99.99%
$$

Yet the model is **useless**.

➡️ Solution: use **Precision, Recall, F1-score**.

---

## 2. Confusion Matrix Refresher

|              | Predicted + | Predicted − |
| ------------ | ----------- | ----------- |
| **Actual +** | TP          | FN          |
| **Actual −** | FP          | TN          |

* **Type-I Error** = False Positive (FP)
* **Type-II Error** = False Negative (FN)

---

## 3. Precision

### Definition

> Of all points predicted as positive, how many are actually positive?

$$
Precision = \frac{TP}{TP + FP}
$$

### When to use

* False positives are **dangerous**
* Examples:

  * Spam detection (important email marked spam)
  * Job offer email marked spam

### Intuition

"Don’t mark something as positive unless you’re very sure."

---

## 4. Recall (Sensitivity)

### Definition

> Of all actual positives, how many did the model correctly find?

$$
Recall = \frac{TP}{TP + FN}
$$

### When to use

* False negatives are **dangerous**
* Examples:

  * Cancer detection
  * Fraud detection
  * Terrorist detection

### Intuition

"Don’t miss any real positives."

---

## 5. Precision vs Recall Trade-off

* Increasing **precision** ↓ recall
* Increasing **recall** ↓ precision

You cannot maximize both simultaneously.

---

## 6. F1-Score

### Why F1?

When **both FP and FN matter** and neither can be ignored.

### Formula (Harmonic Mean)

$$
F1 = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}
$$

### Why Harmonic Mean?

* Penalizes **low values**
* If either precision or recall is low → F1 drops sharply

Example:

* Precision = 0.9
* Recall = 0.1

$$
F1 \approx 0.18
$$

---

## 7. Binary Classification Summary

| Scenario            | Priority Metric |
| ------------------- | --------------- |
| Spam filtering      | Precision       |
| Cancer detection    | Recall          |
| Balanced importance | F1-Score        |

---

## 8. Multi-Class Classification Metrics

For **K classes**, metrics are computed **per class**.

### Per-Class Precision

$$
Precision_c = \frac{TP_c}{TP_c + FP_c}
$$

### Per-Class Recall

$$
Recall_c = \frac{TP_c}{TP_c + FN_c}
$$

### Per-Class F1

$$
F1_c = \frac{2 \cdot Precision_c \cdot Recall_c}{Precision_c + Recall_c}
$$

---

## 9. Averaging Methods (Multi-Class)

### Macro Average

Simple mean across classes:

$$
Macro = \frac{1}{K} \sum_{c=1}^{K} metric_c
$$

✔️ Use when classes are **balanced**.

---

### Weighted Average

$$
Weighted = \sum_{c=1}^{K} w_c \cdot metric_c
$$

where

$$
w_c = \frac{samples_c}{total\ samples}
$$

✔️ Use when classes are **imbalanced**.

---

## 10. Python Code (Binary + Multi-Class)

```python
from sklearn.metrics import (
    confusion_matrix,
    precision_score,
    recall_score,
    f1_score,
    classification_report
)

# Binary classification
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Multi-class (macro / weighted)
precision_macro = precision_score(y_test, y_pred, average='macro')
recall_weighted = recall_score(y_test, y_pred, average='weighted')
f1_macro = f1_score(y_test, y_pred, average='macro')

print(classification_report(y_test, y_pred))
```

---

## 11. classification_report Output Explained

* Precision per class
* Recall per class
* F1 per class
* Support = samples per class
* Macro avg
* Weighted avg

---

## 12. Interview-Ready One-Liners

* **Accuracy fails on imbalanced data**
* **Precision controls false positives**
* **Recall controls false negatives**
* **F1 balances precision & recall**
* **Macro → balanced classes**
* **Weighted → imbalanced classes**

---

## 13. Final Takeaway

Metric choice is **problem-dependent**, not fixed.

> A good ML engineer chooses metrics **before** training models.

---


In [None]:
import numpy as np
import pandas as pd

In [None]:
from sklearn.datasets import load_iris

In [None]:
iris = load_iris()

In [None]:
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

In [None]:
df['species'] = iris.target

In [None]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,0:-1],df.iloc[:,-1],test_size=0.2,random_state=1)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [None]:
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()

In [None]:
clf1.fit(X_train,y_train)
clf2.fit(X_train,y_train)

In [None]:
y_pred1 = clf1.predict(X_test)
y_pred2 = clf2.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score,confusion_matrix
print("Accuracy of Logistic Regression",accuracy_score(y_test,y_pred1))
print("Accuracy of Decision Trees",accuracy_score(y_test,y_pred2))

Accuracy of Logistic Regression 0.9666666666666667
Accuracy of Decision Trees 0.9666666666666667


In [None]:
print("Logistic Regression Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,3)))

Logistic Regression Confusion Matrix



Unnamed: 0,0,1,2
0,11,0,0
1,0,12,1
2,0,0,6


In [None]:
print("Decision Tree Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,3)))

Decision Tree Confusion Matrix



Unnamed: 0,0,1,2
0,11,0,0
1,0,12,1
2,0,0,6


In [None]:

result = pd.DataFrame()
result['Actual Label'] = y_test
result['Logistic Regression Prediction'] = y_pred1
result['Decision Tree Prediction'] = y_pred2
result.sample(10)

Unnamed: 0,Actual Label,Logistic Regression Prediction,Decision Tree Prediction
66,1,1,1
99,1,1,1
56,1,1,1
120,2,2,2
5,0,0,0
78,1,1,1
19,0,0,0
90,1,1,1
146,2,2,2
98,1,1,1


In [None]:

from sklearn.metrics import precision_score,recall_score
precision_score(y_test,y_pred1,average=None)

array([1.        , 1.        , 0.85714286])

In [None]:

recall_score(y_test,y_pred1,average=None)

array([1.        , 0.92307692, 1.        ])