# Classification Metrics ‚Äî Accuracy & Confusion Matrix (Complete Theory + Code)

---

## 1. Why Classification Metrics?

After training a **classification model** (e.g. Logistic Regression, Decision Tree), the most important question is:

> **How good is the model?**

In **regression**, we used metrics like MAE, MSE, RMSE, R¬≤.

In **classification**, we use **classification metrics** to evaluate model performance.

This document covers **Accuracy** and **Confusion Matrix** in full depth, with intuition, math, edge cases, and Python code.

---

## 2. Problem Setup (Binary Classification)

Example dataset:

* Features: CGPA, IQ
* Target: Placement

  * `1` ‚Üí Placed
  * `0` ‚Üí Not Placed

Steps:

1. Split data into **Train** and **Test**
2. Train multiple models
3. Predict on test data
4. Compare predictions with actual labels

---

## 3. Accuracy (Simplest Classification Metric)

### Definition

**Accuracy** measures how many predictions are correct.

$$
\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}
$$

---

### Example (Manual)

Suppose we have 10 test samples:

| Actual | Predicted |
| ------ | --------- |
| 1      | 1         |
| 1      | 1         |
| 0      | 1 ‚ùå       |
| 1      | 1         |
| 0      | 0         |
| 1      | 1         |
| 0      | 0         |
| 1      | 1         |
| 0      | 0         |
| 1      | 0 ‚ùå       |

* Correct predictions = 8
* Total predictions = 10

$$
\text{Accuracy} = \frac{8}{10} = 0.8 = 80%
$$

---

## 4. Accuracy Using Python (Binary Case)

```python
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```

---

## 5. Accuracy for Multi-class Classification

Accuracy formula **does not change** for multi-class problems.

Example (Iris dataset):

* Classes: Setosa (0), Versicolor (1), Virginica (2)

$$
\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}
$$

Same logic. Same formula.

---

## 6. Why Accuracy Can Be Misleading ‚ùå

Accuracy gives **only one number**.

It does **NOT** tell:

* What type of mistakes are happening
* Which class is misclassified
* Whether mistakes are dangerous

---

## 7. Confusion Matrix (Core Classification Tool)

A **Confusion Matrix** shows:

* Correct predictions
* Type of errors

For **binary classification**:

| Actual \ Predicted | 1  | 0  |
| ------------------ | -- | -- |
| **1**              | TP | FN |
| **0**              | FP | TN |

---

## 8. Terminology (Must Memorize)

### True Positive (TP)

* Predicted: 1
* Actual: 1

### True Negative (TN)

* Predicted: 0
* Actual: 0

### False Positive (FP) ‚Äî **Type-I Error**

* Predicted: 1
* Actual: 0

### False Negative (FN) ‚Äî **Type-II Error**

* Predicted: 0
* Actual: 1

---

### Memory Trick üß†

* **Positive / Negative ‚Üí comes from prediction**
* **True / False ‚Üí comes from actual label**

---

## 9. Confusion Matrix in Python

```python
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)
```

Example output:

```
[[22  1]
 [ 6 26]]
```

Meaning:

* TN = 22
* FP = 1
* FN = 6
* TP = 26

---

## 10. Accuracy from Confusion Matrix

$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

---

## 11. Type-I vs Type-II Errors

### Type-I Error (False Positive)

* Model predicts **Positive**, but actually **Negative**

Example:

* Predict cancer ‚Üí patient is healthy

### Type-II Error (False Negative)

* Model predicts **Negative**, but actually **Positive**

Example:

* Predict healthy ‚Üí patient has cancer ‚ùå

---

## 12. Accuracy Fails on Imbalanced Datasets ‚ö†Ô∏è

### Example: Airport Security

* Terrorists: 1
* Normal passengers: 9999

Model predicts **everyone as normal**.

Confusion Matrix:

| Actual \ Predicted | Terrorist | Normal |
| ------------------ | --------- | ------ |
| Terrorist          | 0         | 1      |
| Normal             | 0         | 9999   |

$$
\text{Accuracy} = \frac{9999}{10000} = 99.99%
$$

Model is **useless**, but accuracy is high.

---

## 13. Key Takeaways

* Accuracy is **simple**, **fast**, but **dangerous** alone
* Confusion Matrix explains **where the model fails**
* Accuracy is misleading on **imbalanced datasets**
* Always inspect **FP and FN** before deploying models

---

## 14. What Comes Next?

To fix accuracy limitations, we use:

* **Precision**
* **Recall**
* **F1-Score**

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv('heart.csv')

In [None]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,0:-1],df.iloc[:,-1],test_size=0.2,random_state=2)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [None]:

clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()

In [None]:

clf1.fit(X_train,y_train)
clf2.fit(X_train,y_train)

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:

y_pred1 = clf1.predict(X_test)
y_pred2 = clf2.predict(X_test)

In [None]:

from sklearn.metrics import accuracy_score,confusion_matrix
print("Accuracy of Logistic Regression",accuracy_score(y_test,y_pred1))
print("Accuracy of Decision Trees",accuracy_score(y_test,y_pred2))

Accuracy of Logistic Regression 0.9016393442622951
Accuracy of Decision Trees 0.8360655737704918


In [None]:

confusion_matrix(y_test,y_pred1)

array([[26,  6],
       [ 0, 29]])

In [None]:
print("Logistic Regression Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,2)))

Logistic Regression Confusion Matrix



Unnamed: 0,0,1
0,26,6
1,0,29


In [None]:
print("Decision Tree Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,2)))

Decision Tree Confusion Matrix



Unnamed: 0,0,1
0,24,8
1,2,27


In [None]:

result = pd.DataFrame()
result['Actual Label'] = y_test
result['Logistic Regression Prediction'] = y_pred1
result['Decision Tree Prediction'] = y_pred2

In [None]:

result.sample(10)

Unnamed: 0,Actual Label,Logistic Regression Prediction,Decision Tree Prediction
147,1,1,1
13,1,1,0
267,0,0,1
173,0,0,1
251,0,0,0
29,1,1,1
94,1,1,1
66,1,1,0
169,0,0,0
276,0,0,0


In [None]:
from sklearn.metrics import recall_score,precision_score,f1_score
print("For Logistic regression Model")
print("-"*50)
cdf = pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,2)))
print(cdf)
print("-"*50)
print("Precision - ",precision_score(y_test,y_pred1))
print("Recall - ",recall_score(y_test,y_pred1))
print("F1 score - ",f1_score(y_test,y_pred1))

For Logistic regression Model
--------------------------------------------------
    0   1
0  26   6
1   0  29
--------------------------------------------------
Precision -  0.8285714285714286
Recall -  1.0
F1 score -  0.90625


In [None]:
print("For DT Model")
print("-"*50)
cdf = pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,2)))
print(cdf)
print("-"*50)
print("Precision - ",precision_score(y_test,y_pred2))
print("Recall - ",recall_score(y_test,y_pred2))
print("F1 score - ",f1_score(y_test,y_pred2))

For DT Model
--------------------------------------------------
    0   1
0  24   8
1   2  27
--------------------------------------------------
Precision -  0.7714285714285715
Recall -  0.9310344827586207
F1 score -  0.84375


In [None]:
precision_score(y_test,y_pred1,average=None)

array([1.        , 0.82857143])

In [None]:

precision_score(y_test,y_pred2,average=None)

array([0.92307692, 0.77142857])

In [None]:

recall_score(y_test,y_pred2,average=None)

array([0.75      , 0.93103448])