# Day 4 — Error Analysis on Classification Models
### Machine Learning Roadmap — Week 3
### Author — N Manish Kumar
---

In previous days, we evaluated and compared models using cross-validation.
However, overall accuracy does not tell us *where* the model is making mistakes.

Error analysis focuses on inspecting **misclassified examples** to understand:
- What types of cases the model fails on
- Whether errors follow certain patterns
- What kind of data or features could improve performance

In this notebook, we will:
- Train a classification model
- Predict on the test set
- Identify false positives and false negatives
- Analyze feature patterns in misclassified samples
- Discuss how the model could be improved

Dataset used: **Breast Cancer Dataset (sklearn)**

---

## 1. Import Libraries, Load Dataset and Train-Test Split

In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

print("Training set shape:", X_train.shape)
print("Test set shape:", X_test.shape)

Training set shape: (455, 30)
Test set shape: (114, 30)


---
## 2. Train Model and Generate Predictions

To analyze model errors, we first train a classification model on the
training data and generate predictions on the test set.

We use a Pipeline with StandardScaler and Logistic Regression to ensure
proper preprocessing and stable training.


In [2]:
# Build pipeline
model = Pipeline(steps=[
    ("scaler", StandardScaler()),
    ("model", LogisticRegression(solver="saga", max_iter=10000))
])

# Train model
model.fit(X_train, y_train)

# Predictions on test set
y_pred = model.predict(X_test)


### Interpretation

The model is now trained using the training data and has generated
predictions on the unseen test set.

We can now compare predicted labels with true labels to identify
which samples were classified correctly and which were misclassified.

---
## 3. Confusion Matrix and Error Types

A confusion matrix shows how many predictions fall into each category:

- True Positives (TP)
- True Negatives (TN)
- False Positives (FP)
- False Negatives (FN)

False Positives and False Negatives represent different types of mistakes
and often have different real-world consequences.

Analyzing these errors helps us understand which cases the model struggles with.

In [3]:
cm = confusion_matrix(y_test, y_pred)
cm

array([[41,  1],
       [ 1, 71]])

### Interpretation

The confusion matrix shows how many samples were correctly and incorrectly
classified for each class.

False Positives represent cases where the model predicted positive but the
true label was negative.

False Negatives represent cases where the model predicted negative but the
true label was positive.

In the next step, we will isolate these misclassified samples and inspect
their feature values.

---
## 4. Inspect Misclassified Samples

To understand model errors, we isolate the samples where predictions
do not match the true labels.

We separate:
- False Positives (predicted positive, actually negative)
- False Negatives (predicted negative, actually positive)

By examining their feature values, we can look for patterns that may
explain why the model struggled with these cases.

In [5]:
# Identify misclassified indices
misclassified_idx = np.where(y_test != y_pred)[0]

# True labels and predictions
y_test_np = y_test.values

# False Positives: predicted 1, actual 0
fp_idx = np.where((y_pred == 1) & (y_test_np == 0))[0]

# False Negatives: predicted 0, actual 1
fn_idx = np.where((y_pred == 0) & (y_test_np == 1))[0]

print("Number of False Positives:", len(fp_idx))
print("Number of False Negatives:", len(fn_idx))

# View a few samples
X_test_fp = X_test.iloc[fp_idx]
X_test_fn = X_test.iloc[fn_idx]

X_test_fp.head(), X_test_fn.head()

Number of False Positives: 1
Number of False Negatives: 1


(    mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
 73         13.8         15.79           90.43      584.1           0.1007   
 
     mean compactness  mean concavity  mean concave points  mean symmetry  \
 73             0.128         0.07789              0.05069         0.1662   
 
     mean fractal dimension  ...  worst radius  worst texture  worst perimeter  \
 73                 0.06566  ...         16.57          20.86            110.3   
 
     worst area  worst smoothness  worst compactness  worst concavity  \
 73       812.4            0.1411             0.3542           0.2779   
 
     worst concave points  worst symmetry  worst fractal dimension  
 73                0.1383          0.2589                    0.103  
 
 [1 rows x 30 columns],
      mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
 541        14.47         24.99           95.81      656.4          0.08837   
 
      mean compactness  mean concavity  mean con

### Interpretation

False Positive samples are cases where the model predicted cancer but
the true label was non-cancer, which could lead to unnecessary concern.

False Negative samples are cases where the model missed actual cancer,
which is usually more dangerous in medical applications.

Inspecting feature values of these samples can reveal whether they
look similar to the opposite class, making them difficult to classify.

---
## 5. Compare Feature Distributions for Errors vs Correct Predictions

To understand why the model is confused, we compare feature values of:

- Correctly classified samples
- Misclassified samples (FP and FN)

If misclassified samples have feature distributions similar to the opposite
class, it indicates overlapping regions where linear models may struggle.

We will compare simple summary statistics for selected important features.

In [6]:
# Indices of correctly classified samples
correct_idx = np.where(y_test_np == y_pred)[0]

X_test_correct = X_test.iloc[correct_idx]

# Compare mean feature values for a few important features
features_to_check = [
    "mean radius",
    "mean texture",
    "mean smoothness",
    "mean concavity"
]

comparison_df = pd.DataFrame({
    "Correct Mean": X_test_correct[features_to_check].mean(),
    "False Positive Mean": X_test_fp[features_to_check].mean(),
    "False Negative Mean": X_test_fn[features_to_check].mean()
})

comparison_df

Unnamed: 0,Correct Mean,False Positive Mean,False Negative Mean
mean radius,14.371223,13.8,14.47
mean texture,19.441786,15.79,24.99
mean smoothness,0.097176,0.1007,0.08837
mean concavity,0.087188,0.07789,0.1009


### Interpretation

If False Positive or False Negative samples have feature averages closer to
the opposite class than to correctly classified samples, it suggests that
those cases lie near the decision boundary.

This indicates that:
- The classes overlap in feature space
- A linear model may struggle to separate them perfectly

In such cases, using more expressive models or engineering better features
could help improve performance.

---
## 6. What Could Improve This Model?

After identifying where the model makes mistakes, the next step is to
decide how to improve performance logically instead of randomly tuning models.

Based on error patterns, possible improvement strategies include:
- Adding more informative features
- Using non-linear models
- Adjusting classification threshold
- Collecting more training data in difficult regions

Error analysis helps prioritize which action is most likely to help.


In [8]:
# Get prediction probabilities
y_prob = model.predict_proba(X_test)[:, 1]

# Probabilities of misclassified samples
fp_probs = y_prob[fp_idx]
fn_probs = y_prob[fn_idx]

print("False Positive probabilities (first 5):", fp_probs[:5])
print("False Negative probabilities (first 5):", fn_probs[:5])

False Positive probabilities (first 5): [0.90864595]
False Negative probabilities (first 5): [0.3642837]


### Interpretation

If misclassified samples have probabilities close to 0.5, it indicates
that the model was uncertain and the samples lie near the decision boundary.

This suggests that:
- Small changes in features could flip predictions
- A more expressive model (e.g., tree-based) may separate classes better

Error analysis therefore guides whether to:
- Improve features
- Change model family
- Tune thresholds depending on application needs

---
# Notebook Summary — Week 3 Day 4

In this notebook, we performed error analysis to understand where and why
a classification model makes mistakes instead of relying only on accuracy.

### What was done
- Trained a Logistic Regression model using a proper pipeline
- Generated predictions on the untouched test set
- Built a confusion matrix to identify error types
- Isolated false positives and false negatives
- Compared feature patterns of correct and incorrect predictions
- Analyzed prediction probabilities for misclassified samples

### Key Learnings
- Overall accuracy does not reveal specific failure cases
- Error analysis helps identify overlapping class regions
- False negatives and false positives have different real-world impacts
- Misclassified samples often lie near the decision boundary
- Improvement strategies should be guided by error patterns

### Final Outcome
The analysis showed that many errors occur in regions where class features
overlap, suggesting that better features or more expressive models could
improve performance more effectively than blind tuning.