# 5) Pattern Recognition & Classification

### 5.1 Confusion Matrix and Metrics

**What is this?**
A table showing the performance of a binary classifier by comparing predictions to true labels.

**Confusion Matrix:**
```
                Predicted
              0 (Neg)  1 (Pos)
Actual  0     TN       FP
        1     FN       TP
```

**Key Metrics:**
- **Precision** = $\frac{TP}{TP+FP}$ = Of all positive predictions, how many were correct?
  - High precision = Few false alarms
- **Recall (Sensitivity)** = $\frac{TP}{TP+FN}$ = Of all actual positives, how many did we catch?
  - High recall = Few missed positives
- **Accuracy** = $\frac{TP+TN}{TP+TN+FP+FN}$ = Overall correctness

![title](img/confusion_matrix.png)

**Trade-offs:**
- Precision ↑ Recall ↓ (and vice versa)
- Spam filter: High precision = Less good email marked as spam
- Disease screening: High recall = Catch more sick patients

In [None]:
def confusion_counts(y_true, y_pred):
    y_true = np.asarray(y_true, int)
    y_pred = np.asarray(y_pred, int)
    TP = int(np.sum((y_true==1) & (y_pred==1)))
    TN = int(np.sum((y_true==0) & (y_pred==0)))
    FP = int(np.sum((y_true==0) & (y_pred==1)))
    FN = int(np.sum((y_true==1) & (y_pred==0)))
    return {"TP":TP, "TN":TN, "FP":FP, "FN":FN}

def precision_recall_accuracy(counts):
    TP, TN, FP, FN = counts["TP"], counts["TN"], counts["FP"], counts["FN"]
    precision = TP/(TP+FP) if (TP+FP)>0 else 0.0
    recall = TP/(TP+FN) if (TP+FN)>0 else 0.0
    accuracy = (TP+TN)/(TP+TN+FP+FN) if (TP+TN+FP+FN)>0 else 0.0
    return precision, recall, accuracy


### F1 Score

**What is this?**
The harmonic mean of precision and recall, providing a balanced measure of classifier performance.

**Formula:** $F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$

**Alternative form:** $F_1 = \frac{2 \cdot TP}{2 \cdot TP + FP + FN}$

**Why harmonic mean?**
- Penalizes extreme values (unlike arithmetic mean)
- If either precision or recall is low, F1 is low
- Example: Precision=1.0, Recall=0.1 → F1=0.18 (not 0.55)

**When to use:**
- Imbalanced datasets
- Need balance between precision and recall
- Single metric for model comparison

**Variants:**
- **F-beta score**: $F_\beta = (1+\beta^2) \cdot \frac{\text{Precision} \cdot \text{Recall}}{\beta^2 \cdot \text{Precision} + \text{Recall}}$
- β > 1: Favor recall
- β < 1: Favor precision

In [None]:
def f1_score(y_true, y_pred):
    """
    Compute F1 score for binary classification.
    
    Parameters:
    -----------
    y_true : array-like
        True binary labels
    y_pred : array-like
        Predicted binary labels
    
    Returns:
    --------
    f1 : float
        F1 score
    """
    counts = confusion_counts(y_true, y_pred)
    precision, recall, _ = precision_recall_accuracy(counts)
    
    if precision + recall == 0:
        return 0.0
    
    f1 = 2 * precision * recall / (precision + recall)
    return float(f1)

def fbeta_score(y_true, y_pred, beta=1.0):
    """
    Compute F-beta score.
    
    Parameters:
    -----------
    y_true : array-like
        True binary labels
    y_pred : array-like
        Predicted binary labels
    beta : float, default=1.0
        Weight of recall vs precision
        beta > 1: favor recall
        beta < 1: favor precision
    
    Returns:
    --------
    fbeta : float
        F-beta score
    """
    counts = confusion_counts(y_true, y_pred)
    precision, recall, _ = precision_recall_accuracy(counts)
    
    if (beta**2 * precision + recall) == 0:
        return 0.0
    
    fbeta = (1 + beta**2) * precision * recall / (beta**2 * precision + recall)
    return float(fbeta)

### ROC Curve and AUC

**What is this?**
ROC (Receiver Operating Characteristic) curve visualizes classifier performance across all threshold values.

**How it works:**
1. For each threshold, compute TPR and FPR:
   - **TPR (True Positive Rate / Recall)**: $\frac{TP}{TP+FN}$ (y-axis)
   - **FPR (False Positive Rate)**: $\frac{FP}{FP+TN}$ (x-axis)
2. Plot (FPR, TPR) points for all thresholds
3. Connect points to form ROC curve

**AUC (Area Under Curve):**
- AUC = 1.0: Perfect classifier
- AUC = 0.5: Random classifier (diagonal line)
- AUC < 0.5: Worse than random (flip predictions!)

**Interpretation:**
- Closer to top-left corner = Better classifier
- AUC summarizes performance in single number
- Threshold-independent metric

**When to use:** Comparing classifiers, handling imbalanced data, when costs unknown

In [None]:
def compute_roc_curve(y_true, y_scores):
    """
    Compute ROC curve points.
    
    Parameters:
    -----------
    y_true : array-like
        True binary labels (0 or 1)
    y_scores : array-like
        Predicted scores/probabilities
    
    Returns:
    --------
    fpr : array
        False positive rates
    tpr : array
        True positive rates
    thresholds : array
        Thresholds used
    """
    y_true = np.asarray(y_true, int)
    y_scores = np.asarray(y_scores, float)
    
    # Get unique thresholds (sorted descending)
    thresholds = np.unique(y_scores)[::-1]
    
    fpr_list = []
    tpr_list = []
    
    for threshold in thresholds:
        y_pred = (y_scores >= threshold).astype(int)
        counts = confusion_counts(y_true, y_pred)
        
        # TPR = TP / (TP + FN)
        tpr = counts["TP"] / (counts["TP"] + counts["FN"]) if (counts["TP"] + counts["FN"]) > 0 else 0
        # FPR = FP / (FP + TN)
        fpr = counts["FP"] / (counts["FP"] + counts["TN"]) if (counts["FP"] + counts["TN"]) > 0 else 0
        
        fpr_list.append(fpr)
        tpr_list.append(tpr)
    
    return np.array(fpr_list), np.array(tpr_list), thresholds

def compute_auc(fpr, tpr):
    """Compute Area Under ROC Curve using trapezoidal rule."""
    # Sort by fpr
    indices = np.argsort(fpr)
    fpr_sorted = fpr[indices]
    tpr_sorted = tpr[indices]
    
    # Trapezoidal integration
    auc = float(np.trapz(tpr_sorted, fpr_sorted))
    return auc

### 5.2 Thresholding Score into Labels

**What is this?**
Converts continuous scores (e.g., probabilities, confidence values) into binary class labels.

**Algorithm:** 
- If score ≥ threshold → Predict class 1 (positive)
- If score < threshold → Predict class 0 (negative)

**Threshold selection:**
- **threshold = 0.5**: Default for balanced classes
- **Lower threshold**: More positive predictions (↑ recall, ↓ precision)
- **Higher threshold**: Fewer positive predictions (↓ recall, ↑ precision)

**Use cases:**
- After logistic regression (threshold probability)
- After SVM (threshold decision function)
- Tune threshold based on business costs

In [None]:
def score_to_label(scores, threshold=0.0):
    scores = np.asarray(scores, float)
    return (scores >= threshold).astype(int)


### 5.3 Cost-Sensitive Decisions

**What is this?**
Making classification decisions when different types of errors have different costs.

**Cost Components:**
- `c_fp`: Cost of False Positive (e.g., unnecessary treatment)
- `c_fn`: Cost of False Negative (e.g., missed disease)
- `c_tp`: Cost of True Positive (usually 0)
- `c_tn`: Cost of True Negative (usually 0)

**Total Cost:** $\text{Cost} = c_{FP} \cdot FP + c_{FN} \cdot FN + c_{TP} \cdot TP + c_{TN} \cdot TN$

**Threshold Sweeping:**
Try many threshold values and choose the one that minimizes total cost.

**Example:**
- Medical test: FN (missed disease) might cost 100x more than FP (unnecessary follow-up)
- Lower threshold to catch more cases (↑ recall) even if more false alarms

In [None]:
def total_cost_from_counts(counts, c_fp=1.0, c_fn=1.0, c_tp=0.0, c_tn=0.0):
    return (counts["FP"]*c_fp + counts["FN"]*c_fn + counts["TP"]*c_tp + counts["TN"]*c_tn)

def sweep_thresholds(y_true, scores, thresholds, c_fp=1.0, c_fn=1.0):
    y_true = np.asarray(y_true, int)
    scores = np.asarray(scores, float)
    best = None
    for t in thresholds:
        y_pred = (scores >= t).astype(int)
        counts = confusion_counts(y_true, y_pred)
        cost = total_cost_from_counts(counts, c_fp=c_fp, c_fn=c_fn)
        row = {"threshold": float(t), "cost": float(cost), **counts}
        if best is None or row["cost"] < best["cost"]:
            best = row
    return best


### 5.4 Logistic Regression (from scratch)

**What is this?**
A linear model for binary classification that predicts probabilities using the sigmoid function.

**Model:** $P(y=1|x) = \sigma(w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$

**How it works:**
1. **Predict probability:** Apply sigmoid to linear combination of features
2. **Loss function:** Negative log-likelihood (cross-entropy)
   - $L = -\sum [y \log(p) + (1-y)\log(1-p)]$
3. **Training:** Minimize loss using optimization (e.g., gradient descent, CG)

**Output:**
- `w`: Feature weights (coefficients)
- `b`: Bias (intercept)
- Higher probability → More confident in class 1

**Advantages:**
- Outputs calibrated probabilities
- Interpretable coefficients
- Works well for linearly separable data

In [None]:
import numpy as np

def logistic_predict_proba(X, w, b):
    X = np.asarray(X, float)
    z = X @ w + b
    return 1.0 / (1.0 + np.exp(-z))

def logistic_neg_loglik(params, X, y, eps=1e-12):
    X = np.asarray(X, float)
    y = np.asarray(y, float)
    w = params[:-1]
    b = params[-1]
    p = logistic_predict_proba(X, w, b)
    p = np.clip(p, eps, 1-eps)
    return float(np.sum(-(y*np.log(p) + (1-y)*np.log(1-p))))

from scipy import optimize

def fit_logistic(X, y):
    X = np.asarray(X, float)
    y = np.asarray(y, float)
    init = np.zeros(X.shape[1] + 1)
    res = optimize.minimize(lambda p: logistic_neg_loglik(p, X, y), init, method="CG")
    w = res.x[:-1]
    b = res.x[-1]
    return w, b, res
