## Logistic Regression
Estimate $\Pr(y=1 \mid \mathbf{x})$ for binary labels $y\in\{0,1\}$.

<br>

<img src="visualizations/logistic_regression.png" width="600">
<img src="visualizations/logistic_regression2.png" width="600">


* **Hypothesis**

  $$
    h_\theta(\mathbf{x}) = \sigma(\theta^\top \mathbf{x})
    = \frac{1}{1 + e^{-\theta^\top \mathbf{x}}}.
  $$

* **Decision Rule**
  Predict 1 if $h_\theta(\mathbf{x}) \ge 0.5$; else 0.

* **Loss Function**
  Cross‐entropy (negative log-likelihood):

  $$
    J(\theta)
    = -\frac{1}{m}\sum_{i=1}^m \bigl[y^{(i)}\log h_\theta(\mathbf{x}^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(\mathbf{x}^{(i)}))\bigr].
  $$

* **Optimization**
  Gradient descent (or advanced optimizers):

  $$
    \nabla J(\theta)_j
    = \frac{1}{m}\sum_i \bigl(h_\theta(\mathbf{x}^{(i)}) - y^{(i)}\bigr)\,x_j^{(i)}.
  $$

---

### ℓ₂ Regularization (Ridge Penalty)

Adds $\tfrac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$ to $J(\theta)$, which:

* **Shrinks** weights toward zero (but not exactly to zero).
* **Reduces variance**, helping prevent overfitting.
* Keeps the overall cost convex and easy to optimize.

---

### Softmax (Multinomial Logistic)

<p align="center">
<img src="visualizations/logistic_regression_softmax.png" width="600">
</p>


For $K$-class problems, generalize with parameters $\{\theta_k\}$:

$$
  \Pr(y=k\mid \mathbf{x})
  = \frac{\exp(\theta_k^\top \mathbf{x})}{\sum_{j=1}^K \exp(\theta_j^\top \mathbf{x})}.
$$

Use the multiclass cross-entropy loss and jointly optimize all $\theta_k$, yielding well-calibrated probabilities without overlapping decision “gaps.”


In [1]:
import numpy as np
from tqdm import tqdm
from cifar10.unpickle import get_all_data, get_test_data

### Load & Preprocess CIFAR-10

In [2]:
# 1) Load
x_train, y_train = get_all_data()
x_test, y_test = get_test_data()

# 2) Normalize to [0,1] and flatten
x_train = x_train.astype(np.float32) / 255.0
x_test = x_test.astype(np.float32) / 255.0

n_samples, h, w, c = x_train.shape  # h=32, w=32, c=3
x_train = x_train.reshape(n_samples, h * w * c)  # flatten to (N, 3072)
x_test = x_test.reshape(x_test.shape[0], h * w * c)  # flatten to (N, 3072)

# 3) One-hot encode labels
num_classes = 10
def one_hot(y, K):
    m = y.shape[0]
    oh = np.zeros((m, K))
    oh[np.arange(m), y] = 1
    return oh
# So if y = [3, 1, 0], then:
# one_hot(y, 10) → [
#   [0,0,0,1,0,0,0,0,0,0],
#   [0,1,0,0,0,0,0,0,0,0],
#   [1,0,0,0,0,0,0,0,0,0]
# ]


Y_train = one_hot(np.array(y_train), num_classes)
Y_test  = one_hot(np.array(y_test),  num_classes)

## Model Components (Softmax & Loss -with ℓ₂)

In [3]:
def softmax(logits):
    # logits: (batch, K)
    exp = np.exp(logits - np.max(logits, axis=1, keepdims=True))
    return exp / exp.sum(axis=1, keepdims=True)

def compute_loss_and_grad(X, Y, W, b, reg_strength):
    """
    X: (batch, D), Y: (batch, K) one-hot
    W: (D, K), b: (K,)
    returns: loss (scalar), dW, db
    """
    m = X.shape[0]
    logits = X.dot(W) + b          # (m, K)
    P = softmax(logits)            # (m, K)
    
    # Cross-entropy loss
    data_loss = -np.sum(Y * np.log(P + 1e-12)) / m
    
    # ℓ₂ penalty (exclude bias)
    reg_loss = 0.5 * reg_strength * np.sum(W*W)
    loss = data_loss + reg_loss
    
    # Gradient
    dlogits = (P - Y) / m          # (m, K)
    dW = X.T.dot(dlogits)          # (D, K)
    db = dlogits.sum(axis=0)       # (K,)
    
    # add regularization gradient
    dW += reg_strength * W
    return loss, dW, db

## Training Loop

Mini-batch SGD

In [4]:
def train(X, Y, X_val, Y_val,
          lr=1e-2, reg=1e-3,
          batch_size=256, epochs=20):
    D, K = X.shape[1], Y.shape[1]
    # Initialize parameters
    W = 0.001 * np.random.randn(D, K)
    b = np.zeros(K)
    
    for epoch in range(epochs):
        # Shuffle
        perm = np.random.permutation(X.shape[0])
        X_shuf, Y_shuf = X[perm], Y[perm]
        
        for i in range(0, X.shape[0], batch_size):
            xb = X_shuf[i:i+batch_size]
            yb = Y_shuf[i:i+batch_size]
            loss, dW, db = compute_loss_and_grad(xb, yb, W, b, reg)
            
            # Parameter update
            W -= lr * dW
            b -= lr * db
        
        # Evaluate train/val accuracy
        if (epoch+1) % 5 == 0 or epoch==0:
            def predict_acc(X, Y):
                probs = softmax(X.dot(W) + b)
                preds = np.argmax(probs, axis=1)
                return np.mean(preds == np.argmax(Y, axis=1))
            train_acc = predict_acc(X, Y)
            val_acc   = predict_acc(X_val, Y_val)
            print(f"Epoch {epoch+1}/{epochs} — loss: {loss:.4f}, "
                  f"train_acc: {train_acc:.3f}, val_acc: {val_acc:.3f}")
    
    return W, b

# Example usage:
W, b = train(x_train, Y_train, x_test, Y_test,
             lr=1e-2, reg=1e-3,
             batch_size=512, epochs=30)

Epoch 1/30 — loss: 1.9981, train_acc: 0.297, val_acc: 0.299
Epoch 5/30 — loss: 1.8212, train_acc: 0.368, val_acc: 0.362
Epoch 10/30 — loss: 1.8155, train_acc: 0.377, val_acc: 0.365
Epoch 15/30 — loss: 1.8038, train_acc: 0.385, val_acc: 0.381
Epoch 20/30 — loss: 1.7696, train_acc: 0.392, val_acc: 0.385
Epoch 25/30 — loss: 1.7475, train_acc: 0.397, val_acc: 0.392
Epoch 30/30 — loss: 1.8043, train_acc: 0.401, val_acc: 0.395


<h1 style="font-size: 40px;">Integrate filters</h1>

In [5]:
import sys
import os

project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

from images.image_preprocessing import (
    extract_raw_pixels,
    extract_color_histogram,
    extract_hog,
    extract_lbp,
)

In [7]:
def batch_extract(fn, X):
    """Apply single‐image fn over a batch X of shape (n,H,W,C)."""
    return np.stack([fn(im) for im in X], axis=0)


x_train, y_train = get_all_data()
x_test, y_test = get_test_data()

Xraw_train = batch_extract(extract_raw_pixels, x_train)
Xraw_test = batch_extract(extract_raw_pixels, x_test)

Xhist_train = batch_extract(extract_color_histogram, x_train)
Xhist_test = batch_extract(extract_color_histogram, x_test)

Xhog_train = batch_extract(extract_hog, x_train)
Xhog_test = batch_extract(extract_hog, x_test)

Xlbp_train = batch_extract(extract_lbp, x_train)
Xlbp_test = batch_extract(extract_lbp, x_test)

In [8]:
X_train = np.concatenate([Xraw_train, Xhist_train, Xhog_train, Xlbp_train], axis=1)
X_test  = np.concatenate([Xraw_test,  Xhist_test,  Xhog_test,  Xlbp_test], axis=1)

mean = X_train.mean(axis=0)
std = X_train.std(axis=0) + 1e-10  # avoid division by zero

X_train = (X_train - mean) / std
X_test  = (X_test  - mean) / std

In [9]:
# Your feature extraction + preprocessing steps here (batch_extract calls, concatenation, scaling)...

# Then prepare labels as before:
y_train = np.array(y_train).flatten()
y_test  = np.array(y_test).flatten()
Y_train = one_hot(y_train, num_classes)
Y_test  = one_hot(y_test, num_classes)

# Train with the processed features
W, b = train(X_train, Y_train, X_test, Y_test,
             lr=1e-2, reg=1e-3,
             batch_size=512, epochs=30)


Epoch 1/30 — loss: 1.3658, train_acc: 0.570, val_acc: 0.546
Epoch 5/30 — loss: 1.0555, train_acc: 0.651, val_acc: 0.608
Epoch 10/30 — loss: 1.0408, train_acc: 0.682, val_acc: 0.621
Epoch 15/30 — loss: 0.8835, train_acc: 0.694, val_acc: 0.622
Epoch 20/30 — loss: 0.9000, train_acc: 0.704, val_acc: 0.621
Epoch 25/30 — loss: 0.8704, train_acc: 0.710, val_acc: 0.624
Epoch 30/30 — loss: 0.9066, train_acc: 0.716, val_acc: 0.623


## Just HOG

In [None]:
# Load data
x_train, y_train = get_all_data()
x_test, y_test = get_test_data()

# Extract HOG features only
X_train = batch_extract(extract_hog, x_train)
X_test  = batch_extract(extract_hog, x_test)

# Normalize features
mean = X_train.mean(axis=0)
std = X_train.std(axis=0) + 1e-10  # avoid division by zero

X_train = (X_train - mean) / std
X_test  = (X_test  - mean) / std

# Prepare labels
y_train = np.array(y_train).flatten()
y_test  = np.array(y_test).flatten()

num_classes = 10  # CIFAR-10 classes

Y_train = one_hot(y_train, num_classes)
Y_test  = one_hot(y_test, num_classes)

# Define train and other functions (assumed imported or defined elsewhere)
W, b = train(X_train, Y_train, X_test, Y_test, lr=1e-2, reg=1e-3, batch_size=512, epochs=30)


Epoch 1/30 — loss: 1.7308, train_acc: 0.456, val_acc: 0.450
Epoch 5/30 — loss: 1.4953, train_acc: 0.504, val_acc: 0.496
Epoch 10/30 — loss: 1.4324, train_acc: 0.516, val_acc: 0.506
Epoch 15/30 — loss: 1.3030, train_acc: 0.521, val_acc: 0.514
Epoch 20/30 — loss: 1.5365, train_acc: 0.524, val_acc: 0.514
Epoch 25/30 — loss: 1.3289, train_acc: 0.526, val_acc: 0.516
Epoch 30/30 — loss: 1.3856, train_acc: 0.528, val_acc: 0.516
