# CS454-554 Homework 4 Report: KMNIST Classification

**Due:** May 14, 2025, 23:00
**Author:** DEMBA SOW

---
## 1. Abstract
This report compares multiple CNN architectures on the KMNIST handwritten character classification task. Models include a baseline Linear classifier, MLP, Simple CNN, and advanced variants: CNN-A (BatchNorm + Dropout), CNN-B (Global Pooling), and CNN-C (Residual Blocks). All models are implemented using PyTorch. CNN-A achieved the best performance with 95.7% test accuracy. Experiments highlight the benefits of normalization, residual connections, and dropout for improving generalization in convolutional neural networks.

2. **Introduction**
3. **Baseline Models**

   * Linear model
   * Multilayer Perceptron (40 hidden units)
   * Simple CNN
4. **Extended CNN Variants**
   * CNN‑A (Deeper + BatchNorm + Dropout)
   * CNN‑B (Wider Filters + Global Pooling)
   * CNN‑C (Residual Connections)

Each section includes:

* Model architecture description
* Placeholder for code snippet insertion
* Training & validation loss/accuracy curves
* Numerical results table
* Per‑model analysis

5. **Hyperparameter Tuning & Early Stopping**
6. **Comparative Analysis & Discussion**
7. **Submission Details**



---

## 2. Introduction

The objective of this assignment is to implement and compare six neural network models of increasing complexity on the KMNIST dataset (10-class, 28×28 grayscale Japanese characters) using PyTorch. We evaluate:


## 3. Environment & Data Loading

**Description:** Setup device, transforms, datasets, data loaders, and training/evaluation utilities.

**Code Placeholder:**

```python
%pip install -r requirements.txt

import torch, torch.nn as nn, torch.optim as optim
from torchvision.datasets import KMNIST
from torchvision import transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

# Device
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyperparameters
BATCH_SIZE = 64
LR = 0.001
EPOCHS_BASELINE = 20
EPOCHS_EXTENDED = 50
PATIENCE = 3

# Transforms & DataLoaders
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])
train_ds = KMNIST(root='./data', train=True,  transform=transform, download=True)
test_ds  = KMNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(
    train_ds,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=4,        # <— parallel data loading
    pin_memory=True       # <— faster transfers to GPU
)
test_loader = DataLoader(
    test_ds,
    batch_size=BATCH_SIZE,
    num_workers=4,
    pin_memory=True
)

```

**Analysis:** Normalization to mean=0.5, std=0.5 stabilizes training; batch size 64 offers a balance of gradient noise and efficiency.

---

## 4. Baseline Models

### 4.1 Linear Model

* **Architecture:** Single fully‑connected layer (784→10)
* **Code Placeholder:**
```python
# LinearModel class definition
class LinearModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(28*28, 10)
    def forward(self, x):
        x = x.view(x.size(0), -1)
        return self.fc(x)
```

* **Training Setup:** Adam (lr=1e-3), CrossEntropyLoss, epochs=20, batch=64

* **Learning Curves:** 
![Linear Accuracy & Loss Curves](./output/linear-model.png)


* **Results:**

  | Epoch | Train Loss | Test Loss | Train Acc | Test Acc |
  | :---: | :--------: | :-------: | :-------: | :------: |
  |   20  |    0.567   |   1.059   |   82.9%   |   68.8%  |
* **Analysis:** Underfits due to limited capacity; high bias and \~14% train–test gap.

### 4.2 Multilayer Perceptron (MLP40)

* **Architecture:** Flatten → FC(784→40) → ReLU → FC(40→10)
* **Code Placeholder:**

```python
# MLP40 class definition
class MLP40(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 40),
            nn.ReLU(),
            nn.Linear(40, 10)
        )
    def forward(self, x):
        return self.net(x)
```

* **Training Setup:** same hyperparameters as Linear
* **Learning Curves:** 

![MLP40 Accuracy & Loss Curves](./output/mlp-40-model.png)

* **Results:**

  | Epoch | Train Loss | Test Loss | Train Acc | Test Acc |
  | :---: | :--------: | :-------: | :-------: | :------: |
  |   20  |    0.006   |   0.450   |   99.8%   |   94.8%  |
* **Analysis:** Achieves high capacity but overfits (\~5% gap); ignores spatial structure.

### 4.3 Simple CNN

* **Architecture:**

  1. Conv(1→16,3) → ReLU → MaxPool(2)
  2. Conv(16→32,3) → ReLU → MaxPool(2)
  3. FC(32×7×7→64) → ReLU → FC(64→10)
* **Code Placeholder:**

```python
# SimpleCNN class definition
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 16, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),  # 16×14×14
            nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2)  # 32×7×7
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32*7*7, 64), nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)
```

* **Training Setup:** same hyperparameters
* **Learning Curves:** 
![Simple CNN Accuracy & Loss Curves](./output/simple-cnn-model.png)

* **Results:**

  | Epoch | Train Loss | Test Loss | Train Acc | Test Acc |
  | :---: | :--------: | :-------: | :-------: | :------: |
  |   20  |    0.001   |   0.527   |   99.9%   |   94.6%  |
* **Analysis:** Spatial inductive bias yields strong generalization, matching MLP40 with less overfitting.

---

## 5. Extended CNN Variants

All three models trained with early stopping (patience=3), lr=5e-4, batch size=128, max epochs=50.

### 5.1 CNN‑A (Deep + BN + Dropout)

* **Architecture Description:** Two conv blocks with BatchNorm, ReLU, pooling, dropout; FC(128)+dropout
* **Code Placeholder:**

```python
# CNN_A class definition
class CNN_A(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1,32,3,padding=1), nn.BatchNorm2d(32), nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,3,padding=1), nn.BatchNorm2d(64), nn.ReLU(),
            nn.MaxPool2d(2), nn.Dropout(0.25)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64*7*7,128), nn.ReLU(), nn.Dropout(0.5),
            nn.Linear(128,10)
        )
    def forward(self,x):
        x = self.features(x)
        return self.classifier(x)
```

* **Learning Curves:** 

![CNN-A Loss & Accuracy Curves](./output/cnn_a-model.png)

* **Results:** Stop epoch \~18, Test Acc = 95.68%
* **Analysis:** BatchNorm accelerates convergence; dropout controls overfitting; balanced capacity.

### 5.2 CNN‑B (Wide Filters + Global Pool)

* **Architecture Description:** Two 5×5 conv layers, one 3×3 conv, global average pooling
* **Code Placeholder:**

```python
# CNN_B class definition
class CNN_B(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1,16,5,padding=2), nn.ReLU(),
            nn.Conv2d(16,32,5,padding=2), nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,3,padding=1), nn.ReLU(),
            nn.AdaptiveAvgPool2d(1)
        )
        self.fc = nn.Linear(64,10)
    def forward(self,x):
        x = self.features(x).view(x.size(0), -1)
        return self.fc(x)

```

* **Learning Curves:** 
![CNN-B Loss & Accuracy Curves](./output/cnn_b-model.png)

* **Results:** Stop epoch \~23, Test Acc = 84.02%
* **Analysis:** Lacks normalization; over-smoothing by global pooling; lower capacity.

### 5.3 CNN‑C (Residual Blocks)

* **Architecture Description:** Two ResidualBlocks + pooling; FC(64) → output
* **Code Placeholder:**

```python
# ResidualBlock and CNN_C class definitions

class ResidualBlock(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch,out_ch,3,padding=1), nn.ReLU(),
            nn.Conv2d(out_ch,out_ch,3,padding=1)
        )
        self.skip = nn.Conv2d(in_ch,out_ch,1) if in_ch!=out_ch else nn.Identity()
    def forward(self,x):
        return nn.ReLU()(self.conv(x) + self.skip(x))

class CNN_C(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = ResidualBlock(1,16)
        self.pool1  = nn.MaxPool2d(2)
        self.layer2 = ResidualBlock(16,32)
        self.pool2  = nn.MaxPool2d(2)
        self.classifier = nn.Sequential(
            nn.Flatten(), nn.Linear(32*7*7,64), nn.ReLU(), nn.Linear(64,10)
        )
    def forward(self,x):
        x = self.pool1(self.layer1(x))
        x = self.pool2(self.layer2(x))
        return self.classifier(x)

```

* **Learning Curves:** 
![CNN-C Loss & Accuracy Curves](./output/cnn_c-model.png)


* **Results:** Stop epoch \~8, Test Acc = 93.61%
* **Analysis:** Residual connections speed training; strong performance with moderate capacity.

### 5.4 Training Loop with Early Stopping

* **Code Placeholder:**

```python
# train_es function and model instantiation/training loop
def train_with_early_stopping(model, loaders, criterion, optimizer, epochs, patience):
    best_val = float('inf')
    counter = 0
    # Added 'train_acc' key
    history = {
        'train_loss': [],
        'train_acc': [],    # track train accuracy
        'val_loss': [],
        'val_acc': []
    }

    print(f"Training {model.__class__.__name__} with early stopping...")
    for ep in range(1, epochs+1):
        # ----- Training Phase -----
        model.train()
        train_loss = 0
        total_correct = 0
        for X, y in loaders['train']:
            X, y = X.to(DEVICE), y.to(DEVICE)
            optimizer.zero_grad()
            outputs = model(X)
            loss = criterion(outputs, y)
            loss.backward()
            optimizer.step()

            train_loss    += loss.item() * X.size(0)
            total_correct += (outputs.argmax(1) == y).sum().item()

        train_loss /= len(loaders['train'].dataset)
        train_acc   = total_correct / len(loaders['train'].dataset)

        # ----- Validation Phase -----
        model.eval()
        val_loss = 0
        correct  = 0
        with torch.no_grad():
            for X, y in loaders['val']:
                X, y = X.to(DEVICE), y.to(DEVICE)
                outputs = model(X)
                val_loss += criterion(outputs, y).item() * X.size(0)
                correct  += (outputs.argmax(1) == y).sum().item()

        val_loss /= len(loaders['val'].dataset)
        val_acc   = correct / len(loaders['val'].dataset)

        # ----- Record History -----
        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)   # store train accuracy
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)

        print(f"Epoch {ep}: "
            f"train_loss={train_loss:.4f}, train_acc={train_acc:.4f}, "
            f"val_loss={val_loss:.4f}, val_acc={val_acc:.4f}")

        # ----- Early Stopping -----
        if val_loss < best_val:
            best_val = val_loss
            counter = 0
            torch.save(model.state_dict(), 'best.pt')
        else:
            counter += 1
            if counter >= patience:
                print("Early stopping triggered.")
                break

    # Load best checkpoint
    model.load_state_dict(torch.load('best.pt'))
    return history

```

---

## 6. Final Comparative Analysis

| Model     | Test Acc  | Epochs | Remarks                                  |
| --------- | --------- | ------ | ---------------------------------------- |
| Linear    | 68.8%     | 20     | Underfits                                |
| MLP40     | 94.8%     | 20     | Overfits; ignores spatial structure      |
| SimpleCNN | 94.6%     | 20     | Good baseline                            |
| **CNN A** | **95.7%** | 18     | Best: BN + Dropout, balanced depth/width |
| CNN C     | 93.6%     | 8      | Fast converge with residual blocks       |
| CNN B     | 84.0%     | 23     | Lacks normalization; underperforms       |

**General Conclusion:**
After evaluating six models of increasing complexity, **CNN_A** emerges as the clear top performer, achieving **95.7%** test accuracy thanks to batch normalization, dropout regularization, and sufficient convolutional depth. Its stable training/validation curves and resistance to overfitting make it the recommended architecture for KMNIST.



**Future Improvements:**

* **Data Augmentation:** rotations/shifts/elastic to boost invariance.
* **LR Scheduling:** step decay or cosine annealing for smoother convergence.
* **Deeper Residual Stacks:** extend CNN\_C for higher capacity.
* **Regularization:** weight decay, label smoothing to further reduce overfitting.


----


## 7. Submission Details

* **Report:** `hw4_report.pdf`
* **Source Code:** `hw4_kmnist.py`
* **Requirements:** `requirements.txt`
* **Environment:** Python >= 3.12, PyTorch >= 1.12.0, torchvision >= 0.13.0
* **Dataset:** KMNIST (downloaded automatically)
* **Hardware:** NVIDIA GeForce RTX 3060
* **Training Time:** \~1 hours for all models

