# CS454-554 Homework 4 Report: KMNIST Classification

**Due:** May 14, 2025, 23:00
**Author:** **DEMBA SOW**

---

## 1. Introduction

The objective of this assignment is to implement and compare multiple neural network architectures on the KMNIST dataset—a 10-class classification problem of 28×28 grayscale Japanese character images—using PyTorch. We evaluate:

1. **Baseline Models**

   * Linear
   * Multilayer Perceptron (40 hidden units)
   * Simple CNN
2. **Extended CNN Variants**

   * CNN‑A (Deeper + BatchNorm + Dropout)
   * CNN‑B (Wider Filters + Global Pooling)
   * CNN‑C (Residual Connections)
3. **Hyperparameter Tuning & Early Stopping**
4. **Comparison & Discussion**

---

## 2. Environment & Data Loading

```python
import torch
from torch import nn, optim
from torchvision.datasets import KMNIST
from torchvision import transforms
from torch.utils.data import DataLoader

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_ds = KMNIST('./data', train=True,  transform=transform, download=True)
test_ds  = KMNIST('./data', train=False, transform=transform, download=True)

train_loader = DataLoader(train_ds, batch_size=64,  shuffle=True)
test_loader  = DataLoader(test_ds,  batch_size=64)
```

---

## 3. Model Architectures

### 3.1. Linear Model

* Single `nn.Linear(28×28, 10)` layer

```	python
class LinearModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(28*28, 10)
    def forward(self, x):
        x = x.view(x.size(0), -1)
        return self.fc(x)
```

### 3.2. MLP (40 Hidden Units)

* Flatten → Linear(784→40) → ReLU → Linear(40→10)
```python
class MLP40(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 40),
            nn.ReLU(),
            nn.Linear(40, 10)
        )
    def forward(self, x):
        return self.net(x)
```

### 3.3. Simple CNN

* Conv(1→16,3) → ReLU → MaxPool(2)
* Conv(16→32,3) → ReLU → MaxPool(2)
* FC(32×7×7→64) → ReLU → FC(64→10)

```python
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 16, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),  # 16×14×14
            nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2)  # 32×7×7
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32*7*7, 64), nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)
```

---

## 4. Training & Evaluation

* **Loss:** `nn.CrossEntropyLoss`
* **Optimizer:** `optim.Adam` (LR=0.001)
* **Batch size:** 64
* **Epochs:** 20 (Baseline), 50 (Extended)
* **Early Stopping:** Monitor validation loss, stop if no improvement for 3 epochs

---

## 5. Baseline Results

### 5.1. Accuracy & Loss Curves

* **Linear:** 
![Linear Accuracy & Loss Curves](./output/linear-model.png)

* **MLP40:** 
![MLP40 Accuracy & Loss Curves](./output/mlp-40-model.png)

* **Simple CNN:** 
![Simple CNN Accuracy & Loss Curves](./output/simple-cnn-model.png)


### 5.2. Numerical Results

| Model      | Epoch | Train Loss | Train Acc | Test Loss | Test Acc |
| ---------- | :---: | :--------: | :-------: | :-------: | :------: |
| Linear     |   10  |    0.560   |   0.835   |   1.036   |   0.694  |
| MLP40      |   10  |    0.159   |   0.954   |   0.545   |   0.847  |
| Simple CNN |   10  |    0.016   |   0.995   |   0.307   |   0.940  |


> **Observations:**
>
> * The Linear model underfits, plateauing at \~69% accuracy.
> * MLP40 improves to \~84.7% by adding non-linearity.
> * Simple CNN achieves \~94%, leveraging spatial features.

---

## 6. Extended CNN Variants

### 6.1. Loss & Accuracy Curves

* **CNN‑A:** 
![CNN-A Loss & Accuracy Curves](./output/cnn-a-model.png)

* **CNN‑B:**
![CNN-B Loss & Accuracy Curves](./output/cnn-b-model.png)

* **CNN‑C:** 
![CNN-C Loss & Accuracy Curves](./output/cnn-c-model.png)

### 6.2. Numerical Comparison

| Model | Best Epoch | Test Loss | Test Acc   |
| ----- | ---------- | --------- | ---------- |
| CNN‑A | 14         | \~0.169   | **0.9566** |
| CNN‑B | 7          | \~0.623   | 0.8206     |
| CNN‑C | 6          | \~0.252   | 0.9347     |


---

## 7. Discussion & Conclusion

Extended CNN models yielded significant performance improvements over the baseline:

* **CNN‑A**, with additional depth, BatchNorm, and Dropout, achieved the highest test accuracy of **95.66%**. Its regularization and normalization helped generalization.
* **CNN‑B**, despite wider filters and global pooling, plateaued early, likely due to over-smoothing from global pooling.
* **CNN‑C** used residual connections and performed strongly at **93.47%**, showing effectiveness of skip connections for stability.

Compared to Simple CNN (94%), both CNN‑A and CNN‑C showed improvements, with CNN‑A emerging as the best model.

**Best Model: CNN‑A** with hyperparameters (lr=0.0005, batch=128), early stopped at epoch 14.

---

## 8. Submission Details

* **Report:** `hw4_report.pdf`
* **Source Code:** `hw4_kmnist.py`
* **Requirements:** `requirements.txt`
* **Environment:** Python >= 3.12, PyTorch >= 1.12.0, torchvision >= 0.13.0
* **Dataset:** KMNIST (downloaded automatically)
* **Hardware:** NVIDIA GeForce RTX 3060
* **Training Time:** \~1 hours for all models
* **GPU Utilization:** 100% during training
* **Memory Usage:** \~4GB
* **Batch Size:** 64 for both baseline and extended models
* **Epochs:** 20 for baseline, 50 for extended models
* **Early Stopping:** Implemented for extended models
