<a href="https://colab.research.google.com/github/divya-r-kamat/MnistLite-8k/blob/main/MnistLite_8k_modelrun.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## MNist Neural Network

The goal is to achieve 99.4% validation/test accuracy consistently, with less than 15 epochs and 8k parameters



In [44]:
!rm -rf /content/MnistLite-8k/

In [45]:
%cd /content

/content


In [46]:
!git clone https://github.com/divya-r-kamat/MnistLite-8k.git

Cloning into 'MnistLite-8k'...
remote: Enumerating objects: 45, done.[K
remote: Counting objects: 100% (45/45), done.[K
remote: Compressing objects: 100% (44/44), done.[K
remote: Total 45 (delta 21), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (45/45), 340.50 KiB | 1.84 MiB/s, done.
Resolving deltas: 100% (21/21), done.


In [47]:
%cd MnistLite-8k/

/content/MnistLite-8k


## Model 1

### Target:
- Get the set-up right
- Set Transforms
- Set Data Loader
- Set Basic Working Code
- Set Basic Training & Test Loop
- Get the basic skeleton right
### Results:
- Parameters: 26.5k
- Best Training Accuracy: 99.64%
- Best Test Accuracy: 99.13%
### Analysis
- In the initial epochs, the model quickly jumps from ~62% test accuracy (Epoch 1) to ~97.74% (Epoch 2) and ~98%+ by Epoch 3–4. This shows the network is learning MNIST features very efficiently even with a small parameter count.
- Training accuracy steadily rises and exceeds 99% around Epoch 10, while test accuracy stabilizes between 98.8–99.1%.
- No major overfitting observed — the gap between training and test accuracy remains within ~0.5%, which is acceptable.
- The model is relatively lightweight (26.5k params) compared to typical CNNs on MNIST, yet it achieves strong performance close to larger networks.
Slight oscillations in test accuracy after Epoch 10 (e.g., 99.06 → 98.85 → 98.97 → 99.13) are normal variance, not a sign of instability.
- One-off spike in training loss at Epoch 8 (0.2567) despite high accuracy could be due to noisy batch or optimizer behavior.
E- arly training shows very fast convergence. Performance plateaus around 99%, suggesting further gains may require architectural changes or regularization.

In [48]:
# !python train.py --model model1.py --optimizer optimizer --scheduler scheduler --train_transforms train_transforms --test_transforms test_transforms

# !python train.py \
#   --model model1.py \
#   --optimizer "optim.SGD(model.parameters(), lr=0.01, momentum=0.9)" \
#   --scheduler "optim.lr_scheduler.StepLR(optimizer, step_size=15, gamma=0.1)" \
#   --train_transforms train_transforms \
#   --test_transforms test_transforms


!python train.py \
  --model model1.py \
  --optimizer "optim.SGD(model.parameters(), lr=0.01, momentum=0.9)" \
  --scheduler "optim.lr_scheduler.StepLR(optimizer, step_size=15, gamma=0.1)" \
  --train_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])" \
  --test_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])"

✅ Loaded model class 'Net' from model1.py
CUDA Available? True

Download the train and test dataset, with transforms....
100% 9.91M/9.91M [00:00<00:00, 20.6MB/s]
100% 28.9k/28.9k [00:00<00:00, 491kB/s]
100% 1.65M/1.65M [00:00<00:00, 4.61MB/s]
100% 4.54k/4.54k [00:00<00:00, 15.2MB/s]

Model Training....

Epoch 1
Train loss=0.1072 batch_id=468 Accuracy=62.38: 100% 469/469 [00:12<00:00, 38.92it/s]
Test set: Average loss: 0.1845, Accuracy: 9393/10000 (93.93%)

Epoch 2
Train loss=0.1953 batch_id=468 Accuracy=96.42: 100% 469/469 [00:12<00:00, 38.63it/s]
Test set: Average loss: 0.0693, Accuracy: 9774/10000 (97.74%)

Epoch 3
Train loss=0.0355 batch_id=468 Accuracy=97.80: 100% 469/469 [00:12<00:00, 38.51it/s]
Test set: Average loss: 0.0575, Accuracy: 9804/10000 (98.04%)

Epoch 4
Train loss=0.0486 batch_id=468 Accuracy=98.38: 100% 469/469 [00:12<00:00, 38.97it/s]
Test set: Average loss: 0.0494, Accuracy: 9838/10000 (98.38%)

Epoch 5
Train loss=0.0223 batch_id=468 Accuracy=98.69: 100% 469/469 [00

## Model2

### Target:
- Make the model lighter (reduce parameter count, stay < 8k)
- Use Global Average Pooling (GAP) → reduces overfitting, removes need for large dense layer
- Add BatchNorm after convs → stabilize training, faster convergence, better generalization
### Results:
- Parameters: 5.6k
- Best Training Accuracy: 99.57%
- Best Test Accuracy: 99.13%
### Analysis
- lighter model + GAP + BN gave better stability, fewer params, and accuracy that scales above 99% by 13–15 epochs.
- still far of the 99.4% milestone (peaked at ~99.13%).
- BatchNorm and GAP contribute to stable training and reduced overfitting — the test accuracy remains close to training accuracy throughout..
- However, the goal of consistently hitting 99.4% is not yet met — performance plateaus around 99.1–99.2%..

In [49]:
!python train.py \
  --model model2.py \
  --optimizer "optim.SGD(model.parameters(), lr=0.01, momentum=0.9)" \
  --scheduler "optim.lr_scheduler.StepLR(optimizer, step_size=15, gamma=0.1)" \
  --train_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])" \
  --test_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])"

✅ Loaded model class 'Net' from model2.py
CUDA Available? True

Download the train and test dataset, with transforms....

Model Training....

Epoch 1
Train loss=0.0637 batch_id=468 Accuracy=91.11: 100% 469/469 [00:12<00:00, 37.89it/s]
Test set: Average loss: 0.0782, Accuracy: 9796/10000 (97.96%)

Epoch 2
Train loss=0.0363 batch_id=468 Accuracy=98.12: 100% 469/469 [00:12<00:00, 38.95it/s]
Test set: Average loss: 0.0612, Accuracy: 9843/10000 (98.43%)

Epoch 3
Train loss=0.0202 batch_id=468 Accuracy=98.53: 100% 469/469 [00:12<00:00, 38.82it/s]
Test set: Average loss: 0.0428, Accuracy: 9861/10000 (98.61%)

Epoch 4
Train loss=0.0086 batch_id=468 Accuracy=98.81: 100% 469/469 [00:12<00:00, 38.22it/s]
Test set: Average loss: 0.0356, Accuracy: 9885/10000 (98.85%)

Epoch 5
Train loss=0.0729 batch_id=468 Accuracy=98.94: 100% 469/469 [00:12<00:00, 38.02it/s]
Test set: Average loss: 0.0353, Accuracy: 9885/10000 (98.85%)

Epoch 6
Train loss=0.0816 batch_id=468 Accuracy=99.08: 100% 469/469 [00:12<00:

## Model3

### Target:

- Add Dropout to reduce overfitting and improve generalization
- Increase parameter count from ~5k to ~7.8k for better capacity
- Use CosineAnnealingLR scheduler for smoother learning rate decay
- Use SGD with Nesterov momentum and weight decay for better convergence
- Apply data augmentation: RandomRotation, RandomAffine, Shear, Translation
- Retain BatchNorm and GAP for stable training and compact outputgeneralization
### Results:
- Parameters: 7,864
- Best Training Accuracy: 99.36%
- Best Test Accuracy: 99.42%
- Epochs Run: 15
- Achieved 99.4%+ test accuracy consistently by Epoch 11–15
### Analysis
- The model shows strong early performance, reaching 98.27% test accuracy in Epoch 1 and 99.16% by Epoch 3, indicating fast and stable convergence.
- Dropout layers help maintain generalization, with minimal overfitting observed — test accuracy closely tracks training accuracy throughout.
- CosineAnnealingLR scheduler contributes to smooth learning rate decay, helping the model avoid sharp drops or spikes in performance.
- SGD with Nesterov momentum and weight decay improves optimization dynamics, especially in later epochs.
- Data augmentation adds robustness, helping the model generalize better to unseen data and pushing test accuracy past 99.4%.
- The model consistently hits 99.4%+ test accuracy from Epoch 9 onward, meeting the target goal within 15 epochs.
- Despite increasing parameters to ~7.8k, the model remains lightweight and efficient, balancing capacity and generalization well.
- Final test accuracy of 99.42% with stable loss (~0.018) confirms that the model is well-regularized and optimized.

In [50]:
!python train.py \
  --model model3.py \
  --optimizer "optim.SGD(model.parameters(), lr=0.03, momentum=0.9,nesterov=True, weight_decay=1e-4)" \
  --scheduler "optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=15, eta_min=0.0005)" \
  --train_transforms "transforms.Compose([transforms.RandomRotation(3), transforms.RandomAffine(0, translate=(0.05, 0.05), shear=5), transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])" \
  --test_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])"

✅ Loaded model class 'Net' from model3.py
CUDA Available? True

Download the train and test dataset, with transforms....

Model Training....

Epoch 1
Train loss=0.0953 batch_id=468 Accuracy=93.35: 100% 469/469 [00:20<00:00, 23.31it/s]
Test set: Average loss: 0.0573, Accuracy: 9827/10000 (98.27%)

Epoch 2
Train loss=0.0164 batch_id=468 Accuracy=97.98: 100% 469/469 [00:21<00:00, 21.79it/s]
Test set: Average loss: 0.0383, Accuracy: 9875/10000 (98.75%)

Epoch 3
Train loss=0.0280 batch_id=468 Accuracy=98.38: 100% 469/469 [00:21<00:00, 21.72it/s]
Test set: Average loss: 0.0275, Accuracy: 9916/10000 (99.16%)

Epoch 4
Train loss=0.0396 batch_id=468 Accuracy=98.62: 100% 469/469 [00:21<00:00, 22.23it/s]
Test set: Average loss: 0.0293, Accuracy: 9902/10000 (99.02%)

Epoch 5
Train loss=0.0102 batch_id=468 Accuracy=98.75: 100% 469/469 [00:19<00:00, 23.73it/s]
Test set: Average loss: 0.0222, Accuracy: 9919/10000 (99.19%)

Epoch 6
Train loss=0.0113 batch_id=468 Accuracy=98.85: 100% 469/469 [00:21<00: