## Cifar10 Neural Network

The goal is to achieve 99.4% validation/test accuracy consistently, with less than 15 epochs and 8k parameters



In [17]:
!rm -rf /content/Cifar10/

In [18]:
%cd /content

/content


In [19]:
!git clone https://github.com/divya-r-kamat/Cifar10.git

Cloning into 'Cifar10'...
remote: Enumerating objects: 30, done.[K
remote: Counting objects: 100% (30/30), done.[K
remote: Compressing objects: 100% (30/30), done.[K
remote: Total 30 (delta 15), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (30/30), 411.31 KiB | 1.68 MiB/s, done.
Resolving deltas: 100% (15/15), done.


In [20]:
%cd Cifar10/

/content/Cifar10


## Model 1

### Target:
- Get the set-up right
- Set Transforms
- Set Data Loader
- Set Basic Working Code
- Set Basic Training & Test Loop
- Get the basic skeleton right, with 4 Convolution blocks , No maxpooling and receptive field of > 44

### Results:
- Parameters: 1,605,600
- Best Training Accuracy: 99.9%
- Best Test Accuracy: 80%

### Analysis
- Model is clearly overfitting
- Also model parameters can be reduced further

In [None]:
# !python train.py --model model1.py --optimizer optimizer --scheduler scheduler --train_transforms train_transforms --test_transforms test_transforms

# !python train.py \
#   --model model1.py \
#   --optimizer "optim.SGD(model.parameters(), lr=0.01, momentum=0.9)" \
#   --scheduler "optim.lr_scheduler.StepLR(optimizer, step_size=15, gamma=0.1)" \
#   --train_transforms train_transforms \
#   --test_transforms test_transforms


!python train.py \
  --model model1.py \
  --optimizer "optim.SGD(model.parameters(), lr=0.01, momentum=0.9)" \
  --scheduler "optim.lr_scheduler.StepLR(optimizer, step_size=15, gamma=0.1)" \
  --train_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4919, 0.4829, 0.4467), (0.2444, 0.2408, 0.2582))])" \
  --test_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4919, 0.4829, 0.4467), (0.2444, 0.2408, 0.2582))])"

✅ Loaded model class 'CIFAR10CustomNet' from model1.py
CUDA Available? True

Download the train and test dataset, with transforms....
100% 170M/170M [00:04<00:00, 42.4MB/s]

Model Training....

Epoch 1
Train loss=0.9923 batch_id=390 Accuracy=49.17: 100% 391/391 [00:14<00:00, 27.47it/s]
Test set: Average loss: 1.2222, Accuracy: 5701/10000 (57.01%)

Epoch 2
Train loss=0.9642 batch_id=390 Accuracy=66.98: 100% 391/391 [00:13<00:00, 29.77it/s]
Test set: Average loss: 0.9293, Accuracy: 6715/10000 (67.15%)

Epoch 3
Train loss=0.8995 batch_id=390 Accuracy=74.28: 100% 391/391 [00:13<00:00, 29.59it/s]
Test set: Average loss: 0.7599, Accuracy: 7361/10000 (73.61%)

Epoch 4
Train loss=0.4666 batch_id=390 Accuracy=79.37: 100% 391/391 [00:13<00:00, 29.75it/s]
Test set: Average loss: 0.8419, Accuracy: 7156/10000 (71.56%)

Epoch 5
Train loss=0.3889 batch_id=390 Accuracy=83.18: 100% 391/391 [00:13<00:00, 29.86it/s]
Test set: Average loss: 0.8037, Accuracy: 7387/10000 (73.87%)

Epoch 6
Train loss=0.4388 

## Model2

### Target:
- Reduce the model parameters by reducing the number of channels and Added depth wise convolution
- added dialated convolution for increased receptive field
### Results:
- Parameters: 97,264
- Best Training Accuracy: 72%
- Best Test Accuracy: 67%
### Analysis
- Train/test accuracy reduced after reducing the number of parameters
- still see some overfitting

In [None]:
!python train.py \
  --model model2.py \
  --optimizer "optim.SGD(model.parameters(), lr=0.01, momentum=0.9)" \
  --scheduler "optim.lr_scheduler.StepLR(optimizer, step_size=15, gamma=0.1)" \
  --train_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4919, 0.4829, 0.4467), (0.2444, 0.2408, 0.2582))])" \
  --test_transforms "transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4919, 0.4829, 0.4467), (0.2444, 0.2408, 0.2582))])"

✅ Loaded model class 'CIFAR10' from model2.py
CUDA Available? True

Download the train and test dataset, with transforms....
100% 170M/170M [00:03<00:00, 42.8MB/s]

Model Training....

Epoch 1
Train loss=1.4954 batch_id=390 Accuracy=28.32: 100% 391/391 [00:13<00:00, 28.62it/s]
Test set: Average loss: 1.5520, Accuracy: 4264/10000 (42.64%)

Epoch 2
Train loss=1.2150 batch_id=390 Accuracy=51.61: 100% 391/391 [00:13<00:00, 29.56it/s]
Test set: Average loss: 1.2355, Accuracy: 5731/10000 (57.31%)

Epoch 3
Train loss=0.9533 batch_id=390 Accuracy=61.74: 100% 391/391 [00:13<00:00, 29.52it/s]
Test set: Average loss: 1.0080, Accuracy: 6472/10000 (64.72%)

Epoch 4
Train loss=0.8682 batch_id=390 Accuracy=68.16: 100% 391/391 [00:13<00:00, 29.15it/s]
Test set: Average loss: 0.8727, Accuracy: 6934/10000 (69.34%)

Epoch 5
Train loss=0.7018 batch_id=390 Accuracy=72.30: 100% 391/391 [00:13<00:00, 29.25it/s]
Test set: Average loss: 0.8102, Accuracy: 7194/10000 (71.94%)

Epoch 6
Train loss=0.9820 batch_id=

## Model3

### Target:

- Added data augmentation - HorizontalFlip, ShiftScaleRotate, ColorJitter, CoarseDropout (Cutout)

### Results:
- Parameters: 97,264
- Best Training Accuracy: 84%
- Best Test Accuracy: 85.4%
- Epochs Run: 50
### Analysis
- Starting performance:

    - Epoch 1 test accuracy = 47.8%, which is higher than raw baselines (~35–40%) → augmentations helped the model generalize even in the very first epoch.

- Steady improvements:

    - By Epoch 10: ~80% accuracy.
    - By Epoch 20: ~81–82%, but with some plateauing.
    - By Epoch 30–40: Accuracy stabilized around 83–85%

- helped prevent overfitting (train accuracy was high, but test accuracy tracked well, no big gap)
- Albumentations helped the model learn more invariant and robust features, leading to higher test accuracy (~85.4%) and stronger generalization compared to training without augmentation

In [22]:
!python train.py \
  --model model3.py \
  --optimizer "optim.SGD(model.parameters(), lr=0.03, momentum=0.9,nesterov=True, weight_decay=1e-4)" \
  --scheduler "optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=15, eta_min=0.0005)" \
  --train_transforms """A.Compose([A.HorizontalFlip(p=horizontalflip_prob),\
    A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=rotate_limit, p=shiftscalerotate_prob),\
    A.CoarseDropout(\
        max_holes=num_holes, min_holes=1, \
        max_height=16, max_width=16, \
        min_height=16, min_width=16,\
        p=cutout_prob,\
        fill_value=tuple([x * 255.0 for x in mean])),\
    A.ColorJitter(p=0.25, brightness=0.3, contrast=0.3, saturation=0.3, hue=0.2),\
    A.ToGray(p=0.15),\
    A.Normalize(mean=mean, std=std, always_apply=True),\
    ToTensorV2()\
    ])""" \
  --test_transforms "A.Compose([A.Normalize(mean=mean, std=std, always_apply=True),ToTensorV2()])"\
  --epochs "50"

✅ Loaded model class 'CIFAR10' from model3.py
CUDA Available? True
  original_init(self, **validated_kwargs)

Download the train and test dataset, with transforms....

Model Training....

Epoch 1
Train loss=1.3733 batch_id=390 Accuracy=36.28: 100% 391/391 [00:22<00:00, 17.39it/s]
Test set: Average loss: 1.4182, Accuracy: 4784/10000 (47.84%)

Epoch 2
Train loss=1.0619 batch_id=390 Accuracy=53.56: 100% 391/391 [00:21<00:00, 17.82it/s]
Test set: Average loss: 1.1051, Accuracy: 6072/10000 (60.72%)

Epoch 3
Train loss=1.0864 batch_id=390 Accuracy=61.09: 100% 391/391 [00:22<00:00, 17.69it/s]
Test set: Average loss: 0.9158, Accuracy: 6668/10000 (66.68%)

Epoch 4
Train loss=1.0844 batch_id=390 Accuracy=66.29: 100% 391/391 [00:21<00:00, 18.30it/s]
Test set: Average loss: 0.8054, Accuracy: 7171/10000 (71.71%)

Epoch 5
Train loss=0.8940 batch_id=390 Accuracy=68.69: 100% 391/391 [00:21<00:00, 18.57it/s]
Test set: Average loss: 0.7619, Accuracy: 7283/10000 (72.83%)

Epoch 6
Train loss=0.8382 batch_