<a href="https://colab.research.google.com/github/abhi1021/resnet50-imagenet-1k/blob/main/train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training CIFAR-100 Model (Google Colab)
This notebook trains the WideResNet-28-10 model on CIFAR-100 (depth=28, widen_factor=10, epochs=3)

## Clone Repository

In [1]:
# Clone the repository
!git clone https://github.com/abhi1021/resnet50-imagenet-1k.git

Cloning into 'resnet50-imagenet-1k'...
remote: Enumerating objects: 85, done.[K
remote: Counting objects: 100% (85/85), done.[K
remote: Compressing objects: 100% (59/59), done.[K
remote: Total 85 (delta 37), reused 73 (delta 25), pack-reused 0 (from 0)[K
Receiving objects: 100% (85/85), 367.52 KiB | 8.96 MiB/s, done.
Resolving deltas: 100% (37/37), done.


In [2]:
# Navigate to the repository directory
%cd resnet50-imagenet-1k

/content/resnet50-imagenet-1k


In [3]:
# Install dependencies from requirements.txt
!pip install -r requirements.txt -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.4/53.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for plotille (pyproject.toml) ... [?25l[?25hdone


## Install Required Dependencies

In [4]:
# Verify torchinfo and tqdm are installed (should be in requirements.txt)
!pip list | grep -E 'torch|tqdm'

torch                                    2.8.0+cu126
torchao                                  0.10.0
torchaudio                               2.8.0+cu126
torchdata                                0.11.0
torchinfo                                1.8.0
torchsummary                             1.5.1
torchtune                                0.6.1
torchvision                              0.23.0+cu126
tqdm                                     4.67.1


## Import Libraries and Check GPU

In [5]:
import torch
import sys
from train import CIFAR100Trainer

# Check if CUDA is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
if device.type == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory Allocated: {torch.cuda.memory_allocated(0) / 1024**2:.2f} MB")
    print(f"Memory Cached: {torch.cuda.memory_reserved(0) / 1024**2:.2f} MB")
else:
    print("WARNING: Running on CPU. Training will be slower.")
    print("In Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU")

Using device: cuda
GPU: Tesla T4
Memory Allocated: 0.00 MB
Memory Cached: 0.00 MB


## Initialize and Train CIFAR-100 Model with WideResNet-28-10

In [6]:
# Create trainer with specified settings
# Configuration:
#   - Model: WideResNet-28-10 (36.5M parameters)
#   - Depth: 28
#   - Widen Factor: 10
#   - Epochs: 3
#   - Batch Size: 256 (default)
#   - Learning Rate: Warmup (0.01→0.1) + Cosine annealing
#   - MixUp augmentation: Enabled (alpha=0.2)
#   - Label smoothing: 0.1
#   - Mixed precision: Enabled
#   - Gradient clipping: 1.0
trainer = CIFAR100Trainer(
    model_name='wideresnet',
    depth=28,
    widen_factor=10,
    epochs=3
)

print("Starting CIFAR-100 Model Training with WideResNet-28-10")
print("="*70)
print(f"Configuration:")
print(f"  - Model: WideResNet-28-10 (36.5M parameters)")
print(f"  - Depth: 28")
print(f"  - Widen Factor: 10")
print(f"  - Dataset: CIFAR-100 (100 classes)")
print(f"  - Batch Size: 256")
print(f"  - Epochs: 3")
print(f"  - Optimizer: SGD (momentum=0.9, weight_decay=1e-3)")
print(f"  - Learning Rate: Warmup (0.01→0.1) + Cosine annealing")
print(f"  - MixUp: Enabled (alpha=0.2)")
print(f"  - Label Smoothing: 0.1")
print(f"  - Mixed Precision: Enabled")
print(f"  - Gradient Clipping: 1.0")
print("="*70)

✓ Using NVIDIA GPU: Tesla T4


  original_init(self, **validated_kwargs)
  A.CoarseDropout(
100%|██████████| 169M/169M [00:13<00:00, 12.4MB/s]


Starting CIFAR-100 Model Training with WideResNet-28-10
Configuration:
  - Model: WideResNet-28-10 (36.5M parameters)
  - Depth: 28
  - Widen Factor: 10
  - Dataset: CIFAR-100 (100 classes)
  - Batch Size: 256
  - Epochs: 3
  - Optimizer: SGD (momentum=0.9, weight_decay=1e-3)
  - Learning Rate: Warmup (0.01→0.1) + Cosine annealing
  - MixUp: Enabled (alpha=0.2)
  - Label Smoothing: 0.1
  - Mixed Precision: Enabled
  - Gradient Clipping: 1.0




In [7]:
# Run training
final_accuracy = trainer.run()


Training wideresnet for 3 epochs on CIFAR-100

Model Architecture Summary
Device: cuda
Model: wideresnet
Mixed Precision: True
MixUp: True (alpha=0.2)
Label Smoothing: 0.1

Model Summary:



Epoch 1 Loss=4.4725 Acc=3.84% LR=0.027908: 100%|██████████| 196/196 [01:47<00:00,  1.83it/s]



Test set: Average loss: 4.1322, Accuracy: 698/10000 (6.98%)

*** New best model! Test Accuracy: 6.98% ***
✓ Checkpoint saved: best_model.pth
Best Test Accuracy so far: 6.98%



Epoch 2 Loss=3.7481 Acc=9.68% LR=0.045908: 100%|██████████| 196/196 [01:47<00:00,  1.82it/s]



Test set: Average loss: 3.9508, Accuracy: 1061/10000 (10.61%)

*** New best model! Test Accuracy: 10.61% ***
✓ Checkpoint saved: best_model.pth
Best Test Accuracy so far: 10.61%



Epoch 3 Loss=3.4351 Acc=15.67% LR=0.063908: 100%|██████████| 196/196 [01:47<00:00,  1.82it/s]



Test set: Average loss: 3.3465, Accuracy: 1930/10000 (19.30%)

*** New best model! Test Accuracy: 19.30% ***
✓ Checkpoint saved: best_model.pth
Best Test Accuracy so far: 19.30%


📦 Saving final model...
✓ Checkpoint saved: final_model.pth

Training completed. Best test accuracy: 19.30%

TRAINING AND TESTING LOSS
   (Y)     ^
4.87331457 |
4.62964884 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
4.38598311 | ⠤⠤⠤⣀⣀⣀⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
4.14231739 | ⣀⣀⣀⣀⣀⣀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠒⠒⠒⠒⠒⠢⠤⠤⠤⠤⠤⢄⣀⣀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
3.89865166 | ⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠑⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠢⠤⠤⠬⠭⠭⠭⠭⣉⣉⣑⡒⠒⠒⠒⠒⠒⠒⠒⠢⠤⠤⠤⠤⠤⠤⠤⠤⣀⣀⣀⣀⣀⣀⣀⣀⣀⠀⠀⠀⠀
3.65498593 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠉⠉⠒⠒⠒⠢⠤⠤⠤⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉
3.41132020 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠉⠉⠒⠒⠒⠢⠤⠤⠤⣀⣀⣀⡀⠀
3.16765447 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉
2.92398874 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

## Training Summary

In [8]:
print("\n" + "="*70)
print("TRAINING COMPLETED")
print("="*70)
print(f"Model: WideResNet-28-10")
print(f"Dataset: CIFAR-100 (100 classes)")
print(f"Final Best Test Accuracy: {final_accuracy:.2f}%")
print("="*70)
print("\nCheckpoints saved in ./checkpoints/")
print("  - best_model.pth (highest test accuracy)")
print("  - final_model.pth (final epoch)")
print("  - training_curves.png (visualization)")
print("  - metrics.json (complete training history)")


TRAINING COMPLETED
Model: WideResNet-28-10
Dataset: CIFAR-100 (100 classes)
Final Best Test Accuracy: 19.30%

Checkpoints saved in ./checkpoints/
  - best_model.pth (highest test accuracy)
  - final_model.pth (final epoch)
  - training_curves.png (visualization)
  - metrics.json (complete training history)
