- AlexNet (2012): First breakthrough CNN. Shallow (8 layers), fast, weaker accuracy by today’s standards.

- ResNet152 (2015): Very deep (152 layers), uses skip connections so it can train without vanishing gradients. Much more accurate, but heavier and slower.

In [1]:
import torch 
import torch.nn as nn 
import torch.optim as optim
import torchvision 
import torchvision.transforms as transforms
from sklearn.metrics import classification_report, confusion_matrix

# Preprocessing (Transforms)
### Pretrained models expect images:
##### - size: 224x224
##### - normalized using ImageNet's mean & std

In [2]:
transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

**'transforms.ToTensor()'**

What does it do?
- PyTorch expects images as tensors in a specific format:

- Channels first → (C, H, W)

- C = channels (3 for RGB), H = height, W = width

- But most image libraries (like PIL, OpenCV, NumPy) store them as:

- Channels last → (H, W, C)

- 👉 transforms.ToTensor() changes the order from (H, W, C) to (C, H, W).

In [3]:
trainset = torchvision.datasets.CIFAR10(
    root = './data', train = True, download = True,
    transform = transform 
)

100%|██████████| 170M/170M [00:01<00:00, 85.4MB/s]


In [4]:
trainloader  =torch.utils.data.DataLoader(trainset, batch_size = 32,shuffle = True)

In [5]:
testset = torchvision.datasets.CIFAR10(root = './data', train = False, download = True,
    transform = transform )

In [6]:
testloader  =torch.utils.data.DataLoader(testset, batch_size = 32,shuffle = False)

In [7]:
classes = trainset.classes
print(classes)

['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']


## Load Models

In [8]:
import torchvision.models as models 
# ResNet152
resnet = models.resnet152(pretrained = True)
for param in resnet.parameters():
    param.requires_grad = False 
resnet.fc = nn.Linear(resnet.fc.in_features, len(classes))

Downloading: "https://download.pytorch.org/models/resnet152-394f9c45.pth" to /root/.cache/torch/hub/checkpoints/resnet152-394f9c45.pth
100%|██████████| 230M/230M [00:01<00:00, 200MB/s]


In [9]:
import torchvision.models as models 
alexnet = models.alexnet(pretrained = True)
for param in alexnet.parameters():
    param.require_grad = False
alexnet.classifier[6] = nn.Linear(alexnet.classifier[6].in_features, len(classes))  

Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
100%|██████████| 233M/233M [00:01<00:00, 216MB/s]


- param.requires_grad = False

In PyTorch, every parameter (weights & biases) has a flag requires_grad.

If True → gradients are computed during backpropagation (so it can be updated).

If False → parameter is frozen (not updated during training).

👉 Why do this?

We are doing transfer learning.

The pretrained AlexNet already knows useful features (edges, shapes, textures) from ImageNet.

We don’t want to “retrain” those millions of parameters (takes too long, needs huge data).

So we freeze them by setting requires_grad = False.

In [10]:
# ==============================
# 5. Training
# ==============================
# What happens in each epoch:
# - Set model to training mode
# - Loop through batches
# - Forward pass → get predictions
# - Compute loss (CrossEntropy)
# - Backward pass → update weights
# - Print average loss per epoch

In [11]:
def train(model, trainlaoder, epochs = 5):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(),lr = 0.001)

    for epoch in range(epochs):
        model.train()
        running_loss = 0.0

        for images , labels in trainloader:
            images, labels = images.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss +=loss.item()
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(trainloader):.4f}")
    return model

In [12]:
def evaluate_model(model, testloader, classes):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.eval()  
    y_true, y_pred = [], []

    with torch.no_grad():   # Disable gradient calculation
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)       # Forward pass
            _, preds = torch.max(outputs, 1)  # Get class with max probability
            y_true.extend(labels.cpu().numpy())
            y_pred.extend(preds.cpu().numpy())

    print("Classification Report:")
    print(classification_report(y_true, y_pred, target_names=classes))

    cm = confusion_matrix(y_true, y_pred)
    print("Confusion Matrix:\n", cm)


In [13]:
print("\nTraining ResNet152...")
resnet_trained = train(resnet, trainloader, epochs=5)
print("\nEvaluating ResNet152...")
evaluate_model(resnet_trained, testloader, classes)

# Train & Evaluate AlexNet
print("\nTraining AlexNet...")
alexnet_trained = train(alexnet, trainloader, epochs=5)
print("\nEvaluating AlexNet...")
evaluate_model(alexnet_trained, testloader, classes)


Training ResNet152...
Epoch [1/5], Loss: 0.6698
Epoch [2/5], Loss: 0.5508
Epoch [3/5], Loss: 0.5356
Epoch [4/5], Loss: 0.5090
Epoch [5/5], Loss: 0.5044

Evaluating ResNet152...
Classification Report:
              precision    recall  f1-score   support

    airplane       0.85      0.84      0.85      1000
  automobile       0.84      0.95      0.89      1000
        bird       0.72      0.86      0.79      1000
         cat       0.75      0.73      0.74      1000
        deer       0.84      0.80      0.82      1000
         dog       0.85      0.79      0.82      1000
        frog       0.95      0.85      0.90      1000
       horse       0.84      0.88      0.86      1000
        ship       0.90      0.88      0.89      1000
       truck       0.93      0.85      0.89      1000

    accuracy                           0.84     10000
   macro avg       0.85      0.84      0.84     10000
weighted avg       0.85      0.84      0.84     10000

Confusion Matrix:
 [[840  26  39   7   8

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
