<h1 style="font-size: 120%; text-align: center;">VGG16 Architecture (From Scratch)</h1>

<p style="font-size: 100%;">
VGG16 is a deep convolutional neural network composed of 13 convolution layers
followed by 3 fully connected layers. What makes VGG16 special is its simplicity:
it uses only <b>3×3 convolution filters</b> and <b>2×2 max-pooling</b> throughout the network.
This results in a clean, uniform architecture that is easy to understand and implement.
</p>


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.datasets import ImageFolder
from PIL import ImageFile
import numpy as np
import copy
import os
import time



In [None]:

print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0))
ImageFile.LOAD_TRUNCATED_IMAGES = True
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

True
Tesla T4
Using device: cuda



<div style="
  max-width:830px;
  margin:22px auto;
  padding:22px 28px;
  background:#ffffff;
  border-radius:18px;
  box-shadow:0 12px 30px rgba(0,0,0,0.08);
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial;
  line-height:1.55;
">

  <h1 style="text-align:center; font-size:175%; margin-top:0; color:#111827;">
     Cats vs Dogs Image Dataset
  </h1>

  <p style="font-size:15px; color:#4b5563;">
    This dataset contains a curated collection of <strong>cat and dog images</strong>, designed for binary image
    classification tasks. It is commonly used in computer vision to build and evaluate convolutional neural networks,
    understand transfer learning, and experiment with image preprocessing and augmentation.  
  </p>

  <hr style="border:none; height:1px; background:#e5e7eb; margin:20px 0;">

  <h2 style="font-size:140%; color:#1f2937; margin-bottom:8px;"> Dataset Overview</h2>
  <ul style="font-size:15px; color:#374151; margin-left:22px; margin-top:6px;">
    <li>Two primary classes: <strong>Cats</strong> and <strong>Dogs</strong>.</li>
    <li>Images vary in pose, lighting, background, and resolution.</li>
    <li>Ideal for training deep models like VGG16, ResNet, MobileNet, etc.</li>
    <li>Often used to demonstrate transfer learning and convolutional feature extraction.</li>
  </ul>

  <h2 style="font-size:140%; color:#1f2937; margin-bottom:8px; margin-top:20px;"> Why This Dataset?</h2>
  <ul style="font-size:15px; color:#374151; margin-left:22px; margin-top:6px;">
    <li><strong>Perfect for beginners</strong>  simple binary classification problem.</li>
    <li><strong>Great for transfer learning</strong>  VGG16, ResNet, EfficientNet, etc. perform very well.</li>
    <li><strong>Large enough</strong> to prevent overfitting but small enough for fast training on GPUs.</li>
    <li><strong>Diverse images</strong> make it suitable for real-world generalization.</li>
  </ul>

  <h2 style="font-size:140%; color:#1f2937; margin-bottom:8px; margin-top:20px;"> Typical Use Cases</h2>
  <ul style="font-size:15px; color:#374151; margin-left:22px; margin-top:6px;">
    <li>Transfer Learning demonstrations</li>
    <li>Training CNNs from scratch</li>
    <li>Experimenting with data augmentation techniques</li>
    <li>Evaluating classification accuracy & model generalization</li>
  </ul>

  <div style="margin-top:20px; padding:12px 16px; border-left:4px solid #10b981; background:#ecfdf5; color:#065f46;">
    <strong>Note:</strong> This dataset is used solely for educational and research purposes.
  </div>

</div>


<h2 style="font-size: 115%;"> Dataset File Structure</h2>



<pre>
data/
   train/
      cats/
      dogs/
   val/
      cats/
      dogs/
</pre>


In [None]:
from google.colab import drive
drive.mount('/content/drive')

!ls "/content/drive/My Drive/DATASETZ"
!ls "/content/drive/My Drive/DATASETZ/cat_dog_data"
!ls "/content/drive/My Drive/DATASETZ/cat_dog_data/training_set"
!ls "/content/drive/My Drive/DATASETZ/cat_dog_data/test_set"




Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
cat_dog_data
test_set  training_set
cats  dogs
cats  dogs


In [None]:

train_dir = "/content/drive/MyDrive/DATASETZ/cat_dog_data/training_set"
val_dir   = "/content/drive/MyDrive/DATASETZ/cat_dog_data/test_set"


# Verify contents
import os
for root, dirs, files in os.walk("/content/drive/MyDrive/DATASETZ/cat_dog_data"):
    if ".ipynb_checkpoints" not in root:
        print(root, "->", len(files), "files")

/content/drive/MyDrive/DATASETZ/cat_dog_data -> 0 files
/content/drive/MyDrive/DATASETZ/cat_dog_data/test_set -> 0 files
/content/drive/MyDrive/DATASETZ/cat_dog_data/test_set/cats -> 991 files
/content/drive/MyDrive/DATASETZ/cat_dog_data/test_set/dogs -> 1000 files
/content/drive/MyDrive/DATASETZ/cat_dog_data/training_set -> 0 files
/content/drive/MyDrive/DATASETZ/cat_dog_data/training_set/cats -> 4000 files
/content/drive/MyDrive/DATASETZ/cat_dog_data/training_set/dogs -> 4000 files


In [None]:
import os
print(os.path.exists(train_dir))
print(os.path.exists(val_dir))


True
True


<h2 style="font-size: 115%;"> Transformations / Preprocessing</h2>

<p style="font-size: 95%;">
Since we are training <b>from scratch</b>, we keep preprocessing minimal—resize images, apply augmentation,
and convert to tensors. No ImageNet normalization is used because we are not using pretrained weights.
</p>


In [None]:

# TRANSFORMS

mean = [0.485, 0.456, 0.406]
std  = [0.229, 0.224, 0.225]

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224, scale=(0.85, 1.0)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.RandomApply([
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2)
    ], p=0.3),

    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

val_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])


<h2 style="font-size: 115%;"> Loading the Dataset</h2>

<p style="font-size: 95%;">
PyTorch's <code>ImageFolder</code> automatically reads images from the folder structure.
We create DataLoaders to feed images into the model in batches.
</p>


In [None]:
# DATASETS & DATALOADERS
batch_size = 32

train_dataset = ImageFolder(train_dir, transform=train_transform)
val_dataset   = ImageFolder(val_dir, transform=val_transform)

train_loader = DataLoader(train_dataset, batch_size=batch_size,
                          shuffle=True, num_workers=0)

val_loader = DataLoader(val_dataset, batch_size=batch_size,
                        shuffle=False, num_workers=0)


print("Classes:", train_dataset.classes)
print("Train images:", len(train_dataset))
print("Val images:", len(val_dataset))

num_classes = len(train_dataset.classes)


Classes: ['cats', 'dogs']
Train images: 8000
Val images: 1991


<h2 style="font-size: 115%;"> VGG16 Architecture (Full From-Scratch Model)</h2>

<p style="font-size: 95%;">
Below is the complete VGG16 architecture defined manually, not using any pretrained weights.
It includes:
<ul>
<li>5 Convolutional Blocks</li>
<li>Each block contains 2 or 3 Conv layers</li>
<li>Max-pooling after each block</li>
<li>Fully connected classifier at the end</li>
</ul>
</p>


In [None]:
num_classes = len(train_dataset.classes)

class VGG16(nn.Module):
    def __init__(self, num_classes):
        super(VGG16, self).__init__()

        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 2
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 3
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 4
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            # Block 5
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
           nn.Flatten(),
           nn.Linear(512 * 7 * 7, 4096),
           nn.ReLU(True),
           nn.BatchNorm1d(4096),
           nn.Dropout(0.5),

           nn.Linear(4096, 4096),
           nn.ReLU(True),
           nn.BatchNorm1d(4096),
           nn.Dropout(0.5),

           nn.Linear(4096, num_classes)
         )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x


# Instantiate model
device = torch.device("cuda")
model = VGG16(num_classes).to(device)
print(model)
# Selective unfreeze: Train only Block 5
for param in model.features.parameters():
    param.requires_grad = False  # freeze all feature extractor

for idx in range(24, 31):        # Block 5 layers
    for param in model.features[idx].parameters():
        param.requires_grad = True

print("Model device:", next(model.parameters()).device)




VGG16(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation

In [None]:
inputs, labels = next(iter(train_loader))
inputs = inputs.to(device)
labels = labels.to(device)

print("Batch device:", inputs.device)


Batch device: cuda:0


<h2 style="font-size: 115%;"> Loss Function and Optimizer</h2>

<p style="font-size: 95%;">
We use CrossEntropyLoss for classification and Adam optimizer for stable training.
</p>


In [None]:
criterion = nn.CrossEntropyLoss(label_smoothing=0.1)

# IMPORTANT: Create optimizer ONLY for trainable parameters (classifier)
optimizer = optim.AdamW(filter(lambda p: p.requires_grad, model.parameters()),
                        lr=1e-4, weight_decay=1e-4)


<h2 style="font-size: 115%;"> LR Scheduler</h2>

In [None]:
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)


<h2 style="font-size: 115%;"> Training Loop</h2>

<p style="font-size: 95%;">
This function handles:
<ul>
<li>Forward pass</li>
<li>Backpropagation</li>
<li>Loss & accuracy calculation</li>
<li>Tracking the best model weights</li>
</ul>
</p>


In [None]:
# TRAINING LOOP
train_losses, val_losses = [], []
train_accs, val_accs = [], []

def train_model(model, criterion, optimizer, scheduler, epochs=20):
    best_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(epochs):
        print(f"\nEpoch {epoch+1}/{epochs}")
        print("-" * 50)

        for phase in ["train", "val"]:
            # train mode enables dropout & grads; eval freezes them
            model.train() if phase == "train" else model.eval()

            dataloader = train_loader if phase == "train" else val_loader

            running_loss = 0.0
            running_corrects = 0

            for inputs, labels in dataloader:
                inputs = inputs.to(device)
                labels = labels.to(device)

                optimizer.zero_grad()

                with torch.set_grad_enabled(phase == "train"):
                    outputs = model(inputs)
                    loss = criterion(outputs, labels)
                    _, preds = torch.max(outputs, 1)

                    if phase == "train":
                        loss.backward()
                        optimizer.step()

                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels)

            epoch_loss = running_loss / len(dataloader.dataset)
            epoch_acc  = running_corrects.double() / len(dataloader.dataset)

            print(f"{phase.upper()} | Loss: {epoch_loss:.4f} | Acc: {epoch_acc:.4f}")

            # Store metrics for plotting
            if phase == "train":
                train_losses.append(epoch_loss)
                train_accs.append(epoch_acc.item())
            else:
                val_losses.append(epoch_loss)
                val_accs.append(epoch_acc.item())

            #  SAVE BEST MODEL CHECKPOINT
            if phase == "val" and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_wts = copy.deepcopy(model.state_dict())

                torch.save({
                    'epoch': epoch + 1,
                    'model_state_dict': best_wts,
                    'optimizer_state_dict': optimizer.state_dict(),
                    'best_acc': best_acc
                }, "vgg16_best_checkpoint.pth")

                print(f"Checkpoint saved! New best accuracy: {best_acc:.4f}")


        # Step scheduler AFTER each epoch
        scheduler.step()

    print(f"\nBest Validation Accuracy: {best_acc:.4f}")
    model.load_state_dict(best_wts)
    return model


In [None]:
import time
import torch

def estimate_training_time(model, loader, epochs=35):
    model.eval()


    inputs, labels = next(iter(loader))
    inputs = inputs.to(device)
    labels = labels.to(device)
    _ = model(inputs)

    # Measure time for 10 batches
    start = time.time()
    batches_to_test = 10
    count = 0

    for i, (inputs, labels) in enumerate(loader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        _ = model(inputs)

        count += 1
        if count >= batches_to_test:
            break

    end = time.time()

    sec_per_batch = (end - start) / batches_to_test
    batches_per_epoch = len(loader)
    epoch_time = sec_per_batch * batches_per_epoch
    total_time = epoch_time * epochs

    print(f"\nEstimated time per batch: {sec_per_batch:.4f} sec")
    print(f"Estimated time per epoch: {epoch_time/60:.2f} minutes")
    print(f"Estimated total time ({epochs} epochs): {total_time/60:.2f} minutes")


estimate_training_time(model, train_loader, epochs=35)



Estimated time per batch: 3.4153 sec
Estimated time per epoch: 14.23 minutes
Estimated total time (35 epochs): 498.06 minutes


<h2 style="font-size: 115%;"> Start Training</h2>


In [None]:
#  TRAIN MODEL

trained_model = train_model(model, criterion, optimizer, scheduler, epochs=35)



NameError: name 'train_model' is not defined

In [None]:
#  save
torch.save(trained_model.cpu().state_dict(), "vgg16_from_scratch.pth")
trained_model.to(device)

print("\nSaved model as vgg16_from_scratch.pth")


<h2 style="font-size: 115%;"> Prediction Function</h2>

<p style="font-size: 95%;">
A simple helper function to classify new images using the trained model.
</p>


In [None]:
def predict_image(path, model):
    model.eval()   # <-- always ensure evaluation mode

    # Load and preprocess image
    img = Image.open(path).convert("RGB")
    img = val_transform(img).unsqueeze(0).to(device)

    with torch.no_grad():
        outputs = model(img)
        _, pred = torch.max(outputs, 1)

    # Return class name
    class_idx = pred.item()
    return train_dataset.classes[class_idx]


<h2 style="font-size: 115%;"> ACCURACY & LOSS CURVES</h2>


In [None]:
import matplotlib.pyplot as plt

# -------------------- LOSS PLOT --------------------
plt.figure(figsize=(8,5))
plt.plot(train_losses, label="Train Loss")
plt.plot(val_losses, label="Val Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training vs Validation Loss")
plt.legend()
plt.show()

# -------------------- ACCURACY PLOT --------------------
plt.figure(figsize=(8,5))
plt.plot(train_accs, label="Train Acc")
plt.plot(val_accs, label="Val Acc")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.title("Training vs Validation Accuracy")
plt.legend()
plt.show()


<h2 style="font-size: 115%;">CONFUSION MATRIX</h2>


In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

def evaluate_confusion_matrix(model, dataloader):
    model.eval()
    preds_list = []
    labels_list = []

    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            preds_list.extend(preds.cpu().numpy())
            labels_list.extend(labels.cpu().numpy())

    cm = confusion_matrix(labels_list, preds_list)
    disp = ConfusionMatrixDisplay(cm, display_labels=['Cat', 'Dog'])
    disp.plot(cmap="Blues", values_format="d")
    plt.title("Confusion Matrix")
    plt.show()

# -------------------- CALL IT --------------------
evaluate_confusion_matrix(model, val_loader)
