# ResNet-34 From Scratch

Using Tiny ImageNet, which contains 64×64 images instead of ImageNet’s standard 224×224.  
Created a modified architecture by changing the first 7×7 convolution into a 3×3 convolution with padding so the height and width remain the same. Also removed the first MaxPool layer. 

I used Random cropping with padding, horizontal flipping and normalized with dataset-specific mean and std values computed in mean_std.py.


In [None]:
import torch
import torchvision
from torch import nn, optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
from torchvision import datasets

transform = transforms.Compose([
    transforms.RandomCrop(size=(64,64), padding=4),         # slightly pad and then crop back to 64x64
    transforms.RandomHorizontalFlip(),                      # randomly flip images left and right
    transforms.ToTensor(),                                  # convert image to tensor
    transforms.Normalize(mean=([0.4802, 0.4481, 0.3975]),   # normalize using mean & std
                         std=([0.2296, 0.2263, 0.2255]))
])

data_dir = '/kaggle/input/tiny-imagenet-200/tiny-imagenet-200/train'
full_dataset = datasets.ImageFolder(root=data_dir, transform=transform)

train_size = int(0.9 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset = random_split(full_dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=4, pin_memory=True)


## Residual Block (BasicBlock)

Architecture uses the standard ResNet block:
- Two 3×3 convolutions, each followed by BatchNorm and ReLU.
- A **skip connection (shortcut)** which either:
  - uses an Identity if the input and output shapes are the same, or
  - uses a 1×1 convolution with matching stride and channels to align shapes when needed.
- The addition of the skip path to the output is followed by a final ReLU activation.


If the block changes the spatial resolution or the number of channels, 
the skip connection uses a 1×1 convolution with matching stride and output channels to adjust dimensions before addition.

Image below comes from d2l.ai, it showcases the ResNet block with and without the skip connection.

![Res_block](figures/resnet-block.png)

In [None]:
class ResNetBlock(nn.Module):
    def __init__(self, in_ch:int, out_ch:int, stride:int):
        super().__init__()
        self.sequence = nn.Sequential(
            nn.Conv2d(in_channels=in_ch, out_channels=out_ch, kernel_size=3, stride=stride, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(),
            nn.Conv2d(in_channels=out_ch, out_channels=out_ch, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_ch)
        )

        # Skip connection (identity or projection to match dims)
        self.skip = nn.Identity()
        if stride != 1 or in_ch != out_ch:
            # If spatial size cahnges, adjust using a 1x1 conv
            self.skip = nn.Conv2d(in_channels=in_ch, out_channels=out_ch, kernel_size=1, stride=stride)
        self.relu = nn.ReLU()

    def forward(self, x):
        out = self.sequence(x)
        x = self.skip(x)
        return self.relu(x + out)

## ResNetStack
This builds a stack of multiple residual blocks:
- The first block handles any downsampling.
- Remaining blocks always have stride 1 to preserve dimensions.

Each stage (conv2_x, conv3_x...) is implemented as a ResNetStack

In [9]:
class ResNetStack(nn.Module):
    def __init__(self, in_ch:int, out_ch:int, stride:int, blocks:int):
        super().__init__()
        layers = []

        # First block handles stride or channel change
        layers.append(ResNetBlock(in_ch, out_ch, stride))

        # Remaining blocks keep stride=1
        for _ in range(1, blocks):
            layers.append(ResNetBlock(out_ch, out_ch, 1))
        self.net = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.net(x)

## ResNet-34 Block Overview

| Block    | k | s | p | out channels | repeats | downsample  |
| -------- | - | - | - | ------------ | ------- | ------------|
| conv2\_x | 3 | 1 | 1 | 64           | 3       | No          |
| conv3\_x | 3 | 2 | 1 | 128          | 4       | Yes         |
| conv4\_x | 3 | 2 | 1 | 256          | 6       | Yes         |
| conv5\_x | 3 | 2 | 1 | 512          | 3       | Yes         |

In [10]:
class ResNet34(nn.Module):
    def __init__(self):
        super().__init__()

        # Changed from 7x7 (stride=2) to 3x3 (stride=1) to keep high resolution for small images (64x64)
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        # conv2_x: keeps resolution, repeat=3
        self.conv2_x = ResNetStack(in_ch=64, out_ch=64, stride=1, blocks=3)
        
        # conv3_x: first block downsamples (stride=2), then repeat=4
        self.conv3_x = ResNetStack(in_ch=64, out_ch=128, stride=2, blocks=4)
        
        # conv4_x: first block downsamples (stride=2), then repeat=6
        self.conv4_x = ResNetStack(in_ch=128, out_ch=256, stride=2, blocks=6)
        
        # conv5_x: first block downsamples (stride=2), then repeat=3
        self.conv5_x = ResNetStack(in_ch=256, out_ch=512, stride=2, blocks=3)

        # Global average pooling (outputs 1x1x512)
        self.pool = nn.AdaptiveAvgPool2d((1, 1))
        self.flatten = nn.Flatten(1)
        self.dropout = nn.Dropout(0.4)

        # Fully connected layer to 200 Tiny ImageNet classes
        self.linear = nn.Linear(in_features=512, out_features=200)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2_x(x)
        x = self.conv3_x(x)
        x = self.conv4_x(x)
        x = self.conv5_x(x)
        x = self.pool(x)
        x = self.flatten(x)
        x = self.dropout(x)
        x = self.linear(x)
        return x


| Parameter    | Value                                  |
| ------------ | -------------------------------------- |
| optimizer    | SGD + momentum=0.9                     |
| lr           | 0.1 (step decay or cosine)             |
| weight decay | 1e-4                                   |
| batch size   | 128                                    |
| epochs       | 80-100                                 |
| augmentation | RandomCrop(64,4), RandomHorizontalFlip |


Using CosineAnnealingLR scheduler, starting at lr=0.01.
It gradually decays to nearly 0 by the end of 100 epochs.
This often improves final validation accuracy.

Training loop runs for 100 epochs, tracking both training and validation loss/accuracy. Saves the best model checkpoint based on validation accuracy. And saves a JSON history of all metrics.

In [None]:
from torch.optim.lr_scheduler import StepLR
import json

best_val_acc = 0.0
history = {
    "train_loss": [],
    "train_acc": [],
    "val_loss": [],
    "val_acc": [],
    "lr": []
}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

model = ResNet34().to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0004)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100)

epochs = 100

for epoch in range(epochs):
    model.train()
    total_loss = 0.0
    correct = 0
    total = 0
    
    for xb, yb in train_loader:
        xb, yb = xb.to(device), yb.to(device)
        
        optimizer.zero_grad()
        yhat = model(xb)
        loss = loss_fn(yhat, yb)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * xb.size(0)
        preds = torch.argmax(yhat, dim=1)
        correct += (preds == yb).sum().item()
        total += xb.size(0)

    avg_loss = total_loss / total
    accuracy = correct / total

    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for xb, yb in val_loader:
            xb, yb = xb.to(device), yb.to(device)
            yhat = model(xb)
            loss = loss_fn(yhat, yb)
            val_loss += loss.item() * xb.size(0)
            preds = torch.argmax(yhat, dim=1)
            val_correct += (preds == yb).sum().item()
            val_total += xb.size(0)
    
    avg_val_loss = val_loss / val_total
    val_accuracy = val_correct / val_total
    
    current_lr = scheduler.get_last_lr()[0]

    history["train_loss"].append(avg_loss)
    history["train_acc"].append(accuracy)
    history["val_loss"].append(avg_val_loss)
    history["val_acc"].append(val_accuracy)
    history["lr"].append(current_lr)

    with open("training_history.json", "w") as f:
        json.dump(history, f)

    if val_accuracy > best_val_acc:
        best_val_acc = val_accuracy
        torch.save(model.state_dict(), "ResNet34.pth")
        print("Saved new best model.")

    scheduler.step()

    print(f"Epoch {epoch+1}: "
          f"Train Loss: {avg_loss:.4f}, Train Acc: {accuracy:.4f}, "
          f"Val Loss: {avg_val_loss:.4f}, Val Acc: {val_accuracy:.4f}")

Using device: cuda
Saved new best model.
Epoch 1: Train Loss: 4.7311, Train Acc: 0.0477, Val Loss: 4.2720, Val Acc: 0.0943
Saved new best model.
Epoch 2: Train Loss: 3.9806, Train Acc: 0.1268, Val Loss: 3.8197, Val Acc: 0.1540
Saved new best model.
Epoch 3: Train Loss: 3.5161, Train Acc: 0.1955, Val Loss: 3.3646, Val Acc: 0.2255
Saved new best model.
Epoch 4: Train Loss: 3.1915, Train Acc: 0.2562, Val Loss: 3.0194, Val Acc: 0.2880
Saved new best model.
Epoch 5: Train Loss: 2.9295, Train Acc: 0.3056, Val Loss: 2.9033, Val Acc: 0.3241
Saved new best model.
Epoch 6: Train Loss: 2.7297, Train Acc: 0.3485, Val Loss: 2.8040, Val Acc: 0.3380
Saved new best model.
Epoch 7: Train Loss: 2.5605, Train Acc: 0.3794, Val Loss: 2.6095, Val Acc: 0.3713
Saved new best model.
Epoch 8: Train Loss: 2.4083, Train Acc: 0.4116, Val Loss: 2.4567, Val Acc: 0.4050
Saved new best model.
Epoch 9: Train Loss: 2.2734, Train Acc: 0.4418, Val Loss: 2.3369, Val Acc: 0.4312
Saved new best model.
Epoch 10: Train Loss: 2

In [None]:
import json

file_path = 'training_history.json'
with open(file_path, 'r') as f:
    data = json.load(f)

import plotly.graph_objs as go

fig_loss = go.Figure()
fig_loss.add_trace(go.Scatter(y=data["train_loss"], mode='lines', name='Train Loss'))
fig_loss.add_trace(go.Scatter(y=data["val_loss"], mode='lines', name='Val Loss'))
fig_loss.update_layout(title='Training and Validation Loss over Epochs',
                       xaxis_title='Epoch',
                       yaxis_title='Loss')

fig_acc = go.Figure()
fig_acc.add_trace(go.Scatter(y=data["train_acc"], mode='lines', name='Train Accuracy'))
fig_acc.add_trace(go.Scatter(y=data["val_acc"], mode='lines', name='Val Accuracy'))
fig_acc.update_layout(title='Training and Validation Accuracy over Epochs',
                      xaxis_title='Epoch',
                      yaxis_title='Accuracy')

fig_loss.show()
fig_acc.show()

![Loss](figures/Loss.png)

![Accuracy](figures/Accuracy.png)

## Results
- Achieved a Top-1 validation accuracy of ~61%, which is a solid result on Tiny ImageNet given the small image resolution and large number of classes.
- Saved the best-performing model as ResNet34.pth, and maintained a detailed training history.

Overall, this project demonstrated how to adapt classical deep convolutional architectures to smaller scale image datasets, while achieving meaningful results.
