# **<center> <span style="color: pink;">ATML PA_0 </span>**

### <span style="color: DeepPink;"> Name : Khadeeja Toseef </span>
### <span style="color: DeepPink;"> Roll Number : 27100309 </span>

## <center> <span style="color: pink;">Task 1: Inner Workings of ResNet-152 </span>

### <span style="color: HotPink;">1. Baseline Setup </span>

<span style="color: #D8BFD8;">importing libraries </span>

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as T
from torchvision import models

<span style="color: #D8BFD8;">device setup </span>

In [None]:
device = torch.device("mps" if torch.mps.is_available() else "cpu")
device = "cpu"
print(device)

<span style="color: #D8BFD8;">data </span>

In [None]:
# data transformations 
transform = T.Compose([
    # CIFAR-10 images are 32x32, resnet expects 224x224 images
    T.Resize(224),
    # transform PIL images to tensors
    T.ToTensor(),
    # normalise using imagenet mean and sd
    T.Normalize(
        mean = [0.485,0.456,0.406],
        std = [0.229,0.224,0.225]
    )
])

print(type(T)) # to verify it is not being over written 

# load dataset
train_dataset = torchvision.datasets.CIFAR10(root="./task1data", train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.CIFAR10(root="./task1data", train=False, transform=transform, download=True)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)


<span style="color: MediumPurple;">(a) Use a pre-trained ResNet-152 from PyTorch. </span>

In [None]:
model = models.resnet152(pretrained = True)

<span style="color: MediumPurple;">(b) Replace the final classification layer to match a smaller dataset such as CIFAR-10. </span>

In [None]:
number_of_features = model.fc.in_features
model.fc = nn.Linear(number_of_features , 10)

<span style="color: MediumPurple;">(c) Train only the classification head while freezing the rest of the backbone. </span>

In [None]:
# freeze all layers
for param in model.parameters():
    param.requires_grad = False

# unfreeze only final classification layer
for param in model.fc.parameters():
    param.requires_grad = True

In [None]:
model = model.to(device)
print(device)

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001) # why did we use lr as 0.001

<span style="color: MediumPurple;">(d) Record training and validation performance for a few epochs. </span>

In [None]:
# training loop
for epoch in range(3):
    model.train()
    running_loss = 0.0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        # reset gradients
        optimizer.zero_grad()
        # forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        # backpropagation
        loss.backward()
        # update weights        
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch [{epoch+1}/3], Loss: {running_loss/len(train_loader):.4f}")

# testing loop (eval)
model.eval()
correct, total = 0, 0
with torch.no_grad():  # no gradient computation needed
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)   # highest score = prediction
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

### <span style="color: DarkCyan;">Why is it unnecessary (and impractical) to train ResNet-152 from scratch on small datasets? What does freezing most of the network tell us about the transferability of features? </span>

ANSWER = <span style="color: #D8BFD8;"> ... </span>

### <span style="color: HotPink;">2. Residual Connections in Practice </span>

<span style="color: MediumPurple;">(a) Disable skip connections in a few selected residual blocks and re-train the modified network head. </span>

<span style="color: MediumPurple;">(b) Compare training dynamics and validation accuracy with the baseline. </span>

### <span style="color:DarkCyan;">How do skip connections change gradient flow in very deep networks? What happens to convergence speed and performance when residuals are removed?  </span>

ANSWER = <span style="color: #D8BFD8;"> ... </span>

### <span style="color: HotPink;">3. Feature Hierarchies and Representations </span>

<span style="color: MediumPurple;">(a) Collect features from early, middle, and late layers of the network.  </span>

<span style="color: MediumPurple;">(b) Visualize these features using dimensionality reduction (t-SNE or UMAP). </span>

### <span style="color:DarkCyan;">How does class separability evolve across layers? What differences can you observe between low-level and high-level representations?  </span>

ANSWER = <span style="color: #D8BFD8;">
... 
</span>

### <span style="color: HotPink;"> 4. Transfer Learning and Generalization </span>

<span style="color: MediumPurple;">(a) Fine-tune the model on a dataset different from ImageNet.  </span>

<span style="color: MediumPurple;">(b) Compare performance between:  

<span style="color: MediumPurple;">1. (a) using ImageNet-pretrained weights, and   

<span style="color: MediumPurple;">2. (b) training from random initialization.
</span>

<span style="color: MediumPurple;">(c) Experiment with fine-tuning only the final block versus the full backbone.  </span>

<span style="color: MediumPurple;">(d) Compare performance between:  

<span style="color: MediumPurple;">1. (a) using ImageNet-pretrained weights, and  

<span style="color: MediumPurple;">2. (b) training from random
initialization.
</span>

<span style="color: MediumPurple;">(e) Experiment with fine-tuning only the final block versus the full backbone.  </span>

### <span style="color:DarkCyan;"> Which setting provides the best trade-off between compute and accuracy? Which layers seem most transferable across datasets, and why?  </span>

ANSWER = <span style="color: #D8BFD8;"> ... </span>

### <span style="color: HotPink;"> 5. Optional Experiments </span>

<span style="color: MediumPurple;">(a) Compare t-SNE vs UMAP in representing feature separability. </span>

<span style="color: MediumPurple;">(b) Analyze feature similarities between classes that ResNet tends to confuse.  </span>

<span style="color: MediumPurple;">(c) Compare feature quality from ResNet-152 with a shallower ResNet (e.g., ResNet-18).   </span>