In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
from data import getDataLoaders

In [7]:
# Load the ResNet-18 model from pytorch and display its architecture 
resnet18 = models.resnet18(weights='DEFAULT')
resnet18

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In this pre-trained ResNet model, two key adjustments need to be made: the first and last layers. ResNet was originally trained on the ImageNet dataset, which consists of images that are 224x224 pixels and classified into a thousand categories. Here are the two main issues and their solutions:

1) **Input image size and normalization**: Pre-trained models expect input images to be normalized in a specific way, with mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are at least 224 pixels (cf [PyTorch ResNet Documentation](https://pytorch.org/hub/pytorch_vision_resnet/)). Moreover, the model expects the input images to be normalized with the following mean and standard deviation values (calculated from ImageNet data):
    - `mean = [0.485, 0.456, 0.406]`
    - `std = [0.229, 0.224, 0.225]`
   
***Solution***: Since CIFAR-10 images are smaller at 32x32 pixels and have different statistical properties, we can either resize and normalize them to match the expected input for the pre-trained model or adapt the first layer to fit the smaller input size directly. It's more efficient to modify the first layer as the original 7x7 kernel is too large for 32x32 images and would lead to information loss, so switching to a 3x3 kernel is more appropriate to preserve details.

2) **Output layer**: The final fully connected layer, `(fc): Linear(in_features=512, out_features=1000, bias=True)`, is designed to output 1,000 features, corresponding to the 1,000 classes of ImageNet. 

***Solution***: The CIFAR-10 dataset has 10 different classes. Therefore, we need to adjust the `out_features` parameter in the final fully connected layer: `(fc): Linear(in_features=512, out_features=10, bias=True).`

In [8]:
resnet_cifar = models.resnet18(weights='DEFAULT')

In [9]:
resnet_cifar.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
resnet_cifar.conv1, resnet18.conv1

(Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False),
 Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False))

In [10]:
resnet_cifar.fc = nn.Linear(in_features=512, out_features=10, bias=True)
resnet_cifar.fc, resnet18.fc

(Linear(in_features=512, out_features=10, bias=True),
 Linear(in_features=512, out_features=1000, bias=True))

Initially, we will test the model without training it on the CIFAR-10 dataset. Consequently, we should use the provided normalization values since the model was originally trained on a different dataset, and these values help ensure the input is consistent with what the model expects.

In [11]:
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

train_loader, val_loader = getDataLoaders(mean=mean, std=std)
train_loader.dataset

Files already downloaded and verified
Files already downloaded and verified


Dataset CIFAR10
    Number of datapoints: 50000
    Root location: ./data/cifar-10-batches-py/
    Split: Train
    StandardTransform
Transform: Compose(
                 ToImage()
                 ToDtype(scale=True)
                 Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
           )

In [15]:
import torch
from tqdm import tqdm

def evaluate(model, dataloader, device):
    model.eval()  # Set the model to evaluation mode
    running_loss = 0.0
    correct = 0
    total = 0
    
    # Disable gradient calculation
    with torch.no_grad():
        for inputs, labels in tqdm(dataloader, desc="Evaluating"):
            inputs, labels = inputs.to(device), labels.to(device)
            
            # Forward pass
            outputs = model(inputs)
            
            # Compute loss (if loss function is provided)
            # loss = criterion(outputs, labels)
            # running_loss += loss.item() * inputs.size(0)
            
            # Get predictions
            _, predicted = torch.max(outputs, 1)
            
            # Compute number of correct predictions
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    
    # Calculate accuracy
    accuracy = correct / total
    
    # Optionally, compute the average loss if you have a criterion
    # average_loss = running_loss / total
    
    return accuracy  # Return other metrics if needed, like average_loss

# Example usage
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = resnet_cifar.to(device)  # Move model to the appropriate device

# Assuming val_dataloader is your DataLoader for evaluation
accuracy = evaluate(model, val_loader, device)
print(f"Validation Accuracy: {accuracy * 100:.2f}%")


Evaluating: 100%|█████████████████████████████| 157/157 [00:18<00:00,  8.57it/s]

Validation Accuracy: 10.54%



