Evaluating VGG Model on a Independent and Identically Distributed Dataset

Making all required Imports

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from tqdm import tqdm
import torchvision.models as models
import os
# from git import Repo
from PIL import Image
from torch.utils.data import Dataset, DataLoader

Loading In the VGG Model and setting up the device to see if we have a valid GPU available

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = models.vgg19(pretrained=True)

Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth
100%|██████████| 548M/548M [00:08<00:00, 67.5MB/s]


Getting The VGG Pretraind Model Ready for training.

This step includes freezing all parameters in the feature extractor and replacing the classifier head with a new head based on the number of classes our dataset has which is 10 as CIFAR-10 dataset has 10 classes. We then define loss function and optimizer as. During this step we also move the model to our available device.

In [4]:
for param in model.parameters():
    param.requires_grad = False

num_ftrs = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_ftrs, 10)

model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.classifier[6].parameters(), lr=0.001)

In this step we define our Image transformation that we have to apply onto our CIFAR-10 Dataset. These transformations enhance the dataset and are helpful in Image classification. The outputed dataset with Image transformations applied is stored in the Variable trainset.

We also get the trainloader which essentially splits train_dataset into batches or chunks. This can be helpful in making the model train more faster and smoother

In [5]:
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=4)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:04<00:00, 38746264.53it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data




Training and Finetuning the VGG Model

In this step the VGG model is trained for 3 epochs. Essentially we will be doing a forward pass, calculating the loss and then updating the weights in the backward pass.

In [6]:

model.train()
num_epochs = 3

for epoch in range(num_epochs):
    running_loss = 0.0
    with tqdm(total=len(trainloader), desc=f'Epoch {epoch + 1}/{num_epochs}', unit='batch') as pbar:
        for i, (images, labels) in enumerate(trainloader):
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()

            outputs = model(images)
            loss = criterion(outputs, labels)

            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix(loss=running_loss / (i + 1))
            pbar.update(1)

            if (i + 1) % 100 == 0:
                print(f'Epoch [{epoch + 1}], Step [{i + 1}], Loss: {running_loss / (i + 1):.4f}')


Epoch 1/3:   6%|▋         | 101/1563 [00:20<04:38,  5.25batch/s, loss=0.914]

Epoch [1], Step [100], Loss: 0.9166


Epoch 1/3:  13%|█▎        | 201/1563 [00:40<04:29,  5.05batch/s, loss=0.798]

Epoch [1], Step [200], Loss: 0.7982


Epoch 1/3:  19%|█▉        | 301/1563 [01:00<04:20,  4.84batch/s, loss=0.757]

Epoch [1], Step [300], Loss: 0.7580


Epoch 1/3:  26%|██▌       | 401/1563 [01:21<03:53,  4.98batch/s, loss=0.724]

Epoch [1], Step [400], Loss: 0.7236


Epoch 1/3:  32%|███▏      | 501/1563 [01:41<03:35,  4.94batch/s, loss=0.71]

Epoch [1], Step [500], Loss: 0.7093


Epoch 1/3:  38%|███▊      | 601/1563 [02:01<03:10,  5.04batch/s, loss=0.693]

Epoch [1], Step [600], Loss: 0.6931


Epoch 1/3:  45%|████▍     | 700/1563 [02:21<03:09,  4.55batch/s, loss=0.685]

Epoch [1], Step [700], Loss: 0.6854


Epoch 1/3:  51%|█████     | 800/1563 [02:41<02:33,  4.98batch/s, loss=0.678]

Epoch [1], Step [800], Loss: 0.6778


Epoch 1/3:  58%|█████▊    | 900/1563 [03:01<02:14,  4.94batch/s, loss=0.668]

Epoch [1], Step [900], Loss: 0.6682


Epoch 1/3:  64%|██████▍   | 1001/1563 [03:22<01:51,  5.02batch/s, loss=0.666]

Epoch [1], Step [1000], Loss: 0.6664


Epoch 1/3:  70%|███████   | 1101/1563 [03:42<01:32,  5.00batch/s, loss=0.665]

Epoch [1], Step [1100], Loss: 0.6643


Epoch 1/3:  77%|███████▋  | 1201/1563 [04:02<01:12,  5.02batch/s, loss=0.662]

Epoch [1], Step [1200], Loss: 0.6619


Epoch 1/3:  83%|████████▎ | 1301/1563 [04:22<00:52,  4.98batch/s, loss=0.663]

Epoch [1], Step [1300], Loss: 0.6629


Epoch 1/3:  90%|████████▉ | 1401/1563 [04:42<00:32,  4.97batch/s, loss=0.661]

Epoch [1], Step [1400], Loss: 0.6610


Epoch 1/3:  96%|█████████▌| 1501/1563 [05:02<00:12,  5.01batch/s, loss=0.66]

Epoch [1], Step [1500], Loss: 0.6600


Epoch 1/3: 100%|██████████| 1563/1563 [05:15<00:00,  4.96batch/s, loss=0.66]
Epoch 2/3:   6%|▋         | 101/1563 [00:20<04:58,  4.89batch/s, loss=0.582]

Epoch [2], Step [100], Loss: 0.5827


Epoch 2/3:  13%|█▎        | 200/1563 [00:40<04:30,  5.04batch/s, loss=0.586]

Epoch [2], Step [200], Loss: 0.5868


Epoch 2/3:  19%|█▉        | 300/1563 [01:00<04:13,  4.98batch/s, loss=0.594]

Epoch [2], Step [300], Loss: 0.5944


Epoch 2/3:  26%|██▌       | 401/1563 [01:20<03:50,  5.04batch/s, loss=0.6]

Epoch [2], Step [400], Loss: 0.5997


Epoch 2/3:  32%|███▏      | 500/1563 [01:40<03:46,  4.69batch/s, loss=0.607]

Epoch [2], Step [500], Loss: 0.6073


Epoch 2/3:  38%|███▊      | 601/1563 [02:01<03:11,  5.03batch/s, loss=0.608]

Epoch [2], Step [600], Loss: 0.6078


Epoch 2/3:  45%|████▍     | 701/1563 [02:21<02:55,  4.92batch/s, loss=0.611]

Epoch [2], Step [700], Loss: 0.6105


Epoch 2/3:  51%|█████     | 800/1563 [02:41<02:32,  5.00batch/s, loss=0.612]

Epoch [2], Step [800], Loss: 0.6116


Epoch 2/3:  58%|█████▊    | 901/1563 [03:02<02:12,  5.01batch/s, loss=0.615]

Epoch [2], Step [900], Loss: 0.6149


Epoch 2/3:  64%|██████▍   | 1001/1563 [03:22<01:51,  5.03batch/s, loss=0.62]

Epoch [2], Step [1000], Loss: 0.6198


Epoch 2/3:  70%|███████   | 1101/1563 [03:42<01:32,  4.99batch/s, loss=0.624]

Epoch [2], Step [1100], Loss: 0.6238


Epoch 2/3:  77%|███████▋  | 1200/1563 [04:02<01:12,  4.98batch/s, loss=0.624]

Epoch [2], Step [1200], Loss: 0.6238


Epoch 2/3:  83%|████████▎ | 1301/1563 [04:22<00:52,  5.04batch/s, loss=0.626]

Epoch [2], Step [1300], Loss: 0.6263


Epoch 2/3:  90%|████████▉ | 1401/1563 [04:43<00:32,  4.94batch/s, loss=0.631]

Epoch [2], Step [1400], Loss: 0.6304


Epoch 2/3:  96%|█████████▌| 1501/1563 [05:03<00:12,  5.05batch/s, loss=0.634]

Epoch [2], Step [1500], Loss: 0.6340


Epoch 2/3: 100%|██████████| 1563/1563 [05:15<00:00,  4.95batch/s, loss=0.635]
Epoch 3/3:   6%|▋         | 101/1563 [00:20<04:59,  4.88batch/s, loss=0.608]

Epoch [3], Step [100], Loss: 0.6088


Epoch 3/3:  13%|█▎        | 201/1563 [00:40<04:37,  4.90batch/s, loss=0.61]

Epoch [3], Step [200], Loss: 0.6111


Epoch 3/3:  19%|█▉        | 301/1563 [01:01<04:15,  4.94batch/s, loss=0.608]

Epoch [3], Step [300], Loss: 0.6081


Epoch 3/3:  26%|██▌       | 400/1563 [01:21<03:50,  5.05batch/s, loss=0.616]

Epoch [3], Step [400], Loss: 0.6153


Epoch 3/3:  32%|███▏      | 501/1563 [01:41<03:34,  4.96batch/s, loss=0.626]

Epoch [3], Step [500], Loss: 0.6267


Epoch 3/3:  38%|███▊      | 601/1563 [02:01<03:13,  4.96batch/s, loss=0.631]

Epoch [3], Step [600], Loss: 0.6313


Epoch 3/3:  45%|████▍     | 701/1563 [02:22<02:53,  4.96batch/s, loss=0.64]

Epoch [3], Step [700], Loss: 0.6387


Epoch 3/3:  51%|█████     | 801/1563 [02:42<02:33,  4.97batch/s, loss=0.642]

Epoch [3], Step [800], Loss: 0.6410


Epoch 3/3:  58%|█████▊    | 901/1563 [03:02<02:11,  5.04batch/s, loss=0.643]

Epoch [3], Step [900], Loss: 0.6424


Epoch 3/3:  64%|██████▍   | 1000/1563 [03:22<01:59,  4.71batch/s, loss=0.641]

Epoch [3], Step [1000], Loss: 0.6410


Epoch 3/3:  70%|███████   | 1100/1563 [03:43<01:32,  4.99batch/s, loss=0.642]

Epoch [3], Step [1100], Loss: 0.6424


Epoch 3/3:  77%|███████▋  | 1200/1563 [04:03<01:16,  4.74batch/s, loss=0.645]

Epoch [3], Step [1200], Loss: 0.6448


Epoch 3/3:  83%|████████▎ | 1301/1563 [04:23<00:53,  4.89batch/s, loss=0.644]

Epoch [3], Step [1300], Loss: 0.6437


Epoch 3/3:  90%|████████▉ | 1401/1563 [04:46<00:32,  4.94batch/s, loss=0.647]

Epoch [3], Step [1400], Loss: 0.6467


Epoch 3/3:  96%|█████████▌| 1501/1563 [05:06<00:12,  4.90batch/s, loss=0.644]

Epoch [3], Step [1500], Loss: 0.6444


Epoch 3/3: 100%|██████████| 1563/1563 [05:19<00:00,  4.89batch/s, loss=0.645]


Evaluation Step and Get Accuracy

In this step we define the test set as well as the test loader and make the predictions on our finetuned model and report our accuracy at the end

In [None]:
testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

model.eval()
correct = 0
total = 0

with torch.no_grad():
    for images, labels in testloader:
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

cifar10acc = 100 * correct / total

print(f'Accuracy on CIFAR-10 test set: {100 * correct / total:.2f}%')


Accuracy on CIFAR-10 test set: 82.97%




# TASK 4 Inductive Biases of Models: Semantic Biases

We will do the following:

*   Use the vgg model (finetuned on CIFAR10) and validate on a variation of CIFAR10 that exhibits shape bias
*   Use the vgg model (finetuned on CIFAR10) and validate on a variation of CIFAR10 that exhibits texture bias
*   Finetune the vgg model on MNIST Dataset and the evaluate it on a colourized MNIST dataset to find the colour bias






In [None]:
repo_url = 'https://github.com/bdevans/CIFAR-10G.git'
clone_dir = 'CIFAR-10G'


if not os.path.exists(clone_dir):
    print("Cloning the CIFAR-10G repository...")
    Repo.clone_from(repo_url, clone_dir)
    print("Repository cloned successfully.")
else:
    print("Repository already exists. Skipping clone.")


data_dir = os.path.join(clone_dir, '224x224')
data_dir = 'CIFAR-10G/224x224'

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

Cloning the CIFAR-10G repository...
Repository cloned successfully.


**SHAPE BIAS**

In [None]:

subdirs = ['contours', 'contours_inverted', 'line_drawings', 'line_drawings_inverted', 'silhouettes', 'silhouettes_inverted']
datasets_dict = {}
validation_loaders = {}
accuracy_dict = {}

for subdir in subdirs:
    dataset_path = os.path.join(data_dir, subdir)

    dataset = datasets.ImageFolder(root=dataset_path, transform=transform)
    dataset_size = len(dataset)
    val_size = int(0.2 * dataset_size)
    train_size = dataset_size - val_size
    train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])

    # Load datasets into DataLoader
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
    val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4)

    datasets_dict[subdir] = train_loader
    validation_loaders[subdir] = val_loader
    accuracy_dict[subdir] = []



Evaluating the model on each of the sub directories. We store the accuracy of the model for each directory for future calculations.

In [None]:
num_epochs = 3

for subdir, validation_loader in validation_loaders.items():
    print(f"Validating on {subdir} dataset")

    for epoch in range(num_epochs):
        model.eval()
        correct = 0
        total = 0

        with tqdm(total=len(validation_loader), desc=f'Epoch {epoch + 1}/{num_epochs}', unit='batch') as pbar:
            with torch.no_grad():
                running_loss = 0.0
                for i, (images, labels) in enumerate(validation_loader):

                    images, labels = images.to(device), labels.to(device)
                    outputs = model(images)
                    loss = criterion(outputs, labels)
                    running_loss += loss.item()


                    _, predicted = torch.max(outputs, 1)
                    total += labels.size(0)
                    correct += (predicted == labels).sum().item()

                    pbar.set_postfix(loss=running_loss / (i + 1))
                    pbar.update(1)

        accuracy = 100 * correct / total
        accuracy_dict[subdir].append(accuracy)

        print(f"Validation Accuracy on {subdir} dataset after epoch {epoch + 1}: {accuracy:.2f}%")

    print(f"Finished validating on {subdir} dataset\n")



for subdir, accuracies in accuracy_dict.items():
    print(f"Validation accuracies for {subdir} dataset over {num_epochs} epochs: {accuracies}")


Validating on contours dataset


Epoch 1/3: 100%|██████████| 1/1 [00:00<00:00,  2.53batch/s, loss=9.43]


Validation Accuracy on contours dataset after epoch 1: 10.00%


Epoch 2/3: 100%|██████████| 1/1 [00:00<00:00,  2.98batch/s, loss=9.43]


Validation Accuracy on contours dataset after epoch 2: 10.00%


Epoch 3/3: 100%|██████████| 1/1 [00:00<00:00,  2.82batch/s, loss=9.43]


Validation Accuracy on contours dataset after epoch 3: 10.00%
Finished validating on contours dataset

Validating on contours_inverted dataset


Epoch 1/3: 100%|██████████| 1/1 [00:00<00:00,  2.85batch/s, loss=5.67]


Validation Accuracy on contours_inverted dataset after epoch 1: 15.00%


Epoch 2/3: 100%|██████████| 1/1 [00:00<00:00,  2.86batch/s, loss=5.67]


Validation Accuracy on contours_inverted dataset after epoch 2: 15.00%


Epoch 3/3: 100%|██████████| 1/1 [00:00<00:00,  3.00batch/s, loss=5.67]


Validation Accuracy on contours_inverted dataset after epoch 3: 15.00%
Finished validating on contours_inverted dataset

Validating on line_drawings dataset


Epoch 1/3: 100%|██████████| 1/1 [00:00<00:00,  2.98batch/s, loss=4.95]


Validation Accuracy on line_drawings dataset after epoch 1: 35.00%


Epoch 2/3: 100%|██████████| 1/1 [00:00<00:00,  2.21batch/s, loss=4.95]


Validation Accuracy on line_drawings dataset after epoch 2: 35.00%


Epoch 3/3: 100%|██████████| 1/1 [00:00<00:00,  1.71batch/s, loss=4.95]


Validation Accuracy on line_drawings dataset after epoch 3: 35.00%
Finished validating on line_drawings dataset

Validating on line_drawings_inverted dataset


Epoch 1/3: 100%|██████████| 1/1 [00:00<00:00,  1.14batch/s, loss=4.67]


Validation Accuracy on line_drawings_inverted dataset after epoch 1: 45.00%


Epoch 2/3: 100%|██████████| 1/1 [00:00<00:00,  1.60batch/s, loss=4.67]


Validation Accuracy on line_drawings_inverted dataset after epoch 2: 45.00%


Epoch 3/3: 100%|██████████| 1/1 [00:00<00:00,  2.02batch/s, loss=4.67]


Validation Accuracy on line_drawings_inverted dataset after epoch 3: 45.00%
Finished validating on line_drawings_inverted dataset

Validating on silhouettes dataset


Epoch 1/3: 100%|██████████| 1/1 [00:00<00:00,  2.49batch/s, loss=5.02]


Validation Accuracy on silhouettes dataset after epoch 1: 40.00%


Epoch 2/3: 100%|██████████| 1/1 [00:00<00:00,  2.88batch/s, loss=5.02]


Validation Accuracy on silhouettes dataset after epoch 2: 40.00%


Epoch 3/3: 100%|██████████| 1/1 [00:00<00:00,  2.90batch/s, loss=5.02]


Validation Accuracy on silhouettes dataset after epoch 3: 40.00%
Finished validating on silhouettes dataset

Validating on silhouettes_inverted dataset


Epoch 1/3: 100%|██████████| 1/1 [00:00<00:00,  2.77batch/s, loss=9.11]


Validation Accuracy on silhouettes_inverted dataset after epoch 1: 20.00%


Epoch 2/3: 100%|██████████| 1/1 [00:00<00:00,  2.99batch/s, loss=9.11]


Validation Accuracy on silhouettes_inverted dataset after epoch 2: 20.00%


Epoch 3/3: 100%|██████████| 1/1 [00:00<00:00,  3.03batch/s, loss=9.11]

Validation Accuracy on silhouettes_inverted dataset after epoch 3: 20.00%
Finished validating on silhouettes_inverted dataset

Validation accuracies for contours dataset over 3 epochs: [10.0, 10.0, 10.0]
Validation accuracies for contours_inverted dataset over 3 epochs: [15.0, 15.0, 15.0]
Validation accuracies for line_drawings dataset over 3 epochs: [35.0, 35.0, 35.0]
Validation accuracies for line_drawings_inverted dataset over 3 epochs: [45.0, 45.0, 45.0]
Validation accuracies for silhouettes dataset over 3 epochs: [40.0, 40.0, 40.0]
Validation accuracies for silhouettes_inverted dataset over 3 epochs: [20.0, 20.0, 20.0]





Here we calculate the average accuracy of all the sub directories, then we use that average and the model's original accuracy on the CIFAR 10 dataset to calculate the shape bias of the vgg model.
This is obtained via the formula: avg_accuracy/cifar10accuracy

In [None]:
total_accuracy_sum = 0
accuracy_count = 0

for subdir, accuracies in accuracy_dict.items():
    total_accuracy_sum += sum(accuracies)
    accuracy_count += len(accuracies)

average_accuracy = total_accuracy_sum / accuracy_count

final_value = average_accuracy / cifar10acc

print(f"Shape Bias using the CIFAR10G Dataset turn out to be: {final_value:.4f}")

Shape Bias using the CIFAR10G Dataset turn out to be: 0.3290


**TEXTURE BIAS:** This dataset is a variation of CIFAR10 with texture differences.

In [None]:
!unzip texture_bias_dataset.zip

Archive:  texture_bias_dataset.zip
   creating: airplane/
   creating: automobile/
   creating: bird/
   creating: cat/
   creating: deer/
   creating: dog/
   creating: frog/
   creating: horse/
   creating: ship/
   creating: truck/
  inflating: ship/094.png            
  inflating: ship/004.png            
  inflating: ship/031.png            
  inflating: ship/099.png            
  inflating: ship/037.png            
  inflating: ship/001.png            
  inflating: ship/097.png            
  inflating: ship/076.png            
  inflating: ship/100.png            
  inflating: ship/048.png            
  inflating: ship/007.png            
  inflating: ship/058.png            
  inflating: ship/056.png            
  inflating: ship/096.png            
  inflating: ship/079.png            
  inflating: ship/071.png            
  inflating: ship/026.png            
  inflating: ship/002.png            
  inflating: ship/068.png            
  inflating: ship/013.png            
  inf

Our code uses a CIFAR10GCustomDataset class that reads the repo from github and creates a dataset that the vgg model is used to seeing and was trained on. furthermore, it divides the texture bias dataset into train and test splits, however we will only be using the test data.

In [None]:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
class CIFAR10GCustomDataset(Dataset):
    def __init__(self, root_dir, class_map, transform=None):
        self.root_dir = root_dir
        self.class_map = class_map
        self.transform = transform
        self.image_paths = []
        self.labels = []

        for class_name, class_idx in class_map.items():
            class_dir = os.path.join(root_dir, class_name)
            for img_name in os.listdir(class_dir):
                img_path = os.path.join(class_dir, img_name)
                self.image_paths.append(img_path)
                self.labels.append(class_idx)

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        label = self.labels[idx]
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, label

data_dir = './'

# Mapping folder names to class indices (CIFAR-10 class names)
class_map = {
    'airplane': 0,
    'automobile': 1,
    'bird': 2,
    'cat': 3,
    'deer': 4,
    'dog': 5,
    'frog': 6,
    'horse': 7,
    'ship': 8,
    'truck': 9
}

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

dataset = CIFAR10GCustomDataset(root_dir=data_dir, class_map=class_map, transform=transform)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4)



Here we evaluate the vgg model trained on CIFAR10 on the data which has texture changes.

In [None]:
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in tqdm(val_loader, desc="Validating", unit="batch"):
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Validation Accuracy: {accuracy:.2f}%')
print("Texture Bias is: ", accuracy/cifar10acc)

Validating: 100%|██████████| 7/7 [00:01<00:00,  4.69batch/s]

Validation Accuracy: 37.50%
Texture Bias is:  0.4486719310839914





**COLOUR BIAS**. in the following cell we do:

*   Load the MNIST dataset
*   Freeze the model's backbone and get it ready for finetuning
*   Divide the dataset into training and test sets






In [None]:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.Grayscale(3),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=4)

valset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
valloader = DataLoader(valset, batch_size=32, shuffle=False, num_workers=4)

model2 = models.vgg19(weights=models.VGG19_Weights.IMAGENET1K_V1)


for param in model2.parameters():
    param.requires_grad = False


num_ftrs = model2.classifier[6].in_features
model2.classifier[6] = nn.Linear(num_ftrs, 10)

model2.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model2.classifier[6].parameters(), lr=0.001)




Here we start finetuning the vgg model for 3 epochs

In [None]:

num_epochs = 3
model2.train()

for epoch in range(num_epochs):
    running_loss = 0.0
    with tqdm(total=len(trainloader), desc=f'Epoch {epoch + 1}/{num_epochs}', unit='batch') as pbar:
        for i, (images, labels) in enumerate(trainloader):

            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()

            outputs = model2(images)

            loss = criterion(outputs, labels)

            loss.backward()

            optimizer.step()

            running_loss += loss.item()

            pbar.set_postfix(loss=running_loss / (i + 1))
            pbar.update(1)

    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss / len(trainloader):.4f}')

print("Training complete!")


Epoch 1/3: 100%|██████████| 1875/1875 [05:51<00:00,  5.33batch/s, loss=0.479]


Epoch [1/3], Loss: 0.4792


Epoch 2/3: 100%|██████████| 1875/1875 [05:55<00:00,  5.28batch/s, loss=0.391]


Epoch [2/3], Loss: 0.3908


Epoch 3/3: 100%|██████████| 1875/1875 [05:55<00:00,  5.27batch/s, loss=0.378]

Epoch [3/3], Loss: 0.3780
Training complete!





Here we validate on the MNIST test set to check whether our model was properly trained or not

In [None]:

model2.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in tqdm(valloader, desc="Validating", unit="batch"):
        images, labels = images.to(device), labels.to(device)

        outputs = model2(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Validation Accuracy: {accuracy:.2f}%')

Validating: 100%|██████████| 313/313 [00:59<00:00,  5.24batch/s]

Validation Accuracy: 94.54%





Now we repeat the process but this time we use the colourized MNIST dataset and we avoid finetuning on this new dataset

In [None]:

repo_url = 'https://github.com/jayaneetha/colorized-MNIST.git'
clone_dir = 'colorized-MNIST'

if not os.path.exists(clone_dir):
    print("Cloning the colorized MNIST repository...")
    Repo.clone_from(repo_url, clone_dir)
    print("Repository cloned successfully.")
else:
    print("Repository already exists. Skipping clone.")

test_data_dir = os.path.join(clone_dir, 'testing')

transform_color = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

colorized_mnist_test = datasets.ImageFolder(root=test_data_dir, transform=transform_color)
testloader = DataLoader(colorized_mnist_test, batch_size=32, shuffle=False, num_workers=4)



Cloning the colorized MNIST repository...
Repository cloned successfully.




We evaluate our model (trained on MNIST) on the colourized MNIST dataset to observe colour biases.


*   colour bias = colorizedMNISTaccuracy/MNISTaccuracy




In [None]:
model2.eval()

correct = 0
total = 0
with torch.no_grad():
    for images, labels in tqdm(testloader, desc="Evaluating on Colorized MNIST", unit="batch"):
        images, labels = images.to(device), labels.to(device)
        outputs = model2(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy2 = 100 * correct / total
print(f'Accuracy on Colorized MNIST test set: {accuracy2:.2f}%')

Evaluating on Colorized MNIST: 100%|██████████| 313/313 [00:58<00:00,  5.35batch/s]

Accuracy on Colorized MNIST test set: 38.45%





In [None]:
color_bias = accuracy2/accuracy
print("The colour bias is: ",color_bias)

The colour bias is:  0.4067061561243918


#  TASK 5 Inductive Biases of Models: Locality Biases


In [13]:
!unzip noised_cifar10_test.zip

unzip:  cannot find or open noised_cifar10_test.zip, noised_cifar10_test.zip.zip or noised_cifar10_test.zip.ZIP.


Now we will use more variations of the CIFAR10 dataset to notice the effects of different biases on the accuracy of a model

*   We will once again use a CIFAR10GCustomDataset class that converts our modified cifar10 dataset into a form that our model understands
*   We also use this entire dataset as testing data as we do not have to finetune our model on these sets again



In [12]:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
class CIFAR10GCustomDataset(Dataset):
    def __init__(self, root_dir, class_map, transform=None):
        self.root_dir = root_dir
        self.class_map = class_map
        self.transform = transform
        self.image_paths = []
        self.labels = []

        for class_name, class_idx in class_map.items():
            class_dir = os.path.join(root_dir, class_name)
            for img_name in os.listdir(class_dir):
                img_path = os.path.join(class_dir, img_name)
                self.image_paths.append(img_path)
                self.labels.append(class_idx)

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        label = self.labels[idx]
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, label

data_dir = './'

# Mapping folder names to class indices (CIFAR-10 class names)
class_map = {
    'airplane': 0,
    'automobile': 1,
    'bird': 2,
    'cat': 3,
    'deer': 4,
    'dog': 5,
    'frog': 6,
    'horse': 7,
    'ship': 8,
    'truck': 9
}

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

dataset = CIFAR10GCustomDataset(root_dir=data_dir, class_map=class_map, transform=transform)
test_loader = DataLoader(dataset, batch_size=32, shuffle=False, num_workers=4)


FileNotFoundError: [Errno 2] No such file or directory: './airplane'

Here we validate the model on this dataset with added noise. we will then compare the accuracies with and without noise in the cifar10 dataset and evaluate how much the accuracy drops of our model

In [None]:
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in tqdm(test_loader, desc="Validating", unit="batch"):

        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Validation Accuracy: {accuracy:.2f}%')
print("Change in accuracy: ", cifar10acc - accuracy)


Validating: 100%|██████████| 313/313 [00:58<00:00,  5.34batch/s]

Validation Accuracy: 67.06%
Change in accuracy:  16.519999999999996





In [14]:
!unzip scrambled_cifar10_test.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: ship/4361.png           
  inflating: ship/8785.png           
  inflating: ship/2874.png           
  inflating: ship/4846.png           
  inflating: ship/4351.png           
  inflating: ship/3848.png           
  inflating: ship/4769.png           
  inflating: ship/202.png            
  inflating: ship/7144.png           
  inflating: ship/8704.png           
  inflating: ship/4497.png           
  inflating: ship/1366.png           
  inflating: ship/6691.png           
  inflating: ship/9720.png           
  inflating: ship/5879.png           
  inflating: ship/6961.png           
  inflating: ship/6747.png           
  inflating: ship/8608.png           
  inflating: ship/8303.png           
  inflating: ship/6474.png           
  inflating: ship/3243.png           
  inflating: ship/4389.png           
  inflating: ship/1897.png           
  inflating: ship/8208.png           
  inflating: ship/1358.

**Now we look at the effect of scrambled images on the accuracy**:
Here we load in the cifar10 dataset with scrambled images

In [15]:
dataset = CIFAR10GCustomDataset(root_dir=data_dir, class_map=class_map, transform=transform)

test_loader = DataLoader(dataset, batch_size=32, shuffle=False, num_workers=4)



Here we validate the model on this dataset with scrambled. we will then compare the accuracies with and without scrambled images in the cifar10 dataset and evaluate how much the accuracy drops of our model

In [17]:
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in tqdm(test_loader, desc="Validating", unit="batch"):

        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total

print(f'Validation Accuracy: {accuracy:.2f}%')
print("Change in accuracy: ", cifar10acc - accuracy)


Validating: 100%|██████████| 313/313 [00:59<00:00,  5.22batch/s]

Validation Accuracy: 20.56%
Change in accuracy:  62.41





In [7]:
!unzip cifar10_styled_100.zip

Archive:  cifar10_styled_100.zip
  inflating: styled_image_0_20.png   
  inflating: styled_image_5_45.png   
  inflating: styled_image_2_59.png   
  inflating: styled_image_9_71.png   
  inflating: styled_image_0_72.png   
  inflating: styled_image_9_29.png   
  inflating: styled_image_6_69.png   
  inflating: styled_image_7_28.png   
  inflating: styled_image_1_78.png   
  inflating: styled_image_6_84.png   
  inflating: styled_image_6_80.png   
  inflating: styled_image_1_75.png   
  inflating: styled_image_2_47.png   
  inflating: styled_image_7_12.png   
  inflating: styled_image_5_93.png   
  inflating: styled_image_7_55.png   
  inflating: styled_image_6_13.png   
  inflating: styled_image_2_86.png   
  inflating: styled_image_8_44.png   
  inflating: styled_image_6_54.png   
  inflating: styled_image_3_65.png   
  inflating: styled_image_0_27.png   
  inflating: styled_image_9_48.png   
  inflating: styled_image_6_42.png   
  inflating: styled_image_2_38.png   
  inflating: styl

Global Style changes: First we created a dataset that had a style change modelled on cifar 10. the photo we used to introduce a style change was starry night by van gogh. We evaluate our model (trained on CIFAR10) on these new images. Moreover, we then calculate the change in accuracy after introducing a style change

In [14]:
class StyledImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform

        self.image_filenames = [f for f in os.listdir(root_dir) if f.startswith('styled_image_')]

    def __len__(self):
        return len(self.image_filenames)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir, self.image_filenames[idx])
        image = Image.open(img_name).convert('RGB')
        label = int(self.image_filenames[idx].split('_')[2].split('.')[0])

        if self.transform:
            image = self.transform(image)

        return image, label

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

styled_images_path = './'
dataset = StyledImageDataset(root_dir=styled_images_path, transform=transform)
dataloader = DataLoader(dataset, batch_size=1, shuffle=False)

model.to(device)
model.eval()

correct = 0
total = 0

with torch.no_grad():
    for images, labels in dataloader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

acc = 100 * correct / total
print(f'Accuracy: {100 * correct / total:.2f}%')


Accuracy: 19.00%


In [17]:
print("Drop in accuracy is: ", cifar10acc - acc)

Drop in accuracy is:  63.97
