<h1> ECE4179 - Semi-Supervised Learning Project</h1>
<h2>Data</h2>

We will be using a dataset that can be obtained directly from the torchvision package. There are 10 classes and we will be training a CNN for the image classification task. We have training, validation and test sets that are labelled with the class, and a large unlabeled set.

We will simulating a low training data scenario by only sampling a small percentage of the labelled data (10%) as training data. The remaining examples will be used as the validation set.

To get the labelled data, change the dataset_dir to something suitable for your machine, and execute the following (you will then probably want to wrap the dataset objects in a PyTorch DataLoader):

In [31]:
import torch
import torch.nn as nn
from torchvision.datasets import STL10 as STL10
import torchvision.transforms as transforms
from torch.utils.data import random_split
import torchvision
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
from torch.utils.data import Subset
from copy import deepcopy
from torch.optim import Adam
import torch.optim as optim
from torchvision import models
from sklearn.metrics import f1_score, classification_report
import torch.nn.functional as F
import csv
import os
import random
import math

####### CHANGE TO APPROPRIATE DIRECTORY TO STORE DATASET
dataset_dir = "../../CNN-VAE/data"
#For MonARCH
# dataset_dir = "/mnt/lustre/projects/ds19/SHARED"

#All images are 3x96x96
image_size = 96
#Example batch size
batch_size = 16
# Define the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
print(torch.cuda.is_available())  # Should return True
# Define the number of classes
num_classes = 10
num_epochs = 10
learning_rate = 0.001

Using device: cuda
True


<h3>Create the appropriate transforms</h3>

In [2]:
#Perform random crops and mirroring for data augmentation
transform_train = transforms.Compose(
    [transforms.RandomCrop(image_size, padding=4),
     transforms.RandomHorizontalFlip(p=0.5),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

transform_unlabelled = transforms.Compose(
    [transforms.RandomHorizontalFlip(p=0.5),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

#No random 
transform_test = transforms.Compose(
    [transforms.CenterCrop(image_size),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])


<h3>Create training and validation split</h3>

In [3]:
#Load train and validation sets
trainval_set = STL10(dataset_dir, split='train', transform=transform_train, download=True)

#Use 10% of data for training - simulating low data scenario
num_train = int(len(trainval_set)*0.1)

#Split data into train/val sets
torch.manual_seed(0) #Set torch's random seed so that random split of data is reproducible
train_set, val_set = random_split(trainval_set, [num_train, len(trainval_set)-num_train])

#Load test set
test_set = STL10(dataset_dir, split='test', transform=transform_test, download=False)

Files already downloaded and verified


<h3>Get the unlabelled data</h3>

In [4]:
unlabelled_set = STL10(dataset_dir, split='unlabeled', transform=transform_unlabelled, download=True)

Files already downloaded and verified


### Print the length of unlabelled data

In [5]:
len(unlabelled_set)

100000

### Only get the 1/1000 for unlabled data

In [6]:
# Determine the size of the subset (1/1000 of the full dataset)
subset_size = len(unlabelled_set) // 1000  # This will be 100 samples

# Randomly select indices for the subset
random_indices = random.sample(range(len(unlabelled_set)), subset_size)

# Create a subset of the unlabelled dataset
unlabelled_subset = Subset(unlabelled_set, random_indices)

# Now, create the DataLoader using the subset
unlabelled_loader = DataLoader(unlabelled_subset, shuffle=True, batch_size=batch_size, num_workers=2)

You may find later that you want to make changes to how the unlabelled data is loaded. This might require you sub-classing the STL10 class used above or to create your own dataloader similar to the Pytorch one.
https://pytorch.org/docs/stable/_modules/torchvision/datasets/stl10.html#STL10

<h3>Create the four dataloaders</h3>

In [7]:
train_loader = DataLoader(train_set, shuffle=True, batch_size=batch_size, num_workers=2)

valid_loader = DataLoader(val_set, batch_size=batch_size, num_workers=2)
test_loader = DataLoader(test_set, batch_size=batch_size, num_workers=2)

<h3>Accuracy</h3>

In [8]:
# Define the test function
def test_model(model, test_loader):
    # Define the device inside the function
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    model.to(device)  # Move the model to the appropriate device
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0

    with torch.no_grad():  # Disable gradient calculation
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)  # Move data to the appropriate device
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy}%")


<h3>Marco F1 Score</h3>

In [9]:
# Define the test function to calculate F1 score
def test_model_with_f1(model, test_loader):
    # Define the device inside the function
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    model.to(device)  # Move the model to the appropriate device
    model.eval()  # Set model to evaluation mode
    
    all_labels = []
    all_preds = []

    with torch.no_grad():  # Disable gradient calculation
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            
            # Collect all predictions and labels for F1-score calculation
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(predicted.cpu().numpy())

    # Calculate the Macro F1-score for each class
    f1 = f1_score(all_labels, all_preds, average='macro')
    
    # Alternatively, you can get a detailed report for all classes
    report = classification_report(all_labels, all_preds, target_names=[f"Class {i}" for i in range(10)])
    
    print(f"Macro F1-score: {f1}")
    print("Classification Report:\n", report)

## Network

Let's use a ResNet18 architecture for our CNN...

### Define the training function

In [29]:
# Define the Autoencoder class with ResNet, EfficientNet, and ViT implementations
class Autoencoder(nn.Module):
    def __init__(self, base_model, model_name='resnet'):
        super(Autoencoder, self).__init__()
        self.model_name = model_name

        if model_name == 'resnet':
            # Directly assign the encoder layers from the base model
            self.conv1 = base_model.conv1
            self.bn1 = base_model.bn1
            self.relu = base_model.relu
            self.maxpool = base_model.maxpool
            self.layer1 = base_model.layer1
            self.layer2 = base_model.layer2
            self.layer3 = base_model.layer3
            self.layer4 = base_model.layer4
            # Encoder output channels
            encoder_output_dim = 512

            # Define decoder layers
            self.decoder = nn.Sequential(
                nn.ConvTranspose2d(encoder_output_dim, 256, kernel_size=4, stride=2, padding=1),
                nn.BatchNorm2d(256),
                nn.ReLU(True),
                nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1),
                nn.BatchNorm2d(128),
                nn.ReLU(True),
                nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),
                nn.BatchNorm2d(64),
                nn.ReLU(True),
                nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1),
                nn.BatchNorm2d(32),
                nn.ReLU(True),
                nn.ConvTranspose2d(32, 3, kernel_size=4, stride=2, padding=1),
                nn.Sigmoid()
            )

        elif model_name == 'efficientnet':
            # Directly assign the encoder layers from the base model
            self.features = base_model.features
            # Encoder output channels (depends on the EfficientNet version)
            encoder_output_dim = 1280  # For EfficientNet-B0

            # Define decoder layers
            self.decoder = nn.Sequential(
                nn.ConvTranspose2d(encoder_output_dim, 512, kernel_size=3, stride=2, padding=1, output_padding=1),
                nn.BatchNorm2d(512),
                nn.ReLU(True),
                nn.ConvTranspose2d(512, 256, kernel_size=3, stride=2, padding=1, output_padding=1),
                nn.BatchNorm2d(256),
                nn.ReLU(True),
                nn.ConvTranspose2d(256, 128, kernel_size=3, stride=2, padding=1, output_padding=1),
                nn.BatchNorm2d(128),
                nn.ReLU(True),
                nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, output_padding=1),
                nn.BatchNorm2d(64),
                nn.ReLU(True),
                nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1),
                nn.Sigmoid()
            )

        elif model_name == 'vit':
            # Directly assign the encoder layers from the base model
            self.conv_proj = base_model.conv_proj  # Patch embedding
            self.encoder_layers = base_model.encoder  # Transformer encoder
            self.class_token = base_model.class_token  # [CLS] token
            self.encoder_norm = base_model.encoder.ln  # Layer norm after encoder
            # ViT parameters
            self.hidden_dim = base_model.hidden_dim  # Hidden dimension (e.g., 768)
            self.image_size = base_model.image_size  # Input image size (e.g., 224)
            self.patch_size = base_model.patch_size  # Patch size (e.g., 16)
            self.num_patches = (self.image_size // self.patch_size) ** 2  # Number of patches

            # Decoder: map encoded tokens back to image patches
            self.decoder_linear = nn.Linear(self.hidden_dim, self.patch_size * self.patch_size * 3)

        else:
            raise ValueError("Model name not recognized.")

    def encoder(self, x):
        if self.model_name == 'resnet':
            x = self.conv1(x)
            x = self.bn1(x)
            x = self.relu(x)
            x = self.maxpool(x)

            x = self.layer1(x)
            x = self.layer2(x)
            x = self.layer3(x)
            x = self.layer4(x)
            return x

        elif self.model_name == 'efficientnet':
            x = self.features(x)
            return x

        elif self.model_name == 'vit':
            # Patch embedding
            x = self.conv_proj(x)  # [batch_size, hidden_dim, H', W']
            x = x.flatten(2).transpose(1, 2)  # [batch_size, num_patches, hidden_dim]
            # Add class token
            batch_size = x.size(0)
            cls_tokens = self.class_token.expand(batch_size, -1, -1)  # [batch_size, 1, hidden_dim]
            x = torch.cat((cls_tokens, x), dim=1)  # [batch_size, num_patches+1, hidden_dim]
            # Encoder
            x = self.encoder_layers(x)
            x = self.encoder_norm(x)
            return x  # [batch_size, num_patches+1, hidden_dim]

    def forward(self, x):
        x = self.encoder(x)

        if self.model_name in ['resnet', 'efficientnet']:
            # Decoder expects input of shape [batch_size, encoder_output_dim, H, W]
            x = self.decoder(x)
            return x

        elif self.model_name == 'vit':
            # Remove class token
            x = x[:, 1:, :]  # [batch_size, num_patches, hidden_dim]
            # Decoder: map tokens back to patches
            x = self.decoder_linear(x)  # [batch_size, num_patches, patch_size*patch_size*3]
            # Reshape to image patches
            batch_size = x.size(0)
            x = x.view(batch_size, self.num_patches, 3, self.patch_size, self.patch_size)
            # Rearrange patches into images
            x = x.permute(0, 2, 1, 3, 4)  # [batch_size, 3, num_patches, patch_size, patch_size]
            grid_size = int(math.sqrt(self.num_patches))
            x = x.reshape(batch_size, 3, grid_size, grid_size, self.patch_size, self.patch_size)
            x = x.permute(0, 1, 2, 4, 3, 5)
            x = x.reshape(batch_size, 3, self.image_size, self.image_size)
            return x


In [11]:
# Training function for Autoencoder
def train_autoencoder(autoencoder, unlabelled_loader, num_epochs=10, learning_rate=0.001):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    autoencoder = autoencoder.to(device)

    criterion = nn.MSELoss()
    optimizer = optim.Adam(autoencoder.parameters(), lr=learning_rate)

    autoencoder.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for inputs, _ in unlabelled_loader:
            inputs = inputs.to(device)

            optimizer.zero_grad()

            outputs = autoencoder(inputs)
            loss = criterion(outputs, inputs)

            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        avg_loss = running_loss / len(unlabelled_loader)
        print(f"Autoencoder Training Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}")
    return autoencoder

In [12]:
# Main training function with Autoencoder and Transfer Learning
def train_model_with_autoencoder_and_grid_search(
    model,
    train_loader,
    valid_loader,
    unlabelled_loader,
    num_classes,
    num_epochs=10,
    learning_rate=0.001,
    log_filename='training_log.csv',
    model_name='resnet',
    batch_size=64
):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    # Ensure 'logs' directory exists for saving the best model
    if not os.path.exists('logs'):
        os.makedirs('logs')

    ### Phase 1: Initial Training on Labeled Data ###

    print("Starting Phase 1: Initial Training on Labeled Data...")
    # Copy the model for initial training
    initial_model = deepcopy(model).to(device)

    # Define loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(initial_model.parameters(), lr=learning_rate)

    # Training loop
    initial_model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for inputs, labels in train_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()

            outputs = initial_model(inputs)
            loss = criterion(outputs, labels)

            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        avg_loss = running_loss / len(train_loader)
        print(f"Initial Training Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}")

    ### Phase 2: Train Autoencoder on Unlabeled Data ###

    print("Starting Phase 2: Training Autoencoder on Unlabeled Data...")
    # Initialize autoencoder with the initial model's encoder
    autoencoder = Autoencoder(initial_model, model_name=model_name).to(device)

    # Train the autoencoder
    autoencoder = train_autoencoder(autoencoder, unlabelled_loader, num_epochs=num_epochs, learning_rate=learning_rate)

    ### Phase 3: Use Encoder Weights from Autoencoder ###

    print("Starting Phase 3: Updating Model with Autoencoder Encoder...")
    # Update initial_model's encoder weights with autoencoder's encoder weights

    # Get the encoder state_dict from the autoencoder
    encoder_state_dict = {}
    for name, param in autoencoder.state_dict().items():
        if name in initial_model.state_dict() and 'decoder' not in name:
            encoder_state_dict[name] = param

    # Update initial_model's state_dict with the encoder weights
    initial_model_state_dict = initial_model.state_dict()
    initial_model_state_dict.update(encoder_state_dict)

    # Load the updated state_dict into initial_model
    initial_model.load_state_dict(initial_model_state_dict)

    ### Phase 4: Transfer Learning with Grid Search ###

    print("Starting Phase 4: Transfer Learning with Grid Search...")
    # Define different layer unfreezing configurations based on model type
    if model_name == 'resnet':
        unfreeze_configs = {
            'fc': ['fc'],
            'fc+layer4': ['layer4', 'fc'],
            'fc+layer3+layer4': ['layer3', 'layer4', 'fc'],
        }
    elif model_name == 'efficientnet':
        unfreeze_configs = {
            'fc': ['classifier.1'],
            'fc+features8': ['features.8', 'classifier.1'],
            'fc+features7+features8': ['features.7', 'features.8', 'classifier.1'],
        }
    elif model_name == 'vit':
        unfreeze_configs = {
            'fc': ['heads.head'],
            'fc+encoder11': ['encoder.layers.encoder_layer_11', 'heads.head'],
            'fc+encoder10+encoder11': ['encoder.layers.encoder_layer_10', 'encoder.layers.encoder_layer_11', 'heads.head'],
        }
    else:
        raise ValueError("Model name not recognized.")

    best_f1 = 0.0
    best_config = ''
    best_model_state = None

    # Open log file for recording training progress
    with open(os.path.join('logs', log_filename), mode='w', newline='') as log_file:
        log_writer = csv.writer(log_file)
        log_writer.writerow(['Configuration', 'Epoch', 'Training Loss', 'Validation Macro F1'])

        for config_name, layers_to_unfreeze in unfreeze_configs.items():
            print(f"\nStarting Transfer Learning Phase with configuration: {config_name}")

            # Create a new model by copying the initial model
            finetune_model = deepcopy(initial_model)

            # Modify the final layer based on the model's layer names
            if model_name == 'resnet':
                feature_dim = finetune_model.fc.in_features
                finetune_model.fc = nn.Linear(feature_dim, num_classes)
            elif model_name == 'efficientnet':
                feature_dim = finetune_model.classifier[1].in_features
                finetune_model.classifier[1] = nn.Linear(feature_dim, num_classes)
            elif model_name == 'vit':
                feature_dim = finetune_model.heads.head.in_features
                finetune_model.heads.head = nn.Linear(feature_dim, num_classes)
            else:
                raise ValueError("Model name not recognized.")

            # Freeze all layers first
            for param in finetune_model.parameters():
                param.requires_grad = False

            # Unfreeze specified layers
            for name, param in finetune_model.named_parameters():
                for layer_name in layers_to_unfreeze:
                    if name.startswith(layer_name):
                        param.requires_grad = True
                        print(f"Unfreezing layer: {name}")

            finetune_model = finetune_model.to(device)

            # Define loss function and optimizer for fine-tuning
            criterion = nn.CrossEntropyLoss()
            optimizer = optim.Adam(filter(lambda p: p.requires_grad, finetune_model.parameters()), lr=learning_rate)

            # Fine-tuning training loop
            for epoch in range(num_epochs):
                finetune_model.train()
                running_loss = 0.0
                for inputs, labels in train_loader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    optimizer.zero_grad()
                    outputs = finetune_model(inputs)
                    loss = criterion(outputs, labels)
                    loss.backward()
                    optimizer.step()
                    running_loss += loss.item()
                avg_loss = running_loss / len(train_loader)

                # Validation
                finetune_model.eval()
                all_labels = []
                all_preds = []
                with torch.no_grad():
                    for inputs, labels in valid_loader:
                        inputs, labels = inputs.to(device), labels.to(device)
                        outputs = finetune_model(inputs)
                        _, predicted = torch.max(outputs, 1)
                        all_labels.extend(labels.cpu().numpy())
                        all_preds.extend(predicted.cpu().numpy())

                # Calculate Macro F1 score
                f1 = f1_score(all_labels, all_preds, average='macro')

                # Log results
                log_writer.writerow([config_name, epoch + 1, avg_loss, f1])
                print(f"Config: {config_name}, Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}, Validation Macro F1: {f1:.4f}")

            # Update best model if current config is better
            if f1 > best_f1:
                best_f1 = f1
                best_config = config_name
                best_model_state = deepcopy(finetune_model.state_dict())

    print(f"\nBest Configuration: {best_config} with Macro F1 Score: {best_f1}")
    # Save the best model
    torch.save(best_model_state, os.path.join('logs', f"best_model_{model_name}.pth"))

## ResNet18

In [13]:
# We will keep this for later
model0 = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)


for name, param in model0.named_parameters():
    print(f"Name: {name}, Shape: {param.shape}")

Using cache found in C:\Users\weita/.cache\torch\hub\pytorch_vision_v0.10.0


Name: conv1.weight, Shape: torch.Size([64, 3, 7, 7])
Name: bn1.weight, Shape: torch.Size([64])
Name: bn1.bias, Shape: torch.Size([64])
Name: layer1.0.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Name: layer1.0.bn1.weight, Shape: torch.Size([64])
Name: layer1.0.bn1.bias, Shape: torch.Size([64])
Name: layer1.0.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Name: layer1.0.bn2.weight, Shape: torch.Size([64])
Name: layer1.0.bn2.bias, Shape: torch.Size([64])
Name: layer1.1.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Name: layer1.1.bn1.weight, Shape: torch.Size([64])
Name: layer1.1.bn1.bias, Shape: torch.Size([64])
Name: layer1.1.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Name: layer1.1.bn2.weight, Shape: torch.Size([64])
Name: layer1.1.bn2.bias, Shape: torch.Size([64])
Name: layer2.0.conv1.weight, Shape: torch.Size([128, 64, 3, 3])
Name: layer2.0.bn1.weight, Shape: torch.Size([128])
Name: layer2.0.bn1.bias, Shape: torch.Size([128])
Name: layer2.0.conv2.weight, Shape: torch.Size(

In [14]:
# Example usage with ResNet18
model_resnet18 = deepcopy(model0)  # assuming model0 is a pretrained resnet18
model_resnet18 = model_resnet18.to(device)
train_model_with_autoencoder_and_grid_search(
    model=model_resnet18,
    train_loader=train_loader,
    valid_loader=valid_loader,
    unlabelled_loader=unlabelled_loader,
    num_classes=num_classes,
    num_epochs=num_epochs,
    learning_rate=learning_rate,
    log_filename='resnet18_training_log.csv',
    model_name='resnet',
    batch_size=batch_size
)


Using device: cuda
Starting Phase 1: Initial Training on Labeled Data...
Initial Training Epoch [1/10], Loss: 3.5700
Initial Training Epoch [2/10], Loss: 1.6348
Initial Training Epoch [3/10], Loss: 1.0263
Initial Training Epoch [4/10], Loss: 0.9492
Initial Training Epoch [5/10], Loss: 1.2221
Initial Training Epoch [6/10], Loss: 0.8824
Initial Training Epoch [7/10], Loss: 0.6621
Initial Training Epoch [8/10], Loss: 0.7683
Initial Training Epoch [9/10], Loss: 0.9326
Initial Training Epoch [10/10], Loss: 0.7037
Starting Phase 2: Training Autoencoder on Unlabeled Data...
Autoencoder Training Epoch [1/10], Loss: 0.6369
Autoencoder Training Epoch [2/10], Loss: 0.5238
Autoencoder Training Epoch [3/10], Loss: 0.4404
Autoencoder Training Epoch [4/10], Loss: 0.3997
Autoencoder Training Epoch [5/10], Loss: 0.3601
Autoencoder Training Epoch [6/10], Loss: 0.3378
Autoencoder Training Epoch [7/10], Loss: 0.3268
Autoencoder Training Epoch [8/10], Loss: 0.3245
Autoencoder Training Epoch [9/10], Loss: 0

In [15]:
# Initialize the ResNet18 model
best_model_resnet = models.resnet18(pretrained=False)
best_model_resnet.fc = nn.Linear(best_model_resnet.fc.in_features, num_classes)
best_model_resnet = best_model_resnet.to(device)

# Load the best model weights
best_model_resnet.load_state_dict(torch.load(f'logs/best_model_resnet.pth'))

# Set the model to evaluation mode
best_model_resnet.eval()

  best_model_resnet.load_state_dict(torch.load(f'logs/best_model_resnet.pth'))


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [16]:
# Call the test functions
test_model(best_model_resnet, test_loader)

Test Accuracy: 46.4125%


In [17]:
test_model_with_f1(best_model_resnet, test_loader)

Macro F1-score: 0.4616945313204985
Classification Report:
               precision    recall  f1-score   support

     Class 0       0.62      0.62      0.62       800
     Class 1       0.39      0.45      0.42       800
     Class 2       0.77      0.41      0.54       800
     Class 3       0.35      0.32      0.34       800
     Class 4       0.36      0.66      0.46       800
     Class 5       0.28      0.33      0.30       800
     Class 6       0.56      0.38      0.45       800
     Class 7       0.62      0.23      0.33       800
     Class 8       0.63      0.64      0.64       800
     Class 9       0.45      0.60      0.52       800

    accuracy                           0.46      8000
   macro avg       0.50      0.46      0.46      8000
weighted avg       0.50      0.46      0.46      8000



## EfficientNet

In [18]:
# Load pretrained EfficientNet-B0 model from torchvision hub
model1 = torch.hub.load('pytorch/vision', 'efficientnet_b0', weights="EfficientNet_B0_Weights.IMAGENET1K_V1")

for name, param in model1.named_parameters():
    print(f"Name: {name}, Shape: {param.shape}")

Name: features.0.0.weight, Shape: torch.Size([32, 3, 3, 3])
Name: features.0.1.weight, Shape: torch.Size([32])
Name: features.0.1.bias, Shape: torch.Size([32])
Name: features.1.0.block.0.0.weight, Shape: torch.Size([32, 1, 3, 3])
Name: features.1.0.block.0.1.weight, Shape: torch.Size([32])
Name: features.1.0.block.0.1.bias, Shape: torch.Size([32])
Name: features.1.0.block.1.fc1.weight, Shape: torch.Size([8, 32, 1, 1])
Name: features.1.0.block.1.fc1.bias, Shape: torch.Size([8])
Name: features.1.0.block.1.fc2.weight, Shape: torch.Size([32, 8, 1, 1])
Name: features.1.0.block.1.fc2.bias, Shape: torch.Size([32])
Name: features.1.0.block.2.0.weight, Shape: torch.Size([16, 32, 1, 1])
Name: features.1.0.block.2.1.weight, Shape: torch.Size([16])
Name: features.1.0.block.2.1.bias, Shape: torch.Size([16])
Name: features.2.0.block.0.0.weight, Shape: torch.Size([96, 16, 1, 1])
Name: features.2.0.block.0.1.weight, Shape: torch.Size([96])
Name: features.2.0.block.0.1.bias, Shape: torch.Size([96])
Nam

Using cache found in C:\Users\weita/.cache\torch\hub\pytorch_vision_main


In [19]:
# Example usage with EfficientNetB0
model_efficientnetb0 = deepcopy(model1)  # assuming model1 is a pretrained efficientnetb0
model_efficientnetb0 = model_efficientnetb0.to(device)
# Call the training function
train_model_with_autoencoder_and_grid_search(
    model=model_efficientnetb0,
    train_loader=train_loader,
    valid_loader=valid_loader,
    unlabelled_loader=unlabelled_loader,
    num_classes=num_classes,
    num_epochs=num_epochs,
    learning_rate=learning_rate,
    log_filename='efficientnetb0_training_log.csv',
    model_name='efficientnet',
    batch_size=batch_size
)

Using device: cuda
Starting Phase 1: Initial Training on Labeled Data...
Initial Training Epoch [1/10], Loss: 3.9637
Initial Training Epoch [2/10], Loss: 1.2338
Initial Training Epoch [3/10], Loss: 0.8843
Initial Training Epoch [4/10], Loss: 0.6483
Initial Training Epoch [5/10], Loss: 0.7510
Initial Training Epoch [6/10], Loss: 0.5487
Initial Training Epoch [7/10], Loss: 0.4331
Initial Training Epoch [8/10], Loss: 0.3330
Initial Training Epoch [9/10], Loss: 0.2836
Initial Training Epoch [10/10], Loss: 0.3497
Starting Phase 2: Training Autoencoder on Unlabeled Data...
Autoencoder Training Epoch [1/10], Loss: 0.6489
Autoencoder Training Epoch [2/10], Loss: 0.5431
Autoencoder Training Epoch [3/10], Loss: 0.4582
Autoencoder Training Epoch [4/10], Loss: 0.4067
Autoencoder Training Epoch [5/10], Loss: 0.3699
Autoencoder Training Epoch [6/10], Loss: 0.3471
Autoencoder Training Epoch [7/10], Loss: 0.3314
Autoencoder Training Epoch [8/10], Loss: 0.3236
Autoencoder Training Epoch [9/10], Loss: 0

In [20]:
# Initialize the EfficientNetB0 model
best_model_efficientnet = models.efficientnet_b0(pretrained=False)
best_model_efficientnet.classifier[1] = nn.Linear(best_model_efficientnet.classifier[1].in_features, num_classes)
best_model_efficientnet = best_model_efficientnet.to(device)

# Load the best model weights
best_config = 'fc+features8'  # Replace with your best configuration name
best_model_efficientnet.load_state_dict(torch.load(f'logs/best_model_efficientnet.pth'))

# Set the model to evaluation mode
best_model_efficientnet.eval()

  best_model_efficientnet.load_state_dict(torch.load(f'logs/best_model_efficientnet.pth'))


EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [21]:
# Call the test function
test_model(best_model_efficientnet, test_loader)

Test Accuracy: 62.4625%


In [22]:
# Call the function to calculate and print F1-scores
test_model_with_f1(best_model_efficientnet, test_loader)

Macro F1-score: 0.6246351501282467
Classification Report:
               precision    recall  f1-score   support

     Class 0       0.65      0.83      0.73       800
     Class 1       0.58      0.61      0.60       800
     Class 2       0.82      0.82      0.82       800
     Class 3       0.53      0.53      0.53       800
     Class 4       0.61      0.53      0.57       800
     Class 5       0.45      0.51      0.48       800
     Class 6       0.68      0.60      0.64       800
     Class 7       0.54      0.56      0.55       800
     Class 8       0.77      0.60      0.67       800
     Class 9       0.68      0.66      0.67       800

    accuracy                           0.62      8000
   macro avg       0.63      0.62      0.62      8000
weighted avg       0.63      0.62      0.62      8000



## Vision Transformer (ViT)

In [23]:
# Set image size to 224x224 to match the input size of ViT
transform_train = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to 224x224
    transforms.RandomCrop(224, padding=4),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

transform_unlabelled = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to 224x224
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

transform_test = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to 224x224
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

In [24]:
# Load train and validation sets without redownloading data
trainval_set = STL10(dataset_dir, split='train', transform=transform_train, download=False)

# Use 10% of the data for training (simulating a low data scenario)
num_train = int(len(trainval_set) * 0.1)

# Split data into train/validation sets with a fixed random seed
torch.manual_seed(0)  # Ensure reproducibility
train_set, val_set = random_split(trainval_set, [num_train, len(trainval_set) - num_train])

# Load test set without redownloading data
test_set = STL10(dataset_dir, split='test', transform=transform_test, download=False)

In [25]:
unlabelled_set = STL10(dataset_dir, split='unlabeled', transform=transform_unlabelled, download=False)

# Determine the size of the subset (1/1000 of the full dataset)
subset_size = len(unlabelled_set) // 1000  # This will be 100 samples

# Randomly select indices for the subset
random_indices = random.sample(range(len(unlabelled_set)), subset_size)

# Create a subset of the unlabelled dataset
unlabelled_subset = Subset(unlabelled_set, random_indices)

# Now, create the DataLoader using the subset
unlabelled_loader = DataLoader(unlabelled_subset, shuffle=True, batch_size=batch_size, num_workers=2)

In [26]:
# Create DataLoader for train, validation, and test sets
train_loader = DataLoader(train_set, shuffle=True, batch_size=batch_size, num_workers=2)

valid_loader = DataLoader(val_set, batch_size=batch_size, num_workers=2)
test_loader = DataLoader(test_set, batch_size=batch_size, num_workers=2)

In [27]:
# Load pretrained Vision Transformer (ViT) model from torchvision models
model2 = models.vit_b_16(pretrained=True)

# Print the model structure to verify the changes
for name, param in model2.named_parameters():
    print(f"Name: {name}, Shape: {param.shape}")



Name: class_token, Shape: torch.Size([1, 1, 768])
Name: conv_proj.weight, Shape: torch.Size([768, 3, 16, 16])
Name: conv_proj.bias, Shape: torch.Size([768])
Name: encoder.pos_embedding, Shape: torch.Size([1, 197, 768])
Name: encoder.layers.encoder_layer_0.ln_1.weight, Shape: torch.Size([768])
Name: encoder.layers.encoder_layer_0.ln_1.bias, Shape: torch.Size([768])
Name: encoder.layers.encoder_layer_0.self_attention.in_proj_weight, Shape: torch.Size([2304, 768])
Name: encoder.layers.encoder_layer_0.self_attention.in_proj_bias, Shape: torch.Size([2304])
Name: encoder.layers.encoder_layer_0.self_attention.out_proj.weight, Shape: torch.Size([768, 768])
Name: encoder.layers.encoder_layer_0.self_attention.out_proj.bias, Shape: torch.Size([768])
Name: encoder.layers.encoder_layer_0.ln_2.weight, Shape: torch.Size([768])
Name: encoder.layers.encoder_layer_0.ln_2.bias, Shape: torch.Size([768])
Name: encoder.layers.encoder_layer_0.mlp.0.weight, Shape: torch.Size([3072, 768])
Name: encoder.layers.

In [32]:
# Example usage with Vision Transformer (ViT)
model_vit = deepcopy(model2)  # assuming model2 is a pretrained Vision Transformer (ViT)
model_vit = model_vit.to(device)

# Call the training function
train_model_with_autoencoder_and_grid_search(
    model=model_vit,
    train_loader=train_loader,
    valid_loader=valid_loader,
    unlabelled_loader=unlabelled_loader,
    num_classes=num_classes,
    num_epochs=num_epochs,
    learning_rate=learning_rate,
    log_filename='vit_training_log.csv',
    model_name='vit',
    batch_size=batch_size
)


Using device: cuda
Starting Phase 1: Initial Training on Labeled Data...
Initial Training Epoch [1/10], Loss: 3.0538
Initial Training Epoch [2/10], Loss: 2.3295
Initial Training Epoch [3/10], Loss: 2.2346
Initial Training Epoch [4/10], Loss: 2.1957
Initial Training Epoch [5/10], Loss: 2.2515
Initial Training Epoch [6/10], Loss: 2.0731
Initial Training Epoch [7/10], Loss: 2.1408
Initial Training Epoch [8/10], Loss: 2.1249
Initial Training Epoch [9/10], Loss: 2.0998
Initial Training Epoch [10/10], Loss: 2.0570
Starting Phase 2: Training Autoencoder on Unlabeled Data...
Autoencoder Training Epoch [1/10], Loss: 0.2217
Autoencoder Training Epoch [2/10], Loss: 0.0796
Autoencoder Training Epoch [3/10], Loss: 0.0633
Autoencoder Training Epoch [4/10], Loss: 0.0515
Autoencoder Training Epoch [5/10], Loss: 0.0428
Autoencoder Training Epoch [6/10], Loss: 0.0374
Autoencoder Training Epoch [7/10], Loss: 0.0319
Autoencoder Training Epoch [8/10], Loss: 0.0289
Autoencoder Training Epoch [9/10], Loss: 0

In [33]:
# Initialize the ViT model
best_model_vit = models.vit_b_16(pretrained=False)
best_model_vit.heads.head = nn.Linear(best_model_vit.heads.head.in_features, num_classes)
best_model_vit = best_model_vit.to(device)

# Load the best model weights
best_config = 'fc+encoder11'  # Replace with your best configuration name
best_model_vit.load_state_dict(torch.load(f'logs/best_model_vit.pth'))

# Set the model to evaluation mode
best_model_vit.eval()

  best_model_vit.load_state_dict(torch.load(f'logs/best_model_vit.pth'))


VisionTransformer(
  (conv_proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
  (encoder): Encoder(
    (dropout): Dropout(p=0.0, inplace=False)
    (layers): Sequential(
      (encoder_layer_0): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
        )
        (dropout): Dropout(p=0.0, inplace=False)
        (ln_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (mlp): MLPBlock(
          (0): Linear(in_features=768, out_features=3072, bias=True)
          (1): GELU(approximate='none')
          (2): Dropout(p=0.0, inplace=False)
          (3): Linear(in_features=3072, out_features=768, bias=True)
          (4): Dropout(p=0.0, inplace=False)
        )
      )
      (encoder_layer_1): EncoderBlock(
        (ln_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (self_a

In [34]:
# Call the test function
test_model(best_model_vit, test_loader)

Test Accuracy: 29.925%


In [35]:
# Call the function to calculate and print F1-scores
test_model_with_f1(best_model_vit, test_loader)

Macro F1-score: 0.2644254083813572
Classification Report:
               precision    recall  f1-score   support

     Class 0       0.64      0.22      0.33       800
     Class 1       0.22      0.47      0.30       800
     Class 2       0.41      0.56      0.47       800
     Class 3       0.20      0.26      0.23       800
     Class 4       0.44      0.23      0.30       800
     Class 5       0.00      0.00      0.00       800
     Class 6       0.23      0.54      0.32       800
     Class 7       0.18      0.01      0.03       800
     Class 8       0.39      0.53      0.45       800
     Class 9       0.31      0.17      0.22       800

    accuracy                           0.30      8000
   macro avg       0.30      0.30      0.26      8000
weighted avg       0.30      0.30      0.26      8000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
