# Comparative Analysis of Self-Supervised and Supervised Pretraining Approaches for Genshin Impact Character Classification Using ResNet-18

This notebook implements an experimental framework to compare self-supervised and supervised pretraining approaches for the task of Genshin Impact character classification using the ResNet-18 architecture. The goal is to evaluate the effectiveness of different pretraining strategies in leveraging limited labeled data for improved classification performance.

## Required Libraries
- PyTorch
- Torchvision
- NumPy
- Matplotlib
- Pandas
- Scikit-learn
- Seaborn
- Hugging Face (Optional for model uploading)
- Kaggle (for dataset management)

## Key elements of the implementation:
- **Image Preprocessing**:
    - Image resizing to 256x256 pixels with stretching and no aspect ratio preservation.
    - Normalization using ImageNet statistics.

- **Data Augmentation**:
    - Random cropping, horizontal flipping, color jittering, and Gaussian blur for SimCLR pretraining.
    - Random scaling $[1.0, 1.5]$ with random center cropping, rotation $[-15^\circ, 15^\circ]$ and horizontal flipping for fine-tuning.

- **Model Architecture**:
    - ResNet-18 with a projection head for SimCLR pretraining.
    - Classification MLP head for fine-tuning with 6 output classes.

- **Training Configuration**:
    - SimCLR pretraining with 500 epochs, batch size of 256, and a learning rate of 0.5.
    - Fine-tuning with 100 epochs, batch size of 32, and a learning rate of 0.01, with gradual unfreezing ResNet layers.

- **Loss Functions**:
    - Contrastive loss with temperature scaling for SimCLR pretraining.
    - Cross-entropy loss for classification fine-tuning.

> The contrastive loss is computed using the cosine similarity between the projected features of positive pairs, while the classification loss is computed using the softmax output of the final classification layer.


- **Evaluation Metrics**:
    - Cross-validation with 5 folds to ensure robustness.
    - Top-1 and Top-5 accuracy, F1-score, precision, recall, and confusion matrix analysis.
    - Visualization of learned features using t-SNE and Grad-CAM for interpretability.

- **Models Training Approaches**:
    - Pure supervised training on final dataset. (Comparison baseline)
    - Self-supervised pretraining on unlabeled dataset followed by fine-tuning by supervised training on the final dataset.
    - Self-supervised pretraining on unlabeled dataset followed by fine-tuning by semi-supervised training on the final dataset + non labeled dataset.
    - ImageNet supervised pretraining followed by fine-tuning by supervised training on the final dataset.

## 1. Dependencies import

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.transforms.functional as F
import torchvision.datasets as datasets
import torchvision.models as models
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import kagglehub
import os
from tqdm import tqdm
from PIL import Image

## 2.Check if GPU is available

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print(f"Memory Available: {torch.cuda.get_device_properties(0).total_memory / (1024 ** 3):.2f} GB")
    print(f"Memory Allocated: {torch.cuda.memory_allocated(0) / (1024 ** 3):.2f} GB")
    print(f"Memory Cached: {torch.cuda.memory_reserved(0) / (1024 ** 3):.2f} GB")

Using device: cuda
NVIDIA GeForce RTX 3050 Laptop GPU
Memory Available: 4.00 GB
Memory Allocated: 0.00 GB
Memory Cached: 0.00 GB


In [None]:
torch.backends.cudnn.allow_tf32 = True
torch.backends.cuda.matmul.allow_tf32 = True

# Set default tensor type to float32
torch.set_default_dtype(torch.float32)

print(f"TF32 for cuDNN: {torch.backends.cudnn.allow_tf32}")
print(f"TF32 for matmul: {torch.backends.cuda.matmul.allow_tf32}")
print(f"Default tensor dtype: {torch.get_default_dtype()}")

TF32 for cuDNN: False
TF32 for matmul: False
Default tensor dtype: torch.float32


In [None]:
!nvidia-smi

## 3. Dataset retrieval and transformation for Self-Supervised Learning dataset
- Download the Genshin Impact character dataset from Kaggle.
- Apply image transformations.
- Save as a single dataset for self-supervised learning.

In [None]:
IMAGENET_MEAN = [0.485, 0.456, 0.406] #Using ImageNet mean and std for normalization
IMAGENET_STD = [0.229, 0.224, 0.225]
N = 10

if os.path.exists(os.path.join(os.getcwd(), "datasets", "ssl-dataset.pt")): # check if the combined dataset exists
    print("Combined dataset already exists. Loading...")
    combined_dataset = torch.load(os.path.join(os.getcwd(), "datasets", "ssl-dataset.pt"), weights_only=False)
else: #if not, download, process the datasets and save the combined dataset
    print("Combined dataset does not exist. Downloading and processing datasets...")

    ds1_path = kagglehub.dataset_download("soumikrakshit/anime-faces")
    ds2_path = kagglehub.dataset_download("stevenevan99/face-of-pixiv-top-daily-illustration-2020")
    ds3_path = kagglehub.dataset_download("hirunkulphimsiri/fullbody-anime-girls-datasets")

    # Define the transformations
    transform = transforms.Compose([
        transforms.Resize((256, 256)), #256x256px resize
        transforms.ToTensor(), #convert to tensor
        transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD) #normalize with ImageNet stats
    ])

    # Load the datasets
    dataset1 = datasets.ImageFolder(root=ds1_path, transform=transform)
    dataset2 = datasets.ImageFolder(root=ds2_path, transform=transform)
    dataset3 = datasets.ImageFolder(root=ds3_path, transform=transform)
    # Combine the datasets
    combined_dataset = torch.utils.data.ConcatDataset([dataset1, dataset2, dataset3])
    # Save the combined dataset
    combined_dataset_path = os.path.join(os.getcwd(), "datasets")
    if not os.path.exists(combined_dataset_path):
        os.makedirs(combined_dataset_path)
    torch.save(combined_dataset, os.path.join(combined_dataset_path, "ssl-dataset.pt"))

# Visualize n random images from the combined dataset
def visualize_dataset(dataset, num_images=5):
    indices = np.random.choice(len(dataset), num_images, replace=False)
    images = [dataset[i][0] for i in indices]
    labels = [dataset[i][1] for i in indices]

    fig, axes = plt.subplots(1, num_images, figsize=(15, 5))
    for ax, img, label in zip(axes, images, labels):
        ax.imshow(F.to_pil_image(img))
        ax.axis('off')
        ax.set_title(f"Label: {label}")
    plt.title(f"Sample Images from Combined Dataset ({num_images} images)")
    plt.show()
visualize_dataset(combined_dataset, num_images=N)

## 4. Dataset retrieval and transformation for Supervised fine-tuning

In [None]:
from google.colab import drive # Mount Google Drive to access dataset
drive.mount('/content/gdrive/', force_remount=True)

In [None]:
import zipfile
import shutil

N = 10 # Number of images to visualize

dataset_path_compressed = "/content/gdrive/MyDrive/GenshinImageClassifier/dataset.zip"

#check if the processed dataset exists
if os.path.exists(os.path.join(os.getcwd(), "datasets", "finetune-dataset.pt")):
    print("Processed dataset already exists. Loading...")
    dataset = torch.load(os.path.join(os.getcwd(), "datasets", "finetune-dataset.pt"), weights_only=False)
else:
    #check if non procceessed dataset exists
    if not os.path.exists(dataset_path_compressed):
        print(f"Dataset file at {dataset_path_compressed} does not exist. Please download it or update the path.")

    # Copy it to the current working directory in /tmp/ folder if it does not exist, create it
    if not os.path.exists(os.getcwd() + "/tmp/"):
        os.makedirs(os.getcwd() + "/tmp/")

    print(f"Copying dataset file to /tmp/ directory...")
    shutil.copy(dataset_path_compressed, os.getcwd() + "/tmp/dataset.zip")

    # Unzip the dataset
    with zipfile.ZipFile(os.getcwd() + "/tmp/dataset.zip", 'r') as zip_ref:
        zip_ref.extractall(os.getcwd() + "/tmp/")

    # Load the dataset
    dataset_path = os.path.join(os.getcwd(), "tmp", "dataset")
    if not os.path.exists(dataset_path):
        print(f"Dataset path {dataset_path} does not exist. Please check the extraction.")

    # Load the dataset using ImageFolder
    dataset = datasets.ImageFolder(root=dataset_path, transform=transform)

    # Save as .pt file
    torch.save(dataset, os.path.join(os.getcwd(), "datasets", "finetune-dataset.pt"))

# Visualize n random images from the dataset
visualize_dataset(dataset, num_images=N)

## 5. Models training, evaluation and feature extraction/visualization
- Train the baseline model on the final dataset.
- Train the self-supervised model on the unlabeled dataset.
- Fine-tune the self-supervised model on the final dataset using supervised training.
- Fine-tune the self-supervised model on the final dataset using semi-supervised training.
- Fine-tune the ImageNet pre-trained model on the final dataset using supervised training.

### 5.1. Baseline Model

In [None]:
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, confusion_matrix
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import copy
import numpy as np

EPOCHS = 200
BATCH_SIZE = 64
LEARNING_RATE = 0.001
K_FOLDS = 5

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Disable TF32 to ensure only FP16 is used for matmul/convolutions
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False

# Enable FP16 operations to ensure all operations are performed in FP16
torch.backends.cudnn.matmul.allow_fp16_reduced_precision_reduction = True
torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = True

# Enable Autotuner for cuDNN for better performance on convolutions
torch.backends.cudnn.benchmark = True

# Get all indices for the dataset
dataset_size = len(dataset)
indices = list(range(dataset_size))

print(f"Starting {K_FOLDS}-fold cross-validation on {dataset_size} samples...")
print("Training with pure FP16 (Tensor and CUDA cores). Model weights stored as FP32.")

kfold = KFold(n_splits=K_FOLDS, shuffle=True, random_state=42)
fold_results = {
    'train_losses': [],
    'train_accuracies': [],
    'val_losses': [],
    'val_accuracies': [],
    'test_metrics': []
}

for fold, (train_idx, val_idx) in enumerate(kfold.split(indices)):
    print(f"\n{'='*50}")
    print(f"FOLD {fold + 1}/{K_FOLDS}")
    print(f"{'='*50}")

    train_sampler = torch.utils.data.SubsetRandomSampler(train_idx)
    val_sampler = torch.utils.data.SubsetRandomSampler(val_idx)

    train_loader = torch.utils.data.DataLoader(
        dataset, batch_size=BATCH_SIZE, sampler=train_sampler
    )
    val_loader = torch.utils.data.DataLoader(
        dataset, batch_size=BATCH_SIZE, sampler=val_sampler
    )

    # Initialize model and convert to FP16 for training
    model = models.resnet18(weights=None, num_classes=6).to(device)
    model = model.half()  # Convert model to FP16

    print(f"Model parameters dtype: {next(model.parameters()).dtype}")

    loss_criterion = nn.CrossEntropyLoss().to(device)
    optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

    fold_train_losses = []
    fold_train_accuracies = []
    fold_val_losses = []
    fold_val_accuracies = []

    best_val_accuracy = 0.0
    best_model_state = None

    for epoch in range(EPOCHS):
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0

        for batch_idx, (images, labels) in enumerate(train_loader):
            images = images.half().to(device)  # Convert input to FP16
            labels = labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = loss_criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        epoch_train_loss = running_loss / len(train_idx)
        epoch_train_accuracy = 100 * correct / total

        # Validation phase
        model.eval()
        val_running_loss = 0.0
        val_correct = 0
        val_total = 0

        with torch.no_grad():
            for images, labels in val_loader:
                images = images.half().to(device)
                labels = labels.to(device)
                outputs = model(images)
                loss = loss_criterion(outputs, labels)

                val_running_loss += loss.item() * images.size(0)
                _, predicted = torch.max(outputs.data, 1)
                val_total += labels.size(0)
                val_correct += (predicted == labels).sum().item()

        epoch_val_loss = val_running_loss / len(val_idx)
        epoch_val_accuracy = 100 * val_correct / val_total

        fold_train_losses.append(epoch_train_loss)
        fold_train_accuracies.append(epoch_train_accuracy)
        fold_val_losses.append(epoch_val_loss)
        fold_val_accuracies.append(epoch_val_accuracy)

        if epoch_val_accuracy > best_val_accuracy:
            best_val_accuracy = epoch_val_accuracy
            # Store model weights as FP32 for stability
            best_model_state = {k: v.float().cpu() for k, v in model.state_dict().items()}

        if (epoch + 1) % 50 == 0:
            print(f"Epoch [{epoch+1}/{EPOCHS}]")
            print(f"  Train - Loss: {epoch_train_loss:.4f}, Acc: {epoch_train_accuracy:.2f}%")
            print(f"  Val   - Loss: {epoch_val_loss:.4f}, Acc: {epoch_val_accuracy:.2f}%")

    # Load best model weights (convert back to FP16 for evaluation)
    model.load_state_dict({k: v.to(device).half() for k, v in best_model_state.items()})

    model.eval()
    all_predictions = []
    all_labels = []

    with torch.no_grad():
        for images, labels in val_loader:
            images = images.half().to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            all_predictions.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    fold_accuracy = accuracy_score(all_labels, all_predictions)
    fold_f1 = f1_score(all_labels, all_predictions, average='macro')
    fold_precision = precision_score(all_labels, all_predictions, average='macro')
    fold_recall = recall_score(all_labels, all_predictions, average='macro')

    fold_results['train_losses'].append(fold_train_losses)
    fold_results['train_accuracies'].append(fold_train_accuracies)
    fold_results['val_losses'].append(fold_val_losses)
    fold_results['val_accuracies'].append(fold_val_accuracies)
    fold_results['test_metrics'].append({
        'accuracy': fold_accuracy,
        'f1': fold_f1,
        'precision': fold_precision,
        'recall': fold_recall
    })

    print(f"\nFold {fold + 1} Results:")
    print(f"  Best Validation Accuracy: {best_val_accuracy:.2f}%")
    print(f"  Test Accuracy: {fold_accuracy*100:.2f}%")
    print(f"  Test F1-Score: {fold_f1:.4f}")
    print(f"  Test Precision: {fold_precision:.4f}")
    print(f"  Test Recall: {fold_recall:.4f}")

# Calculate and display overall results
print(f"\n{'='*60}")
print("CROSS-VALIDATION RESULTS SUMMARY")
print(f"{'='*60}")

# Extract test metrics
test_accuracies = [fold['accuracy'] for fold in fold_results['test_metrics']]
test_f1_scores = [fold['f1'] for fold in fold_results['test_metrics']]
test_precisions = [fold['precision'] for fold in fold_results['test_metrics']]
test_recalls = [fold['recall'] for fold in fold_results['test_metrics']]

# Calculate statistics
print(f"Test Accuracy:  {np.mean(test_accuracies)*100:.2f}% ± {np.std(test_accuracies)*100:.2f}%")
print(f"Test F1-Score:  {np.mean(test_f1_scores):.4f} ± {np.std(test_f1_scores):.4f}")
print(f"Test Precision: {np.mean(test_precisions):.4f} ± {np.std(test_precisions):.4f}")
print(f"Test Recall:    {np.mean(test_recalls):.4f} ± {np.std(test_recalls):.4f}")

# Plot training curves for all folds
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Average training loss
avg_train_losses = np.mean(fold_results['train_losses'], axis=0)
std_train_losses = np.std(fold_results['train_losses'], axis=0)
epochs_range = range(1, EPOCHS + 1)

axes[0, 0].plot(epochs_range, avg_train_losses, 'b-', label='Mean Training Loss')
axes[0, 0].fill_between(epochs_range,
                       avg_train_losses - std_train_losses,
                       avg_train_losses + std_train_losses,
                       alpha=0.3, color='blue')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Average Training Loss Across Folds')
axes[0, 0].legend()
axes[0, 0].grid(True)

# Average validation loss
avg_val_losses = np.mean(fold_results['val_losses'], axis=0)
std_val_losses = np.std(fold_results['val_losses'], axis=0)

axes[0, 1].plot(epochs_range, avg_val_losses, 'r-', label='Mean Validation Loss')
axes[0, 1].fill_between(epochs_range,
                       avg_val_losses - std_val_losses,
                       avg_val_losses + std_val_losses,
                       alpha=0.3, color='red')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Loss')
axes[0, 1].set_title('Average Validation Loss Across Folds')
axes[0, 1].legend()
axes[0, 1].grid(True)

# Average training accuracy
avg_train_acc = np.mean(fold_results['train_accuracies'], axis=0)
std_train_acc = np.std(fold_results['train_accuracies'], axis=0)

axes[1, 0].plot(epochs_range, avg_train_acc, 'g-', label='Mean Training Accuracy')
axes[1, 0].fill_between(epochs_range,
                       avg_train_acc - std_train_acc,
                       avg_train_acc + std_train_acc,
                       alpha=0.3, color='green')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Accuracy (%)')
axes[1, 0].set_title('Average Training Accuracy Across Folds')
axes[1, 0].legend()
axes[1, 0].grid(True)

# Average validation accuracy
avg_val_acc = np.mean(fold_results['val_accuracies'], axis=0)
std_val_acc = np.std(fold_results['val_accuracies'], axis=0)

axes[1, 1].plot(epochs_range, avg_val_acc, 'm-', label='Mean Validation Accuracy')
axes[1, 1].fill_between(epochs_range,
                       avg_val_acc - std_val_acc,
                       avg_val_acc + std_val_acc,
                       alpha=0.3, color='magenta')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Accuracy (%)')
axes[1, 1].set_title('Average Validation Accuracy Across Folds')
axes[1, 1].legend()
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()

# Bar plot for test metrics comparison
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
metrics_names = ['Accuracy', 'F1-Score', 'Precision', 'Recall']
metrics_means = [np.mean(test_accuracies)*100, np.mean(test_f1_scores)*100,
                np.mean(test_precisions)*100, np.mean(test_recalls)*100]
metrics_stds = [np.std(test_accuracies)*100, np.std(test_f1_scores)*100,
               np.std(test_precisions)*100, np.std(test_recalls)*100]

x_pos = np.arange(len(metrics_names))
bars = ax.bar(x_pos, metrics_means, yerr=metrics_stds, capsize=5,
              color=['skyblue', 'lightgreen', 'lightcoral', 'lightsalmon'])

ax.set_xlabel('Metrics')
ax.set_ylabel('Score (%)')
ax.set_title('Cross-Validation Test Metrics (Mean ± Std)')
ax.set_xticks(x_pos)
ax.set_xticklabels(metrics_names)
ax.grid(True, alpha=0.3)

# Add value labels on bars
for bar, mean_val, std_val in zip(bars, metrics_means, metrics_stds):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height + std_val + 0.5,
            f'{mean_val:.1f}±{std_val:.1f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()