# Fine-Tuning Swin Transformer on CIFAR10
## Chapter 5 Module 2

## Setup and Imports
We begin by installing essential libraries such as Hugging Face Transformers, Datasets, Torchvision, and Accelerate. These provide utilities for model loading, data handling, and efficient training.

In [14]:
!pip install -U transformers datasets torchvision evaluate accelerate --quiet

## Preprocessing CIFAR-10 with a Swin-Compatible Pipeline
We define a transform that resizes CIFAR-10 images to 224x224 (required by Swin) and initializes a Hugging Face image processor to match the pretrained Swin model’s expectations.

In [None]:
import torch
from torchvision import transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from transformers import AutoImageProcessor, SwinForImageClassification, Trainer, TrainingArguments
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


## Preprocessing CIFAR-10 with a Swin-Compatible Pipeline
We define a transform that resizes CIFAR-10 images to 224x224 (required by Swin) and initializes a Hugging Face image processor to match the pretrained Swin model’s expectations.

In [None]:
model_name = "microsoft/swin-tiny-patch4-window7-224"
processor = AutoImageProcessor.from_pretrained(model_name)

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=processor.image_mean, std=processor.image_std),
])

train_dataset = CIFAR10(root="./data", train=True, download=True, transform=transform)
test_dataset = CIFAR10(root="./data", train=False, download=True, transform=transform)

# Wrap for Hugging Face Trainer
class CIFAR10_HFDataset(torch.utils.data.Dataset):
    def __init__(self, dataset):
        self.dataset = dataset

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image, label = self.dataset[idx]
        return {"pixel_values": image, "label": label}

hf_train = CIFAR10_HFDataset(train_dataset)
hf_test = CIFAR10_HFDataset(test_dataset)

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


## Loading the Pretrained Swin Transformer
We load the Swin-Tiny model from Hugging Face's model hub, specifying num_labels=10 to match CIFAR-10’s classes. The ignore_mismatched_sizes=True argument ensures compatibility even if the classification head dimensions differ.



In [None]:
model = SwinForImageClassification.from_pretrained(
    model_name,
    num_labels=10,
    ignore_mismatched_sizes=True
)

# Freeze everything except the classification head
for param in model.parameters():
    param.requires_grad = False

for param in model.classifier.parameters():
    param.requires_grad = True

Some weights of SwinForImageClassification were not initialized from the model checkpoint at microsoft/swin-tiny-patch4-window7-224 and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([10]) in the model instantiated
- classifier.weight: found shape torch.Size([1000, 768]) in the checkpoint and torch.Size([10, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Defining Evaluation Metrics
This function calculates evaluation metrics (currently just accuracy) from the model's predictions. It will be used by the Trainer during evaluation.

In [None]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    acc = (preds == labels).mean()
    return {"accuracy": acc}

## Setting Up Transfer Learning TrainingArguments
Here we define hyperparameters for the transfer learning phase, such as batch size, evaluation strategy, and output directory. This setup uses the Hugging Face TrainingArguments API.

In [None]:
training_args = TrainingArguments(
    output_dir="./swin-cifar10-transfer",
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=3,
    learning_rate=5e-4,
    logging_dir='./logs',
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=hf_train,
    eval_dataset=hf_test,
    tokenizer=processor,
    compute_metrics=compute_metrics,
)

trainer.train()

## Fine-tuning the Entire Swin Model
Now we unfreeze all model parameters and define a new training configuration for full fine-tuning. This assumes the previous stage may have trained only the classification head.

In [None]:
for param in model.parameters():
    param.requires_grad = True

finetune_args = TrainingArguments(
    output_dir="./swin-cifar10-finetuned",
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=3,
    learning_rate=1e-5,  # Lower learning rate for fine-tuning
    logging_dir='./logs',
    load_best_model_at_end=True,
)

finetune_trainer = Trainer(
    model=model,
    args=finetune_args,
    train_dataset=hf_train,
    eval_dataset=hf_test,
    tokenizer=processor,
    compute_metrics=compute_metrics,
)

finetune_trainer.train()

  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.2086,0.117522,0.9621
2,0.0561,0.106177,0.9723
3,0.0143,0.108439,0.9774




TrainOutput(global_step=4689, training_loss=0.09301996749711967, metrics={'train_runtime': 1682.2373, 'train_samples_per_second': 89.167, 'train_steps_per_second': 2.787, 'total_flos': 3.7292317913088e+18, 'train_loss': 0.09301996749711967, 'epoch': 3.0})

## Running Evaluation on the Fine-Tuned Model
Finally, we evaluate the fully fine-tuned Swin model on the CIFAR-10 test set using the Trainer’s evaluate() method.

In [None]:
finetune_trainer.evaluate()