<a href="https://colab.research.google.com/github/Kazi-Rakib-Hasan-Jawwad/Histo-FSL/blob/master/iBOT_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Steps to connect Colab with local runtime:
1. Put this command in virtual environment terminal:

> jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0

2. Copy and paste the url in colab.


Check availability of GPU.

In [None]:
import torch
use_cuda = torch.cuda.is_available()
if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:',torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

__CUDNN VERSION: 8500
__Number CUDA Devices: 1
__CUDA Device Name: NVIDIA GeForce RTX 3080 Ti
__CUDA Device Total Memory [GB]: 12.636192768


In [None]:
from pathlib import Path
working_directory = Path("/home/rakib/jupyter_notebooks/iBOT_project")
cache_dir = working_directory / "cache"

In [None]:
import sys
from pathlib import Path

from huggingface_hub import hf_hub_download

# We first download custom scripts from Hugging Face
for script in ["data", "module", "trainer", "utils"]:
    data_file = hf_hub_download(
        repo_id="owkin/camelyon16-features",
        filename=f"scripts/{script}.py",
        repo_type="dataset",
        cache_dir=cache_dir
    )

  # The data script handles data loading
  # Module loads the Chowder model to make predictions (described after)
  # Trainer is responsible for updating the model based on data batches
  # Utils is a set of utility functions

# Using sys.path.append, we enable the import of functions inside
# the python scripts we just downloaded
scripts_dir = Path(data_file).parent
sys.path.append(str(scripts_dir))

In [None]:
import numpy as np

from data import SlideFeaturesDataset

# ``SlideFeaturesDataset`` is used to first download the data from
# Hugging Face, then create a proper torch dataset inherited
# from ``torch.utils.data.Dataset` class. The ``__getitem__```
# function of this dataset returns (X, y). X is a matrix
# of features for the slide with dimension (1000, 768). 1000
# corresponds to the number of features (i.e. number of tiles)
# sampled from the given slide, while 768 is the dimension of
# the features (i.e. the output dimension of Phikon).
# y is the label (0 or 1 for absence or presence of metastasis).

# ``cam16_design_dataset`` contains 269 WSIs
cam16_design_dataset = SlideFeaturesDataset(
    "owkin/camelyon16-features",
    split="Phikon_train",
    cache_dir=cache_dir
)

# ``cam16_test_dataset`` contains 130 WSIs
cam16_test_dataset = SlideFeaturesDataset(
    "owkin/camelyon16-features",
    split="Phikon_test",
    cache_dir=cache_dir
)

# We store the indices and labels of the whole training dataset
# for cross-validation
cam16_design_indices = np.arange(len(cam16_design_dataset))
cam16_design_labels = cam16_design_dataset.labels

In [None]:
import torch

from module import Chowder

chowder = Chowder(
    in_features=768,                     # output dimension of Phikon
    out_features=1,                      # dimension of predictions (a probability for class "1")
    n_top=5,                             # number of top scores in Chowder (in the image, N is 2)
    n_bottom=5,                          # number of bottom scores in Chowder
    mlp_hidden=[200, 100],               # MLP hidden layers after the max-min layer
    mlp_activation=torch.nn.Sigmoid(),   # MLP activation
    bias=True                            # bias for first 1D convolution which computes scores
)

def print_trainable_parameters(model: torch.nn) -> None:
    """Print number of trainable parameters."""
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param}"
        f" || trainable%: {100 * trainable_params / all_param:.2f}"
    )

# Chowder has 23,170 parameters: it's a very small model !
print_trainable_parameters(chowder)


trainable params: 23170 || all params: 23170 || trainable%: 100.00


In [None]:
from utils import auc, pad_collate_fn

# We define the loss function, optimizer and metrics for the training
criterion = torch.nn.BCEWithLogitsLoss()  # Binary Cross-Entropy Loss
optimizer = torch.optim.Adam              # Adam optimizer
metrics = {"auc": auc}                    # AUC will be the tracking metric

# ``collator`` is a function that apply a deterministic
# transformation to a batch of samples before being processed
# by the GPU. Here, this function is ``pad_collate_fn``. The
# goal of this function is align matrices of features (the inputs)
# in terms of shape. Indeed, some WSI may have 200 features (very
# small piece of tissues) or 1,000 (the maximum we set). In that case,
# all matrices will have a shape of at most the bigger matrices in the
# batch. Our (200, 768) input matrix will become a (1000, 768) matrix
# with 800 ``inf`` values. A boolean mask is stored so that to tell
# torch not to process these 800 values but only focus on the 200 real ones

collator = pad_collate_fn


In [None]:
import warnings
from copy import deepcopy
import multiprocessing
from datetime import datetime

from IPython.display import clear_output

from sklearn.model_selection import StratifiedKFold
from trainer import TorchTrainer, slide_level_train_step, slide_level_val_step

# We run a 5-fold cross-validation with 1 repeat (you can tweak these parameters)
n_repeats = 1
n_folds = 5
train_metrics, val_metrics = [], []
test_logits = []

cv_start_time = datetime.now()

for repeat in range(n_repeats):
    print(f"Running cross-validation #{repeat+1}")
    # We stratify with respect to the training labels
    cv_skfold = StratifiedKFold(
        n_splits=n_folds,
        shuffle=True,
        random_state=repeat,
    )
    cv_splits = cv_skfold.split(cam16_design_indices, y=cam16_design_labels)

    # 1 training fold approximately takes 25 seconds
    for i, (train_indices, val_indices) in enumerate(cv_splits):
        fold_start_time = datetime.now()
        trainer = TorchTrainer(
            model=deepcopy(chowder),
            criterion=criterion,
            metrics=metrics,
            batch_size=16,                           # you can tweak this
            num_epochs=15,                           # you can tweak this
            learning_rate=1e-3,                      # you can tweak this
            weight_decay=0.0,                        # you can tweak this
            device="cuda:0",
            num_workers=multiprocessing.cpu_count(), # you can tweak this
            optimizer=deepcopy(optimizer),
            train_step=slide_level_train_step,
            val_step=slide_level_val_step,
            collator=pad_collate_fn,
        )

        print(f"Running cross-validation on split #{i+1}")
        cam16_train_dataset = torch.utils.data.Subset(
            cam16_design_dataset, indices=train_indices
        )
        cam16_val_dataset = torch.utils.data.Subset(
            cam16_design_dataset, indices=val_indices
        )

        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", category=UserWarning)
            # Training step for the given number of epochs
            local_train_metrics, local_val_metrics = trainer.train(
                cam16_train_dataset, cam16_val_dataset
            )
            # Predictions on test (logits, sigmoid(logits) = probability)
            local_test_logits = trainer.predict(cam16_test_dataset)[1]

        train_metrics.append(local_train_metrics)
        val_metrics.append(local_val_metrics)
        test_logits.append(local_test_logits)
        fold_end_time = datetime.now()
        fold_running_time = fold_end_time - fold_start_time
        print("\n-----------------------------Finished in {}---------------------------------------\n".format(fold_running_time))
    #clear_output()
cv_end_time = datetime.now()
cv_running_time = cv_end_time - cv_start_time
print("\nFinished cross-validation in {}".format(cv_running_time))

Running cross-validation #1
Running cross-validation on split #1
Epoch 1: train_loss=0.70871, train_auc=0.4995, val_loss=0.64457, val_auc=0.5028
Epoch 2: train_loss=0.69223, train_auc=0.4939, val_loss=0.64451, val_auc=0.5142
Epoch 3: train_loss=0.68629, train_auc=0.4952, val_loss=0.64741, val_auc=0.5455
Epoch 4: train_loss=0.67597, train_auc=0.5329, val_loss=0.67830, val_auc=0.5810
Epoch 5: train_loss=0.66828, train_auc=0.6040, val_loss=0.63151, val_auc=0.6264
Epoch 6: train_loss=0.62713, train_auc=0.7494, val_loss=0.59019, val_auc=0.8111
Epoch 7: train_loss=0.53927, train_auc=0.9172, val_loss=0.49161, val_auc=0.9205
Epoch 8: train_loss=0.42296, train_auc=0.9301, val_loss=0.36481, val_auc=0.9446
Epoch 9: train_loss=0.32791, train_auc=0.9474, val_loss=0.31733, val_auc=0.9375
Epoch 10: train_loss=0.27063, train_auc=0.9557, val_loss=0.26647, val_auc=0.9531
Epoch 11: train_loss=0.25160, train_auc=0.9589, val_loss=0.24450, val_auc=0.9560
Epoch 12: train_loss=0.23698, train_auc=0.9615, val_l

In [None]:
from utils import get_cv_metrics, roc_auc_score

cv_train_metrics = get_cv_metrics(train_metrics)
cv_val_metrics = get_cv_metrics(val_metrics)
test_metrics = trainer.evaluate(cam16_test_dataset)

print("Cross-validation results:")
for k, v in cv_train_metrics.items():
    print(f"mean_train_{k}: {v}")

for k, v in cv_val_metrics.items():
    print(f"mean_val_{k}: {v}")

print("\nEnsembling results on test set:")
test_auc = roc_auc_score(
    cam16_test_dataset.labels,
    np.mean(test_logits, axis=0)
)
print(f"test_auc: {test_auc:.4f}")


Cross-validation results:
mean_train_auc: 0.9796 ± 0.0063
mean_val_auc: 0.9474 ± 0.0300

Ensembling results on test set:
test_auc: 0.8855


In [None]:
import os
from typing import Optional
import random

from datasets import load_dataset
from transformers import set_seed as set_seed_hf
from transformers import AutoImageProcessor

dataset_name = "/home/rakib/data/NCT-CRC-HE-100K-NONORM"
# You can change the dataset name above if you wish to finetune the model on your own dataset.


# We set a seed globally for data loading and training
SEED = 123

def set_seed(seed: Optional[int] = None):
    """Set all seeds to make results reproducible (deterministic mode).
    When seed is None, disables deterministic mode.
    Credits @BramVanroy
    """
    if seed is not None:
        set_seed_hf(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
        os.environ['PYTHONHASHSEED'] = str(seed)

set_seed(SEED)
dataset = load_dataset("imagefolder", data_dir="/home/rakib/data/NCT-CRC-HE-100K-NONORM", cache_dir=cache_dir)


Resolving data files:   0%|          | 0/100000 [00:00<?, ?it/s]

In [None]:
# Debug dataset properties
print(dataset.keys())
print(dataset.items())
print(dataset.unique)

dict_keys(['train'])
dict_items([('train', Dataset({
    features: ['image', 'label'],
    num_rows: 100000
}))])
<bound method DatasetDict.unique of DatasetDict({
    train: Dataset({
        features: ['image', 'label'],
        num_rows: 100000
    })
})>


In [None]:
nct_data = dataset['train']

# Get labels and images
labels = nct_data['label']
images = nct_data['image']

In [None]:
# This strategy doesnot take into account the class imbalance issue

# Define the number of samples you want to randomly select
num_samples = 1000  # Change this number to your desired value

# Randomly sample from the dataset according to the number of samples
random_indices = random.sample(range(len(labels)), num_samples)

# Extract sampled labels and images
sampled_labels = [labels[i] for i in random_indices]
sampled_images = [images[i] for i in random_indices]

# Strategy to solve the class imbalance issue
'''
from collections import defaultdict

# Define the number of samples you want to randomly select
num_samples_per_class = 10  # Change this number to your desired value per class

# Initialize a dictionary to store sampled indices for each class
class_indices = defaultdict(list)

# Map class names to class labels
class_name_to_label = {v: k for k, v in label2id.items()}

# Collect indices for each class
for i, label in enumerate(labels):
    class_name = label2id[label]
    class_label = class_name_to_label[class_name]
    class_indices[class_label].append(i)

# Randomly sample from each class
sampled_indices = []
for class_label, indices in class_indices.items():
    sampled_indices.extend(random.sample(indices, num_samples_per_class))

# Extract sampled labels and images
sampled_labels = [labels[i] for i in sampled_indices]
sampled_images = [images[i] for i in sampled_indices]
'''


'\nfrom collections import defaultdict\n\n# Define the number of samples you want to randomly select\nnum_samples_per_class = 10  # Change this number to your desired value per class\n\n# Initialize a dictionary to store sampled indices for each class\nclass_indices = defaultdict(list)\n\n# Map class names to class labels\nclass_name_to_label = {v: k for k, v in label2id.items()}\n\n# Collect indices for each class\nfor i, label in enumerate(labels):\n    class_name = label2id[label]\n    class_label = class_name_to_label[class_name]\n    class_indices[class_label].append(i)\n\n# Randomly sample from each class\nsampled_indices = []\nfor class_label, indices in class_indices.items():\n    sampled_indices.extend(random.sample(indices, num_samples_per_class))\n\n# Extract sampled labels and images\nsampled_labels = [labels[i] for i in sampled_indices]\nsampled_images = [images[i] for i in sampled_indices]\n'

In [None]:
from sklearn.model_selection import train_test_split

# Define the percentage for the validation set
split_percentage = 0.5

# Split the sampled data into train and validation sets
train_labels, val_labels, train_images, val_images = train_test_split(sampled_labels, sampled_images, test_size=split_percentage)

# Strategy to solve the class imbalance issue
'''
# Split the sampled data into train and validation sets
train_indices, val_indices = train_test_split(sampled_indices, test_size=split_percentage, stratify=sampled_labels)

# Because it's a list function, this step is necessary:

# Extract labels and images for train and validation sets
train_labels = [labels[i] for i in train_indices]
train_images = [images[i] for i in train_indices]

val_labels = [labels[i] for i in val_indices]
val_images = [images[i] for i in val_indices]
'''

"\n# Split the sampled data into train and validation sets\ntrain_indices, val_indices = train_test_split(sampled_indices, test_size=split_percentage, stratify=sampled_labels)\n\n# Because it's a list function, this step is necessary:\n\n# Extract labels and images for train and validation sets\ntrain_labels = [labels[i] for i in train_indices]\ntrain_images = [images[i] for i in train_indices]\n\nval_labels = [labels[i] for i in val_indices]\nval_images = [images[i] for i in val_indices]\n"

In [None]:
# Create train and validation datasets
train_dataset = {'image': train_images, 'label': train_labels}
val_dataset = {'image': val_images, 'label': val_labels}

# Print the number of samples in each set
print(f"Number of samples in the train set: {len(train_labels)}")
print(f"Number of samples in the validation set: {len(val_labels)}")

Number of samples in the train set: 500
Number of samples in the validation set: 500


In [None]:
# From the NCT-CRC 999 samples, we create train and validation sets of 500 images each

# test_dataset_path = "/home/rakib/data/CRC-VAL-HE-7K"

# Test dataset contains 7,180 images
test_dataset = load_dataset("imagefolder", data_dir="/home/rakib/data/CRC-VAL-HE-7K", cache_dir=cache_dir)
print(f"Training dataset size: {len(train_dataset)}\n" f"Validation dataset size: {len(val_dataset)}\n" f"Test dataset size: {len(test_dataset)}\n")

Resolving data files:   0%|          | 0/7180 [00:00<?, ?it/s]

Training dataset size: 2
Validation dataset size: 2
Test dataset size: 1



In [None]:
test_dataset.unique

<bound method DatasetDict.unique of DatasetDict({
    train: Dataset({
        features: ['image', 'label'],
        num_rows: 7180
    })
})>

In [None]:
from datasets import Dataset

# Create train and validation datasets
train_dataset = Dataset.from_dict({'image': train_images, 'label': train_labels})
val_dataset = Dataset.from_dict({'image': val_images, 'label': val_labels})
test_dataset = Dataset.from_dict({'image': test_dataset['train']['image'], 'label': test_dataset['train']['label']})
print(f"Training dataset size: {len(train_dataset)}\n" f"Validation dataset size: {len(val_dataset)}\n" f"Test dataset size: {len(test_dataset)}\n")

Training dataset size: 500
Validation dataset size: 500
Test dataset size: 7180



In [None]:
image_processor = AutoImageProcessor.from_pretrained("owkin/phikon")
print(image_processor)

ViTImageProcessor {
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.485,
    0.456,
    0.406
  ],
  "image_processor_type": "ViTImageProcessor",
  "image_std": [
    0.229,
    0.224,
    0.225
  ],
  "resample": 2,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "height": 224,
    "width": 224
  }
}



In [None]:
from typing import Dict, Any
from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

# ImageNet normalization
normalize = Normalize(
    mean=image_processor.image_mean,
    std=image_processor.image_std
)

# train transforms = random crop, resizing to 224x224, random flip, normalization
train_transforms = Compose(
    [
        RandomResizedCrop(image_processor.size["height"]),
        RandomHorizontalFlip(),
        ToTensor(),
        normalize,
    ]
)

# val transforms = resizing to 224x224, normalization
val_transforms = Compose(
    [
        Resize(image_processor.size["height"]),
        CenterCrop(image_processor.size["height"]),
        ToTensor(),
        normalize,
    ]
)


In [None]:

'''
def preprocess_train(example_batch: dict[str, Any]) -> dict[str, Any]:
    """Apply ``train_transforms`` across a batch."""
    example_batch["pixel_values"] = [
        train_transforms(image) for image in example_batch["image"]
    ]
    return example_batch


def preprocess_val(example_batch: dict[str, Any]) -> dict[str, Any]:
    """Apply ``val_transforms`` across a batch."""
    example_batch["pixel_values"] = [
        val_transforms(image) for image in example_batch["image"]
    ]
    return example_batch
'''

# Modified to avoid type error due to python3.8
def preprocess_train(example_batch: Dict[str, Any]) -> Dict[str, Any]:
    """Apply ``train_transforms`` across a batch."""
    example_batch["pixel_values"] = [
        train_transforms(image) for image in example_batch["image"]
    ]
    return example_batch


def preprocess_val(example_batch: Dict[str, Any]) -> Dict[str, Any]:
    """Apply ``val_transforms`` across a batch."""
    example_batch["pixel_values"] = [
        val_transforms(image) for image in example_batch["image"]
    ]
    return example_batch

# Apply the transformations
train_dataset.set_transform(preprocess_train)
val_dataset.set_transform(preprocess_val)
test_dataset.set_transform(preprocess_val)

In [None]:
from transformers import AutoModelForImageClassification

# Labels from our dataset
label2id = {
    '0': "ADI",
    '1': "BACK",
    '2': "DEB",
    '3': "LYM",
    '4': "MUC",
    '5': "MUS",
    '6': "NORM",
    '7': "STR",
    '8': "TUM"
}
id2label = {v: k for (k, v) in label2id.items()}

# Load the model
model = AutoModelForImageClassification.from_pretrained(
    "owkin/phikon",
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=False,
    cache_dir=cache_dir,
)
print_trainable_parameters(model)

In [None]:
# We also create a version of Phikon where the model is kept frozen and only the classifier head is trained (0.01% of the training parameters).
from copy import deepcopy

frozen_model = deepcopy(model)

for name, param in frozen_model.named_parameters():
     if not name.startswith("classifier."):
        param.requires_grad = False
print_trainable_parameters(frozen_model)

trainable params: 6921 || all params: 85805577 || trainable%: 0.01


In [None]:
# LoRA fine-tuning only requires 0.70% of the original trainable parameters!
from peft import LoraConfig, get_peft_model


# load and configure LoRA from Hugging Face peft library
config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    modules_to_save=["classifier"],
)
lora_model = get_peft_model(model, config)
print_trainable_parameters(lora_model)

trainable params: 596745 || all params: 86402322 || trainable%: 0.69


In [None]:
# Training Config.

import numpy as np
import torch

import evaluate
from transformers import TrainingArguments, Trainer

# LoRA configuration

batch_size = 24
args = TrainingArguments(
    "phikon-finetuned-nct-1k",
    remove_unused_columns=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    gradient_accumulation_steps=1,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    seed=SEED,
    num_train_epochs=10,
    logging_steps=1,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",  # dataset is roughly balanced
    push_to_hub=False,
    label_names=["labels"],
)

# Metric configuration

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred: np.ndarray) -> float:
    """Computes accuracy on a batch of predictions."""
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return metric.compute(predictions=predictions, references=eval_pred.label_ids)

# Inputs generation for training

# Modified to avoid type error (def collate_fn(examples) -> dict[str, torch.Tensor]:)
def collate_fn(examples) -> Dict[str, torch.Tensor]:
    """Create the inputs for LoRA from an example in the dataset."""
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

# Here is the final trainer
trainer_lora = Trainer(
    model=lora_model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [None]:
# Train

import warnings

from transformers.utils import logging


# We display the accuracy on the test set at the end
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning)
    train_results_lora = trainer_lora.train()
    metrics_lora = trainer_lora.evaluate(test_dataset)
    trainer_lora.log_metrics("Fine-tuned model: VAL-CRC-7K", metrics_lora)

Epoch,Training Loss,Validation Loss,Accuracy
1,0.0845,0.344558,0.94
2,0.038,0.255675,0.956
3,0.0042,0.196676,0.97
4,0.1994,0.218456,0.97
5,0.2426,0.205281,0.974
6,0.7117,0.197085,0.976
7,0.0036,0.22619,0.974
8,0.0538,0.243062,0.968
9,0.0001,0.215306,0.966
10,0.0007,0.229913,0.968


Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-21 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-42 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-63 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-84 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-105 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-126 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k

***** Fine-tuned model: VAL-CRC-7K metrics *****
  epoch                   =       10.0
  eval_accuracy           =     0.8703
  eval_loss               =     0.7006
  eval_runtime            = 0:00:23.42
  eval_samples_per_second =    306.465
  eval_steps_per_second   =     12.805



We now do the same training thing fully-frozen Phikon.

We observe up to a +2 increase in multi-class accuracy using LoRA fine-tuning, for only 30 seconds of extra training cost.


In [None]:

trainer_frozen = Trainer(
    frozen_model,
    args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)

with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning)
    train_results_frozen = trainer_frozen.train()
    metrics_frozen = trainer_frozen.evaluate(test_dataset)
    trainer_frozen.log_metrics("Frozen model: VAL-CRC-7K", metrics_frozen)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.0216,0.205745,0.962
2,0.0317,0.208488,0.972
3,0.0001,0.245477,0.962
4,0.153,0.301633,0.958
5,0.0078,0.279823,0.966
6,0.5184,0.228675,0.968
7,0.2187,0.217447,0.97
8,0.0051,0.242014,0.964
9,0.0,0.230358,0.964
10,0.0002,0.226605,0.966


Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-21 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-42 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-63 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-84 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-105 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k/checkpoint-126 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory phikon-finetuned-nct-1k

***** Frozen model: VAL-CRC-7K metrics *****
  epoch                   =       10.0
  eval_accuracy           =     0.8435
  eval_loss               =     0.5429
  eval_runtime            = 0:00:22.38
  eval_samples_per_second =    320.777
  eval_steps_per_second   =     13.403


# **Visualizing features**

We can then visualize the features. We do this for a frozen model as well as LoRA in order to examine the differences in the embeddings.

In [None]:
from tqdm.notebook import tqdm
import pandas as pd

from matplotlib.axes._axes import Axes
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.manifold import TSNE

# First we define a set of functions to
# 1) get the embeddings from the models
# 2) compute the 2D projections using the t-SNE algorithm
# 3) visualize these projections using ``seaborn```

def get_raw_embeddings(model, dataset, use_fp16: bool = True):
    """Retrieve tiles embeddings from a model equipped with a classifier head."""
    embeddings = []
    for pixel_values in tqdm(dataset["pixel_values"]):
        image = pixel_values.unsqueeze(0).to(
            "cuda:0" if torch.cuda.is_available() else "cpu",
            torch.float16 if use_fp16 else torch.float32
          )
        output = model(image, output_hidden_states=True)
        _embeddings = output.hidden_states[-1][:, 0, :].detach().cpu().numpy()
        embeddings.append(_embeddings)
    return np.concatenate(embeddings, axis=0)


def get_tsne_embeddings(raw_embeddings: np.ndarray, **kwargs):
    """Compute 2-dimensional tsne projections from raw embeddings."""
    tsne = TSNE(**kwargs)
    tsne_embeddings = tsne.fit_transform(raw_embeddings)
    tsne_embeddings = pd.DataFrame(tsne_embeddings, columns=["tsne-1", "tsne-2"])
    tsne_embeddings["Tissue type"] = test_subset_labels
    tsne_embeddings["Tissue type"] = tsne_embeddings["Tissue type"].astype(str).replace(label2id)
    return tsne_embeddings

def plot_tsne_embeddings(tsne_embeddings: np.ndarray, title: str, ax: Axes):
    """Plot tsne embeddings in the 2D space."""
    sns.scatterplot(
        x="tsne-1", y="tsne-2",
        hue="Tissue type",
        palette=sns.color_palette("hls", 9),
        data=tsne_embeddings,
        legend="full",
        alpha=0.3,
        ax=ax
    )
    ax.set_title(title)
    return ax

We consider a subset of 1,000 images from the original test set.

In [None]:
subset_size = 7000
test_subset = test_dataset[:subset_size]
# Randomly sample from the dataset according to the number of samples
# test_data = test_dataset

# Get labels and images
# subset_labels = test_data['label']
# subset_images = test_data['image']
# random_test_indices = random.sample(range(len(labels)), subset_size)

# Extract sampled labels and images
# subset_labels = [labels[i] for i in random_indices]
# subset_images = [images[i] for i in random_indices]


print(f"Computing LORA and frozen models embeddings on 7168 test images...")
test_dataset_embeddings_lora = get_raw_embeddings(
    model=lora_model, dataset=subset_images
)
test_dataset_embeddings_frozen = get_raw_embeddings(
    model=frozen_model, dataset=subset_images
)
test_dataset_labels = np.array(subset_labels)

print("Computing tsne projections...")
tsne_embeddings_lora = get_tsne_embeddings(
    test_dataset_embeddings_lora, n_components=2
)
tsne_embeddings_frozen = get_tsne_embeddings(
    test_dataset_embeddings_frozen, n_components=2
  )

The differences between the LoRA fine-tuned and frozen models are small due to the highly separable nature of NCT-CRC prediction task (different tissues can be distinguished easily by the naked eye). However, we notice that LoRA fine-tuning allows to better disentangle clusters such as Lymphocytes (Yellow) and Tumor (red), which can play a significant role in cancer diagnosis.

In [None]:
print("Plotting in 2 dimensions.")
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
axes[0] = plot_tsne_embeddings(
    tsne_embeddings_lora, title="Lora embeddings", ax=axes[0]
)
axes[1] = plot_tsne_embeddings(
    tsne_embeddings_frozen, title="Frozen embeddings", ax=axes[1]
)
plt.show()