In [1]:
from torchvision import datasets, transforms

train = datasets.MNIST(root="data", train=True, download=True, 
                       transform=transforms.ToTensor())
test = datasets.MNIST(root="data", train=False, download=True, 
                      transform=transforms.ToTensor())

We will use a ResNet18 backbone to train a model.

In [2]:
import deeplay as dl

backbone = dl.models.BackboneResnet18(in_channels=1, pool_output=True)
head = dl.MultiLayerPerceptron(512, [], 10)

classifier_net = dl.Sequential(backbone, head)

First, we train a model on the full dataset as a baseline. We should see an accuracy between 99.0-99.5% on the test set.

In [3]:
import torch
import torchmetrics as tm

train_dataloader = torch.utils.data.DataLoader(train, batch_size=128, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test, batch_size=1024, shuffle=False)

accuracy = tm.Accuracy(task="multiclass", num_classes=10)

"""
classifier = dl.CategoricalClassifier(classifier_net.new(),
                                      optimizer=dl.Adam(lr=1e-3),
                                      num_classes=10,
                                      metrics=[accuracy]).build()

trainer = dl.Trainer(max_epochs=1)                                              ### trainer = dl.Trainer(max_epochs=30)
trainer.fit(classifier, train_dataloader)
full_results = trainer.test(classifier, test_dataloader)
print(full_results[0])
"""

'\nclassifier = dl.CategoricalClassifier(classifier_net.new(),\n                                      optimizer=dl.Adam(lr=1e-3),\n                                      num_classes=10,\n                                      metrics=[accuracy]).build()\n\ntrainer = dl.Trainer(max_epochs=1)                                              ### trainer = dl.Trainer(max_epochs=30)\ntrainer.fit(classifier, train_dataloader)\nfull_results = trainer.test(classifier, test_dataloader)\nprint(full_results[0])\n'

Next, we will test three active learning strategies. We will use the same ResNet18 model and train it on a small subset of the data. We will use the following strategies:
- Random sampling (uniform)
- Uncertainty sampling (Smallest margin)
- Adversarial sampling 

The experiments will be repeated five times for statistical significance. We will compare the performance of the models on the test set and the number of samples required to reach a certain accuracy.

First, we define the configurations of the experiments

In [4]:
import torch
import numpy as np

budget_per_iteration = 2                                                        ### budget_per_iteration = 120
max_budget = 10                                                                 ### max_budget = 1800
trials = 1                                                                      ### trials = 5

# Number of rounds per trial
rounds = max_budget // budget_per_iteration - 1

uniform_experiment_accuracy = np.empty((trials, rounds))
uncertainty_experiment_accuracy = np.empty((trials, rounds))
adversarial_experiment_accuracy = np.empty((1, rounds))  ### only one trial for adversarial bc it's slow and stable

Next, we define a reusable active learning loop. This loop will be used to test the three strategies.
In the loop, we will:
1. Train the model on the current training set (trainer.fit)
2. Evaluate the model on the test set (trainer.test)
3. Use the active learning strategy to select the next samples (strategy.query_and_update)
4. Reset the model to the starting state, such that each round of active learning starts training from scratch.

An alternative formulation would omit the fourth step and continue training from the previous model state. This is useful if the traininig of the model is expensive. 
However, for this example, the training is relatively fast, so we will reset the model to the starting state.

In [5]:
def active_learning_loop(strategy, epochs):
    trainer = dl.Trainer(max_epochs=epochs, 
                            enable_checkpointing=False,
                            enable_model_summary=False)
    trainer.fit(strategy)

    #test_results = trainer.test(strategy, test_dataloader)
    #accuracy = test_results[0]["testMulticlassAccuracy"]
    
    print("1")
    strategy.query_and_update(budget_per_iteration)
    print("2")
    
    # Reset the model to the initial state.
    strategy.reset_model()
    return accuracy

The first strategy is uniform random sampling. To perform an active learning strategy, we first need to wrap the training data with a `ActiveLearningDataset` object. This object keeps track of the samples that have been annotated and the samples that are still unannotated. The `ActiveLearningDataset` object also provides a method to query the next samples to annotate. At the start, all data is assume to be unannotated.

Then, we initialize the training dataset by randomly annotating a small subset of the data. This is required for all three active learning strategies we will test.

Next, we create the strategy object, which contains the query strategy. It takes a model as input, together with the training data pool, the test set, a batch size, and a list of metrics.

Finally, we run the active learning loop for `rounds` iterations, each round training for 40 epochs.

In [6]:
import deeplay.activelearning as al

Let's try if margin uncertainty sampling can help us improve the model's performance.

Margin does indeed work better than random sampling. However, neural networks are generally not very good at estimating their own uncertainty. More advanced methods try to mitigate this by using alternate means of estimating uncertainty. For example, ensamble methods, Monte Carlo dropout, or by estimating the loss of the model.

Another issue with uncertainty sampling is that they can be biased towards outliers or datapoints with incomplete information. These are generally the datapoints that are the most uncertain, but they are not necessarily the most informative. There are a few other measures of informativeness that try to mitigate this issue. For example, the expected model change, diversity or representativeness of the data. In fact, a combination of these measures can perform better than any single measure. 

We'll explore a combination of uncertainty sampling and diversity sampling, using an adversarial approach. The idea is to adversarially train a discriminator to distinguish between the embeddings of images that have been annotated and those that have not. This has several advantages. First, the discriminator can indicate diversity. If the discriminator predicts that an unlabeled image is labeled, that means that the image is similar to already labeled images and might not be very informative. Second, by adversarially training the backbone to fool the discriminator, we are enforcing a structure to the embeddings using all the data in the dataset. This additional structure can help the model generalize better on small training sets.

In [7]:
discriminator = dl.MultiLayerPerceptron(512, [512, 512], 1,
                                        out_activation=torch.nn.Sigmoid())
discriminator.initialize(dl.initializers.Kaiming())


adversarial_train_pool = al.ActiveLearningDataset(train)
adversarial_train_pool.annotate_random(budget_per_iteration)

array([38067, 59485])

In [8]:
adversarial_strategy = al.AdversarialStrategy(
    backbone=backbone.new(),
    classification_head=head.new(),
    discriminator_head=discriminator.new(),
    train_pool=adversarial_train_pool,
    criterion=al.Margin(),
    batch_size=128,
    test_metrics=[accuracy]
).build()

In [9]:
"""
for i in range(rounds):
    adversarial_experiment_accuracy[0, i] = active_learning_loop(adversarial_strategy, 1)       ### adversarial_experiment_accuracy[0, i] = active_learning_loop(adversarial_strategy, 5)
"""

adversarial_experiment_accuracy = active_learning_loop(adversarial_strategy, 1)

/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_dlcc/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_dlcc/lib/python3.12/site-packages/lightning/pytorch/trainer/configuration_validator.py:74: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.


Output()

/Users/giovannivolpe/Documents/GitHub/DeepLearningCrashCourse/py_env_dlcc/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=10` in the `DataLoader` to improve performance.


We plot the accuracy as a function of the number of annotated images. We find that the adversarial approach performs better than random sampling and margin sampling, and it is even competitive with the full dataset. Moreover, the adversarial approach is more stable than the other methods. 

In [None]:
import matplotlib.pyplot as plt

x = np.arange(budget_per_iteration, max_budget, budget_per_iteration)

plt.plot(x, np.median(uniform_experiment_accuracy, 0), label="Uniform", linestyle="--")
plt.plot(x, np.median(uncertainty_experiment_accuracy, 0), label="Uncertainty", linestyle="-.")
plt.plot(x, adversarial_experiment_accuracy[0], label="Adversarial", linestyle="-")
plt.axhline(full_results[0]["testMulticlassAccuracy_epoch"], label="Full Test Accuracy", color="black", linestyle=":")

plt.xlabel("Number of Annotated Samples")
plt.ylabel("Test Accuracy")
plt.ylim(0.9, 1)
plt.yticks([0.9, 0.95, full_results[0]["testMulticlassAccuracy_epoch"]])
plt.legend()

In [None]:
accuracy_levels = np.linspace(0.90, 1.0, 25)
num_samples = np.arange(budget_per_iteration, max_budget, budget_per_iteration)

average_samples_uniform = [num_samples[np.argmax(uniform_experiment_accuracy > level, axis=1)] for level in accuracy_levels]
average_samples_uncertainty = [num_samples[np.argmax(uncertainty_experiment_accuracy > level, axis=1)] for level in accuracy_levels]
average_samples_adversarial = [num_samples[np.argmax(adversarial_experiment_accuracy > level, axis=1)] for level in accuracy_levels]

# if the accuracy is not reached, the number of samples is set to the maximum budget
average_samples_uniform = [np.where(samples == budget_per_iteration, max_budget, samples).mean(-1) for samples in average_samples_uniform]
average_samples_uncertainty = [np.where(samples == budget_per_iteration, max_budget, samples).mean(-1) for samples in average_samples_uncertainty]
average_samples_adversarial = [np.where(samples == budget_per_iteration, max_budget, samples).mean(-1) for samples in average_samples_adversarial]

# averag

plt.figure()
plt.plot(accuracy_levels, average_samples_uniform, label="Uniform", linestyle="--")
plt.plot(accuracy_levels, average_samples_uncertainty, label="Uncertainty", linestyle="-.")
plt.plot(accuracy_levels, average_samples_adversarial, label="Adversarial", linestyle="-")
plt.xlabel("Test Accuracy")
plt.ylabel("Average Number of Annotated Samples")
plt.legend()
plt.show()