<h1>ALBench project: BatchBALD</h1>

<p>This Jupyter lab demonstrates use of the al_bench Active Learning Benchmark Tool with a Bayesian Neural Network and BatchBALD</p>

<h2>Install needed Python packages</h2>

<p>If you haven't yet installed these packages, remove the "<code>#</code>" characters and run this code block.</p>

In [1]:
#!pip install -e ../../ALBench  # Installs al_bench and dependencies
#!pip install ipywidgets
#!jupyter labextension install @jupyter-widgets/jupyterlab-manager

import al_bench as alb
import batchbald_redux as bbald
import batchbald_redux.active_learning
import batchbald_redux.consistent_mc_dropout
import batchbald_redux.repeated_mnist
import numpy as np
import os
import random
import shutil
import torch

2023-02-07 09:50:33.793853: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-07 09:50:33.918944: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-07 09:50:33.954652: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-02-07 09:50:34.548646: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; 

<h2>Overview</h2>

<p>Write me</p>

<h2>Find a dataset and create a Dataset Handler</h2>

<p>Write me</p>

In [2]:
mnist = bbald.repeated_mnist.create_repeated_MNIST_dataset(
    num_repetitions=1, add_noise=False
)
train_dataset, test_dataset = mnist

print("Dataset is read")

Dataset is read


In [3]:
# al_bench datasets are supplied as numpy arrays.
# Build the numpy arrays from a subset of the data
train_dataset_list = [d for d in train_dataset][:600]  # 600 (data, label) pairs
test_dataset_list = [d for d in test_dataset][:100]
number_of_training_indices = len(train_dataset_list)
number_of_validation_indices = len(test_dataset_list)
dataset_list = train_dataset_list + test_dataset_list
my_feature_vectors = np.concatenate([d[0].numpy() for d in dataset_list])  # data only
my_labels = np.array([[d[1]] for d in dataset_list])  # labels only
# This dataset is the digits "0" through "9" which will be enumerated with the
# values 0 through 9.
num_classes = 10
my_label_definitions = [{i: {"description": repr(i)} for i in range(num_classes)}]
print("Dataset is converted to numpy")

# Tell al_bench about the dataset
my_dataset_handler = alb.dataset.GenericDatasetHandler()
my_dataset_handler.set_all_feature_vectors(my_feature_vectors)
my_dataset_handler.set_all_label_definitions(my_label_definitions)
my_dataset_handler.set_all_labels(my_labels)

# Set aside disjoint sets of examples for use in validation and as the initial training
# set
my_dataset_handler.set_validation_indices(
    np.array(
        range(
            number_of_training_indices,
            number_of_training_indices + number_of_validation_indices,
        )
    )
)
number_of_initial_training = 20
random_samples = random.sample(
    range(number_of_training_indices), number_of_initial_training
)
currently_labeled_examples = np.array(random_samples)
print("Datahandler is initialized")

Dataset is converted to numpy
Datahandler is initialized


<h2>Create a model and a Model Handler</h2>

<p>Write me</p>

In [4]:
feature_shape = my_feature_vectors.shape[1:]
print(f"{feature_shape = }")

feature_shape = (28, 28)


<h3>Build a TensorFlow model and its Model Handler</h3>

In [5]:
# batchbald_redux does not support tensorflow at this time

<h3>Build a Torch model and its Model Handler</h3>

In [6]:
class BayesianCNN(bbald.consistent_mc_dropout.BayesianModule):
    def __init__(self, num_classes=10):
        super().__init__()

        self.conv1 = torch.nn.Conv2d(1, 32, kernel_size=5)
        self.conv1_drop = bbald.consistent_mc_dropout.ConsistentMCDropout2d()
        self.conv2 = torch.nn.Conv2d(32, 64, kernel_size=5)
        self.conv2_drop = bbald.consistent_mc_dropout.ConsistentMCDropout2d()
        self.fc1 = torch.nn.Linear(1024, 128)
        self.fc1_drop = bbald.consistent_mc_dropout.ConsistentMCDropout()
        self.fc2 = torch.nn.Linear(128, num_classes)

    def mc_forward_impl(self, input: torch.Tensor):
        input = torch.nn.functional.relu(
            torch.nn.functional.max_pool2d(self.conv1_drop(self.conv1(input)), 2)
        )
        input = torch.nn.functional.relu(
            torch.nn.functional.max_pool2d(self.conv2_drop(self.conv2(input)), 2)
        )
        input = input.view(-1, 1024)
        input = torch.nn.functional.relu(self.fc1_drop(self.fc1(input)))
        input = self.fc2(input)
        input = torch.nn.functional.log_softmax(input, dim=1)

        return input


my_pytorch_model = BayesianCNN(num_classes)
print("Created torch model")

Created torch model


In [7]:
# Tell al_bench about the model
my_pytorch_model_handler = alb.model.SamplingBayesianPyTorchModelHandler()
my_pytorch_model_handler.set_model(my_pytorch_model)
print("PyTorch model handler built")

PyTorch model handler built


<h3>Choose one of the models to proceed with</h3>

<p>Write me</p>

In [8]:
# my_model_handler = my_tensorflow_model_handler
my_model_handler = my_pytorch_model_handler

<h2>Make use of Strategy Handlers for active learning</h2>

<p>Write me</p>


In [9]:
all_logs_dir = "runs-SamplingBayesian"
try:
    shutil.rmtree(all_logs_dir)  # DELETE OLD LOG FILES
except:
    pass

for name, my_strategy_handler in (
    ("BALD", alb.strategy.BaldStrategyHandler()),
    ("BatchBALD", alb.strategy.BatchBaldStrategyHandler()),
):
    print(f"=== Begin Strategy: {name} ===")
    my_strategy_handler.set_dataset_handler(my_dataset_handler)
    my_strategy_handler.set_model_handler(my_model_handler)
    my_strategy_handler.set_learning_parameters(
        label_of_interest=0,  # We've supplied only one label per feature vector
        maximum_queries=8,
        number_to_select_per_query=10,
    )

    # ################################################################
    # Simulate the strategy.
    my_strategy_handler.run(currently_labeled_examples)
    # ################################################################

    # We will write out collected information to disk.  First say where:
    log_dir = os.path.join(all_logs_dir, name)
    # Write accuracy and loss information during training
    my_strategy_handler.write_train_log_for_tensorboard(log_dir=log_dir)
    # Write confidence statistics during active learning
    my_strategy_handler.write_confidence_log_for_tensorboard(log_dir=log_dir)
print("=== Done ===")

=== Begin Strategy: BALD ===
Training with 20 examples
Predicting for 700 examples


NotImplementedError: BaldStrategyHandler::select_next_indices is not yet implemented.

<h2>Use with TensorBoard</h2>

<p>Write me</p>

In [None]:
%load_ext tensorboard
%tensorboard --logdir runs

In [None]:
print(currently_labeled_examples)