<h1>al_bench</h1>
Example use of the al_bench Active Learning Benchmark Tool

In [None]:
# Install needed packages
!pip install h5py numpy tensorflow tensorboard
!pip install -e /tf/notebooks/al_bench

<h2>Find a dataset and create a Dataset Handler</h2>
Fetch a dataset of 4598 feature vectors of length 1280 and their 4598 labels.

In [1]:
import al_bench as alb
import h5py as h5
import numpy as np
import random

filename = "../test/TCGA-A2-A0D0-DX1_xmin68482_ymin39071_MPP-0.2500.h5py"
with h5.File(filename) as ds:
    my_feature_vectors = np.array(ds["features"])
    print(
        f"Read in {my_feature_vectors.shape[0]} feature vectors of length {my_feature_vectors.shape[1]}."
    )
    my_labels = np.array(ds["labels"])
    print(f"Read in {my_labels.shape[0]} labels for the feature vectors.")
my_label_definitions = [
    {
        0: {"description": "other"},
        1: {"description": "tumor"},
        2: {"description": "stroma"},
        3: {"description": "infiltrate"},
    }
]
my_dataset_handler = alb.dataset.GenericDatasetHandler()
my_dataset_handler.set_all_feature_vectors(my_feature_vectors)
my_dataset_handler.set_all_label_definitions(my_label_definitions)
my_dataset_handler.set_all_labels(my_labels)

# Set aside disjoint sets of examples for use in validation and as the initial training set
number_of_validation_indices = my_feature_vectors.shape[0] // 10
number_of_initial_training = 20
random_samples = random.sample(
    range(my_feature_vectors.shape[0]),
    number_of_validation_indices + number_of_initial_training,
)
my_dataset_handler.set_validation_indices(
    np.array(random_samples[:number_of_validation_indices])
)
currently_labeled_examples = np.array(random_samples[number_of_validation_indices:])

2022-11-15 14:14:37.724932: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-15 14:14:37.853871: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-15 14:14:37.890005: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-15 14:14:38.500053: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; 

Read in 4598 feature vectors of length 1280.
Read in 4598 labels for the feature vectors.


<h2>Create a model and a Model Handler</h2>
Build a model that we will train.  We will build both a TensorFlow model and a PyTorch model, though normally one model is sufficient.  We'll choose one of them for use with the active learning strategy.

In [2]:
number_of_categories = len(my_label_definitions[0])
number_of_features = my_feature_vectors.shape[1]
hidden_units = 128
dropout = 0.3

<h3>Build a TensorFlow model and its Model Handler</h3>

In [3]:
import tensorflow as tf

my_tensorflow_model = tf.keras.models.Sequential(
    [
        tf.keras.Input(shape=(number_of_features,)),
        tf.keras.layers.Dense(hidden_units, activation="relu"),
        tf.keras.layers.Dropout(dropout, noise_shape=None, seed=20220909),
        tf.keras.layers.Dense(number_of_categories, activation="softmax"),
    ],
    name=(
        f"{number_of_categories}_labels_from_{number_of_features}_features_with_"
        f"dropout_{dropout}"
    ),
)
my_tensorflow_model_handler = alb.model.TensorFlowModelHandler()
my_tensorflow_model_handler.set_model(my_tensorflow_model)
print("Tensorflow model handler built")

Tensorflow model handler built


2022-11-15 14:14:39.238755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2006] Ignoring visible gpu device (device: 1, name: Quadro P400, pci bus id: 0000:a6:00.0, compute capability: 6.1) with core count: 2. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
2022-11-15 14:14:39.239160: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-15 14:14:39.898377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22327 MB memory:  -> device: 0, name: NVIDIA RTX A5000, pci bus id: 0000:73:00.0, compute capability: 8.6


<h3>Build a Torch model and its Model Handler</h3>

In [4]:
import torch


class MyTorchModel(torch.nn.modules.module.Module):
    def __init__(self, number_of_features, number_of_categories):
        super(MyTorchModel, self).__init__()
        self.fc1 = torch.nn.Linear(number_of_features, hidden_units)
        self.relu1 = torch.nn.ReLU()
        self.dropout1 = torch.nn.Dropout(p=dropout)
        self.fc2 = torch.nn.Linear(hidden_units, number_of_categories)
        self.softmax1 = torch.nn.Softmax(dim=-1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.dropout1(x)
        x = self.fc2(x)
        x = self.softmax1(x)
        return x


my_torch_model = MyTorchModel(number_of_features, number_of_categories)

my_pytorch_model_handler = alb.model.PyTorchModelHandler()
my_pytorch_model_handler.set_model(my_torch_model)
print("PyTorch model handler built")

PyTorch model handler built


<h3>Choose one of the models to proceed with</h3>

In [5]:
my_model_handler = my_tensorflow_model_handler
# my_model_handler = my_pytorch_model_handler

<h2>Make use of Strategy Handlers for active learning</h2>
Let's run and compare four active learning strategies

In [6]:
import shutil
import os

try:
    shutil.rmtree("runs")  # DELETE OLD LOG FILES
except:
    pass

for name, my_strategy_handler in (
    ("Random", alb.strategy.RandomStrategyHandler()),
    ("LeastConfidence", alb.strategy.LeastConfidenceStrategyHandler()),
    ("LeastMargin", alb.strategy.LeastMarginStrategyHandler()),
    ("Entropy", alb.strategy.EntropyStrategyHandler()),
):
    print(f"=== Begin Strategy: {name} ===")
    my_strategy_handler.set_dataset_handler(my_dataset_handler)
    my_strategy_handler.set_model_handler(my_model_handler)
    my_strategy_handler.set_learning_parameters(
        label_of_interest=0,  # We've supplied only one label per feature vector
        maximum_queries=8,
        number_to_select_per_query=10,
    )
    # Simulate the strategy once
    my_strategy_handler.run(currently_labeled_examples)
    # Write out collected information to disk
    log_dir = os.path.join("runs", name)
    # Write accuracy and loss information during training
    my_strategy_handler.write_train_log_for_tensorboard(log_dir=log_dir)
    # Write confidence statistics during active learning
    my_strategy_handler.write_confidence_log_for_tensorboard(log_dir=log_dir)
print("=== Done ===")

=== Begin Strategy: Random ===
Training with 20 examples


2022-11-15 14:14:42.661687: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


Predicting for 4598 examples
Training with 30 examples
Predicting for 4598 examples
Training with 40 examples
Predicting for 4598 examples
Training with 50 examples
Predicting for 4598 examples
Training with 60 examples
Predicting for 4598 examples
Training with 70 examples
Predicting for 4598 examples
Training with 80 examples
Predicting for 4598 examples
Training with 90 examples
Predicting for 4598 examples
Training with 100 examples
Predicting for 4598 examples
=== Begin Strategy: LeastConfidence ===
Training with 20 examples
Predicting for 4598 examples
Training with 30 examples
Predicting for 4598 examples
Training with 40 examples
Predicting for 4598 examples
Training with 50 examples
Predicting for 4598 examples
Training with 60 examples
Predicting for 4598 examples
Training with 70 examples
Predicting for 4598 examples
Training with 80 examples
Predicting for 4598 examples
Training with 90 examples
Predicting for 4598 examples
Training with 100 examples
Predicting for 4598 exa

<h2>Use with TensorBoard</h2>

In [7]:
%load_ext tensorboard
%tensorboard --logdir runs