## How to add new AL query strategies / unlabeled pool subsampling strategies

This notebook demonstrates three simple steps to benchmark a new query / unlabeled pool subsampling strategy.

#### 1. Prepare the file and the global variable

In [1]:
# Where the file with the strategy is located
FOLDER_WITH_STRATEGIES = "custom_strategy"
!mkdir $FOLDER_WITH_STRATEGIES
# Name of the AL strategy & file
AL_STRATEGY_NAME = "least_confidence.py"
# Name of the unlabeled pool subsampling strategy addition
SUBSAMPLING_STRATEGY_NAME = "top_from_previous_iteration_subsampling.py"
CUR_PATH = !pwd
# Absolute path to the AL strategy
PATH_TO_AL_STRATEGY = f"{CUR_PATH[0]}/{FOLDER_WITH_STRATEGIES}/{AL_STRATEGY_NAME}"
# Absolute path to the strategy
PATH_TO_SUBSAMPLING_STRATEGY = (
    f"{CUR_PATH[0]}/{FOLDER_WITH_STRATEGIES}/{SUBSAMPLING_STRATEGY_NAME}"
)

#### 2. Write your strategies

In [2]:
%%writefile $PATH_TO_AL_STRATEGY

import numpy as np

def least_confidence(model, X_pool, n_instances, **kwargs):
    probas = model.predict_proba(X_pool)
    uncertainty_estimates = 1 - probas.max(axis=1)
    query_idx = np.argsort(-uncertainty_estimates)[:n_instances]
    query = X_pool.select(query_idx)
    return query_idx, query, uncertainty_estimates

Writing /Users/tsvigun/Anaconda/papers/active_learning/examples/strategy_folder/least_confidence.py


In [3]:
%%writefile $PATH_TO_SUBSAMPLING_STRATEGY

import numpy as np

def top_from_previous_iteration_subsampling(uncertainty_estimates, gamma_or_k_confident_to_save, **kwargs):
    if isinstance(gamma_or_k_confident_to_save, float):
        gamma_or_k_confident_to_save = int(
            gamma_or_k_confident_to_save * len(uncertainty_estimates)
        )
    argsort = np.argsort(-uncertainty_estimates)
    return argsort[:gamma_or_k_confident_to_save]

Writing /Users/tsvigun/Anaconda/papers/active_learning/examples/strategy_folder/top_from_previous_iteration_subsampling.py


#### 3. Use your strategies:

- AL strategy: `config.al.strategy=$PATH_TO_AL_STRATEGY`

- Unlabeled pool subsampling strategy: `config.al.sampling_type=$PATH_TO_SUBSAMPLING_STRATEGY`

Test with 1 GPU: (substitute `custom_strategy/least_confidence` & `custom_strategy/top_from_previous_iteration_subsampling` with your strategies name)

In [None]:
%%bash
CUDA_VISIBLE_DEVICES='0' HYDRA_CONFIG_PATH=../al_benchmark/configs \
HYDRA_CONFIG_NAME=al_cls python ../al_benchmark/run_active_learning.py \
al.strategy=custom_strategy/least_confidence \
al.sampling_type=custom_strategy/top_from_previous_iteration_subsampling \
acquisition_model.checkpoint=roberta-base

The results will be located in the file `workdir/run_active_learning/TODAY_DATE/TIME_SEED_MODEL/acquisition_metrics.json`.