# Reproducing or downloading the benchmark results and running new configurations on the benchmark

## Reproducing or downloading our results

In this notebook, we discuss how to reproduce the results from our paper and how to benchmark your own methods. Before running this notebook, please follow the installation and data download instructions from the README.md file of the repository.

We will now change the working directory from the examples subfolder to the main folder, which is required for the imports to work correctly.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
os.chdir('../..')   # change directory inside the notebook to the main directory

In [3]:
!pwd

/home/blackhc/PycharmProjects/bmdal_reg


## Running custom configurations on the benchmark

If you want to run your own configurations on the benchmark, you may want to take a look at the code in `run_experiments.py`. Here, we will show a minimalistic example of how to run two custom benchmark configurations, which can run on a CPU in a few minutes. A few other files you may find helpful are:
- `test_single_task.py` allows you to run a single BMDAL configuration on a single split of a single data set, for fast exploration.
- `rename_algs.py` contains a few helper functions to modify/rename/remove saved results. It can be used for example if the names of some experiment results should be changed.

First, we need to create a list of configurations that will be executed:

In [68]:
from bmdal_reg.run_experiments import RunConfigList
from bmdal_reg.train import ModelTrainer
from bmdal_reg.sklearn_models import RandomForestRegressor, CatBoostRegressor, HistGradientBoostingRegressor

# some general configuration for the NN and active learning
kwargs = dict(post_sigma=1e-3, maxdet_sigma=1e-3, weight_gain=0.2, bias_gain=0.2, lr=0.375, act='relu')
run_configs = RunConfigList()
#run_configs.append(1e-6, ModelTrainer(f'NN_random', selection_method='random', create_model=RandomForestRegressor,
                                   #base_kernel='linear', kernel_transforms=[], **kwargs))
run_configs.append(4e-6, ModelTrainer(f'HGR_lcmd-tp_predictions', selection_method='lcmd', sel_with_train=True, create_model=HistGradientBoostingRegressor,
                                             base_kernel='predictions', kernel_transforms=[], **kwargs))

run_configs.append(4e-6, ModelTrainer(f'CAT_lcmd-tp_predictions', selection_method='lcmd', sel_with_train=True, create_model=CatBoostRegressor,
                                             base_kernel='predictions', kernel_transforms=[], **kwargs))
run_configs.append(4e-6, ModelTrainer(f'RF_lcmd-tp_predictions', selection_method='lcmd', sel_with_train=True, create_model=RandomForestRegressor,
                                             base_kernel='predictions', kernel_transforms=[], **kwargs))

In [69]:
from bmdal_reg.run_experiments import run_experiments
from bmdal_reg.train import ModelTrainer
run_experiments(exp_name='relu_small', n_splits=2, run_config_list=run_configs, 
                batch_sizes_configs=[[64, 128]], task_descs=['64-128'], use_pool_for_normalization=True,
                max_jobs_per_device=4, n_train_initial=64, ds_names=['ct', 'kegg_undir_uci'], sequential_split=9)

Task ct has n_pool=41712, n_test=10700, n_features=379
Task kegg_undir_uci has n_pool=50599, n_test=12921, n_features=27
Running all configurations on split 0
Results already exist for HGR_lcmd-tp_predictions on split 0 of task ct_64-128
Results already exist for HGR_lcmd-tp_predictions on split 0 of task kegg_undir_uci_64-128
Results already exist for CAT_lcmd-tp_predictions on split 0 of task ct_64-128
Results already exist for CAT_lcmd-tp_predictions on split 0 of task kegg_undir_uci_64-128
Results already exist for RF_lcmd-tp_predictions on split 0 of task ct_64-128
Results already exist for RF_lcmd-tp_predictions on split 0 of task kegg_undir_uci_64-128
Running all configurations on split 1
Results already exist for HGR_lcmd-tp_predictions on split 1 of task ct_64-128
Results already exist for HGR_lcmd-tp_predictions on split 1 of task kegg_undir_uci_64-128
Results already exist for CAT_lcmd-tp_predictions on split 1 of task kegg_undir_uci_64-128
Results already exist for RF_lcmd-

Here, the meaning of the parameters is as follows:
- `exp_name` is the name of the subfolder that the results will be saved at. This can be used to group experiments together, for example we used separate groups for relu and silu experiments in our paper.
- `n_splits` is the number of random splits that the configurations should be run on. The random splits will be run in order. 
- `run_config_list` is the list of run configurations created previously.
- `batch_sizes_configs` is a list of lists of batch sizes. In our case, we only have one batch size configuration, which is to acquire 64 samples in the first BMAL step and 128 samples in the second BMAL step. For experiments in our paper, we mostly used `batch_sizes_configs=[[256]*16]`.
- `task_descs` is a corresponding list of suffixes for the task names. For example, here the data set `ct` combined with the batch size configuration `[64, 128]` will get the name `ct_64-128`. 
- `use_pool_for_normalization` specifies whether the dataset standardization should use statistics from the training and pool set or only from the training set. We used standardization only from the training set in our experiments, but especially for smaller initial training set sizes, it may be important to standardize also with the pool set.
- `max_jobs_per_device` allows to specify a maximum number of jobs that are run in parallel on a single device (CPU or GPU). Fewer jobs may be executed in parallel if their estimated RAM usage (see above) would otherwise exceed the remaining RAM capacity (measured at the start of `run_experiments`).
- `n_train_initial` specifies the initial training set size, which was 256 in our experiments.
- `ds_names` specifies the names of the data sets that experiments should be run on. Possible names can be found in the data folder specified in `custom_paths.py`. By default, all 15 data sets from the benchmark are used.
- `sequential_split` specifies the index of the random split for which `max_jobs_per_device=1` is used; the results from this split can then be used for runtime evaluation. By default, this is set to 9. Since we only use `n_splits=2` here, this case is not reached.

Since the experiments above were run on a CPU, they took 5 minutes to complete, but this would be much faster on a GPU, especially with even higher `max_jobs_per_device`. If we ran this code again, it would notice that the results are already computed and would not recompute them.

Next, we want to evaluate the results. Unfortunately, we cannot directly use `run_evaluation.py` since its current implementation filters results by the suffix `256x16`, while our results use the suffix `64-128`. Therefore, we give a small example showing how to print a table for the results:

In [70]:
from bmdal_reg.evaluation.analysis import ExperimentResults, print_avg_results
results = ExperimentResults.load('relu_small')
print_avg_results(results, relative_to=None, filter_suffix='')

Averaged results across tasks:
Results for metric mae:
CAT_lcmd-tp_predictions: -1.189 +- 0.016
RF_lcmd-tp_predictions:  -1.070 +- 0.019
NN_lcmd-tp_grad_rp-512:  -1.070 +- 0.019
HGR_lcmd-tp_predictions: -0.968 +- 0.029

Results for metric rmse:
CAT_lcmd-tp_predictions: -0.557 +- 0.018
RF_lcmd-tp_predictions:  -0.548 +- 0.020
NN_lcmd-tp_grad_rp-512:  -0.548 +- 0.020
HGR_lcmd-tp_predictions: -0.506 +- 0.011

Results for metric q95:
RF_lcmd-tp_predictions:  0.205 +- 0.028
NN_lcmd-tp_grad_rp-512:  0.205 +- 0.028
CAT_lcmd-tp_predictions: 0.217 +- 0.001
HGR_lcmd-tp_predictions: 0.269 +- 0.025

Results for metric q99:
HGR_lcmd-tp_predictions: 0.730 +- 0.029
RF_lcmd-tp_predictions:  0.761 +- 0.023
NN_lcmd-tp_grad_rp-512:  0.761 +- 0.023
CAT_lcmd-tp_predictions: 0.772 +- 0.049

Results for metric maxe:
RF_lcmd-tp_predictions:  1.371 +- 0.018
NN_lcmd-tp_grad_rp-512:  1.371 +- 0.018
HGR_lcmd-tp_predictions: 1.430 +- 0.019
CAT_lcmd-tp_predictions: 1.582 +- 0.016






We can also print results on individual data sets:

In [10]:
from evaluation.analysis import print_all_task_results
print_all_task_results(results)

Results for task ct_64-128:
Results for metric mae:
NN_lcmd-tp_grad_rp-512: -1.573 +- 0.016
NN_random:              -1.551 +- 0.010

Results for metric rmse:
NN_lcmd-tp_grad_rp-512: -1.176 +- 0.010
NN_random:              -1.035 +- 0.011

Results for metric q95:
NN_lcmd-tp_grad_rp-512: -0.461 +- 0.012
NN_random:              -0.257 +- 0.011

Results for metric q99:
NN_lcmd-tp_grad_rp-512: 0.139 +- 0.003
NN_random:              0.396 +- 0.015

Results for metric maxe:
NN_lcmd-tp_grad_rp-512: 0.821 +- 0.039
NN_random:              0.960 +- 0.009




Results for task kegg_undir_uci_64-128:
Results for metric mae:
NN_lcmd-tp_grad_rp-512: -1.154 +- 0.201
NN_random:              -0.961 +- 0.057

Results for metric rmse:
NN_lcmd-tp_grad_rp-512: -0.424 +- 0.136
NN_random:              -0.161 +- 0.035

Results for metric q95:
NN_lcmd-tp_grad_rp-512: 0.310 +- 0.109
NN_random:              0.435 +- 0.030

Results for metric q99:
NN_lcmd-tp_grad_rp-512: 1.016 +- 0.117
NN_random:              1.280

The results are saved using the folder structure `results_folder/exp_name/task_name/alg_name/split_idx/results.json`. For example, you can view the tasks we ran experiments on as follows:

In [11]:
from pathlib import Path
import custom_paths
os.listdir(Path(custom_paths.get_results_path()) / 'relu_small')

['ct_64-128', 'kegg_undir_uci_64-128']

## Implementing your own methods

If you want to go beyond using our already implemented combinations of selection methods, kernels and kernel transformations, you have to make some modifications to the code such that your method can be used as above. Depending on how different your method is, we suggest three ways of including it:
- If your method fits to our framework and simply provides new selection methods, kernels and/or kernel transformations, we suggest to extend `BatchSelector.select()` in `bmdal/algorithms.py` such that it can use your new component(s) given the corresponding configuration string(s).
- If your method is a different BMDAL method that does not fit into our framework but does not require to modify the NN training process, you can modify the BMDAL part in `ModelTrainer.__call__()` in `train.py` such that it can call your method. While you could realize this by passing a custom BMDAL class or factory method directly to ModelTrainer, you should note that arguments to ModelTrainer are currently serialized to a JSON file, which is why we prefer using native data types like strings as arguments to ModelTrainer. This serialization of arguments to a JSON file can be helpful for example for automatically generating figure captions later on.
- If you also want to modify the NN training process, you may want to change other code in ModelTrainer or replace it by your own custom class. Note that the NN model creation itself can be customized in ModelTrainer through the `create_model` argument.