# Reproducing or downloading the benchmark results and running new configurations on the benchmark

## Reproducing or downloading our results

In this notebook, we discuss how to reproduce the results from our paper and how to benchmark your own methods. Before running this notebook, please follow the installation and data download instructions from the README.md file of the repository.

We will now change the working directory from the examples subfolder to the main folder, which is required for the imports to work correctly.

In [1]:
import os
os.chdir('..')   # change directory inside the notebook to the main directory

Our benchmark results can be generated using the file `run_experiments.py`. This takes around ten days on a workstation with four NVIDIA GeForce RTX 3090 GPUs! In order to use our pre-computed results, download the file `results.tar.gz` which are published [here](https://doi.org/10.18419/darus-3394) and unpack its contents into the results folder you specified in `custom_paths.py`.

If you want to run `run_experiments.py` yourself, for example because you modified it to run your own benchmark configurations, you can for example execute the following command in the main folder:
```
systemd-run --scope --user python3 -u run_experiments.py > ./out.log 2> ./err.log &
```
This command writes the output and error messages to the files `out.log` and `err.log`, respectively. A lot of output will be generated, so this is definitely a good idea. You can monitor the progress of the experiments for example by using the following commands:
```
watch 'cat out.log | grep Running | tail -n 30'
watch 'cat out.log | grep Finished | tail -n 30'
watch 'cat out.log | grep time | tail -n 30'
```

Note that `run_experiments.py` will use all visible GPUs as devices or, if no GPU is visible, it will use the CPU as the only device. It will try to run multiple configurations in parallel, but it will try to not exceed the available RAM on the devices, and it will run at most 20 jobs per device at the same time. Note that the available RAM is measured only at the start of each new data split, so no other jobs with strongly varying RAM demand should run in parallel. Once a configuration has finished running, its results are saved. On a GPU, a configuration takes only a few minutes to run. If you kill `run_experiments.py` and restart it, finished jobs will not be run again, hence it is not very bad to kill `run_experiments.py`. Note that split 9 takes longer to run since it runs only one process per GPU. If out-of-memory errors occured while running an experiment, the code will try again when running the next split. If some results are still missing in the end, try to execute `run_experiments.py` again.

After generating or downloading the result files, we can print and save tables summarizing the results and generate corresponding plots using `run_evaluation.py`. This might take a few minutes for the results from the paper, as these contain data for over a million trained NNs! Note that some of the plot configurations are tailored towards our specific results and might need to be changed if you want to plot your own results. The generated files will be saved in the plots folder that has been configured in `custom_paths.py`. If you are only interested in the generated files, these are also published [here](https://doi.org/10.18419/darus-3394) in the file `plots.tar.gz`.

In [2]:
!python3 run_evaluation.py

----- Running evaluation for relu experiments -----
Loading experiment results...
Loaded experiment results
Averaged results across tasks:
Results for metric mae:
NN_bait-f-p_ll_ens-3_rp-512_train:          -2.041 +- 0.001
NN_lcmd-tp_grad_ens-3_rp-512:               -2.040 +- 0.001
NN_lcmd-tp_grad:                            -2.036 +- 0.001
NN_lcmd-tp_grad_rp-512:                     -2.033 +- 0.001
NN_lcmd-tp_ll_ens-3_rp-512:                 -2.026 +- 0.001
NN_kmeanspp-tp_grad:                        -2.026 +- 0.001
NN_kmeanspp-tp_grad_ens-3_rp-512:           -2.025 +- 0.001
NN_kmeanspp-tp_grad_rp-512:                 -2.025 +- 0.001
NN_kmeanspp-p_grad_rp-512_train:            -2.020 +- 0.001
NN_kmeanspp-tp_ll_ens-3_rp-512:             -2.020 +- 0.001
NN_lcmd-tp_ll:                              -2.017 +- 0.001
NN_kmeanspp-tp_ll:                          -2.016 +- 0.001
NN_bait-f-p_grad_rp-512_train:              -2.013 +- 0.002
NN_bait-f-p_grad_ens-3_rp-512_train:        -2.011 +- 0.0

NN_lcmd-tp_grad_ens-3_rp-512:               -0.948 +- 0.001
NN_lcmd-tp_grad:                            -0.944 +- 0.001
NN_lcmd-tp_grad_rp-512:                     -0.940 +- 0.001
NN_bait-f-p_ll_ens-3_rp-512_train:          -0.940 +- 0.002
NN_bait-f-p_grad_ens-3_rp-512_train:        -0.927 +- 0.002
NN_kmeanspp-tp_grad:                        -0.927 +- 0.001
NN_kmeanspp-tp_grad_rp-512:                 -0.927 +- 0.001
NN_kmeanspp-tp_grad_ens-3_rp-512:           -0.927 +- 0.001
NN_bait-f-p_grad_rp-512_train:              -0.926 +- 0.002
NN_bait-fb-p_grad_rp-512_train:             -0.921 +- 0.001
NN_lcmd-tp_ll_ens-3_rp-512:                 -0.913 +- 0.001
NN_kmeanspp-p_grad_rp-512_acs-rf-512:       -0.912 +- 0.001
NN_lcmd-p_grad_rp-512_acs-rf-512:           -0.905 +- 0.001
NN_kmeanspp-p_grad_rp-512_acs-grad:         -0.905 +- 0.001
NN_kmeanspp-p_grad_rp-512_train:            -0.905 +- 0.001
NN_lcmd-tp_ll:                              -0.901 +- 0.001
NN_bait-f-p_ll_train:                   

NN_bait-fb-p_grad_rp-512_train:             0.852 +- 0.009
NN_maxdet-tp_grad_scale:                    0.856 +- 0.009
NN_bait-f-p_grad_ens-3_rp-512_train:        0.859 +- 0.009
NN_bait-f-p_grad_rp-512_train:              0.862 +- 0.009
NN_lcmd-p_grad_rp-512_train:                0.864 +- 0.009
NN_maxdist-p_grad_rp-512_train:             0.866 +- 0.009
NN_maxdet-p_grad_rp-512_train:              0.867 +- 0.009
NN_maxdist-tp_grad:                         0.867 +- 0.010
NN_maxdet-p_grad_ens-3_rp-512_train:        0.868 +- 0.009
NN_maxdet-p_ll_ens-3_rp-512_train:          0.873 +- 0.008
NN_maxdist-tp_grad_ens-3_rp-512:            0.874 +- 0.009
NN_maxdist-tp_grad_rp-512:                  0.874 +- 0.010
NN_maxdet-p_grad_rp-512_acs-grad:           0.878 +- 0.009
NN_lcmd-p_grad_rp-512_acs-rf-hyper-512:     0.878 +- 0.009
NN_bait-f-p_ll_ens-3_rp-512_train:          0.880 +- 0.008
NN_lcmd-p_grad_rp-512_acs-grad:             0.880 +- 0.010
NN_maxdet-p_grad_rp-512_acs-rf-512:         0.886 +- 0.0

NN_lcmd-tp_grad_ens-3_rp-512:               -1.599 +- 0.002
NN_lcmd-tp_grad:                            -1.597 +- 0.002
NN_lcmd-tp_grad_rp-512:                     -1.597 +- 0.002
NN_bait-f-p_grad_ens-3_rp-512_train:        -1.594 +- 0.001
NN_bait-f-p_grad_rp-512_train:              -1.588 +- 0.001
NN_bait-fb-p_grad_rp-512_train:             -1.585 +- 0.001
NN_kmeanspp-tp_grad_ens-3_rp-512:           -1.569 +- 0.002
NN_kmeanspp-tp_grad_rp-512:                 -1.568 +- 0.002
NN_kmeanspp-tp_grad:                        -1.567 +- 0.002
NN_kmeanspp-p_grad_rp-512_acs-rf-512:       -1.558 +- 0.002
NN_lcmd-tp_ll_ens-3_rp-512:                 -1.554 +- 0.002
NN_lcmd-tp_ll:                              -1.550 +- 0.002
NN_maxdet-tp_grad_scale:                    -1.546 +- 0.002
NN_kmeanspp-p_grad_rp-512_acs-grad:         -1.542 +- 0.002
NN_kmeanspp-p_grad_rp-512_acs-rf-hyper-512: -1.541 +- 0.002
NN_bait-f-p_ll_ens-3_rp-512_train:          -1.541 +- 0.001
NN_maxdist-tp_grad_ens-3_rp-512:        

NN_bait-f-p_grad_ens-3_rp-512_train:        -0.436 +- 0.002
NN_bait-f-p_grad_rp-512_train:              -0.429 +- 0.002
NN_bait-fb-p_grad_rp-512_train:             -0.426 +- 0.001
NN_lcmd-tp_grad_ens-3_rp-512:               -0.423 +- 0.002
NN_lcmd-tp_grad:                            -0.422 +- 0.002
NN_lcmd-tp_grad_rp-512:                     -0.421 +- 0.002
NN_maxdist-tp_grad:                         -0.415 +- 0.002
NN_maxdist-tp_grad_ens-3_rp-512:            -0.414 +- 0.003
NN_maxdist-tp_grad_rp-512:                  -0.408 +- 0.003
NN_maxdet-tp_grad_scale:                    -0.400 +- 0.002
NN_lcmd-p_grad_rp-512_train:                -0.392 +- 0.002
NN_maxdet-p_grad_ens-3_rp-512_train:        -0.390 +- 0.002
NN_lcmd-p_grad_rp-512_acs-grad:             -0.389 +- 0.002
NN_kmeanspp-tp_grad_ens-3_rp-512:           -0.384 +- 0.001
NN_kmeanspp-tp_grad_rp-512:                 -0.383 +- 0.001
NN_kmeanspp-tp_grad:                        -0.382 +- 0.002
NN_maxdet-p_grad_rp-512_train:          

Alg NN_kmeanspp-tp_ll failed on step 7 of split 0 of task diamonds_256x16: filling up with random samples because selection failed after n_selected = 202
Alg NN_kmeanspp-tp_ll_ens-3_rp-512 failed on step 15 of split 1 of task diamonds_256x16: filling up with random samples because selection failed after n_selected = 131
Total number of DBAL steps across all experiments: 364800
eff dim was larger for grad_rp-512 than for ll in 89.7083% of cases
avg eff dim for grad_rp-512: 3.95829
avg eff dim for ll: 2.33616
Generating tables...
Creating learning curve plots...
Creating individual learning curve plots with subplots...
Creating error variation plots...
plot_skewness_ax: R^2 = 0.788092
Creating correlation plots...
Creating individual learning curve plots...
Finished plotting

Creating lcmd visualization...


Note that running `run_evaluation.py` for a second time will load the data faster unless the results folder has been modified. This is because it caches the results in a more efficient format in the cache folder configured in `custom_paths.py`. We can see that the results are grouped into `relu` and `silu` experiments, which matches the two subfolders in the results folder. We can choose to run the evaluation only on the `silu` subfolder:

In [3]:
!python3 run_evaluation.py silu

----- Running evaluation for silu experiments -----
Loading experiment results...
Loaded experiment results
Averaged results across tasks:
Results for metric mae:
NN_lcmd-tp_grad_ens-3_rp-512:               -2.039 +- 0.001
NN_lcmd-tp_grad:                            -2.037 +- 0.002
NN_lcmd-tp_grad_rp-512:                     -2.036 +- 0.001
NN_lcmd-tp_ll_ens-3_rp-512:                 -2.027 +- 0.001
NN_bait-f-p_grad_ens-3_rp-512_train:        -2.025 +- 0.001
NN_kmeanspp-tp_grad_ens-3_rp-512:           -2.025 +- 0.001
NN_kmeanspp-tp_grad_rp-512:                 -2.024 +- 0.001
NN_bait-f-p_grad_rp-512_train:              -2.023 +- 0.001
NN_kmeanspp-tp_grad:                        -2.023 +- 0.001
NN_lcmd-tp_ll:                              -2.021 +- 0.001
NN_bait-fb-p_grad_rp-512_train:             -2.019 +- 0.001
NN_kmeanspp-tp_ll:                          -2.011 +- 0.002
NN_kmeanspp-tp_ll_ens-3_rp-512:             -2.008 +- 0.002
NN_kmeanspp-p_grad_rp-512_acs-grad:         -2.006 +- 0.0

NN_lcmd-tp_grad_ens-3_rp-512:               -0.959 +- 0.002
NN_lcmd-tp_grad:                            -0.956 +- 0.002
NN_lcmd-tp_grad_rp-512:                     -0.955 +- 0.001
NN_bait-f-p_grad_ens-3_rp-512_train:        -0.948 +- 0.001
NN_bait-f-p_grad_rp-512_train:              -0.943 +- 0.001
NN_bait-fb-p_grad_rp-512_train:             -0.938 +- 0.001
NN_kmeanspp-tp_grad_ens-3_rp-512:           -0.934 +- 0.001
NN_kmeanspp-tp_grad_rp-512:                 -0.933 +- 0.001
NN_kmeanspp-tp_grad:                        -0.933 +- 0.002
NN_lcmd-tp_ll_ens-3_rp-512:                 -0.916 +- 0.002
NN_kmeanspp-p_grad_rp-512_acs-rf-512:       -0.914 +- 0.002
NN_lcmd-tp_ll:                              -0.910 +- 0.002
NN_kmeanspp-p_grad_rp-512_acs-grad:         -0.902 +- 0.001
NN_kmeanspp-p_grad_rp-512_acs-rf-hyper-512: -0.898 +- 0.001
NN_kmeanspp-p_grad_rp-512_train:            -0.895 +- 0.002
NN_maxdet-tp_grad_scale:                    -0.895 +- 0.002
NN_kmeanspp-tp_ll:                      

NN_maxdist-tp_grad_ens-3_rp-512:            0.818 +- 0.008
NN_maxdist-tp_grad_rp-512:                  0.820 +- 0.008
NN_maxdist-tp_grad:                         0.823 +- 0.008
NN_maxdet-tp_grad_scale:                    0.834 +- 0.008
NN_maxdet-p_grad_ens-3_rp-512_train:        0.843 +- 0.008
NN_maxdist-p_grad_rp-512_acs-grad:          0.843 +- 0.008
NN_maxdet-p_grad_rp-512_acs-grad:           0.846 +- 0.008
NN_maxdet-p_grad_rp-512_train:              0.846 +- 0.008
NN_lcmd-p_grad_rp-512_train:                0.847 +- 0.008
NN_maxdist-p_grad_rp-512_train:             0.849 +- 0.008
NN_lcmd-p_grad_rp-512_acs-grad:             0.849 +- 0.009
NN_bait-fb-p_grad_rp-512_train:             0.853 +- 0.008
NN_maxdet-p_grad_rp-512_acs-rf-512:         0.855 +- 0.008
NN_bait-f-p_grad_ens-3_rp-512_train:        0.855 +- 0.008
NN_bait-f-p_grad_rp-512_train:              0.859 +- 0.008
NN_maxdet-p_grad_rp-512_acs-rf-hyper-512:   0.863 +- 0.008
NN_maxdist-p_grad_rp-512_acs-rf-512:        0.865 +- 0.0

## Running custom configurations on the benchmark

If you want to run your own configurations on the benchmark, you may want to take a look at the code in `run_experiments.py`. Here, we will show a minimalistic example of how to run two custom benchmark configurations, which can run on a CPU in a few minutes. A few other files you may find helpful are:
- `test_single_task.py` allows you to run a single BMDAL configuration on a single split of a single data set, for fast exploration.
- `rename_algs.py` contains a few helper functions to modify/rename/remove saved results. It can be used for example if the names of some experiment results should be changed.

First, we need to create a list of configurations that will be executed:

In [4]:
from bmdal_reg.run_experiments import RunConfigList
from bmdal_reg.train import ModelTrainer

# some general configuration for the NN and active learning
kwargs = dict(post_sigma=1e-3, maxdet_sigma=1e-3, weight_gain=0.2, bias_gain=0.2, lr=0.375, act='relu')
run_configs = RunConfigList()
run_configs.append(1e-6, ModelTrainer(f'NN_random', selection_method='random',
                                   base_kernel='linear', kernel_transforms=[], **kwargs))
run_configs.append(4e-6, ModelTrainer(f'NN_lcmd-tp_grad_rp-512', selection_method='lcmd', sel_with_train=True,
                                             base_kernel='grad', kernel_transforms=[('rp', [512])], **kwargs))

Let us take this apart: We first define a dictionary with common arguments for convenience. These arguments are as follows:
- `post_sigma` is the $\sigma$ value for the posterior transformation, and is not needed for these two configurations.
- `maxdet_sigma` is the $\sigma$ value for the MaxDet selection method, and is not needed for these two configurations.
- `weight_gain` corresponds to $\sigma_w$ from the paper
- `bias_gain` corresponds to $\sigma_b$ from the paper
- `lr` is the initial learning rate (decayed linearly to zero)
- `act` is the name of the activation function
There are many more possible arguments which can be found by inspecting the code at `train.py` and `bmdal/algorithms.py`.

Next, we create a `RunConfigList` object, which is a wrapper around an ordinary list with a slightly more convenient `append()` function. For the configurations, we use the following arguments:
- The values 1e-6 and 4e-6 are upper bound estimates for how much RAM (in GB) per train+pool sample will be used. These are relevant if you want to run as many experiments in parallel as possible while avoiding out-of-memory errors. Otherwise you can just use small values and limit the number of jobs per device (see below).
- The first argument to ModelTrainer is the name of the configuration, in our case we start with NN followed by selection method (-mode), base kernel and kernel transformations. This particular naming scheme is sometimes exploited in our evaluation code (mainly for plotting), but you may use a different naming scheme.
- The arguments `selection_method`, `base_kernel` and `kernel_transforms` correspond to the choices in our framework, and valid options are described in detail in `bmdal/algorithms.py`. For `NN_random`, the kernel has no influence on the results, hence we use the linear kernel since it triggers the least amount of computations.
- The argument `sel_with_train` chooses between P-mode and TP-mode. Here we use TP-mode for LCMD, which is also the default value for LCMD, hence it would not have been necessary to specify `sel_with_train=True` here.

Note that the configuration `NN_random` is simply the random selection baseline, and the configuration `NN_lcmd-tp_grad_rp-512` is the method suggested in our paper.

Next, we run these configurations, which can take a few minutes. The meaning of the parameters will be explained below.

In [5]:
from bmdal_reg.run_experiments import run_experiments
from bmdal_reg.train import ModelTrainer
run_experiments(exp_name='relu_small', n_splits=2, run_config_list=run_configs, 
                batch_sizes_configs=[[64, 128]], task_descs=['64-128'], use_pool_for_normalization=True,
                max_jobs_per_device=4, n_train_initial=64, ds_names=['ct', 'kegg_undir_uci'], sequential_split=9)

Task ct has n_pool=41712, n_test=10700, n_features=379
Task kegg_undir_uci has n_pool=50599, n_test=12921, n_features=27
Running all configurations on split 0
Start time: 2023-01-25 11:18:24
Starting job 1/4 after 2s
Starting job 2/4 after 2s
Starting job 3/4 after 2s
Starting job 4/4 after 2s
Running NN_lcmd-tp_grad_rp-512 on split 0 of task kegg_undir_uci_64-128 on device cpu
Running NN_random on split 0 of task ct_64-128 on device cpu
Running NN_random on split 0 of task kegg_undir_uci_64-128 on device cpu
Running NN_lcmd-tp_grad_rp-512 on split 0 of task ct_64-128 on device cpu
................................................................................................................................................................................................................................................................
Test results: MAE=0.34011, RMSE=0.507062, MAXE=2.53953, q95=1.17227, q99=1.83336


Performing AL step 1/2 with n_train=64, n_pool=41712, al_batch_size=64


Test results: MAE=0.249944, RMSE=0.376696, MAXE=2.27435, q95=0.796896, q99=1.44544


Performing AL step 2/2 with n_train=128, n_pool=41648, al_batch_size=128
................................................................................................................................................................................................................................................................
Test results: MAE=0.416239, RMSE=0.839556, MAXE=9.89747, q95=1.58267, q99=3.21139


Performing AL step 2/2 with n_train=128, n_pool=50535, al_batch_size=128
................................................................................................................................................................................................................................................................
Test results: MAE=0.278637, RMSE=0.63115, MAXE=5.83391, q95=1.36879, q99=2.71179


Performing AL step 2/2 with n_train=128, n_pool=50535, al_batch_size=128
...............

Here, the meaning of the parameters is as follows:
- `exp_name` is the name of the subfolder that the results will be saved at. This can be used to group experiments together, for example we used separate groups for relu and silu experiments in our paper.
- `n_splits` is the number of random splits that the configurations should be run on. The random splits will be run in order. 
- `run_config_list` is the list of run configurations created previously.
- `batch_sizes_configs` is a list of lists of batch sizes. In our case, we only have one batch size configuration, which is to acquire 64 samples in the first BMAL step and 128 samples in the second BMAL step. For experiments in our paper, we mostly used `batch_sizes_configs=[[256]*16]`.
- `task_descs` is a corresponding list of suffixes for the task names. For example, here the data set `ct` combined with the batch size configuration `[64, 128]` will get the name `ct_64-128`. 
- `use_pool_for_normalization` specifies whether the dataset standardization should use statistics from the training and pool set or only from the training set. We used standardization only from the training set in our experiments, but especially for smaller initial training set sizes, it may be important to standardize also with the pool set.
- `max_jobs_per_device` allows to specify a maximum number of jobs that are run in parallel on a single device (CPU or GPU). Fewer jobs may be executed in parallel if their estimated RAM usage (see above) would otherwise exceed the remaining RAM capacity (measured at the start of `run_experiments`).
- `n_train_initial` specifies the initial training set size, which was 256 in our experiments.
- `ds_names` specifies the names of the data sets that experiments should be run on. Possible names can be found in the data folder specified in `custom_paths.py`. By default, all 15 data sets from the benchmark are used.
- `sequential_split` specifies the index of the random split for which `max_jobs_per_device=1` is used; the results from this split can then be used for runtime evaluation. By default, this is set to 9. Since we only use `n_splits=2` here, this case is not reached.

Since the experiments above were run on a CPU, they took 5 minutes to complete, but this would be much faster on a GPU, especially with even higher `max_jobs_per_device`. If we ran this code again, it would notice that the results are already computed and would not recompute them.

Next, we want to evaluate the results. Unfortunately, we cannot directly use `run_evaluation.py` since its current implementation filters results by the suffix `256x16`, while our results use the suffix `64-128`. Therefore, we give a small example showing how to print a table for the results:

In [6]:
from bmdal_reg.evaluation.analysis import ExperimentResults, print_avg_results
results = ExperimentResults.load('relu_small')
print_avg_results(results, relative_to=None, filter_suffix='')

Averaged results across tasks:
Results for metric mae:
NN_lcmd-tp_grad_rp-512: -1.363 +- 0.101
NN_random:              -1.256 +- 0.029

Results for metric rmse:
NN_lcmd-tp_grad_rp-512: -0.800 +- 0.068
NN_random:              -0.598 +- 0.018

Results for metric q95:
NN_lcmd-tp_grad_rp-512: -0.075 +- 0.055
NN_random:              0.089  +- 0.016

Results for metric q99:
NN_lcmd-tp_grad_rp-512: 0.577 +- 0.058
NN_random:              0.838 +- 0.047

Results for metric maxe:
NN_lcmd-tp_grad_rp-512: 1.408 +- 0.095
NN_random:              1.667 +- 0.007






We can also print results on individual data sets:

In [7]:
from bmdal_reg.evaluation.analysis import print_all_task_results
print_all_task_results(results)

Results for task ct_64-128:
Results for metric mae:
NN_lcmd-tp_grad_rp-512: -1.573 +- 0.016
NN_random:              -1.551 +- 0.010

Results for metric rmse:
NN_lcmd-tp_grad_rp-512: -1.176 +- 0.010
NN_random:              -1.035 +- 0.011

Results for metric q95:
NN_lcmd-tp_grad_rp-512: -0.461 +- 0.012
NN_random:              -0.257 +- 0.011

Results for metric q99:
NN_lcmd-tp_grad_rp-512: 0.139 +- 0.003
NN_random:              0.396 +- 0.015

Results for metric maxe:
NN_lcmd-tp_grad_rp-512: 0.821 +- 0.039
NN_random:              0.960 +- 0.009




Results for task kegg_undir_uci_64-128:
Results for metric mae:
NN_lcmd-tp_grad_rp-512: -1.154 +- 0.201
NN_random:              -0.961 +- 0.057

Results for metric rmse:
NN_lcmd-tp_grad_rp-512: -0.424 +- 0.136
NN_random:              -0.161 +- 0.035

Results for metric q95:
NN_lcmd-tp_grad_rp-512: 0.310 +- 0.109
NN_random:              0.435 +- 0.030

Results for metric q99:
NN_lcmd-tp_grad_rp-512: 1.016 +- 0.117
NN_random:              1.280

The results are saved using the folder structure `results_folder/exp_name/task_name/alg_name/split_idx/results.json`. For example, you can view the tasks we ran experiments on as follows:

In [8]:
from pathlib import Path
from bmdal_reg import custom_paths
os.listdir(Path(custom_paths.get_results_path()) / 'relu_small')

['ct_64-128', 'kegg_undir_uci_64-128']

## Implementing your own methods

If you want to go beyond using our already implemented combinations of selection methods, kernels and kernel transformations, you have to make some modifications to the code such that your method can be used as above. Depending on how different your method is, we suggest three ways of including it:
- If your method fits to our framework and simply provides new selection methods, kernels and/or kernel transformations, we suggest to extend `BatchSelector.select()` in `bmdal/algorithms.py` such that it can use your new component(s) given the corresponding configuration string(s).
- If your method is a different BMDAL method that does not fit into our framework but does not require to modify the NN training process, you can modify the BMDAL part in `ModelTrainer.__call__()` in `train.py` such that it can call your method. While you could realize this by passing a custom BMDAL class or factory method directly to ModelTrainer, you should note that arguments to ModelTrainer are currently serialized to a JSON file, which is why we prefer using native data types like strings as arguments to ModelTrainer. This serialization of arguments to a JSON file can be helpful for example for automatically generating figure captions later on.
- If you also want to modify the NN training process, you may want to change other code in ModelTrainer or replace it by your own custom class. Note that the NN model creation itself can be customized in ModelTrainer through the `create_model` argument.