## Data generation

This notebook creates bash scripts that contain the commands to start each of the MML experiments to generate the predictions. To run the notebook and the actual experiments one needs to install the MML framework ('mml-core'). The 'mml-data' provides the necessary code to create the data locally. In addition the provided MML plugin for the data splitting tag needs to be installed ('mml-prevalences').

In [1]:
import mml.api.interactive as nb
import os
from pathlib import Path

mml_env_path = Path(os.getcwd()).parent / 'mml.env'
nb.init(mml_env_path)

 _____ ______   _____ ______   ___
|\   _ \  _   \|\   _ \  _   \|\  \
\ \  \\\__\ \  \ \  \\\__\ \  \ \  \
 \ \  \\|__| \  \ \  \\|__| \  \ \  \
  \ \  \    \ \  \ \  \    \ \  \ \  \____
   \ \__\    \ \__\ \__\    \ \__\ \_______\
    \|__|     \|__|\|__|     \|__|\|_______|
         ____  _  _    __  _  _  ____  _  _
        (  _ \( \/ )  (  )( \/ )/ ___)( \/ )
         ) _ ( )  /    )( / \/ \\___ \ )  /
        (____/(__/    (__)\_)(_/(____/(__/
Interactive MML API initialized.


In [2]:
all_tasks = ['lapgyn4_surgical_actions', 'lapgyn4_instrument_count', 'lapgyn4_anatomical_actions', 'nerthus_bowel_cleansing_quality', 'hyperkvasir_therapeutic-interventions', 'cholec80_grasper_presence', 'cholec80_hook_presence', 'idle_action_recognition', 'brain_tumor_classification', 'brain_tumor_type_classification', 'chexpert_enlarged_cardiomediastinum', 'chexpert_cardiomegaly', 'chexpert_edema', 'chexpert_consolidation', 'chexpert_pneumonia', 'chexpert_pneumothorax', 'chexpert_pleural_effusion', 'chexpert_fracture', 'pneumonia_classification', 'covid_xray_classification', 'deep_drid_dr_level', 'deep_drid_quality', 'kvasir_capsule_anatomy', 'mura_xr_wrist', 'mura_xr_shoulder', 'mura_xr_humerus', 'mura_xr_hand', 'mura_xr_forearm', 'mura_xr_finger', 'mura_xr_elbow']

In [3]:
# Configuration of the experiment surrounding
# Option 1: Run on the LSF cluster
cluster_reqs = nb.LSFSubmissionRequirements(
    # the following is recommended
    special_requirements=['tensorcore'],
    num_gpus=1,
    vram_per_gpu=11.0,
    queue='gpu')
# Option 2: run everything locally
local_reqs = nb.DefaultRequirements()
# Choose the option here
reqs = local_reqs if True else cluster_reqs
# recommended project folder and number of reruns
project = 'mic23_predictions_reproduce'
reruns = 3

In [4]:
# prepare steps
create_cmds = list()
tag_cmds = list()
# step one: task creation
create_cmds.append(nb.MMLJobDescription(prefix_req=reqs,
                                      config_options={'mode': 'create', 'task_list': all_tasks, 'proj': project}))
# step two: redistribute the splits
for t in all_tasks:
    tag_cmds.append(nb.MMLJobDescription(prefix_req=reqs,
                                          config_options={'mode': 'info', 'task_list': [t], 'tagging.all': '+miccai?1337',
                                                          'preprocessing': 'default',
                                                          'proj': project}))
nb.write_out_commands(cmd_list=create_cmds, suffix='create')
nb.write_out_commands(cmd_list=tag_cmds, suffix='tag')

Stored 1 commands at output_create.txt.
Stored 30 commands at output_tag.txt.


In [5]:
base_cmds = list()
for ix in range(reruns):
    for t in all_tasks:
        opts = {'mode': 'opt', 'mode.store_parameters': False, 'sampling.balanced': True,
                       'sampling.batch_size': 300, 'callbacks': 'early', 'lr_scheduler': 'step',
                       'trainer.max_epochs': 40, 'augmentations': 'baseline256', 'mode.val_is_test': False,
                       'preprocessing': 'default_copy', 'trainer.min_epochs': 5}
        opts.update(
            {'proj': f'{project}_{ix}', 'seed': ix, 'mode.subroutines': '[train_fold,predict_val,predict_test]',
             'task_list': [t],  'mode.store_parameters': True, 'tagging.all': '+miccai?1337', 'reuse.clean_up.parameters': True,
             'lr_scheduler': 'plateau', 'lr_scheduler.patience': 5, '+callbacks.early.patience': 7})
        base_cmds.append(nb.MMLJobDescription(prefix_req=reqs, config_options=opts))
nb.write_out_commands(cmd_list=base_cmds, suffix='predict')

Stored 90 commands at output_predict.txt.
