Skip to content

Latest commit

 

History

History
447 lines (368 loc) · 18.1 KB

benchmark_tasks.md

File metadata and controls

447 lines (368 loc) · 18.1 KB

ETAB Benchmark Suite and Model Zoo

The ETAB benchmark suite encapsulates a diverse set of tasks that are meant to test the quality of visual representations of echocardiograms with respect to different downstream setups of interest across different datasets. The benchmark tasks fall in four different catgeories: 🔴 cardiac structure identification tasks where the goal is to automatically identify anatomical regions of interest, 🔵 cardiac function estimation tasks where the goal is to evaluate cardiac hemodynamics and left ventricle measurements, 🟢 view recognition tasks where the goal is to automate view annotations for echocardiography clips, and 🟡 clinical prediction tasks where the goal is to predict clinical outcomes or issue diagnoses based on observed echocardiograms. Combinations of these tasks constitute adaptation benchmarks that can be used to evaluate transferrability of features across views, data sets and annotations. In this Section, we provide an overview of the ETAB benchmark suite and the supported built-in vision models, along with code snippets and demo notebooks illustrating how users can run a benchmark experiment out-of-the-box.

Benchmark task categorization and encoding

Each benchmark is encoded with a 5-character code that designates the source dataset, the echocardiography view and the downstream task. The structure of the benchmark code follows the layout below:

Benchmark code (5 characters)
Task code (2 characters)
View code (2 characters)
Dataset code (1 character)

The 1-character dataset code can be interpreted using the following table:

Dataset code
EchoNet
E
CAMUS
C
TMED
T
Unity
U
The echocardiographic views are encoded as follows:
View code
An
Apical n-chamber
PL
Parasternal long axis
PS
Parasternal short axi

Currently, ETAB includes 9 core tasks across the 4 task categories. The list of all tasks and their corresponding 2-character codes are summarized in the table below. Tasks with strikethrough marks are still under implementation and will be included in the next release.

 
Task code
 
 
Description
 
 
Datasets (Views)
 
🔴   Cardiac Structure Identification Tasks (Category: a)
0 Segmenting the left ventricle (LV) EchoNet (AP4CH), CAMUS (AP2CH and AP4CH)
1 Segmenting the left atrium (LA) CAMUS (AP2CH and AP4CH)
2 Segmenting the myocardial wall (MY) CAMUS (AP2CH and AP4CH)
🔵   Cardiac Function Estimation Tasks (Category: b)
0 Estimating LV ejection fraction EchoNet (AP4CH), CAMUS (AP2CH and AP4CH)
1 Classifying end-systole and end-diastole frames EchoNet (AP4CH), CAMUS (AP2CH and AP4CH)
2 Longitudinal strain estimation Unity (AP4CH)
3 Interventricular septum thickness estimation Unity (PLAX)
4 Posterior wall thickness estimation Unity (PLAX)
🟢   View Recognition Tasks (Category: c)
0 Classifying apical 2- and 4-chamber views CAMUS (AP2CH vs. AP4CH)
1 Classifying parasternal short and long axis views TMED (PLAX vs. PSAX)
2 Classifying all apical and parasternal views Unity (AP2CH vs. AP3CH vs. AP4CH vs. AP5CH vs. PLAX vs. PSAX)
🟡   Clinical Prediction Tasks (Category: d)
0 Diagnose cardiomyopathy EchoNet (AP4CH), CAMUS (AP2CH and AP4CH)
1 Diagnose aortic stenosis TMED (PSAX and PLAX)
The benchmark codes are represented as strings with characters encoding the dataset, view and task as described above. To provide an example on how to interpret the benchmark code, consider the following string-valued variable:
benchmark_code = "a0-A4-E"

This code designates the benchmark task of segmenting the LV using apical 4-chamber echoes sampled from the EchoNet dataset.

ETAB model zoo

The ETAB library provides a unified API for training a number of baseline models on all the benchmark tasks listed above. Each baseline model comprises a backbone representation and a task-specific head as illustrated below.

The backbone representation is a general-purpose representation of echocardiographic images (or clips) that is independent of the task, whereas the head changes based on the task. The backbone representations supported in ETAB fall into two categories: convolutional neural networks and vision transformers. The list of all backbone representations in ETAB are listed below.

Available backbones
Reference
Convolutional Neural Networks (CNN) ResNet
[1]
ResNeXt
[2]
DenseNet
[3]
Inception
[4]
MobileNet
[5]
ConvNeXt
[6]
Vision Transformers (ViT) Mix Transformer encoders (MiT)
[7]
Pyramid Vision Transformer (PVT)
[8]
Multi-scale vision Transformer (ResT)
[9]
PoolFormer
UniFormer
Dual Attention Vision Transformers (DaViT)

The set of all available task-specific heads (for classification, regression and segmentation tasks) are listed below.

Available task-specific heads
Reference
Classification and regression heads (still image) Standard linear probe
---
Classification and regression heads (video clips) RNN + Linear output layer
---
LSTM + Linear output layer
---
Segmentation heads U-Net
U-Net++
MAnet
Linknet
PSPNet
DeepLabV3
SegFormer
TopFormer

To display all available baseline models, you can print the output of the available_backbones() and available_heads() functions in the etab.baselines.models modules as follows.

from etab.baselines.models import *

print(available_backbones())
print(available_heads())

Video data vs. Still images

Note that some benchmark tasks (e.g., estimation of LV ejection fraction) are defined with respect to video clips rather than still images, whereas other tasks and datasets are limited to 2D images. In the current release of ETAB, we restrict the backbone representations to frame embeddings and use these representations repeatedly over sequences of images and defer the modeling of the temporal correlations between these embeddings to the head through variants of RNNs. By limiting the backbone representations to frame embeddings, we can evaluate the quality of a backbone representation by tuning the attached task-specific heads across all benchmark tasks above to obtain the ETAB score as we discuss in the next Section.

Running a benchmark experiment out-of-the-box (demo notebook)

In what follows, we describe how users can run a benchmark experiments out-of-the-box using the ETAB API. Next, we will show how an experiment can be ran from the terminal using our built-in scripts.

Composing a model and training on a benchmark task

The first step in running a benchmark experiment is to load the relevant dataset. Consider again the benchmark task "a0-A4-E". This task involves segmenting the LV using apical 4-chamber views from the EchoNet dataset. The dataset can be loaded as follows:

from etab.datasets import ETAB_dataset

echonet = ETAB_dataset(name="echonet",
                       target="LV_seg", 
                       view="A4",
                       video=False,
                       normalize=True,
                       frame_l=224,
                       frame_w=224,
                       clip_l=1)

echonet.load_data(n_clips=7000)

train_loader, valid_loader, test_loader = training_data_split(echonet.data, train_frac=0.6, val_frac=0.1, 
                                                              batch_size=batch_size, return_loaders=True)

We have covered the data loading and processing tools in the previous section. More details can be found in this demo notebook. The next step is to compose a baseline model by creating an instance of the ETABmodel class as follows.

from etab.baselines.models import ETABmodel

model  = ETABmodel(task="segmentation",
                   backbone="ResNet-50",
                   head="U-Net")

The model.backbone and model.head are both torch model classes, the hyper-parameters of which can be altered by modifying the values of the attributes of model.backbone and model.head after instantiating the model. Here, we instantiate a standard segmentation model with a ResNet-50 backbone and a U-Net head, but the user can create alternative models using the options specified in the table above. Now, to start training the instantiated model on task "a0-A4-E", we need to set the optimizer and training parameters as follows:

batch_size    = 32
learning_rate = 0.001
n_epoch       = 100
ckpt_dir      = "/directory for saving the trained model"

We can then train the model by invoking the .fit method in the ETABmodel class after passing the training and validation loaders along with the optimization and training parameters.

model.fit(train_loader, 
          valid_loader, 
          n_epoch=n_epoch,
          task_code="EA40", 
          learning_rate=learning_rate,
          ckpt_dir=ckpt_dir)

After the model is trained, we can inspect its predictions on samples from test data as follows:

inputs, ground_truths = next(iter(test_loader))
preds                 = model.predict(inputs.cuda())

# set index to an integer number to select a test sample

plot_segment(inputs[index, :, :, :], 
             preds[index, :, :], 
             overlay=True, color="r")

Evaluating a model on test data

To evaluate the performance of the model on the testing sample, you can use the evaluate_model function in etab.utils.metrics as follows:

from etab.utils.metrics import *

dice_coeff = evaluate_model(model, test_loader, task_code="a0")

By passing the task code to this general-purpose evaluation function, it automatically selects the corresponding evaluation metric for the task. Because this is a segmentation task, the output is a Jaccard Index/Dice coefficient. The function computes the AUC-ROC for classification tasks and the mean square error for regression tasks.

Freezing the backbone and tuning the head

In the example above, we have trained a model by fully optimizing all its parameters for the task at hand. In many cases, we might be interested in only tuning the task-specific head and keeping the backbone representation frozen. We can do so by calling the freeze_backbone method in ETABmodel after the model instantiation command as follows:

model  = ETABmodel(task="segmentation",
                   backbone="ResNet-50",
                   head="U-Net")
                   
model.freeze_backbone()                   

As we will show in the next Section, when computing the ETAB score we are interested in evaluating a pre-trained representation, hence we freeze the backbone model for all benchmark tasks and only tune the head and evaluate the pefromance of the model on test data.

To run the above experiments your self, please refer to the following demo notebook.

CLI for running a benchmark experiment from terminal

You can also run any benchmark task directly from the terminal using the following command:

$ python run_benchmark.py --task "a0-A4-E" --backbone "ResNet-50" --head "U-Net" --freeze_backbone False \
                          --train_frac 0.6 --val_frac 0.1 --lr 0.001 --epochs 100 --batch 32  

To run a task adaptation benchmark, where the backbone representation is trained on a source task and then tuned on a target task, you can use the following command:

$ python run_benchmark.py --source_task "a0-A4-E" --target_task "a1-A2-C" --backbone "ResNet-50" --head "U-Net" \
                          --freeze_backbone False --train_frac 0.6 --val_frac 0.1 --lr 0.001 --epochs 100 --batch 32  

In the example above, the experiment will proceed by training a model to segment the LV using AP4CH views in EchoNet data, and then tune the resulting model to segment the LA using A2CH views in CAMUS dataset.

References and acknowledgments

Our model API builds on the implementations of the following libraries and resources:

[1] https://poutyne.org/

[2] https://github.com/sithu31296/semantic-segmentation

[3] https://github.com/qubvel/segmentation_models.pytorch

[4] https://github.com/rwightman/pytorch-image-models