When using SimpleWORC
, or WORC with similar simple configuration settings, you can already benefit from the main functionality of WORC, i.e. the automatic algorithm optimization. However, several additional functionalities are provided, which are discussed in this chapter.
For a description of the radiomics features, please see the radiomics features chapter <features-chapter>
. For a description of the data mining components, see the data mining chapter <datamining-chapter>
. All other components are discussed here.
For a comprehensive overview of all functions and parameters, please look at the config chapter <config-chapter>
.
Preprocessing of the image, and accordingly the mask, is done in respectively the :pyWORC.processing.preprocessing
and the :pyWORC.processing.segmentix
scripts. Options for preprocessing the image include, in the following order:
- N4 Bias field correction, see also https://simpleitk.readthedocs.io/en/master/link_N4BiasFieldCorrection_docs.html.
- Checking and optionally correcting the spacing if it's 1x1x1 and the DICOM metadata says otherwise.
- Clipping of the image intensities above and below a certain value.
- Normalization, see :py
WORC.processing.preprocessing.normalize_image
for all options. - Transposing the image to another ''main'' orientation, e.g. axial.
- Resampling the image to a different spacing.
Options for preprocessing the segmentation include:
- Hole filling. Many feature computations cannot deal with holes.
- 2. Removing small objects. Many feature computations cannot deal with multiple
objects in a single segmentation.
- 3. Extracing the largest blob. Many feature computations cannot deal with
multiple objects in a single segmentation.
- 4. Instead of using the full segmentation, extracting a ring around the border
of the image to compute the features on. Ring captures both the inner and outer border.
- Dilating the contour.
- Masking the contour with another contour.
- 7. When assuming the same image and metadata, copy the metadata of the image
to the segmentation.
- 8. Checking and optionally correcting the spacing if it's 1x1x1 and the
DICOM metadata says otherwise. Same as image preprocessing step 2.
- 9. Transposing the segmentation to another ''main'' orientation, e.g. axial.
Same as image preprocessing step 5.
- 10. Resampling the segmentation and the segmentation to a different spacing.
Same as image preprocessing step 10.
The default method for feature scaling in WORC
is a robust version of z-scoring. Additional options include:
- regular z-scoring
- MinMax scaling, i.e., scaling to a range between 0 and 1
- Scaling by centering using the median and IQR
- A combination of z-scoring with a logarithmic transform and a correction term to better cope with outliers and non-normally distributed features [CIT1].
When using multiple modalities or sequences, and there is only a segmentation on a single image, image registration is applied to spatially align all sequences and warp the segmentation to the other images through elastix
[CIT2]. Usage of elastix
is automatically included in WORC
when only a single segmentation and multiple modalities are supplied. The image on which the segmentation is provided is used as the moving image, the others as the fixed image, as the segmentations will be moved from the segmented image to the others.
Registration is by default performed using a rigid transformation model, based on a mutual information using the adaptive stochastic gradient descent optimizer. Manual overrides of these defaults are included in the WORC
configuration.
When using Elastix, parameter files have to be provided in the network.Elastix_Para
object, e.g.
network.Elastix_Para = [['Parameters_Rigid.txt', 'Parameters_BSpline.txt']]
The outer list defines the parameter files used per modality. If only one element is provided, the same will be applied for all modalities. Each element of the list should be a list of its own, including the filenames of elastix
. In the example, we provided two files, resulting in first a rigid registration being performed, followed by a bspline registration. Examples of elastix
parameter files can be found at https://github.com/SuperElastix/ElastixModelZoo/tree/master/models/default
Commonly, radiomics studies include multicenter data, resulting in heterogeneity in the acquisition protocols. As radiomics features are generally sensitive to these variations, this limits the repeatability and reproducibility. To compensate for the differences in acquisition, feature harmonization techniques may be used, one of the most frequently used is ComBat. In ComBat, feature distributions are harmonized for variations in the imaging acquisition, e.g. due to differences in hospitals, manufacturers, or acquisition parameters. The dataset is divided in groups based on these differences, and a correction of the error caused by these differences is estimated using empirical Bayes.
ComBat is included in WORC
and can be turned on in the configuration, including options to use empirical Bayes or not, a parametric or non-parametric approach, and a moderation variable.
ComBat feature harmonization is embedded in WORC. A wrapper around the original ComBat code, compatible with the other tools provided by WORC
, is included in the WORC
installation.
When using ComBat, the following configurations should be done:
- Set
config['General']['ComBat']
to'True'
. - To change the ComBat parameters (i.e. which batch and moderation variable to use), change the relevant config fields, see the
Config chapter <config-chapter>
. - WORC extracts the batch and moderation variables from the label file which you also use to give WORC the actual label you want to predict. The same format therefore applies, see the
User manual <usermanual-chapter>
for more details..
Note
In line with current literature, ComBat is applied once on the full dataset straight after the feature extraction, thus before the actual hyperoptimization. Hence, to avoid serious overfitting, we advice to NEVER use the variable you are trying to predict as the moderation variable.
While WORC
was primarily designed for binary classification, as also demonstrated in the main manuscript, various other types of machine learning workflows have been included as well.
In multilabel classification, several mutually exclusive classes are predicted at the same time. This is a special form of multiclass classification, in which the classes do not have to be mutually exclusive. When using multilabel classification in WORC
, the only differences with binary classification in the workflows is in the machine learning component. For the other components, e.g. feature selection and resampling, when not supporting multiclass classification, the methods are performed per class in a one-vs-rest approach. Some of the binary classifiers naturally support multilabel classification (i.e., random forest, AdaBoost, and extreme gradient boosting) and are thus normally used. Others only support binary classification (i.e., LDA, QDA, Naive Bayes, SVM, logistic regression), and are therefore also performed per class in a one-vs-rest approach and combined in a single multilabel model. In the evaluation, the same metrics as in the binary classification are evaluated per class. Additionally, the multiclass AUC [CIT3]. and multiclass BCR are computed.
In regression, a continuous label is predicted. As there are no classes, all class-based feature and sample preprocessing methods (RELIEF, univariate testing, and all resampling methods) cannot be used. In the machine learning component, WORC
includes the following regressors:
- linear regression;
- support vector machines;
- random forest;
- elastic net;
- LASSO;
- ridge regression;
- AdaBoost;
- extreme gradient boosting (XGBoost).
The optimization is by default based on the R2-score. Performance metrics computed are the rw-score, mean squared error, inter-class correlation coefficient, Pearson coefficient and p-value, and Spearman coefficient and p-value.
- CIT1
Chen, Jianan, et al. AMINN: Autoencoder-based Multiple Instance Neural Network for Outcome Prediction of Multifocal Liver Metastases. arXiv preprint arXiv:2012.06875 (2020).
- CIT2
Klein, Stefan, et al. Elastix: a toolbox for intensity-based medical image registration. IEEE transactions on medical imaging 29.1 (2009): 196-205.
- CIT3
Hand, David J., and Robert J. Till. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning 45.2 (2001): 171-186.