Skip to content

Commit

Permalink
ENH: Add support for using a fixed bin count
Browse files Browse the repository at this point in the history
In most literature on Radiomics to date, gray value discretization is done using either a fixed bin count or a fixed bin width. Although there is consensus on the need for discretization prior to texture-feature calculation, there is no definitive evidence favouring either the fixed bin count or fixed bin width method in a general application.
There is some literature available showing a better reproducibility using a fixed bin width for features extracted from PET imaging. On the other hand, in many studies a fixed bin count is used.
At PyRadiomics, we recommend using a fixed bin width, but implement a fixed bin count for completeness.

See [this FAQ topic](http://pyradiomics.readthedocs.io/en/latest/faq.html#radiomics_fixed_bin_width) for a more detailed explanation of our rationale for recommending usage of a fixed bin width.

Additionally, update the documentation to reflect the new voxel-based extraction functionality.
  • Loading branch information
JoostJM committed Jun 27, 2018
1 parent e6435e7 commit 885f981
Show file tree
Hide file tree
Showing 8 changed files with 115 additions and 29 deletions.
3 changes: 3 additions & 0 deletions docs/customization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,9 @@ Feature Class Level
*Image discretization*

- ``binWidth`` [25]: Float, > 0, size of the bins when making a histogram and for discretization of the image gray level.
- ``binCount`` [None]: integer, > 0, specifies the number of bins to create. The width of the bin is
then determined by the range in the ROI. No definitive evidence is available on which method of discretization is
superior, we advise a fixed bin width. See more :ref:`here <radiomics_fixed_bin_width>`.

*Forced 2D extraction*

Expand Down
61 changes: 42 additions & 19 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,20 +43,39 @@ do this by providing a :ref:`parameter file <radiomics-parameter-file-label>`. I
`yaml structured <http://yaml.org/>`_ text file you can define your custom settings and which features and input image
types to enable.

Why is there no parameter to specify a fixed bin count?
#######################################################

PyRadiomics does not have the option for setting a fixed bin count, as a fixed bin count makes the values less
comparable, instead of more. This is because a fixed bin count means that the “meaning” of difference between gray
values is dependent on the range of gray values in the ROI. Take for example 2 images with 2 ROIs, with the range of
gray values in the first being {0-100} and in the second {0-10}. If you use a fixed bin count, the “meaning” of 1 gray
value difference is different (in the first it means 10 gray values different, in the second just 1). This means you are
looking at texture based on very different contrasts.
Therefore, PyRadiomics uses a fixed bin width (parameter ``binWidth``), which ensures texture feature values are
calculated using the same “contrast” between gray values [1]_. There are currently no specific guidelines as to what
constitutes an optimal bin width. We try to choose a bin width in such a way, that the resulting amount of bins is
somewhere between 30 and 130 bins. This allows for differing ranges of intensity in ROIs, while still keeping the
texture features informative (and comparable inter lesion!).
.. _radiomics_fixed_bin_width:

What about gray value discretization? Fixed bin width? Fixed bin count?
#######################################################################

Currently, although many studies favour a fixed bin count over a fixed bin width, there is no hard evidence favouring
either a fixed bin width or a fixed bin count in all cases.
Therefore PyRadiomics implements both the option for setting a fixed bin count (``binCount``) and a fixed bin width
(``binWidth``, default).

The reason the a fixed bin width has been chosen as the default parameter is based in part on studies in PET that show
a better reproducibility of features when implementing a fixed bin width [1]_.
Furthermore, our reasoning is best illustrated by the following example:
Given an input with 2 images with 2 ROIs, with the range of gray values in the first being {0-100} and in the second
{0-10}. If you use a fixed bin count, the “meaning” of 1 (discretized) gray value difference is different (in the first
it means 10 gray values different, in the second just 1). This means you are looking at texture based on very different
contrasts.

This example does assume that the original gray values mean the same thing in both images, and in case of images with
definite/absolute gray values (e.g. HU in CT, SUV in PET imaging), this holds true. However, in case of
arbitrary/relative gray values (e.g. signal intensity in MR), this is not necessarily the case.
In this latter case, we still recommend a fixed bin width, but with additional pre-processing (e.g. normalization) to
ensure better comparability of gray values. Use of a fixed bin count would be possible here, but then the calculated
features may still be very influenced by the range of gray values seen in the image, as well as noise caused by the fact
that the original gray values are less comparable. Moreover, regardless of type of gray value discretization, steps must
be taken to ensure good comparability, as the first order features largely use the original gray values
(without discretization).

Finally, there is the issue of what value to use for the width of the bin. Again, there are currently no specific
guidelines from literature as to what constitutes an optimal bin width. We try to choose a bin width in such a way, that
the resulting amount of bins is somewhere between 30 and 130 bins, which shows good reproducibility and performance in
literature for a fixed bin count [2]_. This allows for differing ranges of intensity in
ROIs, while still keeping the texture features informative (and comparable inter lesion!).

Error loading parameter file
############################
Expand Down Expand Up @@ -99,6 +118,9 @@ the optimal settings may differ between modalities. There are some constraints o
2. 3D or slice: Although PyRadiomics supports single slice (2D) feature extraction, the input is still required to have
3 dimensions (where in case of 2D, a dimension may be of size 1).

If you want to use 2D, color and/or 4D volumes, additional preprocessing is required to convert the images.
See `this thread <https://groups.google.com/forum/#!topic/pyradiomics/QLdD_qEw3PY>`_ for some tips and tricks on how to achieve this.

Can I use DICOM-RT struct for the input mask?
#############################################

Expand All @@ -111,7 +133,7 @@ Usage
-----

How should the input file for ``pyradiomics`` in batch-mode be structured?
#################################################################
##########################################################################

Currently, the batch input file for ``pyradiomics`` is a csv file specifying the combinations of images and masks for
which to extract features. It must contain a header line, where at least header "Image" and "Mask" should be specified
Expand Down Expand Up @@ -143,7 +165,7 @@ logger can be accessed via ``radiomics.logger``. See also :ref:`here <radiomics-
included in the repository on how to set up logging.

I'm able to extract features, but many are NaN, 0 or 1. What happened?
#####################################################################
######################################################################

It is possible that the segmentation was too small to extract a valid texture. Check the value of ``VoxelNum``, which is
part of the additional information in the output. This is the number of voxels in the ROI after pre processing and
Expand All @@ -160,9 +182,10 @@ differ significantly.
Does PyRadiomics support voxel-wise feature extraction?
#######################################################

No, currently PyRadiomics only supports lesion-based feature extraction. However, voxel-based feature extraction may be
a good addition in the future. If you have thoughts or ideas on how to implement this, we'd welcome your input on the
`pyradiomics email list <https://groups.google.com/forum/#!forum/pyradiomics>`_.
Yes, as of version 2.0, voxelwise calculation has been implemented. However, as this entails the calculations of
features for each voxel, performing a voxelwise extraction is much slower and as the output consists of a feature map
for each feature, output size is also much larger. See more on enabling a voxel-based extraction in the
:ref:`usage section<radiomics-usage-label>`.

Miscellaneous
-------------
Expand Down
35 changes: 33 additions & 2 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,37 @@ Interactive Use

* See the :ref:`feature extractor class<radiomics-featureextractor-label>` for more information on using this core class.

----------------------
Voxel-based extraction
----------------------

As of version 2.0, pyradiomics also implements a voxel-based extraction.
Currently, this is only available in the interactive mode, and is as simple as telling the feature extractor to
extract a parameter map::

from radiomics import featureextractor, getTestCase
import six
import sys, os

import SimpleITK as sitk

dataDir = '/path/to/pyradiomics'

imageName, maskName = getTestCase('brain1', dataDir)
params = os.path.join(dataDir, "examples", "exampleSettings", "exampleVoxel.yaml")

extractor = featureextractor.RadiomicsFeaturesExtractor(params)

result = extractor.execute(imageName, maskName, voxelBased=True)

for key, val in six.iteritems(result):
sitk.WriteImage(val, key + 'nrrd')

Important to know here is that this extraction takes longer (features have to be calculated for each voxel), and that
the output is a SimpleITK image of the parameter map in stead of a float value *for each feature*.

Be sure to also check out the ``helloVoxel.py`` example available in the repository (folder ``examples``).

------------------------
PyRadiomics in 3D Slicer
------------------------
Expand All @@ -126,7 +157,7 @@ Using feature classes directly
------------------------------

* This represents an example where feature classes are used directly, circumventing checks and preprocessing done by
the radiomics feature extractor class, and is not intended as standard use example.
the radiomics feature extractor class, and is not intended as standard use.

* (LINUX) To run from source code, add pyradiomics to the environment variable PYTHONPATH (Not necessary when
PyRadiomics is installed):
Expand Down Expand Up @@ -196,4 +227,4 @@ handler to the pyradiomics logger::

To store a log file when running pyradiomics from the commandline, specify a file location in the optional
``--log-file`` argument. The amount of logging that is stored is controlled by the ``--log-level`` argument
(default level INFO and up).
(default level WARNING and up).
6 changes: 5 additions & 1 deletion radiomics/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ def __init__(self, inputImage, inputMask, **kwargs):
self.settings = kwargs

self.binWidth = kwargs.get('binWidth', 25)
# Although binCount is available, we advise use of binWidth!
# See documentation/FAQ/Input-Customization for more details.
self.binCount = kwargs.get('binCount', None)
self.label = kwargs.get('label', 1)
self.voxelBased = kwargs.get('voxelBased', False)
self.initValue = kwargs.get('initValue', 0)
Expand Down Expand Up @@ -176,7 +179,8 @@ def _initCalculation(self):
def _applyBinning(self):
self.matrix, self.binEdges = imageoperations.binImage(self.binWidth,
self.imageArray,
self.maskArray)
self.maskArray,
self.settings.get('binCount', None))
self.coefficients['grayLevels'] = numpy.unique(self.matrix[self.maskArray])
self.coefficients['Ng'] = int(numpy.max(self.coefficients['grayLevels'])) # max gray level in the ROI

Expand Down
5 changes: 5 additions & 0 deletions radiomics/featureextractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,11 @@ def execute(self, imageFilepath, maskFilepath, label=None, voxelBased=False):
self.logger.debug('Enabled features: %s', self._enabledFeatures)
self.logger.debug('Current settings: %s', self.settings)

if self.settings.get('binCount', None) is not None:
self.logger.warning('Fixed bin Count enabled! However, we recommend using a fixed bin Width. See '
'http://pyradiomics.readthedocs.io/en/latest/faq.html#radiomics-fixed-bin-width for more '
'details')

# 1. Load the image and mask
featureVector = collections.OrderedDict()
image, mask = self.loadImage(imageFilepath, maskFilepath)
Expand Down
18 changes: 14 additions & 4 deletions radiomics/firstorder.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ def __init__(self, inputImage, inputMask, **kwargs):
def _initCalculation(self):
super(RadiomicsFirstOrder, self)._initSegmentBasedCalculation()
self.targetVoxelArray = self.imageArray[self.labelledVoxelCoordinates].astype('float')
self.discretizedTargetVoxelArray = None # Lazy instantiation

self.logger.debug('First order feature class initialized')

Expand All @@ -52,6 +53,17 @@ def _moment(self, a, moment=1, axis=0):
s = numpy.power((a - mn), moment)
return numpy.mean(s, axis)

def _getDiscretizedTargetVoxelArray(self):
if self.discretizedTargetVoxelArray is None:
if self.binCount is not None:
binEdges = self.binCount
else:
binEdges = imageoperations.getBinEdges(self.binWidth, self.targetVoxelArray)

self.discretizedTargetVoxelArray = numpy.histogram(self.targetVoxelArray, binEdges)[0]

return self.discretizedTargetVoxelArray

def getEnergyFeatureValue(self):
r"""
**1. Energy**
Expand Down Expand Up @@ -116,8 +128,7 @@ def getEntropyFeatureValue(self):
"""

eps = numpy.spacing(1)
binEdges = imageoperations.getBinEdges(self.binWidth, self.targetVoxelArray)
bins = numpy.histogram(self.targetVoxelArray, binEdges)[0]
bins = self._getDiscretizedTargetVoxelArray()

sumBins = bins.sum()
if sumBins == 0: # No segmented voxels
Expand Down Expand Up @@ -384,8 +395,7 @@ def getUniformityFeatureValue(self):
"""

eps = numpy.spacing(1)
binEdges = imageoperations.getBinEdges(self.binWidth, self.targetVoxelArray)
bins = numpy.histogram(self.targetVoxelArray, binEdges)[0]
bins = self._getDiscretizedTargetVoxelArray()
sumBins = bins.sum()

if sumBins == 0: # No segmented voxels
Expand Down
12 changes: 9 additions & 3 deletions radiomics/imageoperations.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def getBinEdges(binwidth, parameterValues):
return binEdges # numpy.histogram(parameterValues, bins=binedges)


def binImage(binwidth, parameterMatrix, parameterMatrixCoordinates=None):
def binImage(binwidth, parameterMatrix, parameterMatrixCoordinates=None, bincount=None):
r"""
Discretizes the parameterMatrix (matrix representation of the gray levels in the ROI) using the binEdges calculated
using :py:func:`getBinEdges`. Only voxels defined by parameterMatrixCoordinates (defining the segmentation) are used
Expand Down Expand Up @@ -84,10 +84,16 @@ def binImage(binwidth, parameterMatrix, parameterMatrixCoordinates=None):
logger.debug('Discretizing gray levels inside ROI')

if parameterMatrixCoordinates is None:
binEdges = getBinEdges(binwidth, parameterMatrix[:])
if bincount is not None:
binEdges = numpy.histogram(parameterMatrix[:], bincount)[1]
else:
binEdges = getBinEdges(binwidth, parameterMatrix[:])
parameterMatrix = numpy.digitize(parameterMatrix, binEdges)
else:
binEdges = getBinEdges(binwidth, parameterMatrix[parameterMatrixCoordinates])
if bincount is not None:
binEdges = numpy.histogram(parameterMatrix[parameterMatrixCoordinates], bincount)
else:
binEdges = getBinEdges(binwidth, parameterMatrix[parameterMatrixCoordinates])
parameterMatrix[parameterMatrixCoordinates] = numpy.digitize(parameterMatrix[parameterMatrixCoordinates], binEdges)

return parameterMatrix, binEdges
Expand Down
4 changes: 4 additions & 0 deletions radiomics/schemas/paramSchema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ mapping:
type: float
range:
min-ex: 0
binCount:
type: int
range:
min-ex: 0
normalize:
type: bool
normalizeScale:
Expand Down

0 comments on commit 885f981

Please sign in to comment.