ENH: Add support for using a fixed bin count

In most literature on Radiomics to date, gray value discretization is done using either a fixed bin count or a fixed bin width. Although there is consensus on the need for discretization prior to texture-feature calculation, there is no definitive evidence favouring either the fixed bin count or fixed bin width method in a general application. There is some literature available showing a better reproducibility using a fixed bin width for features extracted from PET imaging. On the other hand, in many studies a fixed bin count is used. At PyRadiomics, we recommend using a fixed bin width, but implement a fixed bin count for completeness. See [this FAQ topic](http://pyradiomics.readthedocs.io/en/latest/faq.html#radiomics_fixed_bin_width) for a more detailed explanation of our rationale for recommending usage of a fixed bin width. Additionally, update the documentation to reflect the new voxel-based extraction functionality.
AIM-Harvard · Jun 27, 2018 · 885f981 · 885f981
1 parent e6435e7
commit 885f981
Show file tree

Hide file tree

Showing 8 changed files with 115 additions and 29 deletions.
diff --git a/docs/customization.rst b/docs/customization.rst
@@ -277,6 +277,9 @@ Feature Class Level
 *Image discretization*
 
 - ``binWidth`` [25]: Float, > 0, size of the bins when making a histogram and for discretization of the image gray level.
+- ``binCount`` [None]: integer, > 0, specifies the number of bins to create. The width of the bin is
+  then determined by the range in the ROI. No definitive evidence is available on which method of discretization is
+  superior, we advise a fixed bin width. See more :ref:`here <radiomics_fixed_bin_width>`.
 
 *Forced 2D extraction*
 

diff --git a/docs/faq.rst b/docs/faq.rst
@@ -43,20 +43,39 @@ do this by providing a :ref:`parameter file <radiomics-parameter-file-label>`. I
 `yaml structured <http://yaml.org/>`_ text file you can define your custom settings and which features and input image
 types to enable.
 
-Why is there no parameter to specify a fixed bin count?
-#######################################################
-
-PyRadiomics does not have the option for setting a fixed bin count, as a fixed bin count makes the values less
-comparable, instead of more. This is because a fixed bin count means that the “meaning” of difference between gray
-values is dependent on the range of gray values in the ROI. Take for example 2 images with 2 ROIs, with the range of
-gray values in the first being {0-100} and in the second {0-10}. If you use a fixed bin count, the “meaning” of 1 gray
-value difference is different (in the first it means 10 gray values different, in the second just 1). This means you are
-looking at texture based on very different contrasts.
-Therefore, PyRadiomics uses a fixed bin width (parameter ``binWidth``), which ensures texture feature values are
-calculated using the same “contrast” between gray values [1]_. There are currently no specific guidelines as to what
-constitutes an optimal bin width. We try to choose a bin width in such a way, that the resulting amount of bins is
-somewhere between 30 and 130 bins. This allows for differing ranges of intensity in ROIs, while still keeping the
-texture features informative (and comparable inter lesion!).
+.. _radiomics_fixed_bin_width:
+
+What about gray value discretization? Fixed bin width? Fixed bin count?
+#######################################################################
+
+Currently, although many studies favour a fixed bin count over a fixed bin width, there is no hard evidence favouring
+either a fixed bin width or a fixed bin count in all cases.
+Therefore PyRadiomics implements both the option for setting a fixed bin count (``binCount``) and a fixed bin width
+(``binWidth``, default).
+
+The reason the a fixed bin width has been chosen as the default parameter is based in part on studies in PET that show
+a better reproducibility of features when implementing a fixed bin width [1]_.
+Furthermore, our reasoning is best illustrated by the following example:
+Given an input with 2 images with 2 ROIs, with the range of gray values in the first being {0-100} and in the second
+{0-10}. If you use a fixed bin count, the “meaning” of 1 (discretized) gray value difference is different (in the first
+it means 10 gray values different, in the second just 1). This means you are looking at texture based on very different
+contrasts.
+
+This example does assume that the original gray values mean the same thing in both images, and in case of images with
+definite/absolute gray values (e.g. HU in CT, SUV in PET imaging), this holds true. However, in case of
+arbitrary/relative gray values (e.g. signal intensity in MR), this is not necessarily the case.
+In this latter case, we still recommend a fixed bin width, but with additional pre-processing (e.g. normalization) to
+ensure better comparability of gray values. Use of a fixed bin count would be possible here, but then the calculated
+features may still be very influenced by the range of gray values seen in the image, as well as noise caused by the fact
+that the original gray values are less comparable. Moreover, regardless of type of gray value discretization, steps must
+be taken to ensure good comparability, as the first order features largely use the original gray values
+(without discretization).
+
+Finally, there is the issue of what value to use for the width of the bin. Again, there are currently no specific
+guidelines from literature as to what constitutes an optimal bin width. We try to choose a bin width in such a way, that
+the resulting amount of bins is somewhere between 30 and 130 bins, which shows good reproducibility and performance in
+literature for a fixed bin count [2]_. This allows for differing ranges of intensity in
+ROIs, while still keeping the texture features informative (and comparable inter lesion!).
 
 Error loading parameter file
 ############################
@@ -99,6 +118,9 @@ the optimal settings may differ between modalities. There are some constraints o
 2. 3D or slice: Although PyRadiomics supports single slice (2D) feature extraction, the input is still required to have
    3 dimensions (where in case of 2D, a dimension may be of size 1).
 
+If you want to use 2D, color and/or 4D volumes, additional preprocessing is required to convert the images.
+See `this thread <https://groups.google.com/forum/#!topic/pyradiomics/QLdD_qEw3PY>`_ for some tips and tricks on how to achieve this.
+
 Can I use DICOM-RT struct for the input mask?
 #############################################
 
@@ -111,7 +133,7 @@ Usage
 -----
 
 How should the input file for ``pyradiomics`` in batch-mode be structured?
-#################################################################
+##########################################################################
 
 Currently, the batch input file for ``pyradiomics`` is a csv file specifying the combinations of images and masks for
 which to extract features. It must contain a header line, where at least header "Image" and "Mask" should be specified
@@ -143,7 +165,7 @@ logger can be accessed via ``radiomics.logger``. See also :ref:`here <radiomics-
 included in the repository on how to set up logging.
 
 I'm able to extract features, but many are NaN, 0 or 1. What happened?
-#####################################################################
+######################################################################
 
 It is possible that the segmentation was too small to extract a valid texture. Check the value of ``VoxelNum``, which is
 part of the additional information in the output. This is the number of voxels in the ROI after pre processing and
@@ -160,9 +182,10 @@ differ significantly.
 Does PyRadiomics support voxel-wise feature extraction?
 #######################################################
 
-No, currently PyRadiomics only supports lesion-based feature extraction. However, voxel-based feature extraction may be
-a good addition in the future. If you have thoughts or ideas on how to implement this, we'd welcome your input on the
-`pyradiomics email list <https://groups.google.com/forum/#!forum/pyradiomics>`_.
+Yes, as of version 2.0, voxelwise calculation has been implemented. However, as this entails the calculations of
+features for each voxel, performing a voxelwise extraction is much slower and as the output consists of a feature map
+for each feature, output size is also much larger. See more on enabling a voxel-based extraction in the
+:ref:`usage section<radiomics-usage-label>`.
 
 Miscellaneous
 -------------

diff --git a/docs/usage.rst b/docs/usage.rst
@@ -114,6 +114,37 @@ Interactive Use
 
 * See the :ref:`feature extractor class<radiomics-featureextractor-label>` for more information on using this core class.
 
+----------------------
+Voxel-based extraction
+----------------------
+
+As of version 2.0, pyradiomics also implements a voxel-based extraction.
+Currently, this is only available in the interactive mode, and is as simple as telling the feature extractor to
+extract a parameter map::
+
+    from radiomics import featureextractor, getTestCase
+    import six
+    import sys, os
+
+    import SimpleITK as sitk
+
+    dataDir = '/path/to/pyradiomics'
+
+    imageName, maskName = getTestCase('brain1', dataDir)
+    params = os.path.join(dataDir, "examples", "exampleSettings", "exampleVoxel.yaml")
+
+    extractor = featureextractor.RadiomicsFeaturesExtractor(params)
+
+    result = extractor.execute(imageName, maskName, voxelBased=True)
+
+    for key, val in six.iteritems(result):
+      sitk.WriteImage(val, key + 'nrrd')
+
+Important to know here is that this extraction takes longer (features have to be calculated for each voxel), and that
+the output is a SimpleITK image of the parameter map in stead of a float value *for each feature*.
+
+Be sure to also check out the ``helloVoxel.py`` example available in the repository (folder ``examples``).
+
 ------------------------
 PyRadiomics in 3D Slicer
 ------------------------
@@ -126,7 +157,7 @@ Using feature classes directly
 ------------------------------
 
 * This represents an example where feature classes are used directly, circumventing checks and preprocessing done by
-  the radiomics feature extractor class, and is not intended as standard use example.
+  the radiomics feature extractor class, and is not intended as standard use.
 
 * (LINUX) To run from source code, add pyradiomics to the environment variable PYTHONPATH (Not necessary when
   PyRadiomics is installed):
@@ -196,4 +227,4 @@ handler to the pyradiomics logger::
 
 To store a log file when running pyradiomics from the commandline, specify a file location in the optional
 ``--log-file`` argument. The amount of logging that is stored is controlled by the ``--log-level`` argument
-(default level INFO and up).
+(default level WARNING and up).
diff --git a/radiomics/base.py b/radiomics/base.py
@@ -66,6 +66,9 @@ def __init__(self, inputImage, inputMask, **kwargs):
     self.settings = kwargs
 
     self.binWidth = kwargs.get('binWidth', 25)
+    # Although binCount is available, we advise use of binWidth!
+    # See documentation/FAQ/Input-Customization for more details.
+    self.binCount = kwargs.get('binCount', None)
     self.label = kwargs.get('label', 1)
     self.voxelBased = kwargs.get('voxelBased', False)
     self.initValue = kwargs.get('initValue', 0)
@@ -176,7 +179,8 @@ def _initCalculation(self):
   def _applyBinning(self):
     self.matrix, self.binEdges = imageoperations.binImage(self.binWidth,
                                                           self.imageArray,
-                                                          self.maskArray)
+                                                          self.maskArray,
+                                                          self.settings.get('binCount', None))
     self.coefficients['grayLevels'] = numpy.unique(self.matrix[self.maskArray])
     self.coefficients['Ng'] = int(numpy.max(self.coefficients['grayLevels']))  # max gray level in the ROI
 

diff --git a/radiomics/featureextractor.py b/radiomics/featureextractor.py
@@ -368,6 +368,11 @@ def execute(self, imageFilepath, maskFilepath, label=None, voxelBased=False):
     self.logger.debug('Enabled features: %s', self._enabledFeatures)
     self.logger.debug('Current settings: %s', self.settings)
 
+    if self.settings.get('binCount', None) is not None:
+      self.logger.warning('Fixed bin Count enabled! However, we recommend using a fixed bin Width. See '
+                          'http://pyradiomics.readthedocs.io/en/latest/faq.html#radiomics-fixed-bin-width for more '
+                          'details')
+
     # 1. Load the image and mask
     featureVector = collections.OrderedDict()
     image, mask = self.loadImage(imageFilepath, maskFilepath)

diff --git a/radiomics/firstorder.py b/radiomics/firstorder.py
@@ -37,6 +37,7 @@ def __init__(self, inputImage, inputMask, **kwargs):
   def _initCalculation(self):
     super(RadiomicsFirstOrder, self)._initSegmentBasedCalculation()
     self.targetVoxelArray = self.imageArray[self.labelledVoxelCoordinates].astype('float')
+    self.discretizedTargetVoxelArray = None  # Lazy instantiation
 
     self.logger.debug('First order feature class initialized')
 
@@ -52,6 +53,17 @@ def _moment(self, a, moment=1, axis=0):
       s = numpy.power((a - mn), moment)
       return numpy.mean(s, axis)
 
+  def _getDiscretizedTargetVoxelArray(self):
+    if self.discretizedTargetVoxelArray is None:
+      if self.binCount is not None:
+        binEdges = self.binCount
+      else:
+        binEdges = imageoperations.getBinEdges(self.binWidth, self.targetVoxelArray)
+
+      self.discretizedTargetVoxelArray = numpy.histogram(self.targetVoxelArray, binEdges)[0]
+
+    return self.discretizedTargetVoxelArray
+
   def getEnergyFeatureValue(self):
     r"""
     **1. Energy**
@@ -116,8 +128,7 @@ def getEntropyFeatureValue(self):
     """
 
     eps = numpy.spacing(1)
-    binEdges = imageoperations.getBinEdges(self.binWidth, self.targetVoxelArray)
-    bins = numpy.histogram(self.targetVoxelArray, binEdges)[0]
+    bins = self._getDiscretizedTargetVoxelArray()
 
     sumBins = bins.sum()
     if sumBins == 0:  # No segmented voxels
@@ -384,8 +395,7 @@ def getUniformityFeatureValue(self):
     """
 
     eps = numpy.spacing(1)
-    binEdges = imageoperations.getBinEdges(self.binWidth, self.targetVoxelArray)
-    bins = numpy.histogram(self.targetVoxelArray, binEdges)[0]
+    bins = self._getDiscretizedTargetVoxelArray()
     sumBins = bins.sum()
 
     if sumBins == 0:  # No segmented voxels

diff --git a/radiomics/imageoperations.py b/radiomics/imageoperations.py
@@ -52,7 +52,7 @@ def getBinEdges(binwidth, parameterValues):
   return binEdges  # numpy.histogram(parameterValues, bins=binedges)
 
 
-def binImage(binwidth, parameterMatrix, parameterMatrixCoordinates=None):
+def binImage(binwidth, parameterMatrix, parameterMatrixCoordinates=None, bincount=None):
   r"""
   Discretizes the parameterMatrix (matrix representation of the gray levels in the ROI) using the binEdges calculated
   using :py:func:`getBinEdges`. Only voxels defined by parameterMatrixCoordinates (defining the segmentation) are used
@@ -84,10 +84,16 @@ def binImage(binwidth, parameterMatrix, parameterMatrixCoordinates=None):
   logger.debug('Discretizing gray levels inside ROI')
 
   if parameterMatrixCoordinates is None:
-    binEdges = getBinEdges(binwidth, parameterMatrix[:])
+    if bincount is not None:
+      binEdges = numpy.histogram(parameterMatrix[:], bincount)[1]
+    else:
+      binEdges = getBinEdges(binwidth, parameterMatrix[:])
     parameterMatrix = numpy.digitize(parameterMatrix, binEdges)
   else:
-    binEdges = getBinEdges(binwidth, parameterMatrix[parameterMatrixCoordinates])
+    if bincount is not None:
+      binEdges = numpy.histogram(parameterMatrix[parameterMatrixCoordinates], bincount)
+    else:
+      binEdges = getBinEdges(binwidth, parameterMatrix[parameterMatrixCoordinates])
     parameterMatrix[parameterMatrixCoordinates] = numpy.digitize(parameterMatrix[parameterMatrixCoordinates], binEdges)
 
   return parameterMatrix, binEdges

diff --git a/radiomics/schemas/paramSchema.yaml b/radiomics/schemas/paramSchema.yaml
@@ -29,6 +29,10 @@ mapping:
         type: float
         range:
           min-ex: 0
+      binCount:
+        type: int
+        range:
+          min-ex: 0
       normalize:
         type: bool
       normalizeScale: