Skip to content

Commit

Permalink
BUG: Ensure consistency between numpy.histogram and numpy.digitize
Browse files Browse the repository at this point in the history
Related to #219. Ensure numpy.histogram and numpy.digitize yield comparable results. The difference is caused by the fact that numpy.histogram treats the rightmost bin as a closed interval (i.e. values equal to the rightmost edge are included in the last bin). On the other hand, numpy.digitize treats all bins as half open (including the rightmost). In this latter case, values equal to the rightmost bin edge are given value `len(bins) + 1`.

Ensure consistent behaviour (with half open bins for all bins) by adding an extra bin edge when using fixed bin width (i.e. the rightmost edge is > max(targetValues)).
When using a fixed bin count, correct behaviour was already ensured by adding +1 to the last bin (which is guaranteed to be equal to max(targetValues). In this case, the +1 ensures nummpy.digitize considers these maximum values as part of the last bin, seeing as the edges are arranged such, that a specific number of bins is obtained).

On relation to #219: In that PR, the +1 to the rightmost bin was removed to allow 'switching off' binning by specifying bin-width 1. However, this changed did return the inconsistency between numpy.histogram and numpy.digitize, which is now corrected by this commit (by ensuring numpy.histogram is consistent with numpy.digitize, instead of the other way around).
  • Loading branch information
JoostJM committed Feb 20, 2019
1 parent 5bb18fc commit 12b512f
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions radiomics/imageoperations.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,11 @@ def getBinEdges(parameterValues, **kwargs):

# Start binning form the first value lesser than or equal to the minimum value and evenly dividable by binwidth
lowBound = minimum - (minimum % binWidth)
# Add + binwidth to ensure the maximum value is included in the range generated by numpy.arange
highBound = maximum + binWidth
# Add + 2* binwidth to ensure the maximum value is included in the range generated by numpy.arange, and that values
# equal to highbound are binned into a separate bin by numpy.histogram (This ensures ALL bins are half open, as
# numpy.histogram treats the last bin as a closed interval. Moreover, this ensures consistency with numpy.digitize,
# which will assign len(bins) + 1 to values equal to rightmost bin edge, treating all bins as half-open)
highBound = maximum + 2 * binWidth

binEdges = numpy.arange(lowBound, highBound, binWidth)

Expand Down

0 comments on commit 12b512f

Please sign in to comment.