threshold quality #161

AndreZeug · 2024-05-17T14:31:32Z

Component
Win10, Matlab R2024a
DIPimage toolbox for quantitative image analysis
Version 3.4.3 14-May-2024

Describe the problem that this feature would solve
I have the impression that the quality of the threshold function(s), based on original DIPlib code 1995-2014 (see threshold.cpp), is rather coarse. To optimize threshold.m in DIPimage 2.9 I simply used substantially more bins in the underlying "smoothhistogram" function from N = 200 to N = 20000. Please compare both versions:
DIPimage 2.9 threshold.m threshold.zip
AZeug myThreshold.m myThreshold.zip

When using a test dataset dd.zip
I get the following results
AZeug modifies DIPimage 2.9

[ ~ , thr] = myThreshold(dd,'background')
thr =
   65.5151

DIPimage 3.4 (original DIPimage 2.9 ?)

[ ~ ,thr]=threshold(dd,'background')
thr =
       677.01

When looking to the histogram of this dataset carefully, it looks like this:

Here I calculated the histogram precisely and fitted a gauss function to guess the most frequent value and the sigma of its distribution. From the FWHM I would set the "background" threshold to 60. Therefore, I was quite satisfied with my myThreshold function.

Probably, the other methods profit as well from the higher bin of the "smoothhistogram" function.

I have to admit that I was not able to check in detail how you calculate the thresholds in DIPimage 3.xxx. If the threshold function works as desired, please let me know.

Many Thanks for providing DIPimage
Andre

The text was updated successfully, but these errors were encountered:

crisluengo · 2024-05-17T19:49:59Z

Hi Andre,

Yes, your data totally breaks the logic of this code. My assumption that a small histogram is always sufficient was clearly wrong. Thanks again for the detailed feedback!

It looks like the background threshold algorithm is taking a histogram with 256 bins, which means that there's one bin all the way on the left that is huge, and the rest is nearly zero. So it cannot properly compute the FWHM of the background peak. I think the 677.01 value comes from the subsequent interpolation going wrong.

I can see different things to fix this:

The default histogram for a 16-bit image should have 2^16 bins, not 2^8. But this won't fix the case where the input is floating-point.
Alternatively, estimate the number of histogram bins based on the size of the image, using a method like Freedman–Diaconis. I'm not sure this would fix the threshold in this case.
Add the number of bins used in the histogram as a parameter to the the threshold functions (only the ones that use a histogram of course). This gives the user the ability to fix things, but does not fix the default.
Add a function to DIPimage that computes a threshold based on a histogram given by the user. DIPlib has these functions, but they're not accessible from MATLAB.

I'll be working on some of these things soon.

AndreZeug · 2024-05-20T17:49:06Z

Dear Cris,
Thank you for your analysis and suggestions.
I think 8/16 bit data is straightforward to deal with.
One more thought in the case of floating point numbers is "where the data might come from", typically systems with dynamic ranges no greater than 16 bit. As long as the data range is 'linear', 2^16 bins might be sufficient.
As soon as the data is 'non-linear', such as ratio data or some logarithmic manipulations, the histogram is skewed anyway and the thresholding methods implemented in DIPimage are not applicable. One could still use "number of bins" and maybe also "[min max]" as optional input.

The question is what happens if the bin size is too large and empty bins occur. Does this affect the further analysis?

Surely my "myThreshold" function is a lazy workaround and probably not the right solution for many other applications. 🙈

Have a nice rest of your holiday - if you had one at all.

All the best
Andre

crisluengo · 2024-05-20T23:04:31Z

The concern for empty bins was the main reason that the histograms were limited to 256 bins. You can't say all 16-bit images should use 2^16 bins, because often the data is from a 10-bit or a 12-bit camera. Plus, it is not always necessary to use a histogram with such high definition, fewer bins means some noise gets averaged out.

I've added the Freedman-Diaconis rule to DIPlib in 7b7c8d9.
I tried to implement this in a way that doesn't affect current behavior (i.e. it needs to be explicitly selected).

I'm now playing with the segmentation algorithms using this for the histogram they generate, and at least for your data it seems to do the right thing. [~,thr]=threshold(dd,'background') gives me 60.9158.

I want to add the option to the segmentation functions to choose the number of bins, and one specific value (e.g. 0) could be a flag to trigger the Freedman-Diaconis rule. And then we need to decide whether to change the default or not. This is a case, I think, where it pays changing the default behavior to a more robust solution. Hopefully it doesn't even affect results significantly in a majority of cases.

crisluengo · 2024-05-22T23:21:23Z

I've decided to change these functions to always use this "optimal" histogram configuration. I've seen it work better than the default even on some standard 8-bit test images.

Next I will add to DIPimage the threshold estimation functions that take the histogram as input. In C++ and Python this is already a good alternative if the histogram computation used in the threshold function is not good enough.

AndreZeug added the enhancement label May 17, 2024

crisluengo closed this as completed in dca3d5c May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

threshold quality #161

threshold quality #161

AndreZeug commented May 17, 2024

crisluengo commented May 17, 2024

AndreZeug commented May 20, 2024

crisluengo commented May 20, 2024

crisluengo commented May 22, 2024

threshold quality #161

threshold quality #161

Comments

AndreZeug commented May 17, 2024

crisluengo commented May 17, 2024

AndreZeug commented May 20, 2024

crisluengo commented May 20, 2024

crisluengo commented May 22, 2024