Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threshold quality #161

Closed
AndreZeug opened this issue May 17, 2024 · 4 comments
Closed

threshold quality #161

AndreZeug opened this issue May 17, 2024 · 4 comments

Comments

@AndreZeug
Copy link

Component
Win10, Matlab R2024a
DIPimage toolbox for quantitative image analysis
Version 3.4.3 14-May-2024

Describe the problem that this feature would solve
I have the impression that the quality of the threshold function(s), based on original DIPlib code 1995-2014 (see threshold.cpp), is rather coarse. To optimize threshold.m in DIPimage 2.9 I simply used substantially more bins in the underlying "smoothhistogram" function from N = 200 to N = 20000. Please compare both versions:
DIPimage 2.9 threshold.m threshold.zip
AZeug myThreshold.m myThreshold.zip

When using a test dataset dd.zip
I get the following results
AZeug modifies DIPimage 2.9

[ ~ , thr] = myThreshold(dd,'background')
thr =
   65.5151

DIPimage 3.4 (original DIPimage 2.9 ?)

[ ~ ,thr]=threshold(dd,'background')
thr =
       677.01

When looking to the histogram of this dataset carefully, it looks like this:
BGTestFig
Here I calculated the histogram precisely and fitted a gauss function to guess the most frequent value and the sigma of its distribution. From the FWHM I would set the "background" threshold to 60. Therefore, I was quite satisfied with my myThreshold function.

Probably, the other methods profit as well from the higher bin of the "smoothhistogram" function.

I have to admit that I was not able to check in detail how you calculate the thresholds in DIPimage 3.xxx. If the threshold function works as desired, please let me know.

Many Thanks for providing DIPimage
Andre

@crisluengo
Copy link
Member

Hi Andre,

Yes, your data totally breaks the logic of this code. My assumption that a small histogram is always sufficient was clearly wrong. Thanks again for the detailed feedback!

It looks like the background threshold algorithm is taking a histogram with 256 bins, which means that there's one bin all the way on the left that is huge, and the rest is nearly zero. So it cannot properly compute the FWHM of the background peak. I think the 677.01 value comes from the subsequent interpolation going wrong.

I can see different things to fix this:

  1. The default histogram for a 16-bit image should have 2^16 bins, not 2^8. But this won't fix the case where the input is floating-point.
  2. Alternatively, estimate the number of histogram bins based on the size of the image, using a method like Freedman–Diaconis. I'm not sure this would fix the threshold in this case.
  3. Add the number of bins used in the histogram as a parameter to the the threshold functions (only the ones that use a histogram of course). This gives the user the ability to fix things, but does not fix the default.
  4. Add a function to DIPimage that computes a threshold based on a histogram given by the user. DIPlib has these functions, but they're not accessible from MATLAB.

I'll be working on some of these things soon.

@AndreZeug
Copy link
Author

Dear Cris,
Thank you for your analysis and suggestions.
I think 8/16 bit data is straightforward to deal with.
One more thought in the case of floating point numbers is "where the data might come from", typically systems with dynamic ranges no greater than 16 bit. As long as the data range is 'linear', 2^16 bins might be sufficient.
As soon as the data is 'non-linear', such as ratio data or some logarithmic manipulations, the histogram is skewed anyway and the thresholding methods implemented in DIPimage are not applicable. One could still use "number of bins" and maybe also "[min max]" as optional input.

The question is what happens if the bin size is too large and empty bins occur. Does this affect the further analysis?

Surely my "myThreshold" function is a lazy workaround and probably not the right solution for many other applications. 🙈

Have a nice rest of your holiday - if you had one at all.

All the best
Andre

@crisluengo
Copy link
Member

The concern for empty bins was the main reason that the histograms were limited to 256 bins. You can't say all 16-bit images should use 2^16 bins, because often the data is from a 10-bit or a 12-bit camera. Plus, it is not always necessary to use a histogram with such high definition, fewer bins means some noise gets averaged out.

I've added the Freedman-Diaconis rule to DIPlib in 7b7c8d9.
I tried to implement this in a way that doesn't affect current behavior (i.e. it needs to be explicitly selected).

I'm now playing with the segmentation algorithms using this for the histogram they generate, and at least for your data it seems to do the right thing. [~,thr]=threshold(dd,'background') gives me 60.9158.

I want to add the option to the segmentation functions to choose the number of bins, and one specific value (e.g. 0) could be a flag to trigger the Freedman-Diaconis rule. And then we need to decide whether to change the default or not. This is a case, I think, where it pays changing the default behavior to a more robust solution. Hopefully it doesn't even affect results significantly in a majority of cases.

@crisluengo
Copy link
Member

I've decided to change these functions to always use this "optimal" histogram configuration. I've seen it work better than the default even on some standard 8-bit test images.

Next I will add to DIPimage the threshold estimation functions that take the histogram as input. In C++ and Python this is already a good alternative if the histogram computation used in the threshold function is not good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants