add worst_tiles sampling #77

JonasHell · 2022-07-20T12:54:46Z

Current implementation looks quite complicated. Unfortunately, the idea is not compatible with your _score_based_points(). But maybe there still some room to improve my implementation...

constantinpape

The idea looks very good!
I left a couple of comments of how I think we could improve it.

constantinpape · 2022-07-20T19:59:24Z

torch_em/shallow2deep/prepare_shallow2deep.py

@@ -429,6 +431,7 @@ def worst_points(
    forests, forests_per_stage,
    sample_fraction_per_stage,
    accumulate_samples=True,
+    sampling_kwargs={},


Passing a dict as default argument is generally discouraged (see https://stackoverflow.com/questions/26320899/why-is-the-empty-dictionary-a-dangerous-default-value-in-python). Can you use the **sampling_kwargs pattern instead?

I thought in this case I have to adapt the arguements of all the other sampling startegies as well, but when I think about it again right now that's not true. So sure, I'll fix it.

constantinpape · 2022-07-20T20:05:13Z

torch_em/shallow2deep/prepare_shallow2deep.py

+    # reshape diff to image shape and apply convolution with 1-kernel
+    diff_img = torch.Tensor(diff.reshape(img_shape)[None, None])
+    kernel = torch.Tensor(np.ones(tiles_shape)[None, None])
+    diff_img_smooth = torch.nn.functional.conv2d(diff_img, weight=kernel) if len(img_shape)==2 \


I think using torch for the convolution here is a bit overkill. Two points:

Do we actually want to apply a uniform kernel here, or do we rather want to apply smoothing with a large sigma (for this can use vigra.filters.gaussianSmoothing)

If we do want the uniform kernel you can use the convolve functionality from scipy, speed doesn't matter here and it's easier to read because you don't need to convert to torch tensors: scipy.ndimage.convolve.

Sure, I'll replace torch. About the question what's better, I don't know to be honest. What do you think? Maybe it's possible to allow the smoothing function to be given as an input. Or do you think that's a bit overkill?

What do you think? Maybe it's possible to allow the smoothing function to be given as an input. Or do you think that's a bit overkill?

Maybe we can keep it a bit simpler and have a single parameter, that is either a scalar (then we interpret it as sigma and use gaussian smoothing) or a tuple (then we interpret it as tile shape and apply the uniform kernel)

constantinpape · 2022-07-20T20:08:13Z

torch_em/shallow2deep/prepare_shallow2deep.py

+    indices_per_class = [np.where(labels == class_id)[0] for class_id in range(nc)]
+
+    # get maxima of the label-prediction diff
+    max_centers = np.argsort(diff_img_smooth)[::-1]


Wouldn't we rather use a function that finds local maxima here? I think that would make more sense, because otherwise we can end up oversampling the global maximum.
There's https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.peak_local_max or vigra.analysis.localMaxima you can use for this.

Yeah, you're totally right. I could even add the min_distance parameter. Then I could make sure that there is no overlap of the tiles at all. Maybe then I can do the entire thing more efficient. I'll have a look into that :)

constantinpape · 2022-07-20T20:11:20Z

Oh, and just disregard the CI failures, this looks like a conda issue; hopefully it's transient and things will work again tomorrow.

JonasHell · 2022-07-21T13:27:27Z

I adapted all your suggestion and additionally made the code faster and easier to read.
In the current version I removed the early stopping. While trying a few things it didn't feel like it brings any speed boost. Rather the opposite was the case, it felt like the overhead of the if statements made it slower. But maybe this impression is wrong... What do you think: better keep the current easy to read version with 2 seperate for loops (where we iterate over all maxima even if we don't need all) or add the nested for loop with early stopping back in?

constantinpape · 2022-07-21T14:14:59Z

torch_em/shallow2deep/prepare_shallow2deep.py

+    forests, forests_per_stage,
+    sample_fraction_per_stage,
+    img_shape,
+    tiles_shape=[51, 51],


Do you think such a big tile shape makes sense as default?

To be honest, I don't have a real good metric for that. Just tried with 7, 25, and 51 and the DiceLoss was smallest for 51. But you already mentioned that that's not a good proxy. So I guess we can use something smaller. What would you suggest?
With 51 usually 2-3 maxima were sampled :)

with a patch size between 256 and 512

With 51 usually 2-3 maxima were sampled :)

But did it actually then sample points from all of them after class balancing?

Also, now that I think about the class balancing more here: would it maybe make sense to compute the maxima for the classes independently? I think we might run into cases where we only sample one class otherwise.

With 51 usually 2-3 maxima were sampled :)

But did it actually then sample points from all of them after class balancing?

at least one class did sample from all of them, but you are right for the others that's not guaranteed

Also, now that I think about the class balancing more here: would it maybe make sense to compute the maxima for the classes independently? I think we might run into cases where we only sample one class otherwise.

True, this could happen. I'll try to adapt it :)

but then we loose a bit of the "a human would label localy" idea. because locality is then just guaranteed in a class but not over classes. But I think this still makes sense.

constantinpape · 2022-07-21T14:20:55Z

I adapted all your suggestion and additionally made the code faster and easier to read.

Yes, looks much better now.

better keep the current easy to read version with 2 seperate for loops

Yes, let's keep the current version, it reads much easier and we don't really care about speed here.

What I am not quite sure about yet: [51, 51] sounds like a fairly large shape to take around the maxima. How did you choose it?
(I am a bit worried that given the class balancing we will end up only sampling from a single maxima with this.)

JonasHell · 2022-07-21T15:52:48Z

Finding the maxima is now done per class.
I also changed default tile size to 25. Would you prefer it to be even smaller?

constantinpape

Thanks, I think the changes make sense now, and the per class maxima should ensure a more balanced sampling of classes; and 25 sounds like a good default.

add worst_tiles sampling

92ac2a5

constantinpape reviewed Jul 20, 2022

View reviewed changes

make worst_tiles easier to read, use local maxima

aa4e1db

constantinpape reviewed Jul 21, 2022

View reviewed changes

find worst tiles per class

6c6c2bd

constantinpape approved these changes Jul 21, 2022

View reviewed changes

constantinpape merged commit 280c052 into constantinpape:more-s2d Jul 21, 2022

JonasHell deleted the more-s2d branch July 21, 2022 16:01

JonasHell restored the more-s2d branch July 22, 2022 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add worst_tiles sampling #77

add worst_tiles sampling #77

JonasHell commented Jul 20, 2022

constantinpape left a comment

constantinpape Jul 20, 2022

JonasHell Jul 21, 2022

constantinpape Jul 20, 2022

JonasHell Jul 21, 2022

constantinpape Jul 21, 2022

constantinpape Jul 20, 2022

JonasHell Jul 21, 2022

constantinpape commented Jul 20, 2022

JonasHell commented Jul 21, 2022

constantinpape Jul 21, 2022

JonasHell Jul 21, 2022

JonasHell Jul 21, 2022

constantinpape Jul 21, 2022

constantinpape Jul 21, 2022

JonasHell Jul 21, 2022

JonasHell Jul 21, 2022

JonasHell Jul 21, 2022

constantinpape commented Jul 21, 2022 •

edited

Loading

JonasHell commented Jul 21, 2022

constantinpape left a comment

add worst_tiles sampling #77

add worst_tiles sampling #77

Conversation

JonasHell commented Jul 20, 2022

constantinpape left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

constantinpape commented Jul 20, 2022

JonasHell commented Jul 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

constantinpape commented Jul 21, 2022 • edited Loading

JonasHell commented Jul 21, 2022

constantinpape left a comment

Choose a reason for hiding this comment

constantinpape commented Jul 21, 2022 •

edited

Loading