Notable performance difference between nnUNet v1 and v2 #1779
Replies: 3 comments 5 replies
-
Hey, we recently had a problem where predicted logits could overflow resulting in erroneously predicted background in some locations. This manifested as holes in the segmentations. This issue was also related to the dataset having many classes (this was totalsegmentator). We pushed a new release last week which addressed that. Can you please try that? Just rerun the validation/inference, no need to retrain! |
Beta Was this translation helpful? Give feedback.
-
Hi, apologies for the delay, I ran into some issue with using Upon rereading I noticed that instead of pulling at v2.2, like I thought you meant, you meant the actual most recent commits in the git ( commit 2bc504d, for posterity). Unfortunately, the pipeline to rebuild the Docker->Singularity image takes a long while to complete, so I can't quite test this until next week. |
Beta Was this translation helpful? Give feedback.
-
Hey @MathijsdeBoer , apologies for the delay in our response. I have been on parental leave this last month. Have you been able to run additional experiments or were you able to narrow down the problem in some other way? |
Beta Was this translation helpful? Give feedback.
-
Hey there!
I've been using nnUNet for a little while and am in the process of moving some earlier v1 based models into v2 for easier distributing. Virtually all my transferred datasets result in models with statistically identical performance, except one. This dataset consists of a TOF-MRA, with individual arterial segments labelled, resulting in some 21 classes.
Whereas the v1 dataset showed a remarkably good performance, I have been unable to replicate this performance in the v2 environment. I got close by attempting a two-stage variation, where the first model would predict a binary foreground/background label, and feeding that into the next stage. When using a binary mask based on the manual GT in the second stage, I got excellent performance, roughly on par with v1, but when I switched this binary mask with the predictions of the previous model, performance dropped to the regular v2 model level again. Finally, I attempted to create a similar binary mask using some basic automatic seedpoint selection and region growing techniques. Unfortunately still no dice, if you'll excuse the pun.
v1 Base: standard nnUNet v1 preprocessing and training
v2 Base: standard nnUNet v2 preprocesssing and training, data directly copied from v1 Task
v2 TwoStage w/ GT: Model with two channel input, one image, and one is the manual GT which has been binarized into one overall class
v2 TwoSTage w/ previous model: Same as w/ GT, but with the output from a single-class nnU-Net instead
v2 a-priori: Two channel input, one with region growing label in second channel to mimic TwoStage w/ GT
I'm running out of things to try, and my one remaining theory is that the v1 model just had a very lucky initialization. I've not tested this theory yet by rerunning training on a v1 model and collecting the metrics. I was wondering if there are any other ideas by people who are a little more intimately familiar with both codebases.
Beta Was this translation helpful? Give feedback.
All reactions