You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My question is about how which patches are dropped from the image with the DINO model. It looks like in the code in evaluate.py on line 132head_number = 1. I want to understand the reason why this number was chosen (the other params used to index the attention maps seem to make sense). Wouldn't averaging the attention maps across heads give you better segmentation?
Thanks,
Ravi
The text was updated successfully, but these errors were encountered:
Thanks for your interest and positive comments. Sorry for the delayed response.
We follow the method used in the DINO code and the selected head seems to result in best segmentations for selected data (measured qualitatively for randomly selected visualizations). It is similar for a different task: quantitative evaluation with PASCAL mIoU.
Average maps reduce performance. I think averaging the attention maps reduces foreground segmentation quality since some heads focus on certain parts of the background (e.g. grass in some images).
Hi, thanks for the amazing paper.
My question is about how which patches are dropped from the image with the DINO model. It looks like in the code in
evaluate.py
on line 132head_number = 1
. I want to understand the reason why this number was chosen (the other params used to index the attention maps seem to make sense). Wouldn't averaging the attention maps across heads give you better segmentation?Thanks,
Ravi
The text was updated successfully, but these errors were encountered: