Thank you for your fantastic work and effort to evaluate zero-shot open vocabulary segmentation models properly.
I am curious about your approach to handling predictions related to the background class. Specifically, I'm interested in how you address the issue of predicting the background class, considering that you cannot directly input something like "a photo of a background" as a text embedding for probing the model.
Best regards,
Thank you for your fantastic work and effort to evaluate zero-shot open vocabulary segmentation models properly.
I am curious about your approach to handling predictions related to the background class. Specifically, I'm interested in how you address the issue of predicting the background class, considering that you cannot directly input something like "a photo of a background" as a text embedding for probing the model.
Best regards,