-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage Question #5
Comments
Hi @mranzinger Thanks and we believe RADIO is a great work! Input:
All these take 0-255 values. Resolution: BibTex: |
Okay, great. Thank you. Given that the mIOU was roughly similar at both resolutions, your model seems reasonably resilient to changes in resolution. I'll keep playing with your model. Something I definitely learned with your work is that we should have considered regular ViT as a teacher. I fully did not expect it to be so important, but you proved how valuable it is. When we started RADIO, we had no idea how difficult SAM was going to be for us. We figured "hey, it should help with segmentation", and then spent the next while trying to figure out how to integrate it without it poisoning the model. Based on the format of your paper, are you targeting ICLR? |
Thanks so much for sharing your valuable findings! I think we found something similar about poisoning. We did some preliminary analysis where we found SAM features are pretty easy to predict from other features. Also SAM took a lot of our compute and storage for this distillation purpose, and in the end didn't contribute much improvement. ViT is interesting. We are simply motivated by that it's a classification model which is different than all other teachers we considered. A good news to share is that Theia has been accepted to CoRL this year :) Sorry for not replying earlier when the paper was under review :) |
Congrats on CoRL! |
Hi @mranzinger |
Hello, excellent work!
In the readme, I don't see any reference to how inputs need to be transformed before usage. Crawling through the code, I found this:
https://github.com/bdaiinstitute/theia/blob/main/src/theia/models/backbones.py#L337-L338
So, it suggests to me that the right way to use the model is to pass it an input tensor with values between 0 and 255. Is that the correct usage?
Also, do you have any studies on the resolution interpolation ability of your model? I'm testing it out in an ADE20k semantic segmentation linear probe harness with the following:
just so that it matches our settings for AM-RADIO. I've also tried it with 224px resolution. In both cases, I'm using a sliding window.
My results:
224px: 35.61 mIOU
512px: 35.58
Also, would you be willing to update the bibtex for your reference for AM-RADIO to
I nearly missed your paper because it didn't show up in my "Cited By" section, I think because the citation wasn't complete, and I was thrilled to see your work building in the agglomerative direction.
The text was updated successfully, but these errors were encountered: