Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Referring QAs! #9

Closed
zhang-jr opened this issue Oct 31, 2023 · 3 comments
Closed

Question about Referring QAs! #9

zhang-jr opened this issue Oct 31, 2023 · 3 comments

Comments

@zhang-jr
Copy link

Thank you for sharing. This question has also been raised in the context of LRV.

In the process of utilizing a visual encoder, resizing images becomes necessary. Since there are no predefined pre-processing steps, a concern arises: how can we maintain consistency in the image coordinates within referring questions and answers, especially when dealing with resized images across different models?

@zhang-jr
Copy link
Author

zhang-jr commented Nov 1, 2023

Is it handled in the same way as in LLaVA?

@BoyaWu10
Copy link
Collaborator

BoyaWu10 commented Nov 2, 2023

Hi @zhang-jr, thanks for raising this issue. The coordinates in referring QAs are based on the size of original images.

In LLaVA, images are padded to square before the visual encoder (by setting --image_aspect_ratio pad), the coordinates will also be changed due to the padding behavior. Here we provide a script to help expand the bounding boxes to square as well. After running the script, you can feed the output data to LLaVA.

For other image resizing strategies, I think it will be required to preprocess the referring QAs accordingly, just as the one we do in above steps.

@zhang-jr
Copy link
Author

zhang-jr commented Nov 3, 2023

Thanks for your reply! It really helps a lot.

@zhang-jr zhang-jr closed this as completed Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants