Question about Referring QAs! #9

zhang-jr · 2023-10-31T07:12:40Z

Thank you for sharing. This question has also been raised in the context of LRV.

In the process of utilizing a visual encoder, resizing images becomes necessary. Since there are no predefined pre-processing steps, a concern arises: how can we maintain consistency in the image coordinates within referring questions and answers, especially when dealing with resized images across different models?

zhang-jr · 2023-11-01T12:01:44Z

Is it handled in the same way as in LLaVA?

BoyaWu10 · 2023-11-02T03:24:53Z

Hi @zhang-jr, thanks for raising this issue. The coordinates in referring QAs are based on the size of original images.

In LLaVA, images are padded to square before the visual encoder (by setting --image_aspect_ratio pad), the coordinates will also be changed due to the padding behavior. Here we provide a script to help expand the bounding boxes to square as well. After running the script, you can feed the output data to LLaVA.

For other image resizing strategies, I think it will be required to preprocess the referring QAs accordingly, just as the one we do in above steps.

zhang-jr · 2023-11-03T01:43:13Z

Thanks for your reply! It really helps a lot.

zhang-jr closed this as completed Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Referring QAs! #9

Question about Referring QAs! #9

zhang-jr commented Oct 31, 2023

zhang-jr commented Nov 1, 2023

BoyaWu10 commented Nov 2, 2023 •

edited

zhang-jr commented Nov 3, 2023

Question about Referring QAs! #9

Question about Referring QAs! #9

Comments

zhang-jr commented Oct 31, 2023

zhang-jr commented Nov 1, 2023

BoyaWu10 commented Nov 2, 2023 • edited

zhang-jr commented Nov 3, 2023

BoyaWu10 commented Nov 2, 2023 •

edited