Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about image transformation: short edge is still 384 for the fine-tuning task? #14

Closed
Jxu-Thu opened this issue Jul 8, 2021 · 2 comments

Comments

@Jxu-Thu
Copy link

Jxu-Thu commented Jul 8, 2021

Thanks for your great codes!
I carefully read your paper.

(in your paper) We resize the shorter edge of input images to 384 and limit the longer edge to under 640 while preserving the aspect ratio. This resizing scheme is also used during object detection in other VLP models, but with a larger size of the shorter edge (800). Patch projection of ViLT-B/32 yields 12 × 20 = 240 patches for an image with a resolution of 384*640.

However, I find that the "image_size=384" for all downstream tasks in this codes?

Would it have an effect on the performance of downstream tasks? At least with a shorter edge 800 can greatly increase the length of the sequence. So It should have a smaller batch size when using "shorter edge 800"

@dandelin
Copy link
Owner

dandelin commented Jul 9, 2021

We do use the shorter size of 384 for downstream tasks.
def config() is the default configuration, and the values in the configuration are used as-is unless named configs or command-line modifications do not modify them.

You can check the final configuration of an execution by print_config option.

@Jxu-Thu
Copy link
Author

Jxu-Thu commented Jul 11, 2021

Thanks.

@Jxu-Thu Jxu-Thu closed this as completed Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants