Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size 14x14 to 16x16 patch interpolation for smaller EVA 2 models #153

Open
tobiasvanderwerff opened this issue Apr 17, 2024 · 2 comments
Open

Comments

@tobiasvanderwerff
Copy link

Hi,

First of all, thank you for the great work you've published. I am trying to train EVA 2 on a custom object detection dataset and noticed that the *_p14to16 pre-trained models are only available for EVA-B and EVA_L (in this table), but not for the other model sizes. I am trying to use the smaller EVA S and/or Ti models instead. As far as I understand, the conversion from p14 to p16 involves a linear interpolation of the pos_embed parameters, as mentioned here. This would mean that it could also be applied as a post-processing step of the checkpoint file for the smaller models.

I have tried to do the interpolation myself, by using the interpolate_patch_14to16.py script. However, this does not seem to work for the EVA 2 checkpoints, because of an error in accessing key values in the checkpoint:

Traceback (most recent call last):
  File "/home/tobias/EVA/EVA-01/eva/interpolate_patch_14to16.py", line 53, in <module>
    patch_embed = checkpoint["model"]['patch_embed.proj.weight']
KeyError: 'model'

I am not quite sure if applying the script would be the right approach to take or if another approach is necessary. Could you provide any feedback on this? Thanks in advance!

@tobiasvanderwerff
Copy link
Author

I think I found a decent solution. The interpolate_patch_14to16.py script can be modified in the following way:

  • The p14 checkpoints contain the weights under the module key, not model. I.e. use checkpoint['module'] instead of checkpoint['model'].
  • Bicubic interpolation does not work for half float16 precision. As far as I can see, this can be solved by converting to float32 as an intermediary step. I.e.:
        pos_tokens = pos_tokens.float()  # convert to float32 because float16 is not supported for bicubic interpolation
        pos_tokens = torch.nn.functional.interpolate(
            pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
        pos_tokens = pos_tokens.half()  # convert back to float16

@matteot11
Copy link

Hi @tobiasvanderwerff,
I think the same holds for patch_embed:

patch_embed = torch.nn.functional.interpolate(patch_embed.float(), size=(16, 16), mode='bicubic', align_corners=False)

While there is already the .float(), making the interpolate correctly work, the .half()to convert back to float16 is missing. Btw, thanks for the hint!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants