Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageNet Pretrained TimesFormer #5

Closed
RaivoKoot opened this issue Apr 3, 2021 · 3 comments
Closed

ImageNet Pretrained TimesFormer #5

RaivoKoot opened this issue Apr 3, 2021 · 3 comments

Comments

@RaivoKoot
Copy link

I see you have recently added the TimesFormer model to this repository. In the paper, they initialize their model weights from ImageNet pretrained weights of ViT. Does your implementation offer this too? Thanks!

@black0017
Copy link
Contributor

Hello,

Is the space attention pretrained from ViT in imagenet?

I was not aware of that. Maybe we can work out a solution by taking a pretrained Vit like this repo.

Would you be interested in something like that?

@RaivoKoot
Copy link
Author

Hey. That would definitely be good. I think most people including me are not able to make the TimesFormer converge without ViT initialization. Unfortunately, I don't have any experience in porting weights or knowing how exactly weights are organized in torch modules.

I see that the author of this repo https://github.com/m-bain/video-transformers, however, has been able to include ViT weight initialization. Maybe this helps you in case you are looking to include this in your repo. Sorry, I would help if I could 😬👍.

@black0017
Copy link
Contributor

black0017 commented Jun 30, 2021

Kinda late on this, but I tried it out and it works!

Check it out here:
https://github.com/The-AI-Summer/self-attention-cv/blob/main/examples/timesformer_vit_test.py

I managed to load 176 layers from vit. This is a pretty awesome functionality of pytorch that i had no idea about!!!

Thanks a ton for pointing out this github!!!!! @RaivoKoot

The basic idea is this:

import torch.nn as nn
from timm.models import vision_transformer

vit_model = vision_transformer.vit_base_patch16_224(pretrained=True)
model.load_state_dict(vit_model.state_dict(), strict=False)

(dont forget to $ pip install timm if tou use this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants