New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No acceleration compared with timm vit block #410
Comments
It seems like you are using fp32, could you try fp16? |
Sure, I tested it with fp16. with backward=False
with backward=True
The test environment is: |
Do you have official test result of ViT between native pytorch and lightseq? |
I used a single A100-80G GPU |
Thanks for testing. The lightseq result is the same with mine, around 9.5 ms I will close this issue. BTW, do you have plan to implement the flashAttention in lightseq? |
hi @woolpeeker I'm also trying to integrate lightseq into my project with timm ViT model. After switching to the lightseq layer, can we still load the same pretrained model weights? |
yes, you just need to organize the weight tensor following lightseq' docs. I have test it. the result can align to timm |
@woolpeeker Thanks! This saves much time for me. |
hi @woolpeeker I followed the doc and integrate lightseq transformer layer into timm ViT but the speed improvement is trivial. I'm wondering if you can observe any speedup as I'm using same GPU (A100) and the above snippet code gives me exactly same results as yours. I guess the relative improvement heavily depends on the batch size or other hyperparameters. BTW, switching to FlashAttention could give me a speed boost of around 10% for the whole timm ViT model and for a single attention block, the speedup is 45%. I hope this could help. |
I use the code below to test the vit block speed. The output shows the speed is almost the same between pytorch and lightseq
Did I missed something?
Output for forward only:
The output for forward + backward:
The text was updated successfully, but these errors were encountered: