-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Make LLaMA torch.compile
compatible
#103
Conversation
Co-authored-by: Luca Antiga <luca@lightning.ai>
I'm not that worried about the added conditionals, they look localized to calling rope. |
Something is surely off. Compilation is almost never hurting. At least because it optimizes the graph size thereby making smaller which makes CUDA allocs faster, etc, etc. Super curious what is not working? For me in FSDP mode compilation (without activations checkpointing which doesn't work with compilation) |
Needs to be updated after #174 lands. |
What's the intention with this one? |
I'm closing the PR because adding torch.compile in our scripts is not worth it at the moment. In summary:
|
@awaelchli did you try |
@ezyang Thanks for your suggestion. Yes I did try it. For the generate.py inference script, it took a very long time to compile the iterations, even with dynamic=True (378.39 sec total, 0.13 tokens/sec), which is 100x slower compared to no compilation. And for the finetune_adapter.py script with dynamic=True I got a strange error that I didn't understand:
(using torch nightly) If it's helpful, I could produce a minimized version for further debugging. |
I'm happy to run the lit-llama repo, but I was unable to get the OpenLLAMA weight download instructions working
If you could try running TORCHDYNAMO_REPRO_AFTER="dynamo" this may or may not generated a minified repro; it's often hard to say. Alt, fix the model setup instructions (or tell me to go get the Meta weights, I probably can scrounge them up somewhere ;) |
I'm going to track lit llama compilation problems separately, but these fixes are of interest: |
Builds on top of #100
Attempt to address #62 by introducing a switch between the complex and non-complex variant of the rope implementation:
Drawback: The switch makes the code less readable and introduces indirection. The reader may struggle understanding the nuance here. If this is going too much against our minimalistic principles, we should drop this idea. Suggestions welcome!
Adapter
no compile + dynamic padding: between 90ms and 160ms
no compile + full padding: 352 ms
torch.compile + full padding: 118 ms