Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DDP #81

Merged
merged 7 commits into from
Apr 2, 2023
Merged

Support DDP #81

merged 7 commits into from
Apr 2, 2023

Conversation

awaelchli
Copy link
Contributor

Fixes #80

Note, while we can enable DDP here, training/finetuning the model with DDP won't work on most systems as the model/optimizer/gradients simply can't fit in memory. This PR is just for correctness and because of the question that popped up in #80.

lit_llama/model.py Outdated Show resolved Hide resolved
finetune.py Show resolved Hide resolved
@lantiga
Copy link
Collaborator

lantiga commented Apr 2, 2023

I think the original request was referring to multi-gpu, be it ddp, fsdp or deepspeed.

I’m in favor of merging this after the rope fix, but I’d probably avoid showing ddp since it can’t possibly work, and rather make sure fsdp works instead.

WDYT?

@lantiga lantiga merged commit 2aef01d into main Apr 2, 2023
@lantiga lantiga deleted the ddp-support branch April 2, 2023 19:19
timothylimyl referenced this pull request in timothylimyl/lit-llama-qa May 21, 2023
gkroiz added a commit to gkroiz/lit-llama that referenced this pull request May 22, 2023
)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to finetune with the multi-GPU
2 participants