-
Notifications
You must be signed in to change notification settings - Fork 36
Add tensor parallelism support for HF wrapper forward and lm_eval integration #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
1b60413
added model and sequence parallel to forward
bigximik 5eac621
added asserts for pipeline and sequence parallel to be 1 as not suppoβ¦
bigximik d882d7b
changed logits gathering for only tp and stp dimensions
bigximik b9851c2
added more broadcast primitives and changed _object_to_tensor to be fβ¦
bigximik 750ea1c
added support to TP in forward for generate
bigximik 0f196da
added suppport to other parallelism additionally to data parallelism
bigximik 82b901d
removed out of date comment
bigximik 543f3d6
added extended wait in key places, fix to right batch config, fix movβ¦
bigximik be8050c
added more docs
bigximik d51f584
Merge branch 'main' of github.com:ServiceNow/Fast-LLM into denis/evalβ¦
bigximik f48574a
Merge branch 'main' of github.com:ServiceNow/Fast-LLM into denis/evalβ¦
bigximik 32ea639
Merge branch 'main' into denis/evaluate_tp
jlamypoirier File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout
. Unnecessary long timeouts are often bad, so I recommend making it optional (default none) and enabling only as needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context
Conceptually, places like
worker_forward
ordata-parallel_worker
wait primitives should only exit under three conditions:However, this is not how
torch.distributed
works. It is designed for more or less synchronous communication, while here we are trying to adapt it for asynchronous communication.Problem
If we set the default timeout to
None
, users will end up seeing random timeouts in different places.Discussion
A better long-term solution would be to use a distributed messaging framework that is more appropriate for sending work and finish messages. However, introducing another communication layer into
fast_llm
is likely outside the scope of this PR.Proposal