-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[distributed][Tensor Parallelism] Implement early transforms for column-wise and row-wise linear and embedding #410
Conversation
60844aa
to
f2278ed
Compare
0ac5f94
to
7e79451
Compare
7e79451
to
98b3468
Compare
25b2d14
to
f724a88
Compare
Linear
b5a6e02
to
19a01fd
Compare
19a01fd
to
6db1ebf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @crcrpar
Mostly nitpicks, it was great fun reviewing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supergood, LGTM. Very excited to see the test_tensor_parallel_both_column_and_row
be the first actual example of composing early_transforms
!
I added a few minor nits.
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
This is to avoid passing preprocessed input into another ops while they are supposed to take the original input. For example, suppose we have two embeddings and one of them is column-parallel and the other not, the previous implementation modified the input regardless of embedding's parallelism. Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Ship it! 🚀
this implements a trace transform that converts one or more linear and/or embedding layers into column-wise or row-wise tensor parallel ones by (1) sharding their weight and bias and (2) inserting needed communication and/or scattering before and/or after the modified layers.
Out of four supported ops, row-wise parallel linear would lead to a
BoundSymbol
modification. The change caused is to omit the bias term and that bias is added to the result of communication (after post-processing).example
cc @Borda @apaz-cli @carmocca @awaelchli @crcrpar