-
Notifications
You must be signed in to change notification settings - Fork 4.7k
fix: skip compressed allreduce for empty tensors #7769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: skip compressed allreduce for empty tensors #7769
Conversation
When the buffer has numel==0, scaling divides by sqrt(0) and the\npack/unpack path becomes undefined. Early-return in the compressed\nallreduce backends and clear error buffers to avoid NaNs and useless\ncommunication.\n\nTest plan:\n- Not run (torch not available in this environment) Signed-off-by: T1mn <136770748@qq.com>
8cfe2c8 to
84be831
Compare
|
Hi @T1mn, thank you for your fix! As we need the same code in four files, can you create def check_and_handle_empty_buffer(
buffer_m: torch.Tensor,
original_shape: torch.Size,
original_size: int,
worker_error: torch.Tensor,
server_error: torch.Tensor,
) -> Optional[torch.Tensor]:
if original_size == 0:
if worker_error.numel():
worker_error.zero_()
if server_error.numel():
server_error.zero_()
if len(original_shape) > 1:
return buffer_m.reshape(original_shape)
return buffer_m
return NoneThen you can do result = check_and_handle_empty_buffer(
buffer_m, original_shape, original_size, worker_error, server_error
)
if result is not None:
return result |
725f0ed to
94a093d
Compare
|
Hi @tohtana , thanks for the detailed suggestion. I’ve followed your guidance and extracted the empty-buffer handling into |
|
Thank you @T1mn! The looks good. Can you fix formatting? |
94a093d to
a9bcbb3
Compare
factor the empty-buffer early return into a small helper so the four backends stay consistent. Signed-off-by: T1mn <136770748@qq.com>
a9bcbb3 to
2c6e42c
Compare
|
I’ve run formatting and pushed the updated commit. Please take another look when you have a chance.
|
|
Hi @tohtana , I may have missed some formatting details, I ran the formatter and pushed the update. |
Handle empty buffers in compressed allreduce by early-return and clearing error buffers to avoid NaNs and needless communication. --------- Signed-off-by: T1mn <136770748@qq.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Signed-off-by: Phalani Paladugu <mailofphalani@gmail.com>
Handle empty buffers in compressed allreduce by early-return and clearing error buffers to avoid NaNs and needless communication.