Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is allgather's busbw a little worse than allreduce/reducescatter for the same nccl environment variables #1281

Open
pkuleo opened this issue May 10, 2024 · 1 comment

Comments

@pkuleo
Copy link

pkuleo commented May 10, 2024

Why is allgather's busbw a little worse than allreduce/reducescatter for the same environment variables (e.g. same number of channels)?

For example, the result of nccl-tests on H100, reducescatter's busbw is 360GBps, and allgather's busbw is 350GBps.

Does this have anything to do with the processing efficiency of the kernel functions? Intuitively allgather only needs copy, but reducescatter needs copy and compute (reduce), shouldn't allgather be faster?

I'm not good at kernel performance analysis, so I hope you can point out where I'm wrong. Thanks.

@jbachan
Copy link
Collaborator

jbachan commented May 10, 2024

There isn't a known reason for this. The CUDA compiler may make better choices when building reduce_scatter than allgather. We frequently deal with innocuous changes to the code regressing some ops but improving others. So we just see it as noise in the compiler and move on. The only interesting difference between the two is that allgather is only compiled for byte elements while reduce_scatter has a version compiled for every datatype (and reduction op). Even though the hot path for well aligned (16 byte) data should be equivalent between the two, the slow path for allgather will be much worse than the slow path for reduce_scatter f32, since the latter can assume 4 byte alignment while the former cannot. While the slow path isn't taken in nccl-tests, it might be bloating the icache or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants