Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why NVLS not available on 4 or more machines? #1031

Closed
holmes313 opened this issue Oct 21, 2023 · 2 comments
Closed

Why NVLS not available on 4 or more machines? #1031

holmes313 opened this issue Oct 21, 2023 · 2 comments

Comments

@holmes313
Copy link

I enables the NVLS function on 4 H100 servers and each server have 8 GPUs. But there is no any acceleration, the all_reduce busWidth is 180GB/s.
Then I tested the NVLS on 2 servers, the all_reduce busWidth is 310GB/s.
Why NVLS not available on 4 or more machines?
Thanks.

@AddyLaddy
Copy link
Collaborator

NVLS is an acronym for NVLink SHARP.
NVLink SHARP is an AllReduce offload system over NVLink only.
Currently NVLink is only supported within a single node (e.g. 8 or 16 GPUs).
For scaling beyond a single node, then IB or RoCE is required.
There is a similar AllReduce offload system for IB networks which we now call IB SHARP.

The performance reported by the NCCL tests on 2 nodes can be misleading when comparing it to >2 nodes unless you set NCCL_ALGO=RING when testing on 2 nodes.

How many NICs are there per node and what speed are they?
Is this an IB or RoCE network ?
What is the node architecture?

Perhaps you can attach the NCCL_DEBUG=INFO log for us to examine.

@holmes313
Copy link
Author

Thanks for your reply. I tested it again, NVLS is works for nodes <4. But there is a new question:
I am using 8 dual port CX7 and each port has 200Gb/s on one node. The network is ROCE mode, I found the nccl-test will hang when nodes >= 4 and set NCCL_NVLS_ENABLE=1 and NCCL_ALGO=NVLSTree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants