You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I enables the NVLS function on 4 H100 servers and each server have 8 GPUs. But there is no any acceleration, the all_reduce busWidth is 180GB/s.
Then I tested the NVLS on 2 servers, the all_reduce busWidth is 310GB/s.
Why NVLS not available on 4 or more machines?
Thanks.
The text was updated successfully, but these errors were encountered:
NVLS is an acronym for NVLink SHARP.
NVLink SHARP is an AllReduce offload system over NVLink only.
Currently NVLink is only supported within a single node (e.g. 8 or 16 GPUs).
For scaling beyond a single node, then IB or RoCE is required.
There is a similar AllReduce offload system for IB networks which we now call IB SHARP.
The performance reported by the NCCL tests on 2 nodes can be misleading when comparing it to >2 nodes unless you set NCCL_ALGO=RING when testing on 2 nodes.
How many NICs are there per node and what speed are they?
Is this an IB or RoCE network ?
What is the node architecture?
Perhaps you can attach the NCCL_DEBUG=INFO log for us to examine.
Thanks for your reply. I tested it again, NVLS is works for nodes <4. But there is a new question:
I am using 8 dual port CX7 and each port has 200Gb/s on one node. The network is ROCE mode, I found the nccl-test will hang when nodes >= 4 and set NCCL_NVLS_ENABLE=1 and NCCL_ALGO=NVLSTree.
I enables the NVLS function on 4 H100 servers and each server have 8 GPUs. But there is no any acceleration, the all_reduce busWidth is 180GB/s.
Then I tested the NVLS on 2 servers, the all_reduce busWidth is 310GB/s.
Why NVLS not available on 4 or more machines?
Thanks.
The text was updated successfully, but these errors were encountered: