Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could I carefully control which NIC to use when running ring-based collective operation? #687

Open
TarzanZhao opened this issue May 31, 2022 · 17 comments

Comments

@TarzanZhao
Copy link

TarzanZhao commented May 31, 2022

I want to run multiple broadcasts concurrently. They will all send data into one host with multiple NICs. I do not want one single NIC to be the bottleneck. So I prefer to let these broadcasts use different NICs. How could I carefully control this?

Besides, is the order for broadcast to send data decided at creation of communicator or during running on the fly? This is also important for me to configure which NIC is used.

Thanks!

@sjeaugey
Copy link
Member

I'm not sure I understand the problem. Are all GPUs part of the communicator?

@TarzanZhao
Copy link
Author

Each broadcast has its own communicator that involves all devices used in this broadcast.

@TarzanZhao
Copy link
Author

One simple example: we have 2 hosts, each host has two devices, each device has its own NIC. Broadcast A sends data from (host0, device0) to (host1, device0) and (host1, device1), whereas Broadcast B sends data from (host0, device1) to (host1, device0) and (host1, device1). If these two broadcasts both use the first NIC corresponding to the first device in host1, then these two broadcast could not run concurrently. That will be slow. So I want to let two broadcasts use different NICs when entering host1.

@sjeaugey
Copy link
Member

sjeaugey commented May 31, 2022

I see. With a recent NCCL, if both GPUs are at the same distance of both NICs, I think each GPU would use a different NIC. Could you provide the node topology using NCCL_TOPO_DUMP_FILE=system.txt?

Edit: that might work on the "source" node but on the "destination" node, it would not. One trick would be to force all communicators to use both NICs. For that, you could edit the topology to set the network speed to half of what it is.

@TarzanZhao
Copy link
Author

sry, I do not understand this trick. What does "force all communicators to use both NICs" means? I mean in a single broadcast we will only have one ring and thus use only one NIC.

Besides, what does "edit the topology" mean? How could I edit this?

I am designing algorithm but have not started coding and doing experiments yet. So I do not have topology to be exported.

@sjeaugey
Copy link
Member

I thought you wanted to create two communicators, each having 3 ranks, one GPU on the "source" node and the two GPUs on the "destination" node, then use ncclBroadcast. Is that right or did I misunderstand?

@TarzanZhao
Copy link
Author

Yes, you exactly understand my example. My general question is how to finely control used NIC in a nccl collective operation.

@sjeaugey
Copy link
Member

You can't. But we may be able to find tricks to still get good performance. Can you run the all_reduce_perf test on all 4 GPUs setting NCCL_TOPO_DUMP=system.txt and post the result here. That would tell me which strategy is the best.

@TarzanZhao
Copy link
Author

Thanks! This is just an imagined example.

@kimtaehoon-dev
Copy link

kimtaehoon-dev commented Nov 8, 2022

@sjeaugey hello !
while reading this issues, I have read your comment. you say "I see. With a recent NCCL, if both GPUs are at the same distance of both NICs, I think each GPU would use a different NIC"

But.. when i test some nccl-tests it didn't work like that.. let me describe my test..

First, Test environment

- 2 GPU node (each node have 8 gpu card)
- Each node have 8 infiniband hca(connectx-6)
- nccl 2.11.4-1+cuda11.4
- 

For test, i just set SR-IOV enable and add 4 VF for 1 PF. this is lspci result

$ lspci | grep Mella
0e:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
11:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
51:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
51:00.1 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
51:00.2 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
51:00.3 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
51:00.4 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
52:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
89:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
8c:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
a7:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
a7:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
c6:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
c9:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
  • 51:00.0 device is PF (device name mlx5_0)
  • 51:00.1 device is VF (device name mlx5_10)
  • 51:00.2 device is VF (device name mlx5_11)
  • 51:00.3 device is VF (device name mlx5_12)
  • 51:00.4 device is VF (device name mlx5_13)

and this is nvidia-smi topo -m result

	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	mlx5_0	mlx5_1	mlx5_2	mlx5_3	mlx5_4	mlx5_5	mlx5_6	mlx5_7	mlx5_8	mlx5_9	mlx5_10mlx5_11	mlx5_12	mlx5_13	CPU Affinity	NUMA Affinity
GPU0	 X 	NV12	NV12	NV12	NV12	NV12	NV12	NV12	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	0-63,128-191	0
GPU1	NV12	 X 	NV12	NV12	NV12	NV12	NV12	NV12	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	0-63,128-191	0
GPU2	NV12	NV12	 X 	NV12	NV12	NV12	NV12	NV12	PXB	PXB	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	PXB	PXB	0-63,128-191	0
GPU3	NV12	NV12	NV12	 X 	NV12	NV12	NV12	NV12	PXB	PXB	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	PXB	PXB	PXB	PXB	0-63,128-191	0
GPU4	NV12	NV12	NV12	NV12	 X 	NV12	NV12	NV12	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	64-127,192-254	1
GPU5	NV12	NV12	NV12	NV12	NV12	 X 	NV12	NV12	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	64-127,192-254	1
GPU6	NV12	NV12	NV12	NV12	NV12	NV12	 X 	NV12	SYS	SYS	SYS	SYS	PXB	PXB	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS	64-127,192-254	1
GPU7	NV12	NV12	NV12	NV12	NV12	NV12	NV12	 X 	SYS	SYS	SYS	SYS	PXB	PXB	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS	64-127,192-254	1
mlx5_0	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	 X 	PIX	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	PIX	PIX	PIX	PIX
mlx5_1	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	PIX	 X 	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	PIX	PIX	PIX	PIX
mlx5_2	PXB	PXB	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	 X 	PXB	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE
mlx5_3	PXB	PXB	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	PXB	 X 	SYS	SYS	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE
mlx5_4	SYS	SYS	SYS	SYS	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	 X 	PXB	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS
mlx5_5	SYS	SYS	SYS	SYS	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	PXB	 X 	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS
mlx5_6	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	 X 	PIX	NODE	NODE	SYS	SYS	SYS	SYS
mlx5_7	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	PIX	 X 	NODE	NODE	SYS	SYS	SYS	SYS
mlx5_8	SYS	SYS	SYS	SYS	PXB	PXB	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	 X 	PXB	SYS	SYS	SYS	SYS
mlx5_9	SYS	SYS	SYS	SYS	PXB	PXB	NODE	NODE	SYS	SYS	SYS	SYS	NODE	NODE	NODE	NODE	PXB	 X 	SYS	SYS	SYS	SYS
mlx5_10	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	PIX	PIX	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PIX	PIX	PIX
mlx5_11	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	PIX	PIX	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	PIX	 X 	PIX	PIX
mlx5_12	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	PIX	PIX	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	PIX	PIX	 X 	PIX
mlx5_13	NODE	NODE	PXB	PXB	SYS	SYS	SYS	SYS	PIX	PIX	NODE	NODE	SYS	SYS	SYS	SYS	SYS	SYS	PIX	PIX	PIX	 X

If i run nccl-tests... (command is below)

mpirun -v -H {{ node ip }}:4,{{ node ip }}:4 -map-by slot --mca btl ^openib --mca btl_tcp_if_include bond0 -x NCCL_IB_HCA==mlx5_10:1,mlx5_11:1,mlx5_12:1,mlx5_13:1 -x NCCL_DEBUG=INFO \
{{ nccl-tests path }}/build/all_reduce_perf -b 10G -e 20G -f 2 -c 0 -n 20 -w 5 -t 1 -g 1

I hope 4 process on each node just use each of infiniband hca mlx5_10, 11, 12, 13...(not duplicated. e.g. process-A use mlx5_10, process-B use mlx5_11.. like this) but only mlx5_10 is used !
this is nccl standard output

# nThread 1 nGpus 1 minBytes 10737418240 maxBytes 21474836480 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 0 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid 985924 on cosmos-hpc-100a45 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  1 Group  0 Pid 985925 on cosmos-hpc-100a45 device  1 [0x0a] NVIDIA A100-SXM4-80GB
#  Rank  2 Group  0 Pid 985926 on cosmos-hpc-100a45 device  2 [0x44] NVIDIA A100-SXM4-80GB
#  Rank  3 Group  0 Pid 985927 on cosmos-hpc-100a45 device  3 [0x4a] NVIDIA A100-SXM4-80GB
#  Rank  4 Group  0 Pid 3138488 on cosmos-hpc-100a55 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  5 Group  0 Pid 3138489 on cosmos-hpc-100a55 device  1 [0x0a] NVIDIA A100-SXM4-80GB
#  Rank  6 Group  0 Pid 3138490 on cosmos-hpc-100a55 device  2 [0x44] NVIDIA A100-SXM4-80GB
#  Rank  7 Group  0 Pid 3138491 on cosmos-hpc-100a55 device  3 [0x4a] NVIDIA A100-SXM4-80GB
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO NET/Plugin : Plugin load returned 17 : libnccl-net.so: cannot open shared object file: No such file or directory.
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO Using network IB
NCCL version 2.11.4+cuda11.4
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO Using network IB
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO Using network IB
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO Using network IB
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO Using network IB
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO Using network IB
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO Using network IB
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO Using network IB
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Trees [0] 3/6/-1->2->-1 [1] 3/-1/-1->2->6
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Trees [0] 0/-1/-1->3->2 [1] 0/-1/-1->3->2
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 00/02 :    0   3   6   5   4   7   2   1
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 01/02 :    0   3   6   5   4   7   2   1
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Trees [0] 1/-1/-1->0->3 [1] 1/-1/-1->0->3
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Trees [0] 7/-1/-1->6->2 [1] 7/2/-1->6->-1
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Trees [0] 5/-1/-1->4->7 [1] 5/-1/-1->4->7
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Trees [0] -1/-1/-1->5->4 [1] -1/-1/-1->5->4
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 7[4a000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 7[4a000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 3[4a000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 00 : 0[7000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 3[4a000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 00 : 4[7000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 01 : 0[7000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 00 : 3[4a000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 01 : 3[4a000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Channel 00 : 1[a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 01 : 4[7000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 00 : 7[4a000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 01 : 7[4a000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Channel 00 : 5[a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Channel 01 : 1[a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Channel 01 : 5[a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Connected all rings
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 00 : 0[7000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Connected all rings
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 01 : 0[7000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 00 : 4[7000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 01 : 4[7000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Connected all rings
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 2[44000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 2[44000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Connected all rings
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Connected all rings
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 2[44000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 6[44000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 2[44000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Connected all rings
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 6[44000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Connected all rings
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Connected all trees
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Connected all rings
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 6[44000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 6[44000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 6[44000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 6[44000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 2[44000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 00 : 3[4a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 2[44000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 2[44000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 01 : 3[4a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 2[44000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 6[44000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Connected all trees
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 00 : 7[4a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 6[44000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 00 : 3[4a000] -> 2[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 01 : 7[4a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 01 : 3[4a000] -> 2[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 00 : 7[4a000] -> 6[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 01 : 7[4a000] -> 6[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO comm 0x7fed98000fa0 rank 6 nranks 8 cudaDev 2 busId 44000 - Init COMPLETE
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO comm 0x7f038c000fa0 rank 4 nranks 8 cudaDev 0 busId 7000 - Init COMPLETE
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO comm 0x7fc430000fa0 rank 7 nranks 8 cudaDev 3 busId 4a000 - Init COMPLETE
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO comm 0x7f689c000fa0 rank 5 nranks 8 cudaDev 1 busId a000 - Init COMPLETE
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Connected all trees
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Connected all trees
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO comm 0x7fde18000fa0 rank 3 nranks 8 cudaDev 3 busId 4a000 - Init COMPLETE
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO comm 0x7fbb68000fa0 rank 1 nranks 8 cudaDev 1 busId a000 - Init COMPLETE
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO comm 0x7f4698000fa0 rank 2 nranks 8 cudaDev 2 busId 44000 - Init COMPLETE
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO comm 0x7f0410000fa0 rank 0 nranks 8 cudaDev 0 busId 7000 - Init COMPLETE
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO Launch mode Parallel

and this is nccl graph log

NCCL version 2.11.4+cuda11.4
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO ==========================================
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO ==========================================
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO ==========================================
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO ==========================================
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO ==========================================
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO ==========================================
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO ==========================================
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO ==========================================
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Tree 0 : 2 -> 3 -> 0/-1/-1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Tree 1 : 2 -> 3 -> 0/-1/-1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Ring 00 : 0 -> 3 -> 6
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Ring 01 : 0 -> 3 -> 6
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Ring 00 : 2 -> 1 -> 0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Ring 01 : 2 -> 1 -> 0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Tree 0 : -1 -> 2 -> 3/6/-1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Tree 1 : 6 -> 2 -> 3/-1/-1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Ring 00 : 7 -> 2 -> 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Ring 01 : 7 -> 2 -> 1
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Ring 00 : 5 -> 4 -> 7
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Ring 01 : 5 -> 4 -> 7
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Ring 00 : 1 -> 0 -> 3
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Ring 01 : 1 -> 0 -> 3
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Ring 00 : 6 -> 5 -> 4
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Ring 01 : 6 -> 5 -> 4
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Tree 0 : 2 -> 6 -> 7/-1/-1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Tree 1 : -1 -> 6 -> 7/2/-1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Ring 00 : 3 -> 6 -> 5
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Ring 01 : 3 -> 6 -> 5
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Tree 0 : 6 -> 7 -> 4/-1/-1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Tree 1 : 6 -> 7 -> 4/-1/-1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Ring 00 : 4 -> 7 -> 2
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Ring 01 : 4 -> 7 -> 2

Purpose of this test is... I just check that when multiple process share one infiniband device(PF) if i make VFs via SR-IOV and assign that VFs for each process, it can make more good performance result than just share a PF

thank you, i hope your answer..

@sjeaugey
Copy link
Member

sjeaugey commented Nov 8, 2022

I'm not sure how your experiment relates to my comment. You have 2 NICs for 2 GPUs so each GPU would use a different NIC by default, e.g. GPU 0 would use mlx5_2 and GPU 1 would use mlx5_3. This has nothing to do with multiple processes or VFs. NCCL is not designed to have multiple processes share GPUs. They should be able to share a NIC though, even with PF -- but I don't have much experience with that, and in your case given you have one NIC per GPU it should not happen.

@kimtaehoon-dev
Copy link

kimtaehoon-dev commented Nov 9, 2022

I am sorry, my long question makes you confused.. Let me explain again

  1. There are 2 gpu node ( 8 gpu card, 8 infiniband hca )
  2. I have made four VF for just a PF(via SR-IOV). The PF device name is mlx5_0, VFs is mlx5_[10:13]

And, I will run nccl-tests with this command. If I run this command, 4 process on each node run, and each process use 1 GPU(not share with other process, each process get own gpu. e.g. process-A get GPU1, process-B get GPU2...) and all processes do all_reduce operation via 4 Infiniband VFs device mlx5_[10:13] (not use PFs, only use VFs)

mpirun -v -H {{ node ip }}:4,{{ node ip }}:4 -map-by slot --mca btl ^openib --mca btl_tcp_if_include bond0 -x NCCL_IB_HCA==mlx5_10:1,mlx5_11:1,mlx5_12:1,mlx5_13:1 -x NCCL_DEBUG=INFO \
{{ nccl-tests path }}/build/all_reduce_perf -b 10G -e 20G -f 2 -c 0 -n 20 -w 5 -t 1 -g 1

I can not understand the above nccl-tests result. Because all processes use only a Net device mlx5_10. From your comment "I see. With a recent NCCL, if both GPUs are at the same distance of both NICs, I think each GPU would use a different NIC." I think the processes use all net device mlx5_[10:13]. but they use only mlx5_10...

I try to explain my test easily based on my poor english T^T... I hope you'll get my situation.. and want some suggestion. Thank you !

@sjeaugey
Copy link
Member

sjeaugey commented Nov 9, 2022

I think you are running a single allreduce here, so we create a single ring. Hence, only one interface will be used. If you were to run 4 concurrent allreduce operations (across GPUs 0 of each node, GPUs 1 of each node, ...) then maybe each GPU would pick a different VF.

But when NCCL tries to maximize the bandwidth within a node, it can see that all VFs are actually the same NIC, so it will know there is no point in using all of them because they map to the same port in the end. So once we've found a path using the first port, we know there is no bandwidth left for the other ports and we stop there. You log indicates a single ring.

@kimtaehoon-dev
Copy link

kimtaehoon-dev commented Nov 9, 2022

I think you are running a single allreduce here, so we create a single ring. Hence, only one interface will be used. If you were to run 4 concurrent allreduce operations (across GPUs 0 of each node, GPUs 1 of each node, ...) then maybe each GPU would pick a different VF.

But when NCCL tries to maximize the bandwidth within a node, it can see that all VFs are actually the same NIC, so it will know there is no point in using all of them because they map to the same port in the end. So once we've found a path using the first port, we know there is no bandwidth left for the other ports and we stop there. You log indicates a single ring.

Could you please tell me about single ring..?? and how to set to use concurrent ring algorithm..?? and... how did you got it that nccl use single ring algorithm..?? In the log above, there are logs like this "Connected all trees"... What is this mean?? (I am very very beginner of nccl.... sorry)

@sjeaugey
Copy link
Member

sjeaugey commented Nov 9, 2022

It would use a single ring because more would not give better performance, because they all map to the same port. NCCL has a pretty advanced topology detection and figures out the GPU, PCI, NIC, ports topology -- then searches for the most optimized path between GPUs and NICs.

In the logs, I see you have only 2 channels and we use 2 channels per ring. Also, this log:

cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0

shows that for pattern 4 (ring) the most optimal solution we found was 1 channel, going from NET 0 to GPU 2, 1, 0, 3 then back to NET 0.
You can also see the topology NCCL detected with all 4 NETs attached to the same PCI port of 24GB/s, meaning using more than one port at 25GB/s will not give better performance:

NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
NCCL INFO CPU/0 (1/2/-1)
NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO               + PCI[24.0] - NIC/51000
NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)

Connected .... means that we connected GPUs together along the ring(s) or tree(s) that we computed.

@kimtaehoon-dev
Copy link

kimtaehoon-dev commented Nov 10, 2022

I really appreciate for your answer ! Thank you very very much, I learn lots of thing..! I get some more question..!

You say like this "But when NCCL tries to maximize the bandwidth within a node, it can see that all VFs are actually the same NIC, so it will know there is no point in using all of them because they map to the same port in the end. So once we've found a path using the first port, we know there is no bandwidth left for the other ports and we stop there."

I understand that this means if nccl detect only a infiniband device and all process(gpu) should share the device, only A process use infiniband at a time and other process wait until the process finish it's job. Am i right?? ( As i know, if multiple pure process(not based on nccl environment) share a infiniband, each process create each QP(Queue Pair) and send data concurrently. not wait until another process finish it's job. but process based on nccl is wait. am i right..??)

I run nccl-tests alltoall operation like this(mlx5_0 is infiniband HDR device, not VF, it is PF)

mpirun -v -H 10.182.60.238:4,10.182.62.4:4 -map-by slot --mca btl ^openib --mca btl_tcp_if_include bond0 -x NCCL_IB_HCA==mlx5_0 -x NCCL_DEBUG=INFO \
> /home/deploy/workspace/mosty/nccl-tests/build/alltoall_perf -b 1G -e 2G -f 2 -c 0 -n 20 -w 5 -t 1 -g 1
---
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
  1073741824      33554432     float    none      -1  1014091    1.06    0.93    N/A  1009359    1.06    0.93    N/A
  2147483648      67108864     float    none      -1  1994010    1.08    0.94    N/A  2016215    1.07    0.93    N/A
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 0.932899
#

But bandwidth is very very low. (as i know, infiniband hdr can make bandwidth up to 200Gb/s = 25GB/s)
I think the reason why low bandwidth is , as what you say, other process wait until a process which use infiniband finish it's job. am i right..??

@sjeaugey
Copy link
Member

sjeaugey commented Nov 10, 2022

I understand that this means if nccl detect only a infiniband device and all process(gpu) should share the device, only A process use infiniband at a time and other process wait until the process finish it's job. Am i right?

I'm not sure I'd agree with that. Ring Allreduce requires data to go through each GPU and enter/exit the node once. There is no point in having all GPUs communicate between nodes, we just need to do it once in each direction. In the example above, GPU 2 is receiving data and GPU 3 is sending data. Everything is pipelined; there is a constant flow of data entering the NIC going to GPU 2, then being processed by all GPUs and exiting the node. Feel free to watch my GTC talk this year (2022) for a graphical depiction of the ring algorithm and how the rings map to the hardware.

I run nccl-tests alltoall operation [...] But bandwidth is very very low.

Alltoall would have each GPU use the NIC because there is no way to fuse data (hence no ring), just direct communication, and they may actually use the different VFs (not that it makes any difference). The expected performance, given they share a NIC would be 24GB/s / 8 GPUs = 3 GB/s. 1GB/s is indeed much lower than it should be, but that likely because most GPUs have to use a remote NIC, through the CPU, and that path is slow.
If you use all NICs on the system, and each GPU has a NIC local to its PCI switch, you should see 24GB/s alltoall performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants