Rework the P2P/IPC/SHM container fix #248

AddyLaddy · 2019-08-22T17:29:22Z

Fixes: #155

Now we use /proc/sys/kernel/random/boot_id instead of the
hostname and NS uts,mnt info
A new env var; NCCL_HOSTID can be used to overload this string.

Fixes: #155 Now we use /proc/sys/kernel/random/boot_id instead of the hostname and NS uts,mnt info A new env var; NCCL_HOSTID can be used to overload this string.

AddyLaddy · 2019-08-22T17:30:17Z

@alsrgv Here is the latest P2P/IPC/SHM fix that we are intending to add to the next NCCL release

alsrgv · 2019-08-22T18:27:48Z

@AddyLaddy, @sjeaugey, I'm a bit confused by this change. Before, /uts and /mnt were used to differentiate between containers, which helped to properly disable P2P/SHM/IPC as necessary:

asergeev@server:~$ readlink /proc/$$/ns/uts
uts:[4026531838]
asergeev@server:~$ nvidia-docker run -it --shm-size 1g -v `pwd`:/mnt --rm horovod/horovod:0.17.0.post1-tf1.14.0-torch1.2.0-mxnet1.5.0-py3.6
root@e8864a4a48b1:/examples# readlink /proc/$$/ns/uts
uts:[4026533300]
root@e8864a4a48b1:/examples#

How will P2P/IPC/SHM be auto-disabled between unconnected containers (e.g. just two nvidia-docker run xyz w/o any sharing) after this change?

Fixes: #155 Now we use /proc/sys/kernel/random/boot_id instead of the hostname and NS uts,mnt info A new env var; NCCL_HOSTID can be used to overload this string. Also checks the MAJOR:MINOR of the /dev/shm device to check it can be used between containers

AddyLaddy · 2019-08-22T19:28:06Z

Sorry I missed a component of that fix which should also check the MAJOR:MINOR of the /dev/shm device before attempting to use it in the SHM transport. I'll try and updtae the PR. It will be good to know if this is sufficient for your use case.
If not, then you could recreate the old behaviour by setting NCCL_HOSTID using a wrapper script.

…cl into dev/daddison/p2p_container_fix

alsrgv · 2019-08-22T19:36:36Z

What about CUDA IPC? It does not work across containers, too.

AddyLaddy · 2019-08-22T19:58:57Z

The motivation for this patch is to enable P2P/IPC/SHM between containers as I am told that it is typical for HPC apps to launch a container per rank. @3XX0 Is our container expert and perhaps he can comment?

sjeaugey · 2019-08-22T21:09:11Z

I think the rationale for IPCs is that if we are sure we are on the same node, then we can work on NVML devices and see if they can use P2P.

3XX0 · 2019-08-22T21:16:48Z

Yes, /proc/sys/kernel/random/boot_id will check if the containers are on the same node.

Next check is the major:minor of /dev/shm to know whether or not they share the same POSIX SHM (which is required for CUDA IPCs).

Last check is cuDeviceCanAccessPeer() which will check if the GPUs are visible and support P2P.

This should handle things like:

# No P2P, no SHM
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=none horovod
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=none horovod

# No P2P, SHM possible
nvidia-docker run --name=A -e NVIDIA_VISIBLE_DEVICES=0 horovod
nvidia-docker run --name=B -e NVIDIA_VISIBLE_DEVICES=1 --ipc=container:A horovod

# P2P
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --ipc=host horovod
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --ipc=host horovod

alsrgv · 2019-08-22T21:55:58Z

Do we check cuDeviceCanAccessPeer() now? I remember CUDA IPC crashing before across containers.

sjeaugey · 2019-08-22T22:01:47Z

We should check it here : https://github.com/NVIDIA/nccl/blob/master/src/transport/p2p.cc#L98

3XX0 · 2019-08-22T23:01:50Z

I remember CUDA IPC crashing before across containers.

Probably a driver issue, IIRC at one point there was an issue where you also needed to share the PID namespace for IPCs to work, but that's definitely not the case anymore.

alsrgv · 2019-08-23T03:05:04Z

I've tested this PR and it looks good. Thanks, this makes it much more convenient to use NCCL with various container environments!

AddyLaddy · 2019-08-23T03:59:21Z

Great thanks for the feedback Alex!

Add LL128 Protocol. Rewrite the topology detection and tree/ring creation (#179). Improve tree performance by sending/receiving from different GPUs. Add model-based tuning to switch between the different algorithms and protocols. Rework P2P/SHM detection in containers (#155, #248). Detect duplicated devices and return an error (#231).

Add LL128 Protocol. Rewrite the topology detection and tree/ring creation (#179). Improve tree performance by sending/receiving from different GPUs. Add model-based tuning to switch between the different algorithms and protocols. Rework P2P/SHM detection in containers (#155, #248). Detect duplicated devices and return an error (#231). Add tuning for GCP

sjeaugey · 2019-11-19T22:58:49Z

Merged in 2.5.6. Closing.

Rework the P2P/IPC/SHM container fix

a9479e4

Fixes: #155 Now we use /proc/sys/kernel/random/boot_id instead of the hostname and NS uts,mnt info A new env var; NCCL_HOSTID can be used to overload this string.

AddyLaddy mentioned this pull request Aug 22, 2019

Generate host-hash for P2P and SHM based on $(readlink /proc/self/ns/uts) + $(readlink /proc/self/ns/mnt) #156

Merged

Merge branch 'dev/daddison/p2p_container_fix' of github.com:NVIDIA/nc…

87fd491

…cl into dev/daddison/p2p_container_fix

sjeaugey mentioned this pull request Sep 13, 2019

2.5.6-1 #255

Merged

sjeaugey closed this Nov 19, 2019

AddyLaddy deleted the dev/daddison/p2p_container_fix branch December 24, 2019 20:31

dtrudg mentioned this pull request May 21, 2020

Availability of GPU Direct RDMA support with Singularity? apptainer/singularity#4921

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the P2P/IPC/SHM container fix #248

Rework the P2P/IPC/SHM container fix #248

AddyLaddy commented Aug 22, 2019

AddyLaddy commented Aug 22, 2019

alsrgv commented Aug 22, 2019

AddyLaddy commented Aug 22, 2019

alsrgv commented Aug 22, 2019

AddyLaddy commented Aug 22, 2019

sjeaugey commented Aug 22, 2019

3XX0 commented Aug 22, 2019 •

edited

Loading

alsrgv commented Aug 22, 2019

sjeaugey commented Aug 22, 2019

3XX0 commented Aug 22, 2019

alsrgv commented Aug 23, 2019

AddyLaddy commented Aug 23, 2019

sjeaugey commented Nov 19, 2019

Rework the P2P/IPC/SHM container fix #248

Rework the P2P/IPC/SHM container fix #248

Conversation

AddyLaddy commented Aug 22, 2019

AddyLaddy commented Aug 22, 2019

alsrgv commented Aug 22, 2019

AddyLaddy commented Aug 22, 2019

alsrgv commented Aug 22, 2019

AddyLaddy commented Aug 22, 2019

sjeaugey commented Aug 22, 2019

3XX0 commented Aug 22, 2019 • edited Loading

alsrgv commented Aug 22, 2019

sjeaugey commented Aug 22, 2019

3XX0 commented Aug 22, 2019

alsrgv commented Aug 23, 2019

AddyLaddy commented Aug 23, 2019

sjeaugey commented Nov 19, 2019

3XX0 commented Aug 22, 2019 •

edited

Loading