Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the P2P/IPC/SHM container fix #248

Closed
wants to merge 3 commits into from

Conversation

AddyLaddy
Copy link
Collaborator

Fixes: #155

Now we use /proc/sys/kernel/random/boot_id instead of the
hostname and NS uts,mnt info
A new env var; NCCL_HOSTID can be used to overload this string.

Fixes: #155

Now we use /proc/sys/kernel/random/boot_id instead of the
hostname and NS uts,mnt info
A new env var; NCCL_HOSTID can be used to overload this string.
@AddyLaddy
Copy link
Collaborator Author

@alsrgv Here is the latest P2P/IPC/SHM fix that we are intending to add to the next NCCL release

@alsrgv
Copy link
Contributor

alsrgv commented Aug 22, 2019

@AddyLaddy, @sjeaugey, I'm a bit confused by this change. Before, /uts and /mnt were used to differentiate between containers, which helped to properly disable P2P/SHM/IPC as necessary:

asergeev@server:~$ readlink /proc/$$/ns/uts
uts:[4026531838]
asergeev@server:~$ nvidia-docker run -it --shm-size 1g -v `pwd`:/mnt --rm horovod/horovod:0.17.0.post1-tf1.14.0-torch1.2.0-mxnet1.5.0-py3.6
root@e8864a4a48b1:/examples# readlink /proc/$$/ns/uts
uts:[4026533300]
root@e8864a4a48b1:/examples#

How will P2P/IPC/SHM be auto-disabled between unconnected containers (e.g. just two nvidia-docker run xyz w/o any sharing) after this change?

Fixes: #155

Now we use /proc/sys/kernel/random/boot_id instead of the
hostname and NS uts,mnt info
A new env var; NCCL_HOSTID can be used to overload this string.

Also checks the MAJOR:MINOR of the /dev/shm device to check
it can be used between containers
@AddyLaddy
Copy link
Collaborator Author

Sorry I missed a component of that fix which should also check the MAJOR:MINOR of the /dev/shm device before attempting to use it in the SHM transport. I'll try and updtae the PR. It will be good to know if this is sufficient for your use case.
If not, then you could recreate the old behaviour by setting NCCL_HOSTID using a wrapper script.

@alsrgv
Copy link
Contributor

alsrgv commented Aug 22, 2019

What about CUDA IPC? It does not work across containers, too.

@AddyLaddy
Copy link
Collaborator Author

The motivation for this patch is to enable P2P/IPC/SHM between containers as I am told that it is typical for HPC apps to launch a container per rank. @3XX0 Is our container expert and perhaps he can comment?

@sjeaugey
Copy link
Member

I think the rationale for IPCs is that if we are sure we are on the same node, then we can work on NVML devices and see if they can use P2P.

@3XX0
Copy link
Member

3XX0 commented Aug 22, 2019

Yes, /proc/sys/kernel/random/boot_id will check if the containers are on the same node.

Next check is the major:minor of /dev/shm to know whether or not they share the same POSIX SHM (which is required for CUDA IPCs).

Last check is cuDeviceCanAccessPeer() which will check if the GPUs are visible and support P2P.

This should handle things like:

# No P2P, no SHM
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=none horovod
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=none horovod

# No P2P, SHM possible
nvidia-docker run --name=A -e NVIDIA_VISIBLE_DEVICES=0 horovod
nvidia-docker run --name=B -e NVIDIA_VISIBLE_DEVICES=1 --ipc=container:A horovod

# P2P
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --ipc=host horovod
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --ipc=host horovod

@alsrgv
Copy link
Contributor

alsrgv commented Aug 22, 2019

Do we check cuDeviceCanAccessPeer() now? I remember CUDA IPC crashing before across containers.

@sjeaugey
Copy link
Member

@3XX0
Copy link
Member

3XX0 commented Aug 22, 2019

I remember CUDA IPC crashing before across containers.

Probably a driver issue, IIRC at one point there was an issue where you also needed to share the PID namespace for IPCs to work, but that's definitely not the case anymore.

@alsrgv
Copy link
Contributor

alsrgv commented Aug 23, 2019

I've tested this PR and it looks good. Thanks, this makes it much more convenient to use NCCL with various container environments!

@AddyLaddy
Copy link
Collaborator Author

Great thanks for the feedback Alex!

sjeaugey added a commit that referenced this pull request Sep 13, 2019
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).
@sjeaugey sjeaugey mentioned this pull request Sep 13, 2019
sjeaugey added a commit that referenced this pull request Oct 8, 2019
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).
sjeaugey added a commit that referenced this pull request Oct 18, 2019
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).
sjeaugey added a commit that referenced this pull request Nov 14, 2019
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).

Add tuning for GCP
sjeaugey added a commit that referenced this pull request Nov 19, 2019
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).

Add tuning for GCP
@sjeaugey
Copy link
Member

Merged in 2.5.6. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mesos containerizer does not isolate UTS namespace
4 participants