Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"add nccl cmake enforce" #4818

Closed
wants to merge 13 commits into from
Closed

Conversation

dzhwinter
Copy link
Contributor

@dzhwinter dzhwinter commented Oct 15, 2017

To support Multi-GPU, we use NCCL library to do parameter/gradients integration and distribution. This PR adds an enforce macro helper module, cmake scripts to support NCCL. Cause NCCL library needs to compatible with different CUDA version, we use the DSO loading technical, same as other module cublas, cudnn does.

@QiJune
Copy link
Member

QiJune commented Oct 16, 2017

Please add nccl as a cmake external project, just like pybind/eigen3/... And make ci pass first.

@dzhwinter dzhwinter mentioned this pull request Oct 17, 2017
5 tasks
@dzhwinter dzhwinter changed the title "add nccl enforce" "add nccl cmake enforce" Oct 19, 2017
@QiJune
Copy link
Member

QiJune commented Oct 20, 2017

We have to ways to introduce NCCL to paddle:

  1. Make sure NCCL has been installed in CI machine first. Both .h and .so are installed in the correct directory. We can dynamic load .so of NCCL.
  2. We download NCCL source code from github, and just take NCCL as a cmake external project. Since we fix the version of NCCL, we can build NCCL and use .a of NCCL in our project.

@dzhwinter
Copy link
Contributor Author

NCCL2 has a good back compatible ability, which can used base CUDA 7.0 - CUDA 9.x.
Since the nvidia-docker image can install NCCL2 manually, so I will remove the github clone related cmake scripts.
nvidia-docker

NCCL is not bundled in any CUDA image right now, so you will have to install it manually for now. On 8.0 and 9.0.
Yes, I plan to add NCCL 2.0 to all images soon.
In the mean time, we have a new tag called "base" for CUDA 9.0. It adds our package repositories to the official ubuntu 16.04 image. Afterwards you can manually install any package:
FROM nvidia/cuda:9.0-base
RUN apt-get update && apt-get install -y libnccl-dev

@dzhwinter dzhwinter mentioned this pull request Oct 23, 2017
@dzhwinter
Copy link
Contributor Author

This will be closed since #5001 cherry-picked this code.

@dzhwinter dzhwinter closed this Oct 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants