Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch: fix CUDA support #57438

Closed
wants to merge 1 commit into from
Closed

pytorch: fix CUDA support #57438

wants to merge 1 commit into from

Conversation

@dnaq
Copy link
Contributor

@dnaq dnaq commented Mar 11, 2019

This commit fixes CUDA support when building with allowUnfree = true and cudaSupport = true. The previous change to pytorch.nix
built, but at runtime cuda support didn't work.

This is a work in progress, the test-suite still doesn't find CUDA, so no CUDA-tests are made of the compiled package.
Also the list of packages in nativeBuildInputs and propagatedBuildInputs are probably wrong, due to a lack of understanding from my side.

Motivation for this change

CUDA support didn't work in the previous version.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

This commit fixes CUDA support when building with `allowUnfree = true` and `cudaSupport = true`. The previous change to pytorch.nix
built, but at runtime cuda support didn't work.

This is a work in progress, the test-suite still doesn't find CUDA, so no CUDA-tests are made of the compiled package.
Also the list of packages in `nativeBuildInputs` and `propagatedBuildInputs` are probably wrong, due to a lack of understanding from my side.
] ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ]
++ lib.optionals stdenv.isLinux [ numactl ];

propagatedBuildInputs = [
cffi
numpy.blas
Copy link
Member

@FRidh FRidh Mar 11, 2019

This is unlikely to be correct

Copy link
Contributor Author

@dnaq dnaq Mar 12, 2019

Same as above

@@ -79,20 +79,19 @@ in buildPythonPackage rec {

nativeBuildInputs = [
cmake
numpy.blas
Copy link
Member

@FRidh FRidh Mar 11, 2019

this probably needs to be in both nativeBuildInputs and buildInputs

Copy link
Contributor Author

@dnaq dnaq Mar 12, 2019

To be honest I don’t really know what packages need to be in buldInputs, nativeBuildInputs or propagatedBuildInputs. This is just something that builds and solves my immediate need. I’d be happy to modify it as needed, but given that each build of pytorch takes a couple of hours I don’t really have the time for a lot of trial and error.

@smatting
Copy link
Contributor

@smatting smatting commented Apr 5, 2019

Thanks @dnaq!
I can confirm that with @FRidh proposed changes also build and works.

Here is what I have tested:

  nativeBuildInputs = [
     cmake
     utillinux
     which
     numpy.blas
  ] ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ]
    ++ lib.optionals stdenv.isLinux [ numactl ];

  buildInputs = [
     numpy.blas
  ] ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ]
    ++ lib.optionals stdenv.isLinux [ numactl ];

  propagatedBuildInputs = [
    cffi
    numpy
    pyyaml
    numpy.blas
  ] ++ lib.optional (pythonOlder "3.5") typing
    ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ];

Could you please add this as a comment on the top of the package?

# NOTE: To be able to use the CUDA version of this package,
# you need to manually load the CUDA library from your installed nvidia driver.
# On a NixOs machine this can be done by adding
#
# environment.variables = {
#     LD_PRELOAD = "${pkgs.linuxPackages.nvidia_x11}/lib/libcuda.so:${pkgs.linuxPackages.nvidia_x11}/lib/libnvidia-fatbinaryloader.so";
# };
#
# to your configuration.nix

@teh
Copy link
Contributor

@teh teh commented Apr 14, 2019

@dnaq @smatting based on this discussion #46032 I sounds like this PR may no longer be needed? Would you mind trying cuda on current master?

@baracoder
Copy link
Contributor

@baracoder baracoder commented Apr 15, 2019

I tried a nix-shell on 7d0db6a
with python36 and pytorchWithCuda on a project,
I am getting AssertionError: Torch not compiled with CUDA enabled

The build on this PR fails for me at some other dependency.

@andersk
Copy link
Contributor

@andersk andersk commented Apr 22, 2019

I confirmed that moving cudatoolkit_joined from buildInputs to nativeBuildInputs is sufficient to get CUDA working (torch.cuda.is_available()True, torch.cuda.get_device_name(0)'GeForce GTX 1080'). It’s required to make nvcc available in PATH at build time, or CUDA support is disabled, with a warning buried in the build log:

which: no nvcc in (/nix/store/lmwk7mg8y79m3izdxdlckdn385x7jgl7-python3-3.7.3/bin:/nix/store/r3p6lbws0mp0lp8jwvivl68qcbzdvy8k-python3.7-setuptools-40.8.0/bin:/nix/store/vb8h3l9jvprlb34a0fjw4g6r7dv329ka-cmake-3.13.4/bin:/nix/store/f2bc62h4xcnqhbgppz199aqikxy164jj-util-linux-2.33.1-bin/bin:/nix/store/lil4rsy5ng1dq5232r6xhgfrvjrkmkmf-which-2.21/bin:/nix/store/lmwk7mg8y79m3izdxdlckdn385x7jgl7-python3-3.7.3/bin:/nix/store/409rs332a9qqkg5xd648j0rx01v6f7a7-python3.7-coverage-4.5.2/bin:/nix/store/bd2sn66007fvkvvn2sk3ga69dgpxpsqq-patchelf-0.9/bin:/nix/store/y60j0zq2j50iaaqjn39i18hkhp277zfy-gcc-wrapper-7.4.0/bin:/nix/store/pm4rg0bdiaj5b748kncp9vf7n3x446sd-gcc-7.4.0/bin:/nix/store/f5wl80zkrd3fc1jxsljmnpn7y02lz6v1-glibc-2.27-bin/bin:/nix/store/baylddnb83lh45v3fz15ddhbpxbdb7m7-coreutils-8.31/bin:/nix/store/1n593wk7xhygrxi2nwah6f93ksd4if8i-binutils-wrapper-2.31.1/bin:/nix/store/1kl6ms8x56iyhylb2r83lq7j3jbnix7w-binutils-2.31.1/bin:/nix/store/f5wl80zkrd3fc1jxsljmnpn7y02lz6v1-glibc-2.27-bin/bin:/nix/store/baylddnb83lh45v3fz15ddhbpxbdb7m7-coreutils-8.31/bin:/nix/store/baylddnb83lh45v3fz15ddhbpxbdb7m7-coreutils-8.31/bin:/nix/store/r432g6h0qy7wq18kksdbm9f72h0wx7yv-findutils-4.6.0/bin:/nix/store/2hr6x9f9ivljdr2dkh4sz2wyhmpn8xmc-diffutils-3.7/bin:/nix/store/h67k75i4wm7jkyaan97xzw0g38vm3yxa-gnused-4.7/bin:/nix/store/pyfxqzjkffbs8c0cg28bvspmyb8rvdc8-gnugrep-3.3/bin:/nix/store/b9kmciqh6n9z2b1lg4dlfbh1qzq2pq8z-gawk-4.2.1/bin:/nix/store/4c2akixx0smyz2xbwpfa41bk7gf7rq6f-gnutar-1.31/bin:/nix/store/d9cv4lh32as716x3d9p9ikdh7j2kqrdh-gzip-1.10/bin:/nix/store/plcgyqkiqb599q42cczkqhnrii6pav6w-bzip2-1.0.6.0.1-bin/bin:/nix/store/yg76yir7rkxkfz6p77w4vjasi3cgc0q6-gnumake-4.2.1/bin:/nix/store/yjkch3aia9ny4dq42dbcjrdwqb1y8c33-bash-4.4-p23/bin:/nix/store/xkzym3c0r5368lxs2m9h247c93m0hiv2-patch-2.7.6/bin:/nix/store/5zdqndi3fk72n4drd38wzmgbrqhlaciv-xz-5.2.4-bin/bin)

Possibly other changes may be desirable for cross-compilation; all I know is that this one is necessary for the normal case. Submitted as #60002.

As for @smatting’s comment:

Could you please add this as a comment on the top of the package?

# NOTE: To be able to use the CUDA version of this package,
# you need to manually load the CUDA library from your installed nvidia driver.
# On a NixOs machine this can be done by adding
#
# environment.variables = {
#     LD_PRELOAD = "${pkgs.linuxPackages.nvidia_x11}/lib/libcuda.so:${pkgs.linuxPackages.nvidia_x11}/lib/libnvidia-fatbinaryloader.so";
# };
#
# to your configuration.nix

I didn’t need any such configuration. I simply configured services.xserver.videoDrivers = [ "nvidia" ];, which caused NixOS to add /run/opengl-driver/lib (a symlink to ${nvidia_x11}/lib) to LD_LIBRARY_PATH, which is enough to allow PyTorch to find the needed libraries. Force-loading libraries into every process with LD_PRELOAD may have unintended side effects.

@dnaq
Copy link
Contributor Author

@dnaq dnaq commented May 11, 2019

Seems like #60002 solves this issue in a more idiomatic way.

@dnaq dnaq closed this May 11, 2019
@dnaq dnaq deleted the pytorch-fix branch May 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants