tch::Cuda::is_available() returns false using local libtorch 1.8 for CUDA 11.1 #329

sethmnielsen · 2021-03-07T16:27:22Z

I am on Arch Linux, Rust/cargo 1.50 and have set both LIBTORCH and LD_LIBRARY_PATH according to the README. tch-rs is on latest commit to master, commit 25ac21d. I am running the example with cargo run --example basics. I get false returned by both tch::Cuda::is_available() and tch::Cuda::cudnn_is_available().

I have triple checked that the path for LIBTORCH is correct, and it must be correct as everything builds fine and libtorch was not downloaded inside of tch-rs. I know that CUDA is installed correctly as well, because using the python-pytorch-cuda package from the Arch community repo (which is on PyTorch 1.8) I can use CUDA tensors just fine and torch.cuda.is_available() returns True.

I should note that my version of CUDA is 11.2, though I haven't seen that cause any issues with PyTorch 1.8 in Python (which apparently was built for CUDA 11.1).

Any suggestions? Are others having this issue? I saw #291, but I am not building in release, so I think this is a different problem.

The text was updated successfully, but these errors were encountered:

sethmnielsen · 2021-03-07T17:12:41Z

Looks like if I unset LIBTORCH, run cargo clean then cargo run --example basics, it downloads libtorch (target directory is now 1.4 GB) and builds successfully, but I still get false for tch::Cuda::is_available() and tch::Cuda::cudnn_is_available(). So it definitely is not an issue with correctly setting environment variables.

LaurentMazare · 2021-03-07T20:15:12Z

Thanks for reporting this issue, I just pushed a (hacky) fix that should hopefully help with this.
The culprit here is that the C++ library is split in a cpu and a cuda version, and it's often the case for the cuda version not to be included by the linker as there is no "hard" dependency to it. We have a hack in place to get around this by forcing the dependency but this hack broke with the 1.8 release as the cuda library was split in multiple sub libraries and one of them (cuda_cu) was removed by the linker. I tweaked the hack to force this to be included.
Longer term, this will be tackled by passing -Wl,--no-as-needed via Cargo extra-link-args but this is only available since cargo 1.50 in nightly mode so we'll wait for this to reach stable until we push the fix.

sethmnielsen · 2021-03-07T20:34:48Z

Thanks for the quick reply and fix! Ah, I see - let's hope that makes it to cargo stable soon.

That fix seemed to do the trick! I am now getting true for both function calls. 😄 Thanks for your help!

danieldk · 2021-03-08T08:18:13Z

With the latest change linkage fails against libtorch compiled against CUDA 10.2:

  = note: /nix/store/cp1sa3xxvl71cypiinw2c62i5s33chlr-binutils-2.35.1/bin/ld: cannot find -ltorch_cuda_cu
          /nix/store/cp1sa3xxvl71cypiinw2c62i5s33chlr-binutils-2.35.1/bin/ld: cannot find -ltorch_cuda_cpp
          collect2: error: ld returned 1 exit status

because it doesn't have these libraries

❯ unzip -l libtorch-cxx11-abi-shared-with-deps-1.8.0.zip | grep libtorch_
352214112  02-27-2021 00:02   libtorch/lib/libtorch_cpu.so
1158264872  02-27-2021 00:02   libtorch/lib/libtorch_cuda.so
    12640  02-27-2021 00:02   libtorch/lib/libtorch_global_deps.so
 24837016  02-27-2021 00:02   libtorch/lib/libtorch_python.so

Maybe I should switch to libtorch with CUDA 11.1. Hopefully it doesn't have the same regressions for convolutions as PyTorch 1.7.1 with CUDA 11 had.

LaurentMazare · 2021-03-08T09:01:06Z

Ah it's a bummer that this depends on the cuda version. Anyway I just pushed a small tweak that will only trigger these libs to be linked if the files are present, hopefully this getting 10.2 to work.

danieldk · 2021-03-08T09:09:40Z

Works like a charm, thanks!

sethmnielsen · 2021-03-13T05:49:18Z

So now I am not entirely sure that this is working. Running the basics example works just fine, but if I try to run the reinforcement learning example, I am getting a lot of linker errors.
BTW: I am still on the first commit where you first made the fix; I haven't pulled in the commit(s) after that.

➜  cargo run --example reinforcement-learning  --features=python a2c2 > log.txt

   Compiling tch v0.4.0 (/home/seth/school/adv_dl/project2/tch-rs)
error: linking with `cc` failed: exit code: 1
  |
  = note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" "-Wl,--eh-frame-hdr" "-L" "/usr/lib64/rustlib/x86_64-unknown-linux-gnu/lib" 
  ...

(then there is lots and lots of linker flags, I'll just share the first and last few of them)

    "-Wl,-Bdynamic" "-lstdc++" "-ltorch_cuda" "-ltorch_cuda_cu" "-ltorch_cuda_cpp" "-ltorch" "-ltorch_cpu" "-lc10" "-lgomp" "-lbz2" "-lpython3.9" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc"
  = note: /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_rnn_executor'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `caffe2::TensorShape::TensorShape()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `c10::C10FlagsRegistry[abi:cxx11]()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `bool c10::C10FlagParser::Parse<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_force_shared_col_buffer'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `caffe2::OperatorDef::OperatorDef()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_operator_throw_if_fp_exceptions'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `caffe2::BlobProto::BlobProto()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_print_blob_sizes_at_exit'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cu.so: undefined reference to `c10::MessageLogger::~MessageLogger()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `bool c10::C10FlagParser::Parse<int>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int*)'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_max_keep_on_shrink_memory'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_operator_throw_on_first_occurrence_if_fp_exceptions'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `caffe2::TensorProtos::TensorProtos()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `caffe2::DeviceOption::DeviceOption()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `caffe2::NetDef::NetDef()'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_operator_throw_if_fp_overflow_exceptions'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cu.so: undefined reference to `c10::MessageLogger::MessageLogger(char const*, int, int)'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_keep_on_shrink'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `FLAGS_caffe2_workspace_stack_debug'
          /usr/bin/ld: /home/seth/packages/libtorch/lib/libtorch_cuda_cpp.so: undefined reference to `bool c10::C10FlagParser::Parse<bool>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool*)'
          collect2: error: ld returned 1 exit status


error: aborting due to previous error

error: could not compile `tch`

To learn more, run the command again with --verbose.

EDIT: Should I make a separate issue? It seems to be correctly trying to link to libtorch_cuda_cpp.so, but is having issues doing so.

sethmnielsen · 2021-03-13T06:25:34Z

I fixed it. There must have been some issue with trying to link with /usr/lib/libtorch_cuda.so (installed from the python-pytorch-cuda Arch package) vs. the locally downloaded libtorch, as uninstalling the python-pytorch-cuda package resulted in a successful build and run of the example program. It's also using Cuda(0) as the device, so everything looks good!

Sorry for the false alarm!

LaurentMazare · 2021-03-13T06:36:17Z

Glad that you got it to work, closing this issue for now but feel free to re-open if you notice more issues.

LaurentMazare closed this as completed Mar 13, 2021

eonm-pro mentioned this issue May 9, 2021

Failed to run cargo run on Linux guillaume-be/rust-bert#141

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tch::Cuda::is_available() returns false using local libtorch 1.8 for CUDA 11.1 #329

tch::Cuda::is_available() returns false using local libtorch 1.8 for CUDA 11.1 #329

sethmnielsen commented Mar 7, 2021

sethmnielsen commented Mar 7, 2021 •

edited

LaurentMazare commented Mar 7, 2021

sethmnielsen commented Mar 7, 2021

danieldk commented Mar 8, 2021 •

edited

LaurentMazare commented Mar 8, 2021

danieldk commented Mar 8, 2021

sethmnielsen commented Mar 13, 2021 •

edited

sethmnielsen commented Mar 13, 2021

LaurentMazare commented Mar 13, 2021

tch::Cuda::is_available() returns false using local libtorch 1.8 for CUDA 11.1 #329

tch::Cuda::is_available() returns false using local libtorch 1.8 for CUDA 11.1 #329

Comments

sethmnielsen commented Mar 7, 2021

sethmnielsen commented Mar 7, 2021 • edited

LaurentMazare commented Mar 7, 2021

sethmnielsen commented Mar 7, 2021

danieldk commented Mar 8, 2021 • edited

LaurentMazare commented Mar 8, 2021

danieldk commented Mar 8, 2021

sethmnielsen commented Mar 13, 2021 • edited

sethmnielsen commented Mar 13, 2021

LaurentMazare commented Mar 13, 2021

sethmnielsen commented Mar 7, 2021 •

edited

danieldk commented Mar 8, 2021 •

edited

sethmnielsen commented Mar 13, 2021 •

edited