Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot initialize CUDA without ATen_cuda library #182

Closed
johannesvollmer opened this issue May 3, 2020 · 13 comments
Closed

Cannot initialize CUDA without ATen_cuda library #182

johannesvollmer opened this issue May 3, 2020 · 13 comments

Comments

@johannesvollmer
Copy link

johannesvollmer commented May 3, 2020

Hi! First, thanks for your work regarding PyTorch.

Background

I have run into several problems when trying to run a project using rust-bert, a rust native Transformer-based models implementation which uses tch-rs. The CPU version ran just fine, but the CUDA version did not. Initially, I started a thread on the rust-bert repository with possibly more detailed information, but I'll summarize it here:

Problem

First, switching from Device::CPU to Device::CUDA made it stop working and generated the following error:

TorchError { c_error: "Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don\'t directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don\'t depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library. (initCUDA at C:\\b\\windows\\pytorch\\aten\\src\\ATen/detail/CUDAHooksInterface.h:63)\n(no backtrace available)" }.

Trying to fix it, I installed CUDA 10.2.89, updated graphics drivers, tried release and debug modes, always deleting the cargo build directory to force a fresh build. All of this did not change anything.

Then I tried various manually installing PyTorch 1.5, setting environment variables (LIBTORCH and PATH with the LibTorch path, TORCH_CUDA_VERSION as 10.2), but suddenly, the previous Error did not even show up, because a different runtime error aborted the process before anything else could happen:

error: process didn't exit successfully: `target\release\phrase-set-variations.exe` (exit code: 0xc0000135, STATUS_DLL_NOT_FOUND)

Now, not even the CPU version runs, not even after reverting the environment variable changes. :(
I was not able to resolve that error with Google, so I'm asking you for help here.

Environment

CUDA 10.2.89
tch = "0.1.7"
Windows 10
GeForce GTX 1060
rust-bert = "0.7.0"

Question

If someone of you has an idea on what I could do next, I would really appreciate some hints :)

@johannesvollmer
Copy link
Author

Update:

I created a new project with only tch and the example/basics.rs code. Compiled in debug mode, the same error still remains. Compiling other Rust projects without CUDA works.

@LaurentMazare
Copy link
Owner

I'm very unfamiliar with how linking works on windows so probably won't be of very much help.
Could you try unsetting all environment variables? This should download the cpu version and link/run with it, and then try again running cargo run --example basics.
(The error message seems to indicate that Env:Path is not set properly. I'm actually not sure whether it should point to the libtorch directory or to libtorch\dir)

@johannesvollmer
Copy link
Author

Thanks for your quick answer. I already tried unsetting all environment variables, without effect. The only CUDA-related environment variables I could find are

CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2 and
CUDA_PATH_V10_2 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2.

I assume removing these would not help, I think those are from CUDA itself.

@johannesvollmer
Copy link
Author

Is there a way to inspect what exactly is downloaded when building?

@johannesvollmer
Copy link
Author

I didn't change anything, but I deleted the target build folder, and now the example runs..... what the

@johannesvollmer
Copy link
Author

johannesvollmer commented May 3, 2020

The example only uses CPU devices, and CUDA still does not work though.
Changing FLOAT_CPU to FLOAT_CUDA in the basic example produces the previous error again: Cannot initialize CUDA without ATen_cuda library [...]

To recap:

  • up to date graphics drivers
  • tch 0.1.7
  • no environment variables (which means no manually installed torch lib)

@LaurentMazare
Copy link
Owner

Did you look at this related issue #177 ?
I would currently expect cuda to work in debug mode on windows but in release mode you probably will need the hack given in the issue.
Also you should certainly run cargo clean or delete your target directory when changing LIBTORCH.

@johannesvollmer
Copy link
Author

Oh yeah right, I looked into this issue but then forgot about it when the other error occured! haha

@johannesvollmer
Copy link
Author

The CUDA version does not run in debug or release mode alike, producing the aforementioned error, even when inserting unsafe{ torch_sys::dummy_cuda_dependency(); }

@johannesvollmer johannesvollmer changed the title Cannot initialize CUDA without ATen_cuda library, then (exit code: 0xc0000135, STATUS_DLL_NOT_FOUND) Cannot initialize CUDA without ATen_cuda library May 3, 2020
@johannesvollmer
Copy link
Author

Is there a way to investigate on the build process? Can we inspect the downloaded assets?

@LaurentMazare
Copy link
Owner

I think cargo has a verbose mode. But if you're using a local installation of libtorch I wouldn't expect 'assets' to be downloaded besides the crates that this crate depends on.
Also worth trying compiling the basics example rather than something more complicated (if that's not already what you're doing).

@johannesvollmer
Copy link
Author

I could not get it working. I'll have to try something entirely different due to a tight deadline. Thanks anyways for your suggestions!

@LaurentMazare
Copy link
Owner

Closing as no update in the last 6 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants