Error 999 on bumblebee with wrong libGL activated #50

gdkrmr · 2017-07-04T13:19:24Z

tying to build CUDAdrv I get the following error, could this be, because I am running on a Laptop with bumblebee and two graphics cards? I used optirun julia and did Pkg.checkout("CUDAdrv"), bumblebee is working for other programms:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.1-pre.0 (2017-06-19 13:06 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit dcf39a1* (15 days old release-0.6)
|__/                   |  x86_64-linux-gnu

julia> Pkg.build("CUDAdrv")
INFO: Building CUDAdrv
=========================================[ ERROR: CUDAdrv ]==========================================

LoadError: Initializing CUDA driver failed: unknown error (code 999).
while loading /home/gkraemer/.julia/v0.6/CUDAdrv/deps/build.jl, in expression starting on line 119

=====================================================================================================

==========================================[ BUILD ERRORS ]===========================================

WARNING: CUDAdrv had build errors.

 - packages with build errors remain installed in /home/gkraemer/.julia/v0.6
 - build the package(s) and all dependencies with `Pkg.build("CUDAdrv")`
 - build a single package by running its `deps/build.jl` script

=====================================================================================================

The text was updated successfully, but these errors were encountered:

maleadt · 2017-07-04T13:30:10Z

could this be, because I am running on a Laptop with bumblebee and two graphics cards?

No idea, you'll need to provide more information for any diagnosis.

Did you see the suggestions in the documentation?

unknown error (code 999): this often indicates that your set-up is broken, eg. because you didn't load the correct, or any, kernel module. Please verify your set-up, on Linux by executing nvidia-smi or on other platforms by compiling and running CUDA C code using nvcc.

gdkrmr · 2017-07-05T07:36:53Z

Digging a little bit deeper I found that tensorflow is not working either, probably because the provided binaries are for CUDA 7.5 and I am on 8.0 (I haven't rebuild tensorflow yet to check this). Could that be an issue? What nvidia driver versions do you support, I currently have 375 installed?

maleadt · 2017-07-05T07:40:31Z

What nvidia driver versions do you support, I currently have 375 installed?

Shouldn't be a problem, I'm currently working with (and hence support):

1. 375.39 (installed)
2. 375.66: long-lived (installed)
3. 378.13 (installed)
4. 381.09 (installed)
5. 381.22: short-lived (installed)
6. 384.47: beta (installed)

but in general, our support depends on the CUDA API level exported by the driver library, which is currently 8.0. So unless you have access to the CUDA 9 beta, driver support shouldn't be an issue.

Again, can you execute nvidia-smi and compile & execute regular CUDA C code using nvcc?

gdkrmr · 2017-07-05T09:03:03Z

I could compile all the samples that came with cuda (the folder /usr/share/cuda-8.0/samples) but they wouldn't run.
Then I canged

update-alternatives --config x86_64-linux-gnu_gl_conf

to auto (which is using the nvidia driver) and suddenly I can build CUDAdrv and CUDArt. I hope I did not break anything else. I will let you know if I run into more trouble, thanks for the help!

maleadt · 2017-07-05T11:25:29Z

That is strange, using NVIDIA's libgl shouldn't impact raw usage of libcuda.so.
Again, did and could you run nvidia-smi? Assuming it failed, did you see any error in dmesg?
Maybe you have multiple libcuda.so files in your system, and that update-alternatives call cascaded into changing the active symlink to a different one of those?

gdkrmr · 2017-07-05T12:49:15Z

i could run nvidia-smi before and after (with optirun). I had cuda 7.5 before, but it got removed from the system when I installed cuda 8.0. Also graphical applications worked before with optirun. Also compiling cuda applications worked before, they just didn't want to run complaining about not finding a device.

maleadt · 2017-07-05T20:23:40Z

That is all very confusing... can't really use the info to improve the build system. Glad it's working now though!
Next time you run into the issue, would you mind gathering as much information as possible? eg. running Pkg.build with DEBUG=1 (if that option still exists by then, as it is bound to change, just check the documentation at that point), run a compiled CUDA application through strace and ldd to see exactly what it picks up, etc. Thanks!

gdkrmr · 2017-07-06T09:13:56Z

julia> ENV["DEBUG"] = "1"
julia> Pkg.build("CUDAdrv")
INFO: Building CUDAdrv
DEBUG: Found libcuda at /usr/lib/x86_64-linux-gnu/libcuda.so
DEBUG: Vendor: NVIDIA
===============================[ ERROR: CUDAdrv ]===============================

LoadError: CUDA error 999 calling cuInit
while loading /home/gkraemer/.julia/v0.5/CUDAdrv/deps/build.jl, in expression starting on line 107

================================================================================

================================[ BUILD ERRORS ]================================

WARNING: CUDAdrv had build errors.

 - packages with build errors remain installed in /home/gkraemer/.julia/v0.5
 - build the package(s) and all dependencies with `Pkg.build("CUDAdrv")`
 - build a single package by running its `deps/build.jl` script

================================================================================

gdkrmr · 2017-07-06T09:14:36Z

the good part, is that I can control the error now :D

maleadt · 2017-07-06T09:26:39Z

OK, we can simplify all that to the following now:

$ julia -e 'ccall((:cuInit, "/usr/lib/x86_64-linux-gnu/libcuda.so"), Cint, (Cint,), 0)'

This still produces 999, right?

What does ldd /usr/lib/x86_64-linux-gnu/libcuda.so produce?

If you compile the following file, test.cu:

#include <cuda_runtime.h>
int main() { cudaFree(0); return 0; }

with:

nvcc test.cu -o test

what libraries does it open:

strace ./test |& grep libcuda

Of course, add optirun wherever necessary, I'm not familiar with bumblebee.

gdkrmr · 2017-07-06T09:46:33Z

all of this is with

update-alternatives --config x86_64-linux-gnu_gl_conf

set to the mesa driver

OK, we can simplify all that to the following now:

$ julia -e 'ccall((:cuInit, "/usr/lib/x86_64-linux-gnu/libcuda.so"), Cint, (Cint,), 0)'

This still produces 999, right?

no, does not produce any error message

What does ldd /usr/lib/x86_64-linux-gnu/libcuda.so produce?

$ ldd /usr/lib/x86_64-linux-gnu/libcuda.so
	linux-vdso.so.1 =>  (0x00007ffd39bd5000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb49289b000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb4924d0000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb4922cc000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb4920af000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb491ea6000)
	libnvidia-fatbinaryloader.so.375.66 => /usr/lib/nvidia-375/libnvidia-fatbinaryloader.so.375.66 (0x00007fb491c5a000)
	/lib64/ld-linux-x86-64.so.2 (0x00005560e05de000)

If you compile the following file, test.cu:

#include <cuda_runtime.h>
int main() { cudaFree(0); return 0; }

with:

nvcc test.cu -o test

$ nvcc test.cu -o test
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

what libraries does it open:

strace ./test |& grep libcuda

$ strace ./test |&grep libcuda
open("/home/gkraemer/progs/deeplearning/torch/install/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/mesa/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/tls/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/cuda-8.0/lib64/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 3

$ optirun strace ./test |&grep libcuda
open("/usr/lib/x86_64-linux-gnu/primus/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/tls/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib32/nvidia-375/tls/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib32/nvidia-375/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/gkraemer/progs/deeplearning/torch/install/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/mesa/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/cuda-8.0/lib64/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 3

Of course, add optirun wherever necessary, I'm not familiar with bumblebee.

now setting

update-alternatives --config x86_64-linux-gnu_gl_conf

to auto (the nvidia driver)

optirun julia -e 'Pkg.build("CUDAdrv")'

works fine

julia -e 'Pkg.build("CUDAdrv")'

gives the same error with code 999 as before
test.cu compiles fine, same as above

the strace outputs are a little bit different:

$ strace ./test |&grep libcuda
open("/home/gkraemer/progs/deeplearning/torch/install/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/mesa/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/tls/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/cuda-8.0/lib64/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 3
$ optirun strace ./test |&grep libcuda
open("/usr/lib/x86_64-linux-gnu/primus/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/tls/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/nvidia-375/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib32/nvidia-375/tls/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib32/nvidia-375/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/gkraemer/progs/deeplearning/torch/install/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/mesa/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/cuda-8.0/lib64/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 3

gdkrmr · 2017-07-06T09:48:07Z

and:

$ optirun julia -e 'ccall((:cuInit, "/usr/lib/x86_64-linux-gnu/libcuda.so"), Cint, (Cint,), 0)'
$ julia -e 'ccall((:cuInit, "/usr/lib/x86_64-linux-gnu/libcuda.so"), Cint, (Cint,), 0)'

simply return without error messages

maleadt · 2017-07-06T09:56:10Z

Right, add a @show or smth to display the returned value.

gdkrmr · 2017-07-06T09:58:32Z

update-alternatives --config x86_64-linux-gnu_gl_conf

to auto (the nvidia driver)

ccall((:cuInit,"/usr/lib/x86_64-linux-gnu/libcuda.so"),Cint,(Cint,),0) = 999
gkraemer@laaja:~$ optirun julia -e '@show ccall((:cuInit, "/usr/lib/x86_64-linux-gnu/libcuda.so"), Cint, (Cint,), 0)'
ccall((:cuInit,"/usr/lib/x86_64-linux-gnu/libcuda.so"),Cint,(Cint,),0) = 0

set to mesa:

gkraemer@laaja:~$ optirun julia -e '@show ccall((:cuInit, "/usr/lib/x86_64-linux-gnu/libcuda.so"), Cint, (Cint,), 0)'
ccall((:cuInit,"/usr/lib/x86_64-linux-gnu/libcuda.so"),Cint,(Cint,),0) = 999

maleadt · 2017-07-06T10:55:14Z

Thanks for the details.

/usr/lib/x86_64-linux-gnu/libcuda.so isn't a symlink, is it?

What I'm gathering from this, and some posts on the internet, is that optirun enables/disables your NVIDIA GPU, but shouldn't impact CUDA in any other way. That would explain the error 999, but shouldn't be impacted by the libGL change. Also, nvidia-smi does run without optirun, and you mention other CUDA applications, when run without optirun, erroring out with no device...

Debugging this remotely is going to be annoying. I might try to replicate your set-up; I take it you're running Ubuntu? Which versions? Any peculiarities, on eg. the bumblebee set-up?

gdkrmr · 2017-07-06T12:22:53Z

it is a symlink, it links to libcuda.so.1 which links to libcuda.so.375.66, all are in /usr/lib/x86_64-linux-gnu/

gdkrmr · 2017-07-06T12:44:10Z

Getting bumblebee to run is quite annoying, I will try to give you the steps for it as well as I remember, I am sure that there are some details missing.

A fresh Ubuntu 16.04 install, you should use a desktop that runs without 3d acceleration, e.g. Xfce or Lxde, because you might loose 3d acceleration from the Intel GPU temporarily.
Install CUDA from the official repositories (http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64), this should also give you a fairly new version of the nvidia driver.
Install bumblebee from the testing repository (https://launchpad.net/~bumblebee/+archive/ubuntu/testing)
Install the following packages with apt: bumblebee bumblebee-nvidia primus linux-headers-generic
I am not sure if the following is still necesary:
- Follow steps 5 and 6 from here for blacklisting drivers, change the version of the driver accordingly: https://lenovolinux.blogspot.com.es/2016/05/bumblebee-on-lenovo-t440p-nvidia-gt.html
- I do not remember if I had to follow the other steps from that tutorial.

3d accelerated programs should work on the GPU now if run with optirun.
In case you lost 3d acceleration from the Intel card you have to set $LD_LIBRARY_PATH to include the mesa drivers (see: Bumblebee-Project/Bumblebee#869).

you probably want to include /usr/local/cuda-8.0/bin and /usr/lib/nvidia-375/bin into your $PATH.

gdkrmr · 2017-07-07T20:42:52Z

see here:
Bumblebee-Project/Bumblebee#888 (comment)
and here:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-375/+bug/1703014

dfdx · 2017-08-22T22:59:26Z

Just fixed the same error by upgrading the driver from version 375 to the latest 384.

For reference, here's my setup:

GeForce GTX 960M
Ubuntu 16.04
NVidia driver 384 installed/upgraded using built-in "Additional drivers" application
CUDA 8
no Bumblebee / Optirun

Before upgrading running (driver version 375):

$ julia -e '@show ccall((:cuInit, "/usr/lib/x86_64-linux-gnu/libcuda.so"), Cint, (Cint,), 0)'

failed with error 999, CUDA samples compiled, but at run-time failed with error 30, even though nvidia-smi worked fined.

After upgrading (driver version 384) all example work fine.

maleadt · 2017-08-23T12:33:07Z

but at run-time failed with error 30

This seems different from the original report here?
Not sure I can do much about that, looks more like a defunct toolkit installation.

I've tested on 375.39 and 375.66, and CUDAdrv works fine in both cases. But then again, the toolkit already worked properly before that... One thing you might want to consider, is to run any of the samples as root. Sometimes, parts of nvidia-uvm aren't initialized properly yet, but only root can do so.

dfdx · 2017-08-23T13:30:57Z

This seems different from the original report here?

@gdkrmr didn't report the error code from CUDA samples, so I'm not sure about this one. For cuInit, however, the error was 999, i.e. the same as reported here.

Not sure I can do much about that, looks more like a defunct toolkit installation.

Oh, I didn't mean you should, sorry if I made you think so! At least for me, CUDA on Ubuntu fails every now and then by itself, without any relation to CUDAdrv. I posted the report just for other people who may encounter the same error and have already tried other solutions (like using update-alternatives or running as root, both of which didn't work for me).

gdkrmr · 2017-08-24T08:40:02Z

just tried with nvidia driver 384 and still the same issue.

dfdx · 2017-08-25T06:57:39Z

And one more update. It turns out what really caused the error for me was going to sleep mode, and what fixed it was rebooting.

maleadt · 2018-01-11T15:59:51Z

We just encountered another case of this (or a similar) issue, resolved by loading the nvidia_uvm kernel module.

maleadt closed this as completed Jul 5, 2017

maleadt reopened this Jul 6, 2017

maleadt changed the title ~~error 999 when building~~ Error 999 on bumblebee with wrong libGL activated Jul 6, 2017

maleadt closed this as completed Jul 6, 2017

maleadt reopened this Jul 6, 2017

gdkrmr mentioned this issue Jul 7, 2017

update-alternatives Bumblebee-Project/Bumblebee#888

Open

peastman mentioned this issue Mar 27, 2018

CUDA platform error: Error loading CUDA module: CUDA_ERROR_UNKNOWN (999) openmm/openmm#1962

Closed

maleadt closed this as completed Dec 23, 2019

SPAstef mentioned this issue Apr 1, 2020

New NVIDIA driver guide leads to CUDA error clearlinux/distribution#1869

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error 999 on bumblebee with wrong libGL activated #50

Error 999 on bumblebee with wrong libGL activated #50

gdkrmr commented Jul 4, 2017 •

edited

Loading

maleadt commented Jul 4, 2017

gdkrmr commented Jul 5, 2017

maleadt commented Jul 5, 2017

gdkrmr commented Jul 5, 2017 •

edited

Loading

maleadt commented Jul 5, 2017

gdkrmr commented Jul 5, 2017

maleadt commented Jul 5, 2017

gdkrmr commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

maleadt commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

maleadt commented Jul 6, 2017

gdkrmr commented Jul 6, 2017 •

edited

Loading

maleadt commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

gdkrmr commented Jul 6, 2017 •

edited

Loading

gdkrmr commented Jul 7, 2017

dfdx commented Aug 22, 2017

maleadt commented Aug 23, 2017

dfdx commented Aug 23, 2017 •

edited by maleadt

Loading

gdkrmr commented Aug 24, 2017

dfdx commented Aug 25, 2017

maleadt commented Jan 11, 2018

Error 999 on bumblebee with wrong libGL activated #50

Error 999 on bumblebee with wrong libGL activated #50

Comments

gdkrmr commented Jul 4, 2017 • edited Loading

maleadt commented Jul 4, 2017

gdkrmr commented Jul 5, 2017

maleadt commented Jul 5, 2017

gdkrmr commented Jul 5, 2017 • edited Loading

maleadt commented Jul 5, 2017

gdkrmr commented Jul 5, 2017

maleadt commented Jul 5, 2017

gdkrmr commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

maleadt commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

maleadt commented Jul 6, 2017

gdkrmr commented Jul 6, 2017 • edited Loading

maleadt commented Jul 6, 2017

gdkrmr commented Jul 6, 2017

gdkrmr commented Jul 6, 2017 • edited Loading

gdkrmr commented Jul 7, 2017

dfdx commented Aug 22, 2017

maleadt commented Aug 23, 2017

dfdx commented Aug 23, 2017 • edited by maleadt Loading

gdkrmr commented Aug 24, 2017

dfdx commented Aug 25, 2017

maleadt commented Jan 11, 2018

gdkrmr commented Jul 4, 2017 •

edited

Loading

gdkrmr commented Jul 5, 2017 •

edited

Loading

gdkrmr commented Jul 6, 2017 •

edited

Loading

gdkrmr commented Jul 6, 2017 •

edited

Loading

dfdx commented Aug 23, 2017 •

edited by maleadt

Loading