Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR (theano.sandbox.gpuarray): Could not initialize pygpu, support disabled #19

Closed
marcoippolito opened this issue Oct 10, 2014 · 27 comments

Comments

@marcoippolito
Copy link

Hi,
in Ubuntu 14.04 with Nvidia GeForce GTX 770, I installed via sudo apt-get http://packages.ubuntu.com/trusty/devel/nvidia-cuda-toolkit (5.5.22-3ubuntu1)
Followed the step-by-step-guide of libgpuarray: http://deeplearning.net/software/libgpuarray/installation.html#requirements

But when testing Theano with cuda, the output is:

marco@marco-All-Series:~/Theano-Testing$ THEANO_FLAGS=device=cuda0 python check1.py
ERROR (theano.sandbox.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/gpuarray/init.py", line 44, in
init_dev(config.device)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/gpuarray/init.py", line 36, in init_dev
context = pygpu.init(dev)
File "gpuarray.pyx", line 575, in pygpu.gpuarray.init (pygpu/gpuarray.c:7317)
File "gpuarray.pyx", line 546, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:7246)
File "gpuarray.pyx", line 950, in pygpu.gpuarray.GpuContext.cinit (pygpu/gpuarray.c:10820)
GpuArrayException: No CUDA devices avaiable
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 3.85683894157 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the cpu

Any clues to solve the problem?

Looking forward to your kind hints.
Kind regards.
Marco

@abergeron
Copy link
Member

So it seems that the cuda support was built, but that the library can't find any devices. What does nvidia-smi says?

@marcoippolito
Copy link
Author

Hi,
this is what nvidia-smi says:

marco@marco-All-Series:~/Theano-Testing$ nvidia-smi
Fri Oct 10 21:57:58 2014       
+------------------------------------------------------+                       
| NVIDIA-SMI 4.304...   Driver Version: 304.117        |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name                     | Bus-Id        Disp.  | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage         | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 770          | 0000:01:00.0     N/A |                  N/A |
| 30%   34C  N/A     N/A /  N/A |  10%  210MB / 2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

Do you find anything strange? Or it might be that the Driver Version: 304.117 collides with Theano?

Looking forward to your hints.
Marco

@abergeron
Copy link
Member

There are newer drivers, but that version should be fine for CUDA 5.5. Are you able to run examples from the toolkit (like deviceQuery)?

@marcoippolito
Copy link
Author

sorry for the question.....where can I precisely find "deviceQuery"?

@abergeron
Copy link
Member

It's one one the example programs that comes with the toolkit.

@marcoippolito
Copy link
Author

thank you very much for your kind patience
I'm having difficulty in finding where it put the whole directory of the tool.
But it's indeed installed:
arco@marco-All-Series:~$ dpkg -l * | grep nvidia-cuda-toolkit
+ii nvidia-cuda-toolkit 5.5.22-3ubuntu1 amd64 NVIDIA CUDA toolkit

Where is it usually put?

@abergeron
Copy link
Member

Normally the program is sample source code that you have to compile and then run. On my machine it is at /usr/local/cuda-6.0/samples/1_Utilities/deviceQuery

But since you are using the ubuntu version of the package (rather than the official NVIDIA one), It might be somewhere else.

@abergeron
Copy link
Member

If you can't find deviceQuery try running these commands:

$ THEANO_FLAGS=device=gpu0 python check1.py
$ THEANO_FLAGS=device=cuda0 python check1.py
$ nvidia-smi
$ THEANO_FLAGS=device=gpu0 python check1.py
$ THEANO_FLAGS=device=cuda0 python check1.py

and post the results.

@marcoippolito
Copy link
Author

marco@marco-All-Series:~$ dpkg -L nvidia-cuda-toolkit

found it:
marco@marco-All-Series:/usr/lib/nvidia-cuda-toolkit$ sudo find -name "1_Utilities"
marco@marco-All-Series:/usr/lib/nvidia-cuda-toolkit$ ls -a
. .. bin include lib libdevice

but no "samples"......... strange isn't?

I installed nvidia-cuda-toolkit via sudo apt-get install

dpkg -L nvidia-cuda-toolkit
/.
/usr
/usr/lib
/usr/lib/nvidia-cuda-toolkit
/usr/lib/nvidia-cuda-toolkit/lib
/usr/lib/nvidia-cuda-toolkit/lib/gfec
/usr/lib/nvidia-cuda-toolkit/lib/inline
/usr/lib/nvidia-cuda-toolkit/lib/be
/usr/lib/nvidia-cuda-toolkit/bin
/usr/lib/nvidia-cuda-toolkit/bin/crt
/usr/lib/nvidia-cuda-toolkit/bin/crt/link.stub
/usr/lib/nvidia-cuda-toolkit/bin/crt/prelink.stub
/usr/lib/nvidia-cuda-toolkit/bin/nvcc
/usr/lib/nvidia-cuda-toolkit/bin/nvopencc
/usr/lib/nvidia-cuda-toolkit/bin/cicc
/usr/lib/nvidia-cuda-toolkit/libdevice
/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc
/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.compute_30.10.bc
/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.compute_20.10.bc
/usr/bin
/usr/bin/nvdisasm
/usr/bin/nvcc
/usr/bin/cuda-memcheck
/usr/bin/fatbinary
/usr/bin/cuobjdump
/usr/bin/nvopencc
/usr/bin/fatbin
/usr/bin/ptxas
/usr/bin/cudafe
/usr/bin/cudafe++
/usr/bin/filehash
/usr/bin/bin2c
/usr/bin/nvlink
/usr/share
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/nvidia-cuda-toolkit
/usr/share/doc
/usr/share/doc/nvidia-cuda-toolkit
/usr/share/doc/nvidia-cuda-toolkit/README.Debian
/usr/share/doc/nvidia-cuda-toolkit/copyright
/usr/include
/usr/include/nvvm.h
/etc
/etc/nvcc.profile
/usr/lib/nvidia-cuda-toolkit/bin/nvcc.profile
/usr/share/doc/nvidia-cuda-toolkit/changelog.Debian.gz

@marcoippolito
Copy link
Author

marco@marco-All-Series:/Theano-Testing$ ls -a
. .. check1.py .theanorc
marco@marco-All-Series:
/Theano-Testing$ THEANO_FLAGS=device=gpu0 python check1.py
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu0 is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 29.4918100834 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]
Used the cpu
marco@marco-All-Series:~/Theano-Testing$ THEANO_FLAGS=device=cuda0 python check1.py
ERROR (theano.sandbox.gpuarray): pygpu was configured but could not be imported
Traceback (most recent call last):
File "/home/marco/anaconda/lib/python2.7/site-packages/theano/sandbox/gpuarray/init.py", line 16, in
import pygpu
ImportError: No module named pygpu
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 30.0467281342 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]
Used the cpu

@abergeron
Copy link
Member

Something is wrong with your installation of cuda. I would strongly recommend you remove the ubuntu packages (including the driver) and reinstall using the official NVIDIA pacakges.

If you don't want to do that, then I can't really help you because I've never used other packages than those.

@marcoippolito
Copy link
Author

Two days ago I installed Cuda 6.5 Production Release from here: https://developer.nvidia.com/cuda-downloads
The use of the gpu (Nvidia GeForce GTX 770) by Theano went fine (speed up from 3 secs to 0,3 sec), but the PC was affected by an unfortunately well-spread bug (pc logging)
You can see details of the bug Cuda 6.5 -Ubuntu 14.04 here: https://bugs.launchpad.net/ubuntu/+source/lightdm/+bug/1312526
This is why I decided to re-install Ubuntu 14.04 and install nvidia-cuda-toolkit via sudo apt-get install, which is the normal way to install "secure", because "officially tested" packages in Ubuntu

So I'm in a strange situation: if I install Cuda 6.5 I will probably be affected by the bug, which blocks my PC....but if I install the toolkit via ubuntu, it prevents the gpu to be recognised....

What do you suggest me to do?

@abergeron
Copy link
Member

My last usual suspect is the device nodes. Try running $ ls -lh /dev/nvidia*.

@marcoippolito
Copy link
Author

marco@marco-All-Series:~$ ls -lh /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 ott 10 21:53 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 ott 10 21:53 /dev/nvidiactl

do you find something wrong here?

@abergeron
Copy link
Member

Well everything seems good. I really have no idea what exactly is broken, but something is. And it isn't in theano or libgpuarray.

So I don't really consider this a bug.

@marcoippolito
Copy link
Author

I do really appreciate and thank you for your kind help.
May I ask you your name? (only to say "thank you" personally)

@marcoippolito
Copy link
Author

Tomorrow morning, now here it's 11.30 p.m. and being tired I can do some mistakes, I will remove the ubuntu packages, including the nvidia driver.
I will first re-install them all via sudo apt-get (the normal "secure" way of installing packages in ubuntu), and then, if it doesn't succeed, I will remove them all again and install the driver via https://developer.nvidia.com/cuda-downloads. Hopefully this won't let the "log-in" bug block my PC again, forcing me to re-install again Ubuntu 14.04.

May I contact you again, in case I have other problems?
Kind regards.
Marco

@marcoippolito
Copy link
Author

Hi,
through the expert knowledgeable guide of a friend I solved the problem.
It's actually a problem of incompatibility between nvidia driver, installed via official nvidia repository, and cuda, installed via official ubuntu repository (apt-get).

I asked my friend to make a detailed post on his blog to describe in details problem and solution, in order to spread as much as possible the knowledge about these issues.
Once ready, I will post here the link to his blog post.

Kind regards.
Marco

@abergeron
Copy link
Member

So it seems the problem is sloved.

@antimora
Copy link

Hi @marcoippolito,

Do you have that link to a blog post on how to resolve this issue? Do I need to use the official version of cuda from nvidia not from apt-get?

Thanks,

  • DT

@antimora
Copy link

Strange. I was able to solve this issue by running as root (sudo) and then changing ownership of ~/.theano to my current user. Then afterwards, the issue was fixed.

This group message helped me: https://groups.google.com/d/msg/theano-users/xW9jmHzOwp0/8SvMA_R0EAUJ

Well.. I was told that the problem is wich CUDA 5.5 .. I have to execute CUDA code at least once as root .. after that it all works. I also was told that this issue is gone with CUDA 6.0 RC (but I did not try that out yet).

@fvisin
Copy link
Member

fvisin commented Feb 23, 2015

I had a similar experience. In my case the apt-get installation did not load the required modules.

I was able to solve the problem by running sudo modprobe nvidia_331_updates nvidia-331-updates-uvm. The modules that you need to load depend on your installation, one way to find them is to use the TAB-autocompletion, i.e., write sudo modprobe nvidia and press tab twice to see the list of nvidia modules available in your system.

@fccoelho
Copy link

fccoelho commented Apr 9, 2015

I had the same problem but I got it solved by following the recommendation of @antimora 👍
I ran:

$ sudo theano-test #it detected gpu
$ chown -R <myusername> ~/.theano
$ theano-test # detected gpu!

thanks!

@nouiz
Copy link
Member

nouiz commented Apr 9, 2015

@abergeron, what do you think of making a check in Theano that the
compiledir is owned by the user and if it isn't, raise an error by default?

I think it would help people understand more rapidly the problem.

What about the problem that when "sudo theano-test", the compiledir end up
being in the user home, not root home. Do one of you know why this can
happen?

On Thu, Apr 9, 2015 at 7:43 AM, Flávio Codeço Coelho <
notifications@github.com> wrote:

I had the same problem but I got it solve by following the recommendation
of @antimora https://github.com/antimora [image: 👍]
I ran:

$ sudo theano-test #it detected gpu
$ chown -R ~/.theano
$ theano-test # detected gpu!

thanks!


Reply to this email directly or view it on GitHub
#19 (comment).

@abergeron
Copy link
Member

Because sudo changes the uid but not the rest of the environnement (especially $HOME)

@sjhddh
Copy link

sjhddh commented Apr 17, 2016

Hey @abergeron @marcoippolito , Have you found the final solution?

I think I ran into a similar problem,

I specified the issue here: #4384

Wish can get some ideas and help

Thank you

@nouiz
Copy link
Member

nouiz commented Apr 18, 2016

It don't seem related to libgpuarray, so I replied in your original issue.

On Sun, Apr 17, 2016 at 7:20 AM, Aaron J. Sun notifications@github.com
wrote:

Hey @abergeron https://github.com/abergeron @marcoippolito
https://github.com/marcoippolito , Have you found the final solution?

I think I ran into a similar problem,

I specified the issue here: #4384
Theano/Theano#4384 (comment)
http://url

Wish can get some ideas and help

Thank you


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#19 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants