Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Enable gpu error #427

Closed
yangyang-zhang opened this issue Dec 6, 2017 · 12 comments
Closed

Enable gpu error #427

yangyang-zhang opened this issue Dec 6, 2017 · 12 comments

Comments

@yangyang-zhang
Copy link

I install cuda and environment set,

nvidia-smi 
Wed Dec  6 13:53:24 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:2F:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |  15553MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   36C    P0    31W / 250W |  15479MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

pip install nervananeon

env | grep PATH
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/lib/python2.7/site-packages/mklml_lnx_2018.0.1.20171007/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

python  mnist.py -b gpu
python : error: argument -b/--backend: invalid choice: 'gpu' (choose from 'cpu', 'mkl')

I don't understand why CPU and GPU install the same package,It should be used when the GPU package is installed by TensorFlow.

pip install tensorflow-gpu
@baojun-nervana
Copy link
Contributor

@yangyang-zhang Thanks for the feedback. We will look into this.

@baojun-nervana
Copy link
Contributor

@yangyang-zhang You may install the gpu requirements if you want to run on gpu backend.

https://github.com/NervanaSystems/neon/blob/master/gpu_requirements.txt

pip install pycuda, scikit-cuda, pytools

@armando-fandango
Copy link

@baojun-nervana I am getting the same error even after installing all pre-req:

: error: argument -b/--backend: invalid choice: 'gpu' (choose from 'cpu', 'mkl')

@baojun-nervana
Copy link
Contributor

@armando-fandango Can you try to install the gpu dependencies?

pip install pycuda, scikit-cuda, pytool

@armando-fandango
Copy link

@baojun-nervana yes all the gpu dependencies are installed and I think my message above said that "even after installing all pre-req"

@armando-fandango
Copy link

I see that in makefile you are doing this : nvcc neon/backends/util/check_gpu.c > /dev/null 2>&1 && ./a.out && rm a.out && echo true

This is always returning an empty HAS_GPU string no matter what. I compiled the check_gpu.c outside of the makefile and it compiles.

Is something fishy in above code ?

@baojun-nervana
Copy link
Contributor

It seems you don't have nvcc compiler installed or it is not set in the PATH.

make sure "which nvcc" return right path first.

@armando-fandango
Copy link

armando-fandango commented Dec 20, 2017

this command works and produces the a.out : nvcc neon/backends/util/check_gpu.c

I am reinstalling the whole nvidia driver and CUDA library to see if there is some problem with that.

@baojun-nervana
Copy link
Contributor

It sounds it should work. Is the PATH set up right too?

export PATH="/usr/local/cuda/bin:"$PATH
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib:"$LD_LIBRARY_PATH

@armando-fandango
Copy link

After reinstalling CUDA and Neon, now I am getting this error:

python3 examples/mnist_mlp.py -b gpu
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pycuda/tools.py", line 426, in context_dependent_memoize
return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7f7aa4360500>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "examples/mnist_mlp.py", line 88, in
num_epochs=args.epochs, cost=cost, callbacks=callbacks)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/models/model.py", line 183, in fit
self._epoch_fit(dataset, callbacks)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/models/model.py", line 205, in epoch_fit
x = self.fprop(x)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/models/model.py", line 236, in fprop
res = self.layers.fprop(x, inference)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/layers/container.py", line 395, in fprop
x = l.fprop(x, inference=inference)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/layers/layer.py", line 1288, in fprop
bsum=self.batch_sum)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/nervanagpu.py", line 1585, in compound_dot
kernel = kernel_specs.get_kernel("
".join((clss, op, size)), vec_opt)
File "", line 2, in get_kernel
File "/usr/local/lib/python3.5/dist-packages/pycuda/tools.py", line 430, in context_dependent_memoize
result = func(*args)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernel_specs.py", line 849, in get_kernel
run_command(maxas_i + [sass_file, cubin_file])
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernel_specs.py", line 785, in run_command
raise RuntimeError("Error(%d):\n%s\n%s" % (proc.returncode, cmd, err))
RuntimeError: Error(13):
PERL5LIB=/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/maxas.pl -i -w -k sgemm_nn_32x128_vec -Dvec 1 /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/sass/sgemm_nn_32x128.sass /home/armando/.cache/neon/kernels/cubin/sgemm_nn_32x128_vec.cubin
b'Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?5)?,?(?4)?,?(?3)?,?(?2)?,?(?1)?,?(?0)?}/ at /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/MaxAs/MaxAsGrammar.pm line 239.\nUnescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?\!)?P(?[0-6]) )?DEPBAR(?^: { <-- HERE (?5)?,?(?4)?,?(?3)?,?(?2)?,?(?1)?,?(?0)?});/ at /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/MaxAs/MaxAsGrammar.pm line 275.\nError: could not open /home/armando/.cache/neon/kernels/cubin/sgemm_nn_32x128_vec.cubin for writing: Permission denied at /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/MaxAs/Cubin.pm line 640.\n'

@armando-fandango
Copy link

after I run this command:

sudo chown -R armando.armando ~/.cache

Now it works:
python3 examples/mnist_mlp.py -b gpu
Epoch 0 [Train |████████████████████| 469/469 batches, 0.24 cost, 6.15s]
Epoch 1 [Train |████████████████████| 469/469 batches, 0.20 cost, 1.03s]
Epoch 2 [Train |████████████████████| 469/469 batches, 0.17 cost, 1.06s]
Epoch 3 [Train |████████████████████| 468/468 batches, 0.14 cost, 1.22s]
Epoch 4 [Train |████████████████████| 468/468 batches, 0.13 cost, 1.14s]
Epoch 5 [Train |████████████████████| 468/468 batches, 0.11 cost, 1.07s]
Epoch 6 [Train |████████████████████| 468/468 batches, 0.10 cost, 1.57s]
Epoch 7 [Train |████████████████████| 468/468 batches, 0.09 cost, 1.14s]
Epoch 8 [Train |████████████████████| 468/468 batches, 0.08 cost, 1.18s]
Epoch 9 [Train |████████████████████| 468/468 batches, 0.07 cost, 1.06s]
2017-12-20 14:15:58,969 - neon - DISPLAY - Misclassification error = 2.5%
@yangyang-zhang let me know if I can help you :-)

@yangyang-zhang
Copy link
Author

Thank you, I've already solved it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants