Enable gpu error #427

yangyang-zhang · 2017-12-06T05:56:19Z

I install cuda and environment set,

nvidia-smi 
Wed Dec  6 13:53:24 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:2F:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |  15553MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   36C    P0    31W / 250W |  15479MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

pip install nervananeon

env | grep PATH
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/lib/python2.7/site-packages/mklml_lnx_2018.0.1.20171007/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

python  mnist.py -b gpu
python : error: argument -b/--backend: invalid choice: 'gpu' (choose from 'cpu', 'mkl')

I don't understand why CPU and GPU install the same package,It should be used when the GPU package is installed by TensorFlow.

pip install tensorflow-gpu

The text was updated successfully, but these errors were encountered:

baojun-nervana · 2017-12-06T19:23:39Z

@yangyang-zhang Thanks for the feedback. We will look into this.

baojun-nervana · 2017-12-14T00:26:53Z

@yangyang-zhang You may install the gpu requirements if you want to run on gpu backend.

https://github.com/NervanaSystems/neon/blob/master/gpu_requirements.txt

pip install pycuda, scikit-cuda, pytools

armando-fandango · 2017-12-20T15:49:20Z

@baojun-nervana I am getting the same error even after installing all pre-req:

: error: argument -b/--backend: invalid choice: 'gpu' (choose from 'cpu', 'mkl')

baojun-nervana · 2017-12-20T15:59:42Z

@armando-fandango Can you try to install the gpu dependencies?

pip install pycuda, scikit-cuda, pytool

armando-fandango · 2017-12-20T16:03:36Z

@baojun-nervana yes all the gpu dependencies are installed and I think my message above said that "even after installing all pre-req"

armando-fandango · 2017-12-20T16:04:38Z

I see that in makefile you are doing this : nvcc neon/backends/util/check_gpu.c > /dev/null 2>&1 && ./a.out && rm a.out && echo true

This is always returning an empty HAS_GPU string no matter what. I compiled the check_gpu.c outside of the makefile and it compiles.

Is something fishy in above code ?

baojun-nervana · 2017-12-20T16:13:57Z

It seems you don't have nvcc compiler installed or it is not set in the PATH.

make sure "which nvcc" return right path first.

armando-fandango · 2017-12-20T16:21:33Z

this command works and produces the a.out : nvcc neon/backends/util/check_gpu.c

I am reinstalling the whole nvidia driver and CUDA library to see if there is some problem with that.

baojun-nervana · 2017-12-20T17:19:20Z

It sounds it should work. Is the PATH set up right too?

export PATH="/usr/local/cuda/bin:"$PATH
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib:"$LD_LIBRARY_PATH

armando-fandango · 2017-12-20T19:09:32Z

After reinstalling CUDA and Neon, now I am getting this error:

python3 examples/mnist_mlp.py -b gpu
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pycuda/tools.py", line 426, in context_dependent_memoize
return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7f7aa4360500>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "examples/mnist_mlp.py", line 88, in
num_epochs=args.epochs, cost=cost, callbacks=callbacks)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/models/model.py", line 183, in fit
self._epoch_fit(dataset, callbacks)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/models/model.py", line 205, in epoch_fit
x = self.fprop(x)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/models/model.py", line 236, in fprop
res = self.layers.fprop(x, inference)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/layers/container.py", line 395, in fprop
x = l.fprop(x, inference=inference)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/layers/layer.py", line 1288, in fprop
bsum=self.batch_sum)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/nervanagpu.py", line 1585, in compound_dot
kernel = kernel_specs.get_kernel("".join((clss, op, size)), vec_opt)
File "", line 2, in get_kernel
File "/usr/local/lib/python3.5/dist-packages/pycuda/tools.py", line 430, in context_dependent_memoize
result = func(*args)
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernel_specs.py", line 849, in get_kernel
run_command(maxas_i + [sass_file, cubin_file])
File "/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernel_specs.py", line 785, in run_command
raise RuntimeError("Error(%d):\n%s\n%s" % (proc.returncode, cmd, err))
RuntimeError: Error(13):
PERL5LIB=/usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/maxas.pl -i -w -k sgemm_nn_32x128_vec -Dvec 1 /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/sass/sgemm_nn_32x128.sass /home/armando/.cache/neon/kernels/cubin/sgemm_nn_32x128_vec.cubin
b'Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?5)?,?(?4)?,?(?3)?,?(?2)?,?(?1)?,?(?0)?}/ at /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/MaxAs/MaxAsGrammar.pm line 239.\nUnescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?\!)?P(?[0-6]) )?DEPBAR(?^: { <-- HERE (?5)?,?(?4)?,?(?3)?,?(?2)?,?(?1)?,?(?0)?});/ at /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/MaxAs/MaxAsGrammar.pm line 275.\nError: could not open /home/armando/.cache/neon/kernels/cubin/sgemm_nn_32x128_vec.cubin for writing: Permission denied at /usr/local/lib/python3.5/dist-packages/nervananeon-2.4.0-py3.5.egg/neon/backends/kernels/maxas/MaxAs/Cubin.pm line 640.\n'

armando-fandango · 2017-12-20T19:16:43Z

after I run this command:

sudo chown -R armando.armando ~/.cache

Now it works:
python3 examples/mnist_mlp.py -b gpu
Epoch 0 [Train |████████████████████| 469/469 batches, 0.24 cost, 6.15s]
Epoch 1 [Train |████████████████████| 469/469 batches, 0.20 cost, 1.03s]
Epoch 2 [Train |████████████████████| 469/469 batches, 0.17 cost, 1.06s]
Epoch 3 [Train |████████████████████| 468/468 batches, 0.14 cost, 1.22s]
Epoch 4 [Train |████████████████████| 468/468 batches, 0.13 cost, 1.14s]
Epoch 5 [Train |████████████████████| 468/468 batches, 0.11 cost, 1.07s]
Epoch 6 [Train |████████████████████| 468/468 batches, 0.10 cost, 1.57s]
Epoch 7 [Train |████████████████████| 468/468 batches, 0.09 cost, 1.14s]
Epoch 8 [Train |████████████████████| 468/468 batches, 0.08 cost, 1.18s]
Epoch 9 [Train |████████████████████| 468/468 batches, 0.07 cost, 1.06s]
2017-12-20 14:15:58,969 - neon - DISPLAY - Misclassification error = 2.5%
@yangyang-zhang let me know if I can help you :-)

yangyang-zhang · 2017-12-21T06:23:23Z

Thank you, I've already solved it.

yangyang-zhang closed this as completed Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable gpu error #427

Enable gpu error #427

yangyang-zhang commented Dec 6, 2017

baojun-nervana commented Dec 6, 2017

baojun-nervana commented Dec 14, 2017

armando-fandango commented Dec 20, 2017

baojun-nervana commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

baojun-nervana commented Dec 20, 2017

armando-fandango commented Dec 20, 2017 •

edited

Loading

baojun-nervana commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

yangyang-zhang commented Dec 21, 2017

Enable gpu error #427

Enable gpu error #427

Comments

yangyang-zhang commented Dec 6, 2017

baojun-nervana commented Dec 6, 2017

baojun-nervana commented Dec 14, 2017

armando-fandango commented Dec 20, 2017

baojun-nervana commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

baojun-nervana commented Dec 20, 2017

armando-fandango commented Dec 20, 2017 • edited Loading

baojun-nervana commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

armando-fandango commented Dec 20, 2017

yangyang-zhang commented Dec 21, 2017

armando-fandango commented Dec 20, 2017 •

edited

Loading