pygpu.gpuarray.GpuArrayException #5945

jagiella · 2017-05-16T11:44:26Z

Using keras/theano with the new gpuarray backend, the following error occured:

Using Theano backend.
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
In file included from /tmp/try_flags_Q3q20i.c:4:0:
/usr/include/cudnn.h:63:26: fatal error: driver_types.h: No such file or directory
compilation terminated.

Mapped name None to device cuda: GeForce GTX 1070 (0000:01:00.0)
Epoch 1/10
Traceback (most recent call last):
  File "newBackend.py", line 38, in <module>
    model.fit( x, y)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 868, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1503, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1155, in _fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 1196, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 683, in pygpu.gpuarray.pygpu_copy (pygpu/gpuarray.c:9990)
  File "pygpu/gpuarray.pyx", line 396, in pygpu.gpuarray.array_copy (pygpu/gpuarray.c:7083)
pygpu.gpuarray.GpuArrayException: 
Apply node that caused the error: GpuContiguous(GpuSubtensor{::, ::, ::int64, ::int64}.0)
Toposort index: 72
Inputs types: [GpuArrayType<None>(float32, 4D)]
Inputs shapes: [(4, 1, 11, 11)]
Inputs strides: [(4, 9223372036854775807, -176, -16)]
Inputs values: ['not shown']
Outputs clients: [[GpuCorrMM{valid, (1, 1), (1, 1)}(GpuContiguous.0, GpuContiguous.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

I updated Keras and Theano to the latest versions on both repositories:

sudo -H python -m pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
Successfully installed Keras-2.0.4

sudo -H python -m pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
Successfully installed Theano-0.9.0

Code to reproduce the problem:

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense
import numpy as np
x = np.random.random( size=(100, 1, 200, 300)).astype('float32')
y = np.random.random( size=(100, 3)).astype('float32')
model = Sequential([
	Conv2D( 4, (11,11), activation='relu', padding='same', input_shape=x.shape[1:]),
	MaxPooling2D( (2,2)),
	Conv2D( 8, (7,7), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	Conv2D( 16, (5,5), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	Conv2D( 32, (3,3), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	Conv2D( 64, (3,3), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	#Conv2D( 128, (3,3), activation='relu', padding='same'),
	#MaxPooling2D( (2,2)),
	#Conv2D( 256, (3,3), activation='relu', padding='same'),
	#MaxPooling2D( (2,2)),
	Flatten(),
	Dropout(0.05),
	Dense(128, activation='relu'),
	Dropout(0.05),
	Dense(y.shape[1], activation='softmax')
])
#print( model.summary())
model.compile( loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
model.fit( x, y)

nouiz · 2017-05-16T13:15:31Z

I have a different error message and it come from Keras and tell that there is missing information in the code you give:

$ THEANO_FLAGS=device=cuda,floatX=float32 python test.py 
Using Theano backend.
/Tmp/lisa/os_v5/anaconda/lib/python2.7/site-packages/skcuda/cublas.py:282: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')
/u/bastienf/repos/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6005 on context None
Mapped name None to device cuda: GeForce GTX 750 (0000:05:00.0)
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    Dense(y.shape[1], activation='softmax')
  File "/u/bastienf/repos/keras/models.py", line 407, in __init__
    self.add(layer)
  File "/u/bastienf/repos/keras/models.py", line 475, in add
    output_tensor = layer(self.outputs[0])
  File "/u/bastienf/repos/keras/engine/topology.py", line 615, in __call__
    output_shape = self.compute_output_shape(input_shape)
  File "/u/bastienf/repos/keras/layers/core.py", line 488, in compute_output_shape
    '(got ' + str(input_shape[1:]) + '. '
ValueError: The shape of the input to "Flatten" is not fully defined (got (0, 6, 64). Make sure to pass a complete "input_shape" or "batch_input_shape" argument to the first layer in your model.

Here is the Theano and Keras commit version that I tested:

Theano$ git show
commit 6187a1fa64cc1df6f814cfacd213bd719ab87f38
Merge: 2b3de3e40 78328b7f1
Author: Pascal Lamblin <lamblinp@iro.umontreal.ca>
Date:   Mon May 15 21:24:52 2017 -0400

    Merge pull request #5927 from notoraptor/simplify-and-prepare-gpukernelbase-for-paramstype
    
    Partially factorize GpuKernelBase.get_params() and configure it to use ParamsType.

Theano$ cd ../Keras/
Keras$ git show
commit 0d27d903c295f00d16efaed7f596b3b974e38764
Author: Fariz Rahman <farizrahman4u@gmail.com>
Date:   Sun May 14 23:12:44 2017 +0530

Are you sure there isn't a Keras or Theano version installed in the user? That would cause different version to be used instead of the newly installed version.

jagiella · 2017-05-16T14:31:03Z

I removed all installed versions (python 2.7 and 3.5) of Theano, libgpuarray and Keras and made a clean install from the respective git repositories. But I ended up with the same error.

jagiella · 2017-05-16T14:32:27Z

Your error seems strange by the way as input_shape is indicated in the first layer of the example.

nouiz · 2017-05-16T14:56:05Z

I think the problem as the other one is in keras that don't compute shape correctly. Make an issue on there repo and link to here.

ozancaglayan · 2017-05-17T10:01:56Z

Can't reproduce the problem with the above code on: Python 3.6, Cuda 8.0, cudnn 5.1 (5110), theano-0.9, gpuarray and keras latest master as of yesterday. @nouiz is your image_data_format given as channels_first in keras.json? I think by default it uses TF scheme for dim ordering.

Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.
Using Theano backend.
Using cuDNN version 5110 on context None
Mapped name None to device cuda0: GeForce GTX 1080 (0000:03:00.0)
Epoch 1/10
100/100 [==============================] - 0s - loss: 1.6038 - acc: 0.4300
Epoch 2/10
100/100 [==============================] - 0s - loss: 1.6075 - acc: 0.3100
Epoch 3/10
100/100 [==============================] - 0s - loss: 1.6009 - acc: 0.4500
Epoch 4/10
100/100 [==============================] - 0s - loss: 1.6039 - acc: 0.3400
Epoch 5/10
100/100 [==============================] - 0s - loss: 1.6029 - acc: 0.3600
Epoch 6/10
100/100 [==============================] - 0s - loss: 1.6014 - acc: 0.4100
Epoch 7/10
100/100 [==============================] - 0s - loss: 1.6013 - acc: 0.4200
Epoch 8/10
100/100 [==============================] - 0s - loss: 1.5998 - acc: 0.4000
Epoch 9/10
100/100 [==============================] - 0s - loss: 1.6010 - acc: 0.4300
Epoch 10/10
100/100 [==============================] - 0s - loss: 1.5998 - acc: 0.5000

lamblin · 2017-06-02T21:47:43Z

Which version of libgpuarray did you install, and how?

nouiz · 2017-06-05T14:30:28Z

Here is my keras config file:

{
"image_dim_ordering": "channels_last",
"image_dim_ordering": "th",
"image_dim_ordering": "channels_first",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}

I still have the problem with an updated keras to 0bc8fac4463c68faa3b3c415c26eab02aa361fd5

jagiella · 2017-06-06T13:15:08Z

I updated everything to the current versions:

Keras==2.0.4
Theano==0.10.0.dev1
pygpu==0.6.5 (libgpuarray.so.2.1)
numpy==1.13.0.dev0+b297cb7

and still get a similar error:

Using Theano backend.
Using cuDNN version 5110 on context None
Mapped name None to device cuda: GeForce GTX 1070 (0000:01:00.0)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 4, 190, 290)       488       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 95, 145)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 55100)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               7052928   
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 387       
=================================================================
Total params: 7,053,803
Trainable params: 7,053,803
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/10
Traceback (most recent call last):
  File "newBackend.py", line 15, in <module>
    model.fit( x, y)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 866, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1504, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1156, in _fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 1196, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 683, in pygpu.gpuarray.pygpu_copy (pygpu/gpuarray.c:9990)
  File "pygpu/gpuarray.pyx", line 396, in pygpu.gpuarray.array_copy (pygpu/gpuarray.c:7083)
pygpu.gpuarray.GpuArrayException: 
Apply node that caused the error: GpuContiguous(InplaceGpuDimShuffle{3,2,0,1}.0)
Toposort index: 27
Inputs types: [GpuArrayType<None>(float32, 4D)]
Inputs shapes: [(4, 1, 11, 11)]
Inputs strides: [(4, 9223372036854775807, 176, 16)]
Inputs values: ['not shown']
Outputs clients: [[Shape(GpuContiguous.0), Shape_i{3}(GpuContiguous.0), Shape_i{2}(GpuContiguous.0), Shape_i{0}(GpuContiguous.0), GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), dilation=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})]]

My keras.json looks like following:

{
    "image_data_format": "channels_first", 
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "backend": "theano"
}

My .theanorc:

 [cuda] 
root = /usr/local/cuda-8.0

[global]
floatX=float32
device=cuda

[nvcc]
flags=-D_FORCE_INLINES

[lib]
cnmem = 0.

[dnn]
enabled = True
include_path=/usr/local/cuda-8.0/include
library_path=/usr/local/cuda-8.0/lib64/

nouiz · 2017-06-06T14:51:31Z

What do you mean by libgpuarray.so.2.1?

Can you check what is your latest version of pygpu? pygpu.version. It should be 0.6.5. It give better error in many cases. If you use it, it seem we need to add a better error for your case.

jagiella · 2017-06-06T15:23:22Z

pygpu==0.6.5

lamblin · 2017-06-06T16:31:30Z

I guess the issue is triggered by:

Inputs shapes: [(4, 1, 11, 11)]
Inputs strides: [(4, 9223372036854775807, 176, 16)]

i.e., a "strange" stride on a dimension of length 1.
This could have been copied from NumPy, which does that since 1.12 I guess, so the version of NumPy could be important to reproduce the bug.

jagiella · 2017-06-06T19:51:06Z

numpy==1.13.0.dev0+b297cb7

jagiella · 2017-06-07T10:37:12Z

@lamblin you were right. It was indeed related to the numpy version. I removed the dev-version and installed the default version with pip:

numpy==1.12.1

Problem solved!

Nevertheless, when the default version of numpy is updated to 1.13, the problem might reoccur!

lamblin · 2017-06-09T04:33:44Z

I think it is more a difference between released versions of NumPy and development ones.
In any case, we should still handle that case correctly, so I'm reopening.

nikolay-256 · 2017-06-19T21:58:28Z

@jagiella
I'm install numpy==1.12.1, but problem not solved, i continue to have notice:
/python3.6/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
But it is still faster than cpu in 3 times.
How many times faster do you have, without this notice?

nouiz · 2017-06-20T14:16:40Z

Update Theano to the dev version to get rid of this warning. The GPU speed up vs the CPU depend of the conv sizes. You can use the dnn.conv.algo_bwd_data, dnn.conv.algo_bwd_filter and dnn.conv.algo_fwd to change the convolution implementation and get bigger speed up. time_once and guess_once are good values for those flags. time_once will make the first call much slower, so remove it from your timing.

…

On Mon, Jun 19, 2017 at 5:58 PM nikolay-256 ***@***.***> wrote: jagiella I'm install numpy==1.12.1, but problem not solved, i continue to have notice /python3.6/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1. But it is still faster than cpu in 3 times. How many times faster do you have, without this notice? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5945 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALC-5WGiSoDpMMC9wpdIYDerOVmXdZGks5sFu8HgaJpZM4NcXmn> .

nouiz mentioned this issue May 16, 2017

pygpu.gpuarray.GpuArrayException: Unaligned array #5934

Closed

nouiz added Crash GPU - New back-end labels May 16, 2017

jagiella mentioned this issue Jun 6, 2017

pygpu.gpuarray.GpuArrayException: Unaligned array / Binary mode not supported any more Theano/libgpuarray#432

Closed

jagiella closed this as completed Jun 7, 2017

lamblin reopened this Jun 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pygpu.gpuarray.GpuArrayException #5945

pygpu.gpuarray.GpuArrayException #5945

jagiella commented May 16, 2017

nouiz commented May 16, 2017

jagiella commented May 16, 2017

jagiella commented May 16, 2017

nouiz commented May 16, 2017

ozancaglayan commented May 17, 2017

lamblin commented Jun 2, 2017

nouiz commented Jun 5, 2017

jagiella commented Jun 6, 2017 •

edited

nouiz commented Jun 6, 2017

jagiella commented Jun 6, 2017

lamblin commented Jun 6, 2017

jagiella commented Jun 6, 2017

jagiella commented Jun 7, 2017 •

edited

lamblin commented Jun 9, 2017

nikolay-256 commented Jun 19, 2017 •

edited

nouiz commented Jun 20, 2017 via email

pygpu.gpuarray.GpuArrayException #5945

pygpu.gpuarray.GpuArrayException #5945

Comments

jagiella commented May 16, 2017

nouiz commented May 16, 2017

jagiella commented May 16, 2017

jagiella commented May 16, 2017

nouiz commented May 16, 2017

ozancaglayan commented May 17, 2017

lamblin commented Jun 2, 2017

nouiz commented Jun 5, 2017

jagiella commented Jun 6, 2017 • edited

nouiz commented Jun 6, 2017

jagiella commented Jun 6, 2017

lamblin commented Jun 6, 2017

jagiella commented Jun 6, 2017

jagiella commented Jun 7, 2017 • edited

lamblin commented Jun 9, 2017

nikolay-256 commented Jun 19, 2017 • edited

nouiz commented Jun 20, 2017 via email

jagiella commented Jun 6, 2017 •

edited

jagiella commented Jun 7, 2017 •

edited

nikolay-256 commented Jun 19, 2017 •

edited