Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygpu.gpuarray.GpuArrayException #5945

Open
jagiella opened this issue May 16, 2017 · 16 comments
Open

pygpu.gpuarray.GpuArrayException #5945

jagiella opened this issue May 16, 2017 · 16 comments

Comments

@jagiella
Copy link

Using keras/theano with the new gpuarray backend, the following error occured:

Using Theano backend.
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
In file included from /tmp/try_flags_Q3q20i.c:4:0:
/usr/include/cudnn.h:63:26: fatal error: driver_types.h: No such file or directory
compilation terminated.

Mapped name None to device cuda: GeForce GTX 1070 (0000:01:00.0)
Epoch 1/10
Traceback (most recent call last):
  File "newBackend.py", line 38, in <module>
    model.fit( x, y)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 868, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1503, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1155, in _fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 1196, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 683, in pygpu.gpuarray.pygpu_copy (pygpu/gpuarray.c:9990)
  File "pygpu/gpuarray.pyx", line 396, in pygpu.gpuarray.array_copy (pygpu/gpuarray.c:7083)
pygpu.gpuarray.GpuArrayException: 
Apply node that caused the error: GpuContiguous(GpuSubtensor{::, ::, ::int64, ::int64}.0)
Toposort index: 72
Inputs types: [GpuArrayType<None>(float32, 4D)]
Inputs shapes: [(4, 1, 11, 11)]
Inputs strides: [(4, 9223372036854775807, -176, -16)]
Inputs values: ['not shown']
Outputs clients: [[GpuCorrMM{valid, (1, 1), (1, 1)}(GpuContiguous.0, GpuContiguous.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

I updated Keras and Theano to the latest versions on both repositories:

sudo -H python -m pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
Successfully installed Keras-2.0.4
sudo -H python -m pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
Successfully installed Theano-0.9.0

Code to reproduce the problem:

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense
import numpy as np
x = np.random.random( size=(100, 1, 200, 300)).astype('float32')
y = np.random.random( size=(100, 3)).astype('float32')
model = Sequential([
	Conv2D( 4, (11,11), activation='relu', padding='same', input_shape=x.shape[1:]),
	MaxPooling2D( (2,2)),
	Conv2D( 8, (7,7), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	Conv2D( 16, (5,5), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	Conv2D( 32, (3,3), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	Conv2D( 64, (3,3), activation='relu', padding='same'),
	MaxPooling2D( (2,2)),
	#Conv2D( 128, (3,3), activation='relu', padding='same'),
	#MaxPooling2D( (2,2)),
	#Conv2D( 256, (3,3), activation='relu', padding='same'),
	#MaxPooling2D( (2,2)),
	Flatten(),
	Dropout(0.05),
	Dense(128, activation='relu'),
	Dropout(0.05),
	Dense(y.shape[1], activation='softmax')
])
#print( model.summary())
model.compile( loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
model.fit( x, y)
@nouiz
Copy link
Member

nouiz commented May 16, 2017

I have a different error message and it come from Keras and tell that there is missing information in the code you give:

$ THEANO_FLAGS=device=cuda,floatX=float32 python test.py 
Using Theano backend.
/Tmp/lisa/os_v5/anaconda/lib/python2.7/site-packages/skcuda/cublas.py:282: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')
/u/bastienf/repos/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6005 on context None
Mapped name None to device cuda: GeForce GTX 750 (0000:05:00.0)
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    Dense(y.shape[1], activation='softmax')
  File "/u/bastienf/repos/keras/models.py", line 407, in __init__
    self.add(layer)
  File "/u/bastienf/repos/keras/models.py", line 475, in add
    output_tensor = layer(self.outputs[0])
  File "/u/bastienf/repos/keras/engine/topology.py", line 615, in __call__
    output_shape = self.compute_output_shape(input_shape)
  File "/u/bastienf/repos/keras/layers/core.py", line 488, in compute_output_shape
    '(got ' + str(input_shape[1:]) + '. '
ValueError: The shape of the input to "Flatten" is not fully defined (got (0, 6, 64). Make sure to pass a complete "input_shape" or "batch_input_shape" argument to the first layer in your model.

Here is the Theano and Keras commit version that I tested:

Theano$ git show
commit 6187a1fa64cc1df6f814cfacd213bd719ab87f38
Merge: 2b3de3e40 78328b7f1
Author: Pascal Lamblin <lamblinp@iro.umontreal.ca>
Date:   Mon May 15 21:24:52 2017 -0400

    Merge pull request #5927 from notoraptor/simplify-and-prepare-gpukernelbase-for-paramstype
    
    Partially factorize GpuKernelBase.get_params() and configure it to use ParamsType.

Theano$ cd ../Keras/
Keras$ git show
commit 0d27d903c295f00d16efaed7f596b3b974e38764
Author: Fariz Rahman <farizrahman4u@gmail.com>
Date:   Sun May 14 23:12:44 2017 +0530

Are you sure there isn't a Keras or Theano version installed in the user? That would cause different version to be used instead of the newly installed version.

@jagiella
Copy link
Author

I removed all installed versions (python 2.7 and 3.5) of Theano, libgpuarray and Keras and made a clean install from the respective git repositories. But I ended up with the same error.

@jagiella
Copy link
Author

Your error seems strange by the way as input_shape is indicated in the first layer of the example.

@nouiz
Copy link
Member

nouiz commented May 16, 2017

I think the problem as the other one is in keras that don't compute shape correctly. Make an issue on there repo and link to here.

@ozancaglayan
Copy link
Contributor

Can't reproduce the problem with the above code on: Python 3.6, Cuda 8.0, cudnn 5.1 (5110), theano-0.9, gpuarray and keras latest master as of yesterday. @nouiz is your image_data_format given as channels_first in keras.json? I think by default it uses TF scheme for dim ordering.

Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.
Using Theano backend.
Using cuDNN version 5110 on context None
Mapped name None to device cuda0: GeForce GTX 1080 (0000:03:00.0)
Epoch 1/10
100/100 [==============================] - 0s - loss: 1.6038 - acc: 0.4300
Epoch 2/10
100/100 [==============================] - 0s - loss: 1.6075 - acc: 0.3100
Epoch 3/10
100/100 [==============================] - 0s - loss: 1.6009 - acc: 0.4500
Epoch 4/10
100/100 [==============================] - 0s - loss: 1.6039 - acc: 0.3400
Epoch 5/10
100/100 [==============================] - 0s - loss: 1.6029 - acc: 0.3600
Epoch 6/10
100/100 [==============================] - 0s - loss: 1.6014 - acc: 0.4100
Epoch 7/10
100/100 [==============================] - 0s - loss: 1.6013 - acc: 0.4200
Epoch 8/10
100/100 [==============================] - 0s - loss: 1.5998 - acc: 0.4000
Epoch 9/10
100/100 [==============================] - 0s - loss: 1.6010 - acc: 0.4300
Epoch 10/10
100/100 [==============================] - 0s - loss: 1.5998 - acc: 0.5000

@lamblin
Copy link
Member

lamblin commented Jun 2, 2017

Which version of libgpuarray did you install, and how?

@nouiz
Copy link
Member

nouiz commented Jun 5, 2017

Here is my keras config file:

{
"image_dim_ordering": "channels_last",
"image_dim_ordering": "th",
"image_dim_ordering": "channels_first",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}

I still have the problem with an updated keras to 0bc8fac4463c68faa3b3c415c26eab02aa361fd5

@jagiella
Copy link
Author

jagiella commented Jun 6, 2017

I updated everything to the current versions:

Keras==2.0.4
Theano==0.10.0.dev1
pygpu==0.6.5 (libgpuarray.so.2.1)
numpy==1.13.0.dev0+b297cb7

and still get a similar error:

Using Theano backend.
Using cuDNN version 5110 on context None
Mapped name None to device cuda: GeForce GTX 1070 (0000:01:00.0)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 4, 190, 290)       488       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 95, 145)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 55100)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               7052928   
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 387       
=================================================================
Total params: 7,053,803
Trainable params: 7,053,803
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/10
Traceback (most recent call last):
  File "newBackend.py", line 15, in <module>
    model.fit( x, y)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 866, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1504, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1156, in _fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 1196, in __call__
    return self.function(*inputs)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/gpuarray.pyx", line 683, in pygpu.gpuarray.pygpu_copy (pygpu/gpuarray.c:9990)
  File "pygpu/gpuarray.pyx", line 396, in pygpu.gpuarray.array_copy (pygpu/gpuarray.c:7083)
pygpu.gpuarray.GpuArrayException: 
Apply node that caused the error: GpuContiguous(InplaceGpuDimShuffle{3,2,0,1}.0)
Toposort index: 27
Inputs types: [GpuArrayType<None>(float32, 4D)]
Inputs shapes: [(4, 1, 11, 11)]
Inputs strides: [(4, 9223372036854775807, 176, 16)]
Inputs values: ['not shown']
Outputs clients: [[Shape(GpuContiguous.0), Shape_i{3}(GpuContiguous.0), Shape_i{2}(GpuContiguous.0), Shape_i{0}(GpuContiguous.0), GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), dilation=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})]]

My keras.json looks like following:

{
    "image_data_format": "channels_first", 
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "backend": "theano"
}

My .theanorc:

 [cuda] 
root = /usr/local/cuda-8.0

[global]
floatX=float32
device=cuda

[nvcc]
flags=-D_FORCE_INLINES

[lib]
cnmem = 0.

[dnn]
enabled = True
include_path=/usr/local/cuda-8.0/include
library_path=/usr/local/cuda-8.0/lib64/

@nouiz
Copy link
Member

nouiz commented Jun 6, 2017

What do you mean by libgpuarray.so.2.1?

Can you check what is your latest version of pygpu? pygpu.version. It should be 0.6.5. It give better error in many cases. If you use it, it seem we need to add a better error for your case.

@jagiella
Copy link
Author

jagiella commented Jun 6, 2017

pygpu==0.6.5

@lamblin
Copy link
Member

lamblin commented Jun 6, 2017

I guess the issue is triggered by:

Inputs shapes: [(4, 1, 11, 11)]
Inputs strides: [(4, 9223372036854775807, 176, 16)]

i.e., a "strange" stride on a dimension of length 1.
This could have been copied from NumPy, which does that since 1.12 I guess, so the version of NumPy could be important to reproduce the bug.

@jagiella
Copy link
Author

jagiella commented Jun 6, 2017

numpy==1.13.0.dev0+b297cb7

@jagiella
Copy link
Author

jagiella commented Jun 7, 2017

@lamblin you were right. It was indeed related to the numpy version. I removed the dev-version and installed the default version with pip:

numpy==1.12.1

Problem solved!

Nevertheless, when the default version of numpy is updated to 1.13, the problem might reoccur!

@jagiella jagiella closed this as completed Jun 7, 2017
@lamblin
Copy link
Member

lamblin commented Jun 9, 2017

I think it is more a difference between released versions of NumPy and development ones.
In any case, we should still handle that case correctly, so I'm reopening.

@lamblin lamblin reopened this Jun 9, 2017
@nikolay-256
Copy link

nikolay-256 commented Jun 19, 2017

@jagiella
I'm install numpy==1.12.1, but problem not solved, i continue to have notice:
/python3.6/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
But it is still faster than cpu in 3 times.
How many times faster do you have, without this notice?

@nouiz
Copy link
Member

nouiz commented Jun 20, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants