Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: total size of new array must be unchanged #2270

Closed
cancan101 opened this issue Nov 19, 2014 · 16 comments
Closed

ValueError: total size of new array must be unchanged #2270

cancan101 opened this issue Nov 19, 2014 · 16 comments

Comments

@cancan101
Copy link

Sometime between 52cb8ec and d8ffecc I started getting:

Traceback (most recent call last):
    data = yaml_parse.load(yaml_str)
  File "/home/ubuntu/git/pylearn2/pylearn2/config/yaml_parse.py", line 337, in load
    return _instantiate(proxy_graph)
  File "/home/ubuntu/git/pylearn2/pylearn2/config/yaml_parse.py", line 280, in _instantiate
    return _instantiate_proxy_tuple(proxy, bindings)
  File "/home/ubuntu/git/pylearn2/pylearn2/config/yaml_parse.py", line 230, in _instantiate_proxy_tuple
    obj = checked_call(proxy.callable, kwargs)
  File "/home/ubuntu/git/pylearn2/pylearn2/utils/call_check.py", line 99, in checked_call
    return to_call(**kwargs)
  File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 490, in __init__
    self._update_layer_input_spaces()
  File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 565, in _update_layer_input_spaces
    layers[i].set_input_space(layers[i-1].get_output_space())
  File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 3305, in set_input_space
    self.initialize_output_space()
  File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 3245, in initialize_output_space
    dummy_p = dummy_p.eval()
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/graph.py", line 431, in eval
    rval = self._fn_cache[inputs](*args)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 605, in __call__
    self.fn.thunks[self.fn.position_of_error])
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/op.py", line 753, in rval
    r = p(n, [x[0] for x in i], o)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/sandbox/cuda/basic_ops.py", line 2347, in perform
    raise ValueError("total size of new array must be unchanged")
ValueError: total size of new array must be unchanged
Apply node that caused the error: GpuReshape{4}(GpuDnnPool.0, Join.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)), TensorType(int64, vector)]
Inputs shapes: [(32, 1, 73, 98), (4,)]
Inputs strides: [(7154, 0, 98, 1), (8,)]
Inputs scalar values: ['not scalar', 'not scalar']
@nouiz
Copy link
Member

nouiz commented Nov 19, 2014

I got this report from someone else. I'm not able to reproduce it. To help
debug this, can you provide script that generate this error?

On Wed, Nov 19, 2014 at 1:55 PM, Alex Rothberg notifications@github.com
wrote:

Sometime between 52cb8ec
52cb8ec
and d8ffecc
d8ffecc
I started getting:

Traceback (most recent call last):
data = yaml_parse.load(yaml_str)
File "/home/ubuntu/git/pylearn2/pylearn2/config/yaml_parse.py", line 337, in load
return _instantiate(proxy_graph)
File "/home/ubuntu/git/pylearn2/pylearn2/config/yaml_parse.py", line 280, in _instantiate
return _instantiate_proxy_tuple(proxy, bindings)
File "/home/ubuntu/git/pylearn2/pylearn2/config/yaml_parse.py", line 230, in _instantiate_proxy_tuple
obj = checked_call(proxy.callable, kwargs)
File "/home/ubuntu/git/pylearn2/pylearn2/utils/call_check.py", line 99, in checked_call
return to_call(**kwargs)
File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 490, in init
self._update_layer_input_spaces()
File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 565, in _update_layer_input_spaces
layers[i].set_input_space(layers[i-1].get_output_space())
File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 3305, in set_input_space
self.initialize_output_space()
File "/home/ubuntu/git/pylearn2/pylearn2/models/mlp.py", line 3245, in initialize_output_space
dummy_p = dummy_p.eval()
File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/graph.py", line 431, in eval
rval = self._fn_cacheinputs
File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 605, in call
self.fn.thunks[self.fn.position_of_error])
File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 595, in call
outputs = self.fn()
File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/op.py", line 753, in rval
r = p(n, [x[0] for x in i], o)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/sandbox/cuda/basic_ops.py", line 2347, in perform
raise ValueError("total size of new array must be unchanged")
ValueError: total size of new array must be unchanged
Apply node that caused the error: GpuReshape{4}(GpuDnnPool.0, Join.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)), TensorType(int64, vector)]
Inputs shapes: [(32, 1, 73, 98), (4,)]
Inputs strides: [(7154, 0, 98, 1), (8,)]
Inputs scalar values: ['not scalar', 'not scalar']


Reply to this email directly or view it on GitHub
#2270.

@cancan101
Copy link
Author

from pylearn2.config import yaml_parse
model_yl = """
!obj:pylearn2.models.mlp.MLP {input_space: !obj:pylearn2.space.Conv2DSpace {shape: [
      300, 400], num_channels: 1}, layers: [!obj:pylearn2.models.mlp.ConvRectifiedLinear {
      layer_name: 'h0', output_channels: 7, irange: .05, kernel_shape: [3, 3], pool_shape: [
        2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365}, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
      layer_name: 'h1', output_channels: 16, irange: .05, kernel_shape: [3, 3], pool_shape: [
        2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365}, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
      layer_name: 'h2', output_channels: 32, irange: .05, kernel_shape: [3, 3], pool_shape: [
        2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365, init_bias: 0.1}, !obj:pylearn2.models.maxout.Maxout {
      layer_name: 'h6', num_units: 57, num_pieces: 2, irange: 0.0575059125935, max_col_norm: 2.0,
      init_bias: 1.}, !obj:pylearn2.models.mlp.Softmax {max_col_norm: 3.9365, layer_name: 'y',
      n_classes: 2, irange: 0.005000}]}
"""
yaml_parse.load(model_yl)

@nouiz
Copy link
Member

nouiz commented Nov 19, 2014

This do not reproduce the problem here. Can you tell me which version of
Theano and pylearn2 do you use? Can you update pylearn2 and try if it work
for you?

On Wed, Nov 19, 2014 at 2:32 PM, Alex Rothberg notifications@github.com
wrote:

from pylearn2.config import yaml_parse
model_yl = """!obj:pylearn2.models.mlp.MLP {input_space: !obj:pylearn2.space.Conv2DSpace {shape: [ 300, 400], num_channels: 1}, layers: [!obj:pylearn2.models.mlp.ConvRectifiedLinear { layer_name: 'h0', output_channels: 7, irange: .05, kernel_shape: [3, 3], pool_shape: [ 2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365}, !obj:pylearn2.models.mlp.ConvRectifiedLinear { layer_name: 'h1', output_channels: 16, irange: .05, kernel_shape: [3, 3], pool_shape: [ 2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365}, !obj:pylearn2.models.mlp.ConvRectifiedLinear { layer_name: 'h2', output_channels: 32, irange: .05, kernel_shape: [3, 3], pool_shape: [ 2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365, init_bias: 0.1}, !obj:pylearn2.models.maxout.Maxout { layer_name: 'h6', num_units: 57, num_pieces: 2, irange: 0.0575059125935, max_col_norm: 2.0, init_bias: 1.}, !obj:pylearn2.models.mlp.Softmax {max_col_norm: 3.9365, layer_name: 'y', n_classes: 2, irange: 0.005000}]}"""
yaml_parse.load(model_yl)


Reply to this email directly or view it on GitHub
#2270 (comment).

@cancan101
Copy link
Author

I am seeing this issue on pylearn2 lisa-lab/pylearn2@a23672b (master) and Theano 4c513ba (master).

@nouiz
Copy link
Member

nouiz commented Nov 19, 2014

I was missing that the Theano flags need to be floatX=float32 and
device=gpu. I'm checking what is going on.

On Wed, Nov 19, 2014 at 2:56 PM, Alex Rothberg notifications@github.com
wrote:

I am seeing this issue on pylearn2 lisa-lab/pylearn2@a23672b
lisa-lab/pylearn2@a23672b
(master) and Theano 4c513ba
4c513ba
(master).


Reply to this email directly or view it on GitHub
#2270 (comment).

@GeertLitjens
Copy link

In response to your replies in regards to changing the type.py in theano.gof, I got the following. I'm guessing that this does not help you any further:

Input shape: (128, 128)
Using gpu device 0: GeForce GTX TITAN Black
Detector space: (124, 124)
Traceback (most recent call last):

File "", line 1, in
runfile('D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py', wdir='D:/Experiments/Papers/PatholCAD/scripts')

File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 586, in runfile
execfile(filename, namespace)

File "D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py", line 14, in
train = yaml_parse.load(train)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\config\yaml_parse.py", line 338, in load
return _instantiate(proxy_graph)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\config\yaml_parse.py", line 281, in _instantiate
return _instantiate_proxy_tuple(proxy, bindings)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\config\yaml_parse.py", line 230, in _instantiate_proxy_tuple
for k, v in six.iteritems(proxy.keywords))

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\config\yaml_parse.py", line 230, in
for k, v in six.iteritems(proxy.keywords))

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\config\yaml_parse.py", line 281, in _instantiate
return _instantiate_proxy_tuple(proxy, bindings)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\config\yaml_parse.py", line 231, in _instantiate_proxy_tuple
obj = checked_call(proxy.callable, kwargs)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\utils\call_check.py", line 99, in checked_call
return to_call(**kwargs)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\models\mlp.py", line 464, in init
self._update_layer_input_spaces()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\models\mlp.py", line 529, in _update_layer_input_spaces
layers[0].set_input_space(self.get_input_space())

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\sandbox\rnn\models\mlp_hook.py", line 334, in outer
return set_input_space(self, input_space)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\sandbox\rnn\models\mlp_hook.py", line 334, in outer
return set_input_space(self, input_space)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\models\mlp.py", line 3066, in set_input_space
self.initialize_output_space()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\models\mlp.py", line 3006, in initialize_output_space
dummy_p = dummy_p.eval()

File "C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\gof\graph.py", line 431, in eval
rval = self._fn_cacheinputs

File "C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 595, in call
outputs = self.fn()

File "C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\debugmode.py", line 2096, in deco
return f()

File "C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\debugmode.py", line 1806, in f
storage_map[r][0] = _lessbroken_deepcopy(r_vals[r])

File "C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\debugmode.py", line 880, in _lessbroken_deepcopy
rval = copy.deepcopy(a)

File "C:\Python27\lib\copy.py", line 189, in deepcopy
"un(deep)copyable object of type %s" % cls)

Error: un(deep)copyable object of type <type 'PyCObject'>

My code is similar to that of cancan101, so you should get the same issue with that. If there is anything else I can do to help, let me know.

Kind regards,

Geert

@nouiz
Copy link
Member

nouiz commented Nov 19, 2014

I have a partial fix in a branch:

https://github.com/nouiz/Theano/tree/mixed

It fix the crash, but now the tests fails. So I need to fix them. With this branch, your example do not crash anymore.

@GeertLitjens
Copy link

Hi Fred,

After applying your fix it work up till right after epoch 0, as soon as it starts with epoch 1 I get:

....
valid_y_nll: 0.693147003651
valid_y_row_norms_max: 0.0427022613585
valid_y_row_norms_mean: 0.0127967474982
valid_y_row_norms_min: 0.000618629215751
Traceback (most recent call last):

File "", line 1, in
runfile('D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py', wdir='D:/Experiments/Papers/PatholCAD/scripts')

File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 586, in runfile
execfile(filename, namespace)

File "D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py", line 15, in
train.main_loop()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\train.py", line 207, in main_loop
rval = self.algorithm.train(dataset=self.dataset)

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\training_algorithms\sgd.py", line 453, in train
self.sgd_update(*batch)

File "C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 605, in call
self.fn.thunks[self.fn.position_of_error])

File "C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py", line 595, in call
outputs = self.fn()

RuntimeError: GpuDnnPoolGrad: error doing operation: An incorrect value was passed in.
Apply node that caused the error: GpuDnnPoolGrad(GpuContiguous.0, GpuContiguous.0, GpuContiguous.0, GpuDnnPoolDesc{ws=(2, 2), stride=(2, 2), mode='max'}.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, (False, True, False, False)), <theano.gof.type.CDataType object at 0x000000002311D668>]
Inputs shapes: [(8192, 1, 27, 27), (8192, 1, 14, 14), (8192, 1, 14, 14), 'No shapes']
Inputs strides: [(729, 0, 27, 1), (196, 0, 14, 1), (196, 0, 14, 1), 'No strides']
Inputs values: ['not shown', 'not shown', 'not shown', <PyCObject object at 0x000000002E43C1C0>]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.

@nouiz
Copy link
Member

nouiz commented Nov 20, 2014

Thanks for the info. The example I had didn't cover this case. This mean
the grad of the pooling have the same problem. I'll look into it today.

A workaround is to not use the cudnn pooling for now. This can be done with
the Theano flags: optimizer_excluding=local_pool_dnn

On Wed, Nov 19, 2014 at 6:17 PM, GeertLitjens notifications@github.com
wrote:

Hi Fred,

After applying your fix it work up till right after epoch 0, as soon as it
starts with epoch 1 I get:

....
valid_y_nll: 0.693147003651
valid_y_row_norms_max: 0.0427022613585
valid_y_row_norms_mean: 0.0127967474982
valid_y_row_norms_min: 0.000618629215751
Traceback (most recent call last):

File "", line 1, in
runfile('D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py',
wdir='D:/Experiments/Papers/PatholCAD/scripts')

File
"C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py",
line 586, in runfile
execfile(filename, namespace)

File "D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py",
line 15, in
train.main_loop()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\train.py", line 207, in
main_loop
rval = self.algorithm.train(dataset=self.dataset)

File
"D:/Code/thirdpartylibs/pylearn2\pylearn2\training_algorithms\sgd.py", line
453, in train
self.sgd_update(*batch)

File
"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",
line 605, in call
self.fn.thunks[self.fn.position_of_error])

File
"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",
line 595, in call
outputs = self.fn()

RuntimeError: GpuDnnPoolGrad: error doing operation: An incorrect value
was passed in.
Apply node that caused the error: GpuDnnPoolGrad(GpuContiguous.0,
GpuContiguous.0, GpuContiguous.0, GpuDnnPoolDesc{ws=(2, 2), stride=(2, 2),
mode='max'}.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)), ]
Inputs shapes: [(8192, 1, 27, 27), (8192, 1, 14, 14), (8192, 1, 14, 14),
'No shapes']
Inputs strides: [(729, 0, 27, 1), (196, 0, 14, 1), (196, 0, 14, 1), 'No
strides']
Inputs values: ['not shown', 'not shown', 'not shown', ]

HINT: Re-running with most Theano optimization disabled could give you a
back-trace of when this node was created. This can be done with by setting
the Theano flag 'optimizer=fast_compile'. If that does not work, Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint of
this apply node.


Reply to this email directly or view it on GitHub
#2270 (comment).

@nouiz
Copy link
Member

nouiz commented Nov 21, 2014

Just to tell that yesterday we disabled this fonctionality in Theano. So if
you update Theano, it won't use the problematic code.

On Thu, Nov 20, 2014 at 9:18 AM, Frédéric Bastien <
frederic.bastien@gmail.com> wrote:

Thanks for the info. The example I had didn't cover this case. This mean
the grad of the pooling have the same problem. I'll look into it today.

A workaround is to not use the cudnn pooling for now. This can be done
with the Theano flags: optimizer_excluding=local_pool_dnn

On Wed, Nov 19, 2014 at 6:17 PM, GeertLitjens notifications@github.com
wrote:

Hi Fred,

After applying your fix it work up till right after epoch 0, as soon as
it starts with epoch 1 I get:

....
valid_y_nll: 0.693147003651
valid_y_row_norms_max: 0.0427022613585
valid_y_row_norms_mean: 0.0127967474982
valid_y_row_norms_min: 0.000618629215751
Traceback (most recent call last):

File "", line 1, in
runfile('D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py',
wdir='D:/Experiments/Papers/PatholCAD/scripts')

File
"C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py",
line 586, in runfile
execfile(filename, namespace)

File
"D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py", line
15, in
train.main_loop()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\train.py", line 207, in
main_loop
rval = self.algorithm.train(dataset=self.dataset)

File
"D:/Code/thirdpartylibs/pylearn2\pylearn2\training_algorithms\sgd.py", line
453, in train
self.sgd_update(*batch)

File
"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",
line 605, in call
self.fn.thunks[self.fn.position_of_error])

File
"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",
line 595, in call
outputs = self.fn()

RuntimeError: GpuDnnPoolGrad: error doing operation: An incorrect value
was passed in.
Apply node that caused the error: GpuDnnPoolGrad(GpuContiguous.0,
GpuContiguous.0, GpuContiguous.0, GpuDnnPoolDesc{ws=(2, 2), stride=(2, 2),
mode='max'}.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)), ]
Inputs shapes: [(8192, 1, 27, 27), (8192, 1, 14, 14), (8192, 1, 14, 14),
'No shapes']
Inputs strides: [(729, 0, 27, 1), (196, 0, 14, 1), (196, 0, 14, 1), 'No
strides']
Inputs values: ['not shown', 'not shown', 'not shown', ]

HINT: Re-running with most Theano optimization disabled could give you a
back-trace of when this node was created. This can be done with by setting
the Theano flag 'optimizer=fast_compile'. If that does not work, Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint of
this apply node.


Reply to this email directly or view it on GitHub
#2270 (comment).

@cancan101
Copy link
Author

Okay. Hopefully you are to turn the example I made into a unit test.

On Fri Nov 21 2014 at 2:00:46 PM Frédéric Bastien notifications@github.com
wrote:

Just to tell that yesterday we disabled this fonctionality in Theano. So
if
you update Theano, it won't use the problematic code.

On Thu, Nov 20, 2014 at 9:18 AM, Frédéric Bastien <
frederic.bastien@gmail.com> wrote:

Thanks for the info. The example I had didn't cover this case. This mean
the grad of the pooling have the same problem. I'll look into it today.

A workaround is to not use the cudnn pooling for now. This can be done
with the Theano flags: optimizer_excluding=local_pool_dnn

On Wed, Nov 19, 2014 at 6:17 PM, GeertLitjens notifications@github.com

wrote:

Hi Fred,

After applying your fix it work up till right after epoch 0, as soon as
it starts with epoch 1 I get:

....
valid_y_nll: 0.693147003651
valid_y_row_norms_max: 0.0427022613585
valid_y_row_norms_mean: 0.0127967474982
valid_y_row_norms_min: 0.000618629215751
Traceback (most recent call last):

File "", line 1, in

runfile('D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py',

wdir='D:/Experiments/Papers/PatholCAD/scripts')

File

"C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py",

line 586, in runfile
execfile(filename, namespace)

File
"D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py",
line
15, in
train.main_loop()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\train.py", line 207, in
main_loop
rval = self.algorithm.train(dataset=self.dataset)

File
"D:/Code/thirdpartylibs/pylearn2\pylearn2\training_algorithms\sgd.py",
line
453, in train
self.sgd_update(*batch)

File

"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",

line 605, in call
self.fn.thunks[self.fn.position_of_error])

File

"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",

line 595, in call
outputs = self.fn()

RuntimeError: GpuDnnPoolGrad: error doing operation: An incorrect value
was passed in.
Apply node that caused the error: GpuDnnPoolGrad(GpuContiguous.0,
GpuContiguous.0, GpuContiguous.0, GpuDnnPoolDesc{ws=(2, 2), stride=(2,
2),
mode='max'}.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)), ]
Inputs shapes: [(8192, 1, 27, 27), (8192, 1, 14, 14), (8192, 1, 14,
14),
'No shapes']
Inputs strides: [(729, 0, 27, 1), (196, 0, 14, 1), (196, 0, 14, 1), 'No
strides']
Inputs values: ['not shown', 'not shown', 'not shown', ]

HINT: Re-running with most Theano optimization disabled could give you
a
back-trace of when this node was created. This can be done with by
setting
the Theano flag 'optimizer=fast_compile'. If that does not work, Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint
of
this apply node.


Reply to this email directly or view it on GitHub
#2270 (comment).


Reply to this email directly or view it on GitHub
#2270 (comment).

@nouiz
Copy link
Member

nouiz commented Nov 21, 2014

Can't use pylearn2 stuff in Theano. But I'll make sure to cover the
underlaying problem in tests.

On Fri, Nov 21, 2014 at 2:06 PM, Alex Rothberg notifications@github.com
wrote:

Okay. Hopefully you are to turn the example I made into a unit test.

On Fri Nov 21 2014 at 2:00:46 PM Frédéric Bastien <
notifications@github.com>
wrote:

Just to tell that yesterday we disabled this fonctionality in Theano. So
if
you update Theano, it won't use the problematic code.

On Thu, Nov 20, 2014 at 9:18 AM, Frédéric Bastien <
frederic.bastien@gmail.com> wrote:

Thanks for the info. The example I had didn't cover this case. This
mean
the grad of the pooling have the same problem. I'll look into it
today.

A workaround is to not use the cudnn pooling for now. This can be done
with the Theano flags: optimizer_excluding=local_pool_dnn

On Wed, Nov 19, 2014 at 6:17 PM, GeertLitjens <
notifications@github.com>

wrote:

Hi Fred,

After applying your fix it work up till right after epoch 0, as soon
as
it starts with epoch 1 I get:

....
valid_y_nll: 0.693147003651
valid_y_row_norms_max: 0.0427022613585
valid_y_row_norms_mean: 0.0127967474982
valid_y_row_norms_min: 0.000618629215751
Traceback (most recent call last):

File "", line 1, in

runfile('D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py',

wdir='D:/Experiments/Papers/PatholCAD/scripts')

File

"C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py",

line 586, in runfile
execfile(filename, namespace)

File
"D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py",
line
15, in
train.main_loop()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\train.py", line 207,
in
main_loop
rval = self.algorithm.train(dataset=self.dataset)

File

"D:/Code/thirdpartylibs/pylearn2\pylearn2\training_algorithms\sgd.py",
line
453, in train
self.sgd_update(*batch)

File

"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",

line 605, in call
self.fn.thunks[self.fn.position_of_error])

File

"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",

line 595, in call
outputs = self.fn()

RuntimeError: GpuDnnPoolGrad: error doing operation: An incorrect
value
was passed in.
Apply node that caused the error: GpuDnnPoolGrad(GpuContiguous.0,
GpuContiguous.0, GpuContiguous.0, GpuDnnPoolDesc{ws=(2, 2),
stride=(2,
2),
mode='max'}.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)), ]
Inputs shapes: [(8192, 1, 27, 27), (8192, 1, 14, 14), (8192, 1, 14,
14),
'No shapes']
Inputs strides: [(729, 0, 27, 1), (196, 0, 14, 1), (196, 0, 14, 1),
'No
strides']
Inputs values: ['not shown', 'not shown', 'not shown', ]

HINT: Re-running with most Theano optimization disabled could give
you
a
back-trace of when this node was created. This can be done with by
setting
the Theano flag 'optimizer=fast_compile'. If that does not work,
Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint
of
this apply node.


Reply to this email directly or view it on GitHub
#2270 (comment).


Reply to this email directly or view it on GitHub
#2270 (comment).


Reply to this email directly or view it on GitHub
#2270 (comment).

@cancan101
Copy link
Author

Understood on pylearn2. Wouldn't want circular dependencies.

On Fri Nov 21 2014 at 2:08:31 PM Frédéric Bastien notifications@github.com
wrote:

Can't use pylearn2 stuff in Theano. But I'll make sure to cover the
underlaying problem in tests.

On Fri, Nov 21, 2014 at 2:06 PM, Alex Rothberg notifications@github.com
wrote:

Okay. Hopefully you are to turn the example I made into a unit test.

On Fri Nov 21 2014 at 2:00:46 PM Frédéric Bastien <
notifications@github.com>
wrote:

Just to tell that yesterday we disabled this fonctionality in Theano.
So
if
you update Theano, it won't use the problematic code.

On Thu, Nov 20, 2014 at 9:18 AM, Frédéric Bastien <
frederic.bastien@gmail.com> wrote:

Thanks for the info. The example I had didn't cover this case. This
mean
the grad of the pooling have the same problem. I'll look into it
today.

A workaround is to not use the cudnn pooling for now. This can be
done
with the Theano flags: optimizer_excluding=local_pool_dnn

On Wed, Nov 19, 2014 at 6:17 PM, GeertLitjens <
notifications@github.com>

wrote:

Hi Fred,

After applying your fix it work up till right after epoch 0, as
soon
as
it starts with epoch 1 I get:

....
valid_y_nll: 0.693147003651
valid_y_row_norms_max: 0.0427022613585
valid_y_row_norms_mean: 0.0127967474982
valid_y_row_norms_min: 0.000618629215751
Traceback (most recent call last):

File "", line 1, in

runfile('D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py',

wdir='D:/Experiments/Papers/PatholCAD/scripts')

File

"C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py",

line 586, in runfile
execfile(filename, namespace)

File

"D:/Experiments/Papers/PatholCAD/scripts/trainConvolutionNetwork.py",
line
15, in
train.main_loop()

File "D:/Code/thirdpartylibs/pylearn2\pylearn2\train.py", line 207,
in
main_loop
rval = self.algorithm.train(dataset=self.dataset)

File

"D:/Code/thirdpartylibs/pylearn2\pylearn2\training_algorithms\sgd.py",
line
453, in train
self.sgd_update(*batch)

File

"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",

line 605, in call
self.fn.thunks[self.fn.position_of_error])

File

"C:\Python27\lib\site-packages\theano-0.6.0-py2.7.egg\theano\compile\function_module.py",

line 595, in call
outputs = self.fn()

RuntimeError: GpuDnnPoolGrad: error doing operation: An incorrect
value
was passed in.
Apply node that caused the error: GpuDnnPoolGrad(GpuContiguous.0,
GpuContiguous.0, GpuContiguous.0, GpuDnnPoolDesc{ws=(2, 2),
stride=(2,
2),
mode='max'}.0)
Inputs types: [CudaNdarrayType(float32, (False, True, False,
False)),
CudaNdarrayType(float32, (False, True, False, False)),
CudaNdarrayType(float32, (False, True, False, False)), ]
Inputs shapes: [(8192, 1, 27, 27), (8192, 1, 14, 14), (8192, 1, 14,
14),
'No shapes']
Inputs strides: [(729, 0, 27, 1), (196, 0, 14, 1), (196, 0, 14, 1),
'No
strides']
Inputs values: ['not shown', 'not shown', 'not shown', ]

HINT: Re-running with most Theano optimization disabled could give
you
a
back-trace of when this node was created. This can be done with by
setting
the Theano flag 'optimizer=fast_compile'. If that does not work,
Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a
debugprint
of
this apply node.


Reply to this email directly or view it on GitHub
#2270 (comment).


Reply to this email directly or view it on GitHub
#2270 (comment).


Reply to this email directly or view it on GitHub
#2270 (comment).


Reply to this email directly or view it on GitHub
#2270 (comment).

@nouiz
Copy link
Member

nouiz commented Nov 24, 2014

We reenabled this funcitonality in gh-2281.

The consequence of the bug could be crash or that we do ignore the border when we where asked to do not ignore it.

@nouiz nouiz closed this as completed Nov 24, 2014
@cancan101
Copy link
Author

I think with master I am now seeing:

ERROR (theano.gof.opt): Optimization failure due to: LocalOptGroup(local_conv_dnn,local_conv_gemm)
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/opt.py", line 1524, in process_node
    fgraph.replace_all_validate(repl_pairs, reason=lopt)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/toolbox.py", line 258, in replace_all_validate
    fgraph.replace(r, new_r, reason=reason, verbose=False)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/gof/fg.py", line 467, in replace
    raise TypeError("The type of the replacement must be the same as the type of the original Variable.", r, new_r, r.type, new_r.type, str(reason))
TypeError: ('The type of the replacement must be the same as the type of the original Variable.', GpuConv{valid, (1, 1), None, (148, 198), False, (None, 150, 200), (148, 198)}.0, GpuFromHost.0, CudaNdarrayType(float32, (True, False, False, False)), CudaNdarrayType(float32, 4D), 'LocalOptGroup(local_conv_dnn,local_conv_gemm)')

@cancan101
Copy link
Author

And here is the python code that replicates the issue:

from pylearn2.config import yaml_parse
import numpy as np
from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
from pylearn2.config import yaml_parse

y_labels = 4
labels = [0] * 100
y = np.array(labels)[:,np.newaxis]
y[0] = 1
y[0] = 2
y[0] = 3

train_set = DenseDesignMatrix(topo_view=np.zeros((100,300,400,1)), y=y,  y_labels=y_labels)
valid = DenseDesignMatrix(topo_view=np.zeros((100,300,400,1)), y=y,  y_labels=y_labels)

model_yl = """
!obj:pylearn2.models.mlp.MLP {input_space: !obj:pylearn2.space.Conv2DSpace {shape: [
      300, 400], num_channels: 1}, layers: [!obj:pylearn2.models.mlp.ConvRectifiedLinear {
      layer_name: 'h0', output_channels: 7, irange: .05, kernel_shape: [3, 3], pool_shape: [
        2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365}, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
      layer_name: 'h1', output_channels: 16, irange: .05, kernel_shape: [3, 3], pool_shape: [
        2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365}, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
      layer_name: 'h2', output_channels: 32, irange: .05, kernel_shape: [3, 3], pool_shape: [
        2, 2], pool_stride: [2, 2], max_kernel_norm: 1.9365, init_bias: 0.1}, !obj:pylearn2.models.maxout.Maxout {
      layer_name: 'h6', num_units: 57, num_pieces: 2, irange: 0.0575059125935, max_col_norm: 2.0,
      init_bias: 1.}, !obj:pylearn2.models.mlp.Softmax {max_col_norm: 3.9365, layer_name: 'y',
      n_classes: 2, irange: 0.005000}]}
"""

model = yaml_parse.load(model_yl)

train_yml = """
!obj:pylearn2.train.Train {dataset: !import '__main__.train_set',
  model: null, algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {batch_size: 10,
    learning_rate: 0.059662, learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
      init_momentum: 0.15}, cost: !obj:pylearn2.costs.cost.SumOfCosts {costs: [!obj:pylearn2.models.mlp.WeightDecay {
          coeffs: [0.0, 0.0, 0.0, 0.0, 0.012569075841283767]}, !obj:pylearn2.costs.mlp.dropout.Dropout {
          input_include_probs: {'h0': 1, 'h1': 1, 'h2': 1}, input_scales: {'h0': 1,
            'h1': 1, 'h2': 1}}]}, monitor_iteration_mode: "even_shuffled_sequential",
    train_iteration_mode: "even_shuffled_sequential", monitoring_dataset: {'valid': !import '__main__.valid'},
    termination_criterion: !obj:pylearn2.termination_criteria.And {criteria: [!obj:pylearn2.termination_criteria.MonitorBased {
          channel_name: "valid_y_misclass", prop_decrease: 0.01, N: 30}, !obj:pylearn2.termination_criteria.EpochCounter {
          max_epochs: 50, new_epochs: false}]}}, extensions: [
 !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
      start: 1, saturate: 40, final_momentum: 0.60}, !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
      start: 20, saturate: 100, decay_factor: 0.1}
], 
}
"""
train = yaml_parse.load(train_yml)
train.model = model
train.main_loop()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants