BlockGrad Bug #4731

sxjscience · 2017-01-19T14:28:30Z

Environment info

Operating System: Windows

Compiler: Visual Studio Community 2015

Package used (Python/R/Scala/Julia): Python

MXNet commit hash (git rev-parse HEAD): 949300d

Error Message:

MXNetError: [22:22:22] src/executor/graph_executor.cc:511: Check failed: storage_id >= 0 (-1 vs. 0) Do not support runtime shape op yet

Stack trace returned 43 entries:
[bt] (0) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f8ec2c30f69]
[bt] (1) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor19InitDataEntryMemoryERKSt6vectorINS_7NDArrayESaIS3_EE+0x267b) [0x7f8ec375b82b]
[bt] (2) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor4InitEN4nnvm6SymbolERKNS_7ContextERKSt3mapISsS4_St4lessISsESaISt4pairIKSsS4_EEERKSt6vectorINS_7NDArrayESaISI_EESM_RKSH_INS_9OpReqTypeESaISN_EESM_PNS_8ExecutorE+0x444) [0x7f8ec3760c04]
[bt] (3) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet8Executor4BindEN4nnvm6SymbolERKNS_7ContextERKSt3mapISsS3_St4lessISsESaISt4pairIKSsS3_EEERKSt6vectorINS_7NDArrayESaISH_EESL_RKSG_INS_9OpReqTypeESaISM_EESL_PS0_+0x4f5) [0x7f8ec3761165]
[bt] (4) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorBindEX+0xf99) [0x7f8ec371dfe9]
[bt] (5) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f8f4fd9a080]
[bt] (6) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x148) [0x7f8f4fd991e8]
[bt] (7) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x292) [0x7f8f4fd90df2]
[bt] (8) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(+0x9ce4) [0x7f8f4fd87ce4]
[bt] (9) /usr/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f8f5c04c5f3]
[bt] (10) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3b76) [0x7f8f5c100a66]
[bt] (11) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (12) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (13) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (14) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (15) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (16) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19) [0x7f8f5c103e49]
[bt] (17) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x58af) [0x7f8f5c10279f]
[bt] (18) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (19) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (20) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (21) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (22) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (23) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (24) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (25) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (26) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (27) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (28) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (29) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (30) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (31) /usr/local/lib/libpython2.7.so.1.0(+0xc3095) [0x7f8f5c07e095]
[bt] (32) /usr/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f8f5c04c5f3]
[bt] (33) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x11eb) [0x7f8f5c0fe0db]
[bt] (34) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (35) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (36) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (37) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19) [0x7f8f5c103e49]
[bt] (38) /usr/local/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x8a) [0x7f8f5c127aca]
[bt] (39) /usr/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd7) [0x7f8f5c129057]
[bt] (40) /usr/local/lib/libpython2.7.so.1.0(Py_Main+0xc25) [0x7f8f5c13ef35]
[bt] (41) /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f8f58f45b35]
[bt] (42) /usr/local/bin/python2() [0x4007a1]

Minimum reproducible example

import mxnet as mx
a = mx.sym.Variable('a')
b = mx.sym.BlockGrad(2*a)
exe = b.simple_bind(ctx=mx.cpu(), a=(10,10))

What have you tried to solve it?

I meet this problem when trying to refactor the code of FGradient for BlockGrad. The following code will work correctly while the code above raises an error.

import mxnet as mx
a = mx.sym.Variable('a')
b = mx.sym.BlockGrad(a+a)
exe = b.simple_bind(ctx=mx.cpu(), a=(10,10))

The text was updated successfully, but these errors were encountered:

piiswrong · 2017-01-19T20:58:41Z

@tqchen

tqchen · 2017-01-20T16:41:22Z

Was because the gradient is zeros and it is a lonely zeros without connection to others, so the shape inference failed.

Fixing this would be an interesting practice to hack Nnvm gradient module. Please see if you can attempt a fix

There are two ways. Make block grad always return zeros-like, which contains shape constraint.

Insert shape hint identity like in terminal leaf, to hope backward inference kicks in

piiswrong · 2017-01-20T16:59:14Z

zeros-like doesn't get recognized in gradient aggregation. It also doesn't work for dangling output from slice

tqchen · 2017-01-20T17:06:46Z

zeros like's recognition can be added to gradient aggregation, which is not a big issue.

Dangling output is a separated issue, which zeros also suffer, so I think that is beyond the scope of this issue.

phunterlau · 2017-09-28T17:30:00Z

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

phunterlau closed this as completed Sep 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BlockGrad Bug #4731

BlockGrad Bug #4731

sxjscience commented Jan 19, 2017 •

edited

piiswrong commented Jan 19, 2017

tqchen commented Jan 20, 2017

piiswrong commented Jan 20, 2017

tqchen commented Jan 20, 2017

phunterlau commented Sep 28, 2017

BlockGrad Bug #4731

BlockGrad Bug #4731

Comments

sxjscience commented Jan 19, 2017 • edited

Environment info

Error Message:

Minimum reproducible example

What have you tried to solve it?

piiswrong commented Jan 19, 2017

tqchen commented Jan 20, 2017

piiswrong commented Jan 20, 2017

tqchen commented Jan 20, 2017

phunterlau commented Sep 28, 2017

sxjscience commented Jan 19, 2017 •

edited