Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

BlockGrad Bug #4731

Closed
sxjscience opened this issue Jan 19, 2017 · 5 comments
Closed

BlockGrad Bug #4731

sxjscience opened this issue Jan 19, 2017 · 5 comments

Comments

@sxjscience
Copy link
Member

sxjscience commented Jan 19, 2017

Environment info

Operating System: Windows

Compiler: Visual Studio Community 2015

Package used (Python/R/Scala/Julia): Python

MXNet commit hash (git rev-parse HEAD): 949300d

Error Message:

MXNetError: [22:22:22] src/executor/graph_executor.cc:511: Check failed: storage_id >= 0 (-1 vs. 0) Do not support runtime shape op yet

Stack trace returned 43 entries:
[bt] (0) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7f8ec2c30f69]
[bt] (1) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor19InitDataEntryMemoryERKSt6vectorINS_7NDArrayESaIS3_EE+0x267b) [0x7f8ec375b82b]
[bt] (2) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor4InitEN4nnvm6SymbolERKNS_7ContextERKSt3mapISsS4_St4lessISsESaISt4pairIKSsS4_EEERKSt6vectorINS_7NDArrayESaISI_EESM_RKSH_INS_9OpReqTypeESaISN_EESM_PNS_8ExecutorE+0x444) [0x7f8ec3760c04]
[bt] (3) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet8Executor4BindEN4nnvm6SymbolERKNS_7ContextERKSt3mapISsS3_St4lessISsESaISt4pairIKSsS3_EEERKSt6vectorINS_7NDArrayESaISH_EESL_RKSG_INS_9OpReqTypeESaISM_EESL_PS0_+0x4f5) [0x7f8ec3761165]
[bt] (4) /home/data/xingjian/mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorBindEX+0xf99) [0x7f8ec371dfe9]
[bt] (5) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f8f4fd9a080]
[bt] (6) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x148) [0x7f8f4fd991e8]
[bt] (7) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x292) [0x7f8f4fd90df2]
[bt] (8) /usr/local/software/python2/lib/python2.7/lib-dynload/_ctypes.so(+0x9ce4) [0x7f8f4fd87ce4]
[bt] (9) /usr/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f8f5c04c5f3]
[bt] (10) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3b76) [0x7f8f5c100a66]
[bt] (11) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (12) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (13) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (14) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (15) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (16) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19) [0x7f8f5c103e49]
[bt] (17) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x58af) [0x7f8f5c10279f]
[bt] (18) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (19) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (20) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (21) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (22) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (23) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (24) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (25) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (26) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (27) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (28) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (29) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (30) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (31) /usr/local/lib/libpython2.7.so.1.0(+0xc3095) [0x7f8f5c07e095]
[bt] (32) /usr/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f8f5c04c5f3]
[bt] (33) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x11eb) [0x7f8f5c0fe0db]
[bt] (34) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (35) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5907) [0x7f8f5c1027f7]
[bt] (36) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830) [0x7f8f5c103d20]
[bt] (37) /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19) [0x7f8f5c103e49]
[bt] (38) /usr/local/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x8a) [0x7f8f5c127aca]
[bt] (39) /usr/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd7) [0x7f8f5c129057]
[bt] (40) /usr/local/lib/libpython2.7.so.1.0(Py_Main+0xc25) [0x7f8f5c13ef35]
[bt] (41) /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f8f58f45b35]
[bt] (42) /usr/local/bin/python2() [0x4007a1]

Minimum reproducible example

import mxnet as mx
a = mx.sym.Variable('a')
b = mx.sym.BlockGrad(2*a)
exe = b.simple_bind(ctx=mx.cpu(), a=(10,10))

What have you tried to solve it?

I meet this problem when trying to refactor the code of FGradient for BlockGrad. The following code will work correctly while the code above raises an error.

import mxnet as mx
a = mx.sym.Variable('a')
b = mx.sym.BlockGrad(a+a)
exe = b.simple_bind(ctx=mx.cpu(), a=(10,10))
@piiswrong
Copy link
Contributor

@tqchen

@tqchen
Copy link
Member

tqchen commented Jan 20, 2017

Was because the gradient is zeros and it is a lonely zeros without connection to others, so the shape inference failed.

Fixing this would be an interesting practice to hack Nnvm gradient module. Please see if you can attempt a fix

There are two ways. Make block grad always return zeros-like, which contains shape constraint.

Insert shape hint identity like in terminal leaf, to hope backward inference kicks in

@piiswrong
Copy link
Contributor

zeros-like doesn't get recognized in gradient aggregation. It also doesn't work for dangling output from slice

@tqchen
Copy link
Member

tqchen commented Jan 20, 2017

zeros like's recognition can be added to gradient aggregation, which is not a big issue.

Dangling output is a separated issue, which zeros also suffer, so I think that is beyond the scope of this issue.

@phunterlau
Copy link
Contributor

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants