Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

csr binary operator bug #7920

Closed
eric-haibin-lin opened this issue Sep 16, 2017 · 2 comments
Closed

csr binary operator bug #7920

eric-haibin-lin opened this issue Sep 16, 2017 · 2 comments

Comments

@eric-haibin-lin
Copy link
Member

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: DeepLearning Ubuntu AMI

Compiler:

Package used (Python/R/Scala/Julia): Python

MXNet version:

Or if installed from source:

MXNet commit hash (git rev-parse HEAD): 9d56db6

If you are using python package, please provide

Python version and distribution:

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.

>>> import mxnet as mx; a = mx.nd.sparse.zeros('csr', (1,1)); b = mx.nd.elemwise_add(a,a)                                                                                  
>>> [22:57:41] /home/ubuntu/upstream-cpu/dmlc-core/include/dmlc/logging.h:308: [22:57:41] src/operator/tensor/./elemwise_binary_op-inl.h:257: Check failed: output.aux_shape(csr::kIn
dPtr) == lhs.aux_shape(csr::kIndPtr) ((2,) vs. (0,))

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3f) [0x7fb0ffb633db]
[bt] (1) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op16ElemwiseBinaryOp8CsrCsrOpIfllN7mshadow2op4plusEEEvPNS3_6StreamINS3_3cpuEEERKN4nnvm9NodeAttrsERKN$
_9OpContextERKNS_7NDArrayESJ_NS_9OpReqTypeESJ_+0x47a) [0x7fb1004a4688]
[bt] (2) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op16ElemwiseBinaryOp9ComputeExIN7mshadow3cpuENS3_2op4plusEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt$
vectorINS_7NDArrayESaISF_EERKSE_INS_9OpReqTypeESaISK_EESJ_+0x989) [0x7fb100478837]
[bt] (3) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvRKN4nnvm9NodeAttrsERKN5mxnet9OpContextERKSt6vectorINS4_7NDArrayESaIS9_EERKS8_INS4_9$
pReqTypeESaISE_EESD_EPSJ_E9_M_invokeERKSt9_Any_dataS3_S7_SD_SI_SD_+0x91) [0x7fb0ffdc8897]
[bt] (4) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZNKSt8functionIFvRKN4nnvm9NodeAttrsERKN5mxnet9OpContextERKSt6vectorINS4_7NDArrayESaIS9_EERKS8_INS4_9OpReqType$
SaISE_EESD_EEclES3_S7_SD_SI_SD_+0xa6) [0x7fb10134bb28]
[bt] (5) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(+0x265f37d) [0x7fb10134237d]
[bt] (6) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(+0x2663893) [0x7fb101346893]
[bt] (7) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZNKSt8functionIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEEclES1_S3_+0x85) [0x7fb10137ba1b]
[bt] (8) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x200) [0x7fb101386cb0]
[bt] (9) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9CPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEPNS1_17Threa$
WorkerBlockIXT_EEE+0x4e) [0x7fb10138a076]

[22:57:41] /home/ubuntu/upstream-cpu/dmlc-core/include/dmlc/logging.h:308: [22:57:41] src/engine/./threaded_engine.h:347: [22:57:41] src/operator/tensor/./elemwise_binary_op-inl.h:$
57: Check failed: output.aux_shape(csr::kIndPtr) == lhs.aux_shape(csr::kIndPtr) ((2,) vs. (0,))

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3f) [0x7fb0ffb633db]
[bt] (1) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op16ElemwiseBinaryOp8CsrCsrOpIfllN7mshadow2op4plusEEEvPNS3_6StreamINS3_3cpuEEERKN4nnvm9NodeAttrsERKN$
_9OpContextERKNS_7NDArrayESJ_NS_9OpReqTypeESJ_+0x47a) [0x7fb1004a4688]
[bt] (2) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op16ElemwiseBinaryOp9ComputeExIN7mshadow3cpuENS3_2op4plusEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt$
vectorINS_7NDArrayESaISF_EERKSE_INS_9OpReqTypeESaISK_EESJ_+0x989) [0x7fb100478837]
[bt] (3) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvRKN4nnvm9NodeAttrsERKN5mxnet9OpContextERKSt6vectorINS4_7NDArrayESaIS9_EERKS8_INS4_9O
pReqTypeESaISE_EESD_EPSJ_E9_M_invokeERKSt9_Any_dataS3_S7_SD_SI_SD_+0x91) [0x7fb0ffdc8897]
[bt] (4) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZNKSt8functionIFvRKN4nnvm9NodeAttrsERKN5mxnet9OpContextERKSt6vectorINS4_7NDArrayESaIS9_EERKS8_INS4_9OpReqTypeE
SaISE_EESD_EEclES3_S7_SD_SI_SD_+0xa6) [0x7fb10134bb28]
[bt] (5) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(+0x265f37d) [0x7fb10134237d]
[bt] (6) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(+0x2663893) [0x7fb101346893]
[bt] (7) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZNKSt8functionIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEEclES1_S3_+0x85) [0x7fb10137ba1b]
[bt] (8) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x200) [0x7fb101386cb0]
[bt] (9) /home/ubuntu/upstream-cpu/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9CPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEPNS1_17Thread
WorkerBlockIXT_EEE+0x4e) [0x7fb10138a076]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run wi
th debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE b
ack to empty after debugging.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

  1. python
  2. import mxnet as mx; a = mx.nd.sparse.zeros('csr', (1,1)); b = mx.nd.elemwise_add(a,a)

What have you tried to solve it?

@cjolivier01

@cjolivier01
Copy link
Member

Taking a look...

cjolivier01 pushed a commit to cjolivier01/mxnet that referenced this issue Sep 18, 2017
@cjolivier01
Copy link
Member

Looks like problem was introduced here
0b13631

In function
void FillZerosCsrImpl(mshadow::Stream *s, NDArray *dst)

Why?
CSR matrix, even if all "zeros", must have m + 1 items in its row pointer (csr::IndPtr) array. Current implementation leaves all aux arrays empty.

PR of fix
#7935

piiswrong pushed a commit that referenced this issue Oct 10, 2017
* Fix for: #7920

* lint

* remove unused variable warning

* Since GPU version of FillZerosCsrImpl() is called from a non-cuda-compiled file, the gpu version is compiled via cuda directly in init_op.cu

* Trigger build

* This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test.

* Update test_kvstore.py

* Update CMakeLists.txt

* merge fix

* Trigger build

* Fix 'dest' item in FIllXXX calls

* Revert NDARray as pointer in Fillxxx
mbaijal pushed a commit to mbaijal/incubator-mxnet that referenced this issue Oct 12, 2017
* Fix for: apache#7920

* lint

* remove unused variable warning

* Since GPU version of FillZerosCsrImpl() is called from a non-cuda-compiled file, the gpu version is compiled via cuda directly in init_op.cu

* Trigger build

* This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test.

* Update test_kvstore.py

* Update CMakeLists.txt

* merge fix

* Trigger build

* Fix 'dest' item in FIllXXX calls

* Revert NDARray as pointer in Fillxxx
crazy-cat pushed a commit to crazy-cat/incubator-mxnet that referenced this issue Oct 26, 2017
* Fix for: apache#7920

* lint

* remove unused variable warning

* Since GPU version of FillZerosCsrImpl() is called from a non-cuda-compiled file, the gpu version is compiled via cuda directly in init_op.cu

* Trigger build

* This test case is BS. I can't even tell what's wrong on the CI build because so many errors coming from this test.

* Update test_kvstore.py

* Update CMakeLists.txt

* merge fix

* Trigger build

* Fix 'dest' item in FIllXXX calls

* Revert NDARray as pointer in Fillxxx
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants