Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mx.nd.random.shuffle crashes in the master branch #15029

Closed
zheng-da opened this issue May 22, 2019 · 3 comments · Fixed by #15041
Closed

mx.nd.random.shuffle crashes in the master branch #15029

zheng-da opened this issue May 22, 2019 · 3 comments · Fixed by #15041
Labels
Backend Issues related to the backend of MXNet Bug Operator

Comments

@zheng-da
Copy link
Contributor

zheng-da commented May 22, 2019

After compiling the latest master branch (commit 5854b98) of MXNet, it seems it can't run the following simple code:

import mxnet as mx
arr = mx.nd.arange(0, 153431, dtype='int64')
arr.wait_to_read()
arr = mx.nd.random.shuffle(arr)
arr.wait_to_read()

The code will result in segfault.

If MXNet is compiled with USE_INT64_TENSOR_SIZE = 1, the code above works fine.

The simple code should work with int32 tensor size.

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@zheng-da zheng-da added the Bug label May 22, 2019
@apeforest
Copy link
Contributor

The coredump does not happen on MacOS with 16GB memory. It occurrs on ubuntu 16.04

@apeforest
Copy link
Contributor

Here is the stack trace from GDB

#52347 0x00007fff9fcb71e9 in std::uniform_int_distribution<int>::operator()<std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul> > (this=0x7fff2a5f4c80, __urng=...)
    at /usr/include/c++/5/bits/uniform_int_dist.h:165
#52348 0x00007fff9fcae86b in mxnet::op::(anonymous namespace)::<lambda(mxnet::index_t)>::operator()(mxnet::index_t) const (__closure=0x7fff2a5f61e0,
    n=-1) at src/operator/random/shuffle_op.cc:50
#52349 0x00007fff9fcb66e3 in __gnu_parallel::__parallel_random_shuffle_drs<long*, void mxnet::op::(anonymous namespace)::Shuffle1D<long, std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul> >(long*, int, std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>*)::{lambda(int)#1}> () at /usr/include/c++/5/parallel/random_shuffle.h:384
#52350 0x00007ffff34d0638 in __kmp_api_GOMP_parallel_40_alias () from /home/ubuntu/src/mxnet/python/mxnet/../../lib/libiomp5.so
#52351 0x00007fff9fcb3bbd in __gnu_parallel::__parallel_random_shuffle_drs<long int*, mxnet::op::(anonymous namespace)::Shuffle1D(DType*, mxnet::index_t, Rand*) [with DType = long int; Rand = std::mersenne_twister_engine<long unsigned int, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>; mxnet::index_t = int]::<lambda(mxnet::index_t)> >(long *, long *, std::iterator_traits<long*>::difference_type, __gnu_parallel::_ThreadIndex, mxnet::op::(anonymous namespace)::<lambda(mxnet::index_t)> &) (__begin=0x7fff70010040, __end=0x7fff7013baf8,
    __n=153431, __num_threads=16, __rng=...) at /usr/include/c++/5/parallel/random_shuffle.h:342
#52352 0x00007fff9fcb1a0f in __gnu_parallel::__parallel_random_shuffle<long int*, mxnet::op::(anonymous namespace)::Shuffle1D(DType*, mxnet::index_t, Rand*) [with DType = long int; Rand = std::mersenne_twister_engine<long unsigned int, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>; mxnet::index_t = int]::<lambda(mxnet::index_t)> >(long *, long *, mxnet::op::(anonymous namespace)::<lambda(mxnet::index_t)>) (__begin=0x7fff70010040, __end=0x7fff7013baf8, __rng=...) at /usr/include/c++/5/parallel/random_shuffle.h:528
#52353 0x00007fff9fcaf2b7 in std::__parallel::random_shuffle<long int*, mxnet::op::(anonymous namespace)::Shuffle1D(DType*, mxnet::index_t, Rand*) [with DType = long int; Rand = std::mersenne_twister_engine<long unsigned int, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>; mxnet::index_t = int]::<lambda(mxnet::index_t)>&>(long *, long *, mxnet::op::(anonymous namespace)::<lambda(mxnet::index_t)> &) (__begin=0x7fff70010040, __end=0x7fff7013baf8, __rand=...) at /usr/include/c++/5/parallel/algo.h:1681
#52354 0x00007fff9fcae8d3 in mxnet::op::(anonymous namespace)::Shuffle1D<long, std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul> > (out=0x7fff70010040, size=153431, prnd=0x20c0470)
    at src/operator/random/shuffle_op.cc:52
#52355 0x00007fff9fcacac9 in mxnet::op::ShuffleForwardCPU (attrs=..., ctx=..., inputs=std::vector of length 1, capacity 1 = {...},
    req=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...}) at src/operator/random/shuffle_op.cc:98
#52356 0x00007fff9f7e2ced in std::_Function_handler<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&), void (*)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)>::_M_invoke(std::_Any_data const&, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&) (__functor=..., __args#0=..., __args#1=...,
    __args#2=std::vector of length 1, capacity 1 = {...}, __args#3=std::vector of length 1, capacity 1 = {...},
    __args#4=std::vector of length 1, capacity 1 = {...}) at /usr/include/c++/5/functional:1871
#52357 0x00007fff9f6bd1c4 in std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> ---Type <return> to continue, or q <return> to quit---
> const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)>::operator()(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&) const (this=0x20c18d8, __args#0=...,
    __args#1=..., __args#2=std::vector of length 1, capacity 1 = {...}, __args#3=std::vector of length 1, capacity 1 = {...},
    __args#4=std::vector of length 1, capacity 1 = {...}) at /usr/include/c++/5/functional:2267
#52358 0x00007fffa25ed60d in mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const (__closure=0x20c1850, rctx=...) at src/imperative/./imperative_utils.h:434
#52359 0x00007fffa25f26e6 in std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&) (__functor=...,
    __args#0=<unknown type in /home/ubuntu/src/mxnet/python/mxnet/../../lib/libmxnet.so, CU 0x7af4db2, DIE 0x7bae75a>)
    at /usr/include/c++/5/functional:1871
#52360 0x00007fffa2eb67d8 in std::function<void (mxnet::RunContext)>::operator()(mxnet::RunContext) const (this=0x209eca0, __args#0=...)
    at /usr/include/c++/5/functional:2267
#52361 0x00007fffa2ecc22f in mxnet::engine::ThreadedEngine::<lambda(mxnet::RunContext, mxnet::Engine::CallbackOnComplete)>::operator()(mxnet::RunContext, mxnet::Engine::CallbackOnComplete) const (__closure=0x209eca0, ctx=..., on_complete=...) at src/engine/threaded_engine.cc:350
#52362 0x00007fffa2ecd7ae in std::_Function_handler<void(mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::engine::ThreadedEngine::PushSync(mxnet::Engine::SyncFn, mxnet::Context, const std::vector<mxnet::engine::Var*>&, const std::vector<mxnet::engine::Var*>&, mxnet::FnProperty, int, char const*)::<lambda(mxnet::RunContext, mxnet::Engine::CallbackOnComplete)> >::_M_invoke(const std::_Any_data &, <unknown type in /home/ubuntu/src/mxnet/python/mxnet/../../lib/libmxnet.so, CU 0x95cc75b, DIE 0x96109b3>, <unknown type in /home/ubuntu/src/mxnet/python/mxnet/../../lib/libmxnet.so, CU 0x95cc75b, DIE 0x96109b8>) (__functor=..., __args#0=<unknown type in /home/ubuntu/src/mxnet/python/mxnet/../../lib/libmxnet.so, CU 0x95cc75b, DIE 0x96109b3>,
    __args#1=<unknown type in /home/ubuntu/src/mxnet/python/mxnet/../../lib/libmxnet.so, CU 0x95cc75b, DIE 0x96109b8>)
    at /usr/include/c++/5/functional:1871
#52363 0x00007fffa2eb783a in std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>::operator()(mxnet::RunContext, mxnet::engine::CallbackOnComplete) const (this=0x1ed4000, __args#0=..., __args#1=...) at /usr/include/c++/5/functional:2267
#52364 0x00007fffa2ebf597 in mxnet::engine::ThreadedEngine::ExecuteOprBlock (this=0x1ed2a50, run_ctx=..., opr_block=0x1ed6000)
    at src/engine/./threaded_engine.h:380
#52365 0x00007fffa2ed672a in mxnet::engine::ThreadedEnginePerDevice::CPUWorker<(dmlc::ConcurrentQueueType)0> (this=0x1ed2a50, ctx=..., block=0x2091600,
    ready_event=std::shared_ptr (count 2, weak 0) 0x1e0ac40) at src/engine/threaded_engine_perdevice.cc:300
#52366 0x00007fffa2ed4a70 in mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}::operator()(dmlc::ManualEvent) const (__closure=0xb0ebb0,
    ready_event=std::shared_ptr (count 2, weak 0) 0x1e0ac40) at src/engine/threaded_engine_perdevice.cc:116
#52367 0x00007fffa2ed93a6 in std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::sha---Type <return> to continue, or q <return> to quit---
red_ptr<dmlc::ManualEvent>&&) (__functor=...,
    __args#0=<unknown type in /home/ubuntu/src/mxnet/python/mxnet/../../lib/libmxnet.so, CU 0x963de09, DIE 0x96be975>)
    at /usr/include/c++/5/functional:1871
#52368 0x00007fffa2ecb3d7 in std::function<void (std::shared_ptr<dmlc::ManualEvent>)>::operator()(std::shared_ptr<dmlc::ManualEvent>) const (
    this=0x2081c38, __args#0=std::shared_ptr (empty) 0x0) at /usr/include/c++/5/functional:2267
#52369 0x00007fffa2ecb34a in std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x2081c28) at /usr/include/c++/5/functional:1531
#52370 0x00007fffa2ecb1de in std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)>::operator()() (this=0x2081c28) at /usr/include/c++/5/functional:1520
#52371 0x00007fffa2ecb12e in std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run() (this=0x2081c10) at /usr/include/c++/5/thread:115
#52372 0x00007fffee543c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#52373 0x00007ffff7bc16ba in start_thread (arg=0x7fff2a5f7700) at pthread_create.c:333
#52374 0x00007ffff78f741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@apeforest apeforest added Backend Issues related to the backend of MXNet Operator labels May 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Bug Operator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants