Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Simple_Bind failure in 1.5.0 #15784

Closed
samskalicky opened this issue Aug 7, 2019 · 6 comments
Closed

Simple_Bind failure in 1.5.0 #15784

samskalicky opened this issue Aug 7, 2019 · 6 comments
Labels
Backend Issues related to the backend of MXNet Operator

Comments

@samskalicky
Copy link
Contributor

Description

Simple bind fails in 1.5.0, used to work in 1.4.1. Possibly related to #14661

Error Message:

Traceback (most recent call last):
  File "test.py", line 45, in <module>
    model,inputs = load_a3c_model(mx.cpu())
  File "test.py", line 25, in load_a3c_model
    mod.bind(for_training=False, inputs_need_grad=False, data_shapes=data_shape)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/module/module.py", line 429, in bind
    state_names=self._state_names)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 279, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 375, in bind_exec
    shared_group))
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 662, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1629, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 12, 210, 160)
Error in operator value: [18:56:17] include/mxnet/./tuple.h:202: Check failed: i >= 0 && i < ndim(): index = 0 must be in range [0, -1)
Stack trace:
  [bt] (0) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4a3b8b) [0x7f4d47da8b8b]
  [bt] (1) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x534d58) [0x7f4d47e39d58]
  [bt] (2) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2c023f2) [0x7f4d4a5073f2]
  [bt] (3) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x26b8c32) [0x7f4d49fbdc32]
  [bt] (4) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x26bb51b) [0x7f4d49fc051b]
  [bt] (5) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map<std::string, mxnet::Context, std::less<std::string>, std::allocator<std::pair<std::string const, mxnet::Context> > > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::unordered_map<std::string, mxnet::TShape, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, mxnet::TShape> > > const&, std::unordered_map<std::string, int, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, int> > > const&, std::unordered_map<std::string, int, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, int> > > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::unordered_set<std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::unordered_map<std::string, mxnet::NDArray, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, mxnet::NDArray> > >*, mxnet::Executor*, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&)+0x365) [0x7f4d49fac185]
  [bt] (6) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::Executor::SimpleBind(nnvm::Symbol, mxnet::Context const&, std::map<std::string, mxnet::Context, std::less<std::string>, std::allocator<std::pair<std::string const, mxnet::Context> > > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::unordered_map<std::string, mxnet::TShape, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, mxnet::TShape> > > const&, std::unordered_map<std::string, int, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, int> > > const&, std::unordered_map<std::string, int, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, int> > > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::unordered_set<std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::unordered_map<std::string, mxnet::NDArray, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, mxnet::NDArray> > >*, mxnet::Executor*)+0x8a8) [0x7f4d49fad408]
  [bt] (7) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBindEx+0x221b) [0x7f4d49ef12ab]
  [bt] (8) /home/ubuntu/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f4e1cc91ec0]

Minimum reproducible example

import mxnet as mx

# load_a3c_model
# Description: create A3C model and return MXNet object
def load_a3c_model(ctx):
    net = mx.symbol.Variable('data')
    net = mx.symbol.Cast(data=net, dtype='float32')
    net = mx.symbol.Convolution(data=net, name='conv1', kernel=(8, 8), stride=(4, 4), num_filter=16)
    net = mx.symbol.Activation(data=net, name='relu1', act_type="relu")
    net = mx.symbol.Convolution(data=net, name='conv2', kernel=(4, 4), stride=(2, 2), num_filter=32)
    net = mx.symbol.Activation(data=net, name='relu2', act_type="relu")
    net = mx.symbol.Flatten(data=net)
    net = mx.symbol.FullyConnected(data=net, name='fc4', num_hidden=256)
    net = mx.symbol.Activation(data=net, name='relu4', act_type="relu")
    fc_policy = mx.symbol.FullyConnected(data=net, name='fc_policy', num_hidden=4)
    policy = mx.symbol.SoftmaxOutput(data=fc_policy, name='policy', out_grad=True)
    entropy = mx.symbol.SoftmaxActivation(data=fc_policy, name='entropy')
    value = mx.symbol.FullyConnected(data=net, name='fc_value', num_hidden=1)
    value = mx.symbol.LinearRegressionOutput(data=value, name='value')
    sym = mx.symbol.Group([policy, entropy, value])

    data_shape = [('data', (1, 12, 210, 160))]
    mod = mx.mod.Module(symbol=sym, label_names=('policy_label', 'value_label'), context=ctx)

    mod.bind(for_training=False, inputs_need_grad=False, data_shapes=data_shape)
    init = mx.init.Mixed(['fc_value_weight|fc_policy_weight', '.*'],
                         [mx.init.Uniform(0.001), mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2)])
    mod.init_params(initializer=init, arg_params=None, aux_params=None)

    data = [mx.random.uniform(-1.0, 1.0, shape=shape, ctx=mx.cpu()) for _, shape in mod.data_shapes]
    return mod, data

# infer_a3c_model
# Description: runs inference on an a3c model
def infer_a3c_model(model, inputs):
    batch = mx.io.DataBatch(inputs, [])

    model.forward(batch, is_train=False)
    mx.nd.waitall()
    results = model.get_outputs()

    return results


model,inputs = load_a3c_model(mx.cpu())

out = infer_a3c_model(model,inputs)
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@samskalicky
Copy link
Contributor Author

Looks like it was fixed in master, tried building from source and it succeeded. Need to nail down the commit with the fix.

@reminisce any suggestion on what that might have been?

@samskalicky
Copy link
Contributor Author

Thanks @reminisce for pointing me to #15620 as the fix. I tried the previous PR #15137 and found that it was still failing. So #15620 fixes this issue.

@vdantu
Copy link
Contributor

vdantu commented Aug 8, 2019

@samskalicky : Should this be closed? I see that the PR responsible for the fix is being tracked in the Patch Release discussion.

@zachgk zachgk added Backend Issues related to the backend of MXNet Operator labels Aug 13, 2019
@zachgk
Copy link
Contributor

zachgk commented Aug 13, 2019

@samskalicky Is this fixed since #15620 was merged?

@samskalicky
Copy link
Contributor Author

Fixed in #15620

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Operator
Projects
None yet
Development

No branches or pull requests

4 participants