Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Java examples broken with mxnet mkldnn build #15267

Closed
arcadiaphy opened this issue Jun 18, 2019 · 7 comments
Closed

Java examples broken with mxnet mkldnn build #15267

arcadiaphy opened this issue Jun 18, 2019 · 7 comments
Labels
Backend Issues related to the backend of MXNet MKLDNN Operator

Comments

@arcadiaphy
Copy link
Member

arcadiaphy commented Jun 18, 2019

Description

I've built the scala-package with mxnet mkldnn and run the java demo ImageClassification, but the demo is broken on mobillenet.

Environment info (Required)

  1. Java version: 1.8.0
  2. Maven version: 3.6.0

Build info (Required if built from source)

mac build with clang

latest commit:
cab1dfa

Error Message:

Exception in thread "main" org.apache.mxnet.MXNetError: [01:45:18] src/ndarray/ndarray.cc:634: Check failed: !is_view: 
Stack trace:
  [bt] (0) 1   libmxnet.so                         0x0000000117c23f59 dmlc::LogMessageFatal::~LogMessageFatal() + 57
  [bt] (1) 2   libmxnet.so                         0x00000001195bbb3b mxnet::NDArray::GetMKLDNNData() const + 571
  [bt] (2) 3   libmxnet.so                         0x00000001195d5b8b void mxnet::CopyFromToDnsImpl<mshadow::cpu, mshadow::cpu>(mxnet::NDArray const&, mxnet::NDArray const&, mxnet::RunContext) + 2747
  [bt] (3) 4   libmxnet.so                         0x00000001195d4dca void mxnet::CopyFromToImpl<mshadow::cpu, mshadow::cpu>(mxnet::NDArray const&, mxnet::NDArray const&, mxnet::RunContext, std::__1::vector<mxnet::Resource, std::__1::allocator<mxnet::Resource> > const&) + 2458
  [bt] (4) 5   libmxnet.so                         0x00000001195d43e1 std::__1::__function::__func<mxnet::CopyFromTo(mxnet::NDArray const&, mxnet::NDArray const&, int, bool)::$_7, std::__1::allocator<mxnet::CopyFromTo(mxnet::NDArray const&, mxnet::NDArray const&, int, bool)::$_7>, void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>::operator()(mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&) + 65
  [bt] (5) 6   libmxnet.so                         0x000000011945af7d mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 701
  [bt] (6) 7   libmxnet.so                         0x000000011945f10e mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>)::operator()(std::__1::shared_ptr<dmlc::ManualEvent>) const + 190
  [bt] (7) 8   libmxnet.so                         0x000000011945efa9 std::__1::__function::__func<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>), std::__1::allocator<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>)>, void (std::__1::shared_ptr<dmlc::ManualEvent>)>::operator()(std::__1::shared_ptr<dmlc::ManualEvent>&&) + 41
  [bt] (8) 9   libmxnet.so                         0x000000011945c083 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void (std::__1::shared_ptr<dmlc::ManualEvent>)>, std::__1::shared_ptr<dmlc::ManualEvent> > >(void*) + 83


	at org.apache.mxnet.Base$.checkCall(Base.scala:111)
	at org.apache.mxnet.NDArray.internal(NDArray.scala:1231)
	at org.apache.mxnet.NDArray.toArray(NDArray.scala:1216)
	at org.apache.mxnet.infer.Predictor$$anonfun$11.apply(Predictor.scala:190)
	at org.apache.mxnet.infer.Predictor$$anonfun$11.apply(Predictor.scala:190)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at org.apache.mxnet.infer.Predictor.predictImpl(Predictor.scala:190)
	at org.apache.mxnet.infer.Predictor.predict(Predictor.scala:156)
	at org.apache.mxnet.infer.javaapi.Predictor.predict(Predictor.scala:74)
	at mxnet.ImageClassification.main(ImageClassification.java:102)

Minimum reproducible example

https://github.com/apache/incubator-mxnet/blob/master/scala-package/mxnet-demo/java-demo/src/main/java/mxnet/ImageClassification.java

I've replaced the resnet-18 in the script with mobilenet:
model.zip

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Build

@leleamol
Copy link
Contributor

@mxnet-label-bot add [Java, Scala, MKLDNN]

@marcoabreu marcoabreu added Java Label to identify Java API component MKLDNN Scala labels Jun 18, 2019
@arcadiaphy
Copy link
Member Author

I've separated the bug from java interface, it comes from the slice operation in java predict:

  def predict(batch: DataBatch): IndexedSeq[NDArray] = {
    require(binded && paramsInitialized, "bind() and initParams() must be called first.")
    forward(batch, isTrain = Option(false))
    val pad = batch.pad
    getOutputsMerged().map(out => {
      val withoutPadding = out.slice(0, out.shape(0)-pad)
      val copied = withoutPadding.copy()
      withoutPadding.dispose()
      copied
    })
  }

I reproduced the bug with python code:

import mxnet as mx

# get symbol from gluon model
net = mx.gluon.model_zoo.vision.get_model('mobilenetv2_0.25')
net.hybridize()
net.initialize()
x = mx.nd.uniform(shape=(1, 3, 224, 224))
y = net(x)
sym = net._cached_graph[1]

exc = sym.simple_bind(ctx=mx.cpu(), grad_req='null', data=(1, 3, 224, 224))
y = exc.forward()[0]
# trigger the bug
print y.slice(0, 1)

I tested the above script on multiple mxnet versions: it works fine in 1.4, but triggers the error on 1.5.

Traceback (most recent call last):
  File "bug.py", line 14, in <module>
    print y.slice(0, 1)
  File "/Users/arcadia/work/mxnet-src/python/mxnet/ndarray/ndarray.py", line 194, in __repr__
    return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()),
  File "/Users/arcadia/work/mxnet-src/python/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/Users/arcadia/work/mxnet-src/python/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: std::exception

@pengzhao-intel
Copy link
Contributor

@TaoLv could you help to take a look for slice?

@TaoLv
Copy link
Member

TaoLv commented Jun 19, 2019

Sure. I will work on this later.

@TaoLv
Copy link
Member

TaoLv commented Jun 19, 2019

@arcadiaphy @pengzhao-intel This issue is root caused. The flatten layer before slice is not properly handled. It can be reproduced as below. We already have a fix in local and will submit a PR soon. Do you think we should have the fix into 1.5.0 release?

import mxnet as mx
import numpy as np
from mxnet import Context

np.random.seed(12345)

data = mx.symbol.Variable('data')
weight = mx.symbol.Variable('weight')
bias = mx.symbol.Variable('bias')
conv1= mx.symbol.Convolution(data = data, weight=weight, bias=bias, name='conv1', num_filter=64, kernel=(3,3), stride=(1,1))
flatten1 = mx.symbol.flatten(data = conv1)
slice1 = mx.symbol.slice(data = flatten1, begin=0, end=1)

shape = (2, 16, 224, 224)
val = np.random.rand(2, 16, 224, 224).astype(np.float32)
exe = slice1.simple_bind(Context.default_ctx, data=shape)
exe.arg_arrays[0][:] = val
exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
p = exe.forward(is_train=False)
p[0].wait_to_read()
print(p[0])

@zachgk zachgk added Backend Issues related to the backend of MXNet Operator and removed Java Label to identify Java API component Scala labels Jun 25, 2019
@pengzhao-intel
Copy link
Contributor

Fixed and closing now. Thanks to reporting the issue :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet MKLDNN Operator
Projects
None yet
Development

No branches or pull requests

7 participants