Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

1.5.0 MKLDNN error "Unknown MKLDNN format" #15737

Closed
samskalicky opened this issue Aug 2, 2019 · 12 comments
Closed

1.5.0 MKLDNN error "Unknown MKLDNN format" #15737

samskalicky opened this issue Aug 2, 2019 · 12 comments
Labels

Comments

@samskalicky
Copy link
Contributor

samskalicky commented Aug 2, 2019

Resnext-50 model from MXNet model zoo fails in 1.5.0 build for "mxnet-mkl" pip wheel with MKLDNN error "Unknown MKLDNN format for 5 dimensions: 108".

Works with:

  • "mxnet" pip wheel
  • "mxnet-mkl==1.4.1"

Error message:

Traceback (most recent call last):
  File "resnext.py", line 43, in <module>
    results = mod.get_outputs()[0].asnumpy()
  File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 1996, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [17:48:25] src/operator/nn/mkldnn/mkldnn_base.cc:398: Unknown MKLDNN format for 5 dimensions: 108
Stack trace:
  [bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x25b2ab) [0x7fb152faf2ab]
  [bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x267c95) [0x7fb152fbbc95]
  [bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::GetMKLDNNData(mkldnn::memory::primitive_desc const&) const+0x1f4) [0x7fb15533c944]
  [bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::GetWeights(mxnet::NDArray const&, mkldnn::memory::primitive_desc const&, int)+0x21) [0x7fb152fc29c1]
  [bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x4da) [0x7fb152fde26a]
  [bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x434) [0x7fb152fdf9b4]
  [bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x73b81a) [0x7fb15348f81a]
  [bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x24298f7) [0x7fb15517d8f7]
  [bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2431dbf) [0x7fb155185dbf]

Failing code:

import cv2
import numpy as np
import mxnet as mx
from collections import namedtuple

# load_image
# Description: load image for imagenet model testing
# Returns formatted image as numpy array
def load_image():
    fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true')
    img = cv2.imread(fname)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (224, 224,))
    img = np.swapaxes(img, 0, 2)
    img = np.swapaxes(img, 1, 2)
    img = img[np.newaxis, :]
    return img

# download model
mx.test_utils.download('http://data.mxnet.io/models/imagenet/resnext/50-layers/resnext-50-symbol.json')
mx.test_utils.download('http://data.mxnet.io/models/imagenet/resnext/50-layers/resnext-50-0000.params')

ctx = mx.cpu()

# load model
sym, arg_params, aux_params = mx.model.load_checkpoint('resnext-50', 0)
mod = mx.mod.Module(symbol=sym, context=ctx)

# bind model
exe = mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes)
mod.set_params(arg_params, aux_params, allow_missing=True)

# setup batch
img = load_image()
Batch = namedtuple('Batch', ['data'])
data = Batch([mx.nd.array(img)])

# inference
mod.forward(data, is_train=False)
results = mod.get_outputs()[0].asnumpy()

@PatricZhao @ZhennanQin @TaoLv @juliusshufan can you please help debug?

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Installation, Build

@vrakesh
Copy link
Contributor

vrakesh commented Aug 2, 2019

@mxnet-label-bot add [MKLDNN]

@pengzhao-intel
Copy link
Contributor

@samskalicky thanks to reporting the issue.
@TaoLv please help take a look, we need to fix it in 1.5.1 if this is a valid issue.

@TaoLv
Copy link
Member

TaoLv commented Aug 3, 2019

@samskalicky Could you try the latest nightly build? I think it should be fixed on the master branch already.

pip install mxnet-mkl --pre

@samskalicky
Copy link
Contributor Author

Thanks @TaoLv, it does work with mxnet-mkl --pre

Can you share the PR that has the fix? Is it a fix that can go into 1.5.1, or is it part of a new feature that will have to wait until 1.6.0?

@pengzhao-intel
Copy link
Contributor

@samskalicky is this from the real customer case? If so, we need to pick up to 1.5.1.

@samskalicky
Copy link
Contributor Author

@pengzhao-intel This is from a public model in the MXNet model zoo. Can you share the PR with the fix and describe the complexity of including that in 1.5.1 patch release?

@TaoLv
Copy link
Member

TaoLv commented Aug 6, 2019

Thanks @TaoLv, it does work with mxnet-mkl --pre

Can you share the PR that has the fix? Is it a fix that can go into 1.5.1, or is it part of a new feature that will have to wait until 1.6.0?

Should be #15692. But since v1.5.x is using a different version of MKL-DNN, could you try to apply this patch to v1.5.x manually and check if the issue is fixed there? If so, I will add it to the list for 1.5.1 patch release. Thank you!

@samskalicky
Copy link
Contributor Author

samskalicky commented Aug 6, 2019 via email

@samskalicky
Copy link
Contributor Author

@TaoLv I tried building with the v1.5.x branch and confirmed that it failed. I cherry picked your PR in and rebuilt, and the test worked! Lets include this in the 1.5.1 patch release.

@TaoLv
Copy link
Member

TaoLv commented Aug 6, 2019

Thanks for the prompt response @samskalicky . I will update the wiki page accordingly.

@TaoLv
Copy link
Member

TaoLv commented Aug 8, 2019

@samskalicky #15801 is merged to v1.5.x branch. I'm now closing this issue. Thanks!

@TaoLv TaoLv closed this as completed Aug 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants