Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MKLDNN-1.0 doesn't support slice operator for Large Tensor #16732

Closed
access2rohit opened this issue Nov 5, 2019 · 15 comments
Closed

MKLDNN-1.0 doesn't support slice operator for Large Tensor #16732

access2rohit opened this issue Nov 5, 2019 · 15 comments

Comments

@access2rohit
Copy link
Contributor

access2rohit commented Nov 5, 2019

Description

when MXNet is built for CPU MKL slice operator doesn't work.

Error Message

could not initialize a sub-memory

To Reproduce

Use MXNET cpu build with MKL and MKLDNN enabled from master

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Run command MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/nightly/test_large_array.py:test_slice

Environment

Ubuntu 16.04 DeepLearning AMI

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

----------Python Info----------
Version      : 3.6.4
Compiler     : GCC 7.2.0
Build        : ('default', 'Jan 16 2018 18:10:19')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.0
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Version      : 1.6.0
Directory    : /home/ubuntu/incubator-mxnet/python/mxnet
Num GPUs     : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-1095-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-82-110
release      : 4.4.0-1095-aws
version      : #106-Ubuntu SMP Wed Sep 18 13:33:48 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2700.882
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.08
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0127 sec, LOAD: 0.4722 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0003 sec, LOAD: 0.3578 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0976 sec, LOAD: 0.0698 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0259 sec, LOAD: 0.1256 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.1084 sec, LOAD: 0.1252 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0288 sec, LOAD: 0.4309 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0025 sec, LOAD: 0.0944 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0015 sec, LOAD: 0.0324 sec.
@access2rohit access2rohit changed the title MKLDNN-1.0 doesn't support slice operator MKLDNN-1.0 doesn't support slice operator for Large Tensor Nov 5, 2019
@access2rohit
Copy link
Contributor Author

@pengzhao-intel @TaoLv

@access2rohit
Copy link
Contributor Author

@mxnet-label-bot add [MKLDNN]

@pengzhao-intel
Copy link
Contributor

@rongzha1 @wuxun-zhang could you take a look ASAP?

@pengzhao-intel
Copy link
Contributor

@access2rohit is this a necessary part for r1.6 or we can fix in master?

@wuxun-zhang
Copy link
Contributor

@pengzhao-intel I am looking into this.

@rongzha1
Copy link
Contributor

rongzha1 commented Nov 6, 2019

Can't reproduce this case in our skylake machine . Will keep debug
(mxnet) [rongzha1@mlt-skx141 rong_git_mxnet]$ MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/nightly/test_large_array.py:test_slice
test_large_array.test_slice ... mkldnn_verbose,info,Intel(R) MKL-DNN v1.0.0 (Git Hash 553c23fc020dfda19f8145e92e57b0e40ecdff56),Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,create,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.00610352
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.092041
ok

@rongzha1
Copy link
Contributor

rongzha1 commented Nov 6, 2019

@access2rohit could you add some bt info? thanks

@TaoLv
Copy link
Member

TaoLv commented Nov 6, 2019

@rongzha1 Have you ever tried to build MXNet with USE_INT64_TENSOR_SIZE=1?

@wuxun-zhang
Copy link
Contributor

@TaoLv Yes, we have enabled the int64 flag, building command is make USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INT64_TENSOR_SIZE=1 USE_INTEL_PATH=/opt/intel -j.

I also cannot reproduce this issue. However, I found another bug about the offset assignment in slice and will file a PR soon.

@wuxun-zhang
Copy link
Contributor

Also tested with AWS EC2 m5.8 instance, and found no error (master commit 3c404a5).

ubuntu@ip-172-29-133-38:~/incubator-mxnet/tests/nightly$ MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s test_large_array.py:test_slice
test_large_array.test_slice ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.045166
ok

----------------------------------------------------------------------
Ran 1 test in 1.625s

OK

Env Configuration:

----------Python Info----------
Version      : 3.6.6
Compiler     : GCC 7.2.0
Build        : ('default', 'Jun 28 2018 17:14:51')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.3.1
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/ubuntu/incubator-mxnet/python/mxnet
Num GPUs     : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-1096-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-29-133-38
release      : 4.4.0-1096-aws
version      : #107-Ubuntu SMP Thu Oct 3 01:51:58 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping:              4
CPU MHz:               2499.998
BogoMIPS:              4999.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              33792K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0013 sec, LOAD: 0.4353 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0002 sec, LOAD: 0.5455 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1184 sec, LOAD: 0.0858 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0190 sec, LOAD: 0.2150 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0027 sec, LOAD: 0.1508 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0250 sec, LOAD: 0.4320 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0011 sec, LOAD: 0.1035 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0017 sec, LOAD: 0.0272 sec.

@access2rohit
Copy link
Contributor Author

access2rohit commented Nov 6, 2019

@pengzhao-intel I tried with branch v1.6.x @rongzha1 can you try with that branch too.
Also, I never got this message in my run

mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.045166

It seems this was run on an instance that supports AVX512. Can you try on an instance that has just AVX2 ?

@access2rohit
Copy link
Contributor Author

access2rohit commented Nov 6, 2019

@access2rohit is this a necessary part for r1.6 or we can fix in master?

@pengzhao-intel
Yes, it is very important for Deep Graph Library(DGL) support that relies heavily on slice operator.

@wuxun-zhang
Copy link
Contributor

wuxun-zhang commented Nov 6, 2019

You can try to use export MKLDNN_VERBOSE=1 to get these logs.

Also I just filed a PR related to slice op, but not sure if it will resolve this issue. Could you help double check?

@access2rohit
Copy link
Contributor Author

access2rohit commented Nov 7, 2019

PR: #16737
fixes the issue

@wuxun-zhang
Copy link
Contributor

Glad it works.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants