Inconsistent behavior in split() when num_outputs=1 #14064

jgasthaus · 2019-02-04T11:21:40Z

Description

The behaviour of NDArray.split() is inconsistent and surprising when num_outputs=1: When num_outputs > 1, split() returns a list containing the individual split elements, but when num_outputs=1, the single resulting array is returned directly, without being wrapped in a list.

In[42]: type(nd.ones((2, 1, 4, 5)).split(num_outputs=1, axis=1))
Out[42]: mxnet.ndarray.ndarray.NDArray

In[43]: type(nd.ones((2, 2, 4, 5)).split(num_outputs=2, axis=1))
Out[43]: list

If this is the intended behavior, it appears to be undocumented.

Environment info (Required)

----------Python Info----------
Version      : 3.6.1
Compiler     : GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)
Build        : ('default', 'Mar 23 2017 16:49:01')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.1
Directory    : ***/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.1
Directory    : ***/lib/python3.6/site-packages/mxnet
Commit Hash   : 19c501680183237d52a862e6ae1dc4ddc296305b
----------System Info----------
Platform     : Darwin-15.6.0-x86_64-i386-64bit
system       : Darwin
node         : ***
release      : 15.6.0
version      : Darwin Kernel Version 15.6.0: Thu Jun 21 20:07:40 PDT 2018; root:xnu-3248.73.11~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT FPU_CSDS'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.4511 sec, LOAD: 0.6549 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0768 sec, LOAD: 1.3912 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0529 sec, LOAD: 1.3386 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0608 sec, LOAD: 1.5901 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0612 sec, LOAD: 1.2379 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0534 sec, LOAD: 1.0818 sec.

Package used (Python/R/Scala/Julia):
Python

The text was updated successfully, but these errors were encountered:

mxnet-label-bot · 2019-02-04T11:21:44Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

ChaiBapchya · 2019-02-04T22:28:39Z

@jgasthaus Thanks for pointing it out. Yes, you are right, it can be documented.
Do you want to push a PR on the same? Looks like a minor doc change. I can help you with that if you want.

It is not a bug, since it follows the definition
Also reflected in case of numpy.split
https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.split.html

@mxnet-label-bot add [Question, Doc, Good First Issue]

szha · 2019-02-05T02:35:29Z

To me, such diverging return type is an anti-pattern that complicates all downstream code, and it should be addressed as part of MXNet 2.0 #9686.

jgasthaus · 2019-02-05T12:27:14Z

@ChaiBapchya Sure, I can try to submit a doc-change PR.

I agree with @szha that this might be a good candidate for a breaking API change in 2.0. We have a few places in our code where we now have to branch because of this. It can also lead to fairly subtle bugs, especially since both lists and NDArrays support indexing, so you might not notice that you are indexing the returned array along the 0-th axis as opposed to the list as intended.

Also note that this is not consistent with numpy's split() behavior:

In[42]: type(np.split(np.ones((2, 1, 4, 5)), 1, axis=1))
Out[42]: list
In[42]: type(np.split(np.ones((2, 2, 4, 5)), 2, axis=1))
Out[49]: list

jgasthaus · 2019-02-05T13:13:48Z

Actually, I just noticed that the behavior differs between mx.symbol and mx.ndarray, so maybe this should be considered a bug after all.

In[1]: x = mx.sym.ones((2, 2)).split(num_outputs=1, axis=0)[0]
In[2]: ex = x.simple_bind(mx.cpu())
In[3]: ex.forward()
Out[3]: 
[
 [[1. 1.]
  [1. 1.]]
 <NDArray 2x2 @cpu(0)>]

In[4]: mx.nd.ones((2, 2)).split(num_outputs=1, axis=0)[0]
Out[4]: 
[1. 1.]
<NDArray 2 @cpu(0)>

ChaiBapchya · 2019-02-05T16:48:20Z

Delving a bit deeper, here's what I found

Similar behavior between NDArray and Symbol for num_outputs=2

NDArray num_ouputs=2

>>> b=mx.nd.ones((2, 2)).split(num_outputs=2, axis=0)
>>> b
[
[[1. 1.]]
<NDArray 1x2 @cpu(0)>, 
[[1. 1.]]
<NDArray 1x2 @cpu(0)>]

Symbol num_ouputs=2

>>> x = mx.sym.ones((2, 2)).split(num_outputs=2, axis=0)
>>> ex = x.simple_bind(mx.cpu())
>>> ex.forward()
[
[[1. 1.]]
<NDArray 1x2 @cpu(0)>, 
[[1. 1.]]
<NDArray 1x2 @cpu(0)>]

Abnormal behavior between NDArray and Symbol for num_outputs=2

NDArray num_ouputs=1

>>> b=mx.nd.ones((2, 2)).split(num_outputs=1, axis=0)
>>> b

[[1. 1.]
 [1. 1.]]
<NDArray 2x2 @cpu(0)>

Symbol num_ouputs=1

>>> x = mx.sym.ones((2, 2)).split(num_outputs=1, axis=0)[0]
>>> ex = x.simple_bind(mx.cpu())
>>> ex.forward()
[
[[1. 1.]
 [1. 1.]]
<NDArray 2x2 @cpu(0)>]

szha · 2020-07-29T18:53:24Z

in 2.x we will focus on numpy array instead of ndarray. the np.split is now behaving consistently.

In [15]: mx.np.split(a, 2)
Out[15]: [array([[1., 1., 1.]]), array([[1., 1., 1.]])]

In [16]: mx.np.split(a, 1)
Out[16]:
[array([[1., 1., 1.],
        [1., 1., 1.]])]

marcoabreu added Doc good first issue Question labels Feb 4, 2019

hzfan mentioned this issue Apr 26, 2019

tuple unpacking inconsistent for symbol and ndarray namespaces #14695

Open

szha added v1.x Targeting v1.x branch NDArray won't fix and removed good first issue labels Jul 29, 2020

szha closed this as completed Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior in split() when num_outputs=1 #14064

Inconsistent behavior in split() when num_outputs=1 #14064

jgasthaus commented Feb 4, 2019

mxnet-label-bot commented Feb 4, 2019

ChaiBapchya commented Feb 4, 2019

szha commented Feb 5, 2019

jgasthaus commented Feb 5, 2019

jgasthaus commented Feb 5, 2019

ChaiBapchya commented Feb 5, 2019

szha commented Jul 29, 2020

Inconsistent behavior in split() when num_outputs=1 #14064

Inconsistent behavior in split() when num_outputs=1 #14064

Comments

jgasthaus commented Feb 4, 2019

Description

Environment info (Required)

mxnet-label-bot commented Feb 4, 2019

ChaiBapchya commented Feb 4, 2019

szha commented Feb 5, 2019

jgasthaus commented Feb 5, 2019

jgasthaus commented Feb 5, 2019

ChaiBapchya commented Feb 5, 2019

Similar behavior between NDArray and Symbol for num_outputs=2

NDArray num_ouputs=2

Symbol num_ouputs=2

Abnormal behavior between NDArray and Symbol for num_outputs=2

NDArray num_ouputs=1

Symbol num_ouputs=1

szha commented Jul 29, 2020