Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mx.sym.argsort() cannot sort array with large tensor #7510

Closed
Yre opened this issue Aug 17, 2017 · 6 comments · Fixed by dmlc/mshadow#285
Closed

mx.sym.argsort() cannot sort array with large tensor #7510

Yre opened this issue Aug 17, 2017 · 6 comments · Fixed by dmlc/mshadow#285

Comments

@Yre
Copy link

Yre commented Aug 17, 2017

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: ubuntu 14.04
Compiler: GCC4.4.7
Package used (Python/R/Scala/Julia): python
MXNet version: 0.11.0
MXNet commit hash : 568b5a2

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

import mxnet as mx
import numpy as np

value = mx.sym.Variable('data')
sorted_adj = mx.sym.argsort(value, axis = 2)

coord_data = np.random.rand(32, 2048, 2048)
coord_blob = mx.nd.array(coord_data, mx.gpu())
e = sorted_adj.bind(mx.gpu(), {'data':coord_blob})
y = e.forward()

result =  y[0].asnumpy()

vis = np.zeros(2048)
for i in range (2048):
    vis[int(result[4, 0, i])] = 1
cnt = 0
for i in range (2048):
    if vis[i] == 0:
        cnt += 1
print cnt

Steps to reproduce

  1. The provided code will give the cnt!=0, which should be 0 because the argsort should return all the integer from 0 to 2047.
  2. If the result[4, 0, i] has been changed to result[3, 0, i] , the result will be correct again.

What have you tried to solve it?

1.If the size of coord_data is (10000, 2048), and the axis in argsort changed to 1, then result[8000, :] will cover from 0 to 2047 as desired; but vis[int(result[9000, i])] will be incorrect again. (I guess it may because 8000<4*2048<9000)
2. By several tests, it seems that the argsort function can only deal with the first 2^24 element.

@piiswrong
Copy link
Contributor

piiswrong commented Aug 17, 2017

@sxjscience @reminisce

@sxjscience
Copy link
Member

I've confirmed. This is a bug. I'll look into what caused it. It may be related to the way I do batched sort.

@sxjscience
Copy link
Member

sxjscience commented Aug 18, 2017

I'll switch to the cub to have a try.

@sxjscience
Copy link
Member

OK, I've found the problem. It's this line https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/ordering_op-inl.h#L207. When there are too many elements, the real_t type is not precise enough to store all the index values.

@sxjscience
Copy link
Member

I'll pr the fix after the MShadow side is merged.

@Yre
Copy link
Author

Yre commented Aug 21, 2017

Thank you so much for your timely update! @sxjscience

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants