Add the argsort operator #11174

kuke · 2018-06-05T00:10:07Z

reyoung

BTW, maybe thrust::sort can be used for GPU implementation.

https://thrust.github.io/doc/group__sorting.html#ga1099d781e06c43805be06a918f7b7499

qingqing01 · 2018-06-11T06:50:13Z

paddle/fluid/operators/argsort_op.cc

+                   "Output(Indices) of ArgsortOp should not be null.");
+
+    auto in_dims = ctx->GetInputDim("X");
+    int axis = static_cast<int>(ctx->Attrs().Get<int>("axis"));


Remove static_cast<int>() .

qingqing01 · 2018-06-11T06:53:21Z

paddle/fluid/operators/argsort_op.cc

+    AddInput("X", "(Tensor) The input of Argsort op.");
+    AddOutput("Out", "(Tensor) The sorted tensor of Argsort op.");
+    AddOutput("Indices",
+              "(Tensor) The indices of a tensor giving the sorted order.");


Give the shape for Out and Indices.

qingqing01 · 2018-06-11T06:53:35Z

paddle/fluid/operators/argsort_op.h

+    auto* input = ctx.Input<framework::Tensor>("X");
+    auto* output = ctx.Output<framework::Tensor>("Out");
+    auto* indices = ctx.Output<framework::Tensor>("Indices");
+    int axis = static_cast<int>(ctx.Attr<int>("axis"));


Remove static_cast<int>()

qingqing01 · 2018-06-11T06:54:18Z

paddle/fluid/operators/argsort_op.cc

+    PADDLE_ENFORCE(axis >= 0 || axis == -1,
+                   "Attr(axis) %d of ArgsortOp must be nonnegative or equal to "
+                   "-1.",
+                   axis);


If axis < 0, we can re-set the axis = in_dims.size() + axis ? not limited to -1 for the negative value.

sneaxiy · 2018-06-11T08:25:54Z

paddle/fluid/operators/argsort_op.h

+        out_data[index] = in_vec[j].first;
+        idx_data[index] = in_vec[j].second;
+      }
+    }


Line 40-73 can be changed to be more efficient and save memory used.

int64_t part_dims_prod = input->numel() / in_dims[axis]; int64_t step = 1; for (int64_t i = in_dims.size()-1; i > axis; --i) step *= in_dims[i]; std::vector<int64_t> org_index_vec(in_dims.size()); std::vector<int64_t> idx_vec(in_dims.size()); idx_vec[axis] = 0; for (int64_t i = 0; i < part_dims_prod; ++i) { for (int64_t dim = in_dims.size() - 1; dim >= 0; --dim) { if (dim != axis) { idx_vec[dim] = idx % in_dims[dim]; idx /= in_dims[dim]; } } int64_t start_index = idx_vec[0]; for (int64_t dim = 1; dim < in_dims.size(); ++dim) { start_index = start_index * in_dims[dim] + idx_vec[dim]; } for (int64_t j = 0; j < in_dims.size(); ++j) { org_index_vec[j] = start_index + j*step; } std::sort( org_index_vec.begin(), org_index_vec.end(), [in_data](int64_t idx1, int64_t idx2) { return in_data[idx1] < in_data[idx2]; }); for (size_t j = 0; j < org_index_vec.size(); ++j) { int64_t org_index = org_index_vec[j]; int64_t ret_index = start_index + j*step; out_data[ret_index] = in_data[org_index]; idx_data[ret_index] = org_index; } }

Thanks! It is a good idea to only sort the index, and I made the change. Please take a look.

kuke · 2018-06-12T16:50:36Z

@reyoung Yes, we can use thrust::sort here. But it may be not straightforward as you expected, for the objects to sort are some sub-vectors consisting of elements with discontinuous address.

qingqing01 · 2018-06-29T03:47:14Z

paddle/fluid/operators/argsort_op.cu

+
+    auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
+                      ctx.device_context())
+                      .stream();


auto stream = ctx.cuda_device_context().stream();

qingqing01 · 2018-06-29T04:39:36Z

paddle/fluid/operators/argsort_op.cu

+                                 int64_t* med_ids) {
+  int64_t index = threadIdx.x + blockDim.x * blockIdx.x;
+  if (index < n) {
+    const int max_rank = 9;  // Max rank of a tensor allow in Fluid


Move this constant variable before line 19.

const int kMaxRank = 6;

Do you mean outside the kernel function? Then done.

qingqing01 · 2018-06-29T04:49:10Z

python/paddle/fluid/tests/unittests/test_argsort_op.py

+class TestArgsortOp(OpTest):
+    def setUp(self):
+        self.init_axis()
+        x = np.random.random((2, 3, 4, 5)).astype("float32")


This unit testing has no gradient checking. so, better to use large shape here to coverage more case, since PADDLE_CUDA_NUM_THREADS is large.

qingqing01 · 2018-06-29T06:57:15Z

python/paddle/fluid/layers/tensor.py

@@ -442,6 +443,56 @@ def argmax(x, axis=0):
    return out


+def argsort(input, axis=-1):


Need unit testing in https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/test_layers.py

Better add name.

def argsort(input, axis=-1, name=None):

kuke

All done, thx!

kuke · 2018-06-30T06:07:08Z

paddle/fluid/operators/argsort_op.cu

+
+    auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
+                      ctx.device_context())
+                      .stream();


kuke · 2018-06-30T06:07:24Z

python/paddle/fluid/layers/tensor.py

@@ -442,6 +443,56 @@ def argmax(x, axis=0):
    return out


+def argsort(input, axis=-1):


kuke · 2018-06-30T06:07:34Z

python/paddle/fluid/tests/unittests/test_argsort_op.py

+class TestArgsortOp(OpTest):
+    def setUp(self):
+        self.init_axis()
+        x = np.random.random((2, 3, 4, 5)).astype("float32")


kuke · 2018-06-30T06:07:54Z

paddle/fluid/operators/argsort_op.cu

+                                 int64_t* med_ids) {
+  int64_t index = threadIdx.x + blockDim.x * blockIdx.x;
+  if (index < n) {
+    const int max_rank = 9;  // Max rank of a tensor allow in Fluid


Do you mean outside the kernel function? Then done.

qingqing01

LGTM.

Add the argsort operator

Add the argsort operator

4760f28

qingqing01 added this to In Progress in Computer Vision: Faster-RCNN Jun 5, 2018

qingqing01 requested review from wanghaoshuang and qingqing01 June 5, 2018 02:08

Remove redundant code

2c2120c

reyoung reviewed Jun 7, 2018

View reviewed changes

qingqing01 reviewed Jun 11, 2018

View reviewed changes

sneaxiy reviewed Jun 11, 2018

View reviewed changes

Yibing Liu added 5 commits June 12, 2018 01:43

Add gpu kernel for argsort op

6ee22c4

Compute target index on gpu

42645ff

Support more negative axes in argsort_op

94e72ea

Simplify the computation in cpu

98460c0

Merge branch 'develop' of upstream into argsort_dev

28a0ac5

Yibing Liu added 5 commits June 17, 2018 01:44

Avoid using dynamic array in cuda kernel

92cfa2b

Merge branch 'develop' of upstream into argsort_dev

7ca511e

Add python api for argsort_op

a523b6f

Merge branch 'develop' of upstream into argsort_dev

e710d2c

Merge branch 'develop' of upstream into argsort_dev

9c69fdf

qingqing01 reviewed Jun 29, 2018

View reviewed changes

Enhance cuda code & unittest for argsort_op

9386ac0

kuke commented Jun 30, 2018

View reviewed changes

qingqing01 approved these changes Jul 2, 2018

View reviewed changes

kuke merged commit 5f79c7f into PaddlePaddle:develop Jul 2, 2018

Computer Vision: Faster-RCNN automation moved this from In Progress to Done Jul 2, 2018

kuke pushed a commit to kuke/Paddle that referenced this pull request Aug 25, 2018

Merge pull request PaddlePaddle#11174 from kuke/argsort_dev

1e96642

Add the argsort operator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the argsort operator #11174

Add the argsort operator #11174

kuke commented Jun 5, 2018 •

edited

Loading

reyoung left a comment

qingqing01 Jun 11, 2018

kuke Jun 12, 2018

qingqing01 Jun 11, 2018

kuke Jun 12, 2018

qingqing01 Jun 11, 2018

kuke Jun 12, 2018

qingqing01 Jun 11, 2018

kuke Jun 12, 2018

sneaxiy Jun 11, 2018 •

edited

Loading

kuke Jun 12, 2018

sneaxiy Jun 13, 2018

kuke commented Jun 12, 2018

qingqing01 Jun 29, 2018

kuke Jun 30, 2018

qingqing01 Jun 29, 2018 •

edited

Loading

kuke Jun 30, 2018

qingqing01 Jun 29, 2018

kuke Jun 30, 2018

qingqing01 Jun 29, 2018

kuke Jun 30, 2018

kuke left a comment

kuke Jun 30, 2018

kuke Jun 30, 2018

kuke Jun 30, 2018

kuke Jun 30, 2018

qingqing01 left a comment

		@@ -442,6 +443,56 @@ def argmax(x, axis=0):
		return out


		def argsort(input, axis=-1):

Add the argsort operator #11174

Add the argsort operator #11174

Conversation

kuke commented Jun 5, 2018 • edited Loading

reyoung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sneaxiy Jun 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke commented Jun 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 Jun 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

kuke commented Jun 5, 2018 •

edited

Loading

sneaxiy Jun 11, 2018 •

edited

Loading

qingqing01 Jun 29, 2018 •

edited

Loading