Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the argsort operator #11174

Merged
merged 13 commits into from
Jul 2, 2018
Merged

Add the argsort operator #11174

merged 13 commits into from
Jul 2, 2018

Conversation

kuke
Copy link
Contributor

@kuke kuke commented Jun 5, 2018

Resolve #11399

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, maybe thrust::sort can be used for GPU implementation.

https://thrust.github.io/doc/group__sorting.html#ga1099d781e06c43805be06a918f7b7499

"Output(Indices) of ArgsortOp should not be null.");

auto in_dims = ctx->GetInputDim("X");
int axis = static_cast<int>(ctx->Attrs().Get<int>("axis"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove static_cast<int>() .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

AddInput("X", "(Tensor) The input of Argsort op.");
AddOutput("Out", "(Tensor) The sorted tensor of Argsort op.");
AddOutput("Indices",
"(Tensor) The indices of a tensor giving the sorted order.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give the shape for Out and Indices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto* input = ctx.Input<framework::Tensor>("X");
auto* output = ctx.Output<framework::Tensor>("Out");
auto* indices = ctx.Output<framework::Tensor>("Indices");
int axis = static_cast<int>(ctx.Attr<int>("axis"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove static_cast<int>()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

PADDLE_ENFORCE(axis >= 0 || axis == -1,
"Attr(axis) %d of ArgsortOp must be nonnegative or equal to "
"-1.",
axis);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If axis < 0, we can re-set the axis = in_dims.size() + axis ? not limited to -1 for the negative value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

out_data[index] = in_vec[j].first;
idx_data[index] = in_vec[j].second;
}
}
Copy link
Collaborator

@sneaxiy sneaxiy Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 40-73 can be changed to be more efficient and save memory used.

int64_t part_dims_prod = input->numel() / in_dims[axis];
int64_t step = 1;
for (int64_t i = in_dims.size()-1; i > axis; --i) step *= in_dims[i];

std::vector<int64_t> org_index_vec(in_dims.size());
std::vector<int64_t> idx_vec(in_dims.size());
idx_vec[axis] = 0;
for (int64_t i = 0; i < part_dims_prod; ++i) {
  for (int64_t dim = in_dims.size() - 1; dim >= 0; --dim) {
    if (dim != axis) {
      idx_vec[dim] = idx % in_dims[dim];
      idx /= in_dims[dim];
    }
  }

  int64_t start_index = idx_vec[0];
  for (int64_t dim = 1; dim < in_dims.size(); ++dim) {
    start_index = start_index * in_dims[dim] + idx_vec[dim];
  }

  for (int64_t j = 0; j < in_dims.size(); ++j) {
    org_index_vec[j] = start_index + j*step;
  }

  std::sort(
      org_index_vec.begin(), org_index_vec.end(),
      [in_data](int64_t idx1, int64_t idx2) {
        return in_data[idx1] < in_data[idx2];
      });

  for (size_t j = 0; j < org_index_vec.size(); ++j) {
    int64_t org_index = org_index_vec[j];
    int64_t ret_index = start_index + j*step;
    out_data[ret_index] = in_data[org_index];
    idx_data[ret_index] = org_index;
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It is a good idea to only sort the index, and I made the change. Please take a look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@kuke
Copy link
Contributor Author

kuke commented Jun 12, 2018

@reyoung Yes, we can use thrust::sort here. But it may be not straightforward as you expected, for the objects to sort are some sub-vectors consisting of elements with discontinuous address.


auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context())
.stream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto stream = ctx.cuda_device_context().stream();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int64_t* med_ids) {
int64_t index = threadIdx.x + blockDim.x * blockIdx.x;
if (index < n) {
const int max_rank = 9; // Max rank of a tensor allow in Fluid
Copy link
Contributor

@qingqing01 qingqing01 Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this constant variable before line 19.

const int kMaxRank = 6;  

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean outside the kernel function? Then done.

class TestArgsortOp(OpTest):
def setUp(self):
self.init_axis()
x = np.random.random((2, 3, 4, 5)).astype("float32")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit testing has no gradient checking. so, better to use large shape here to coverage more case, since PADDLE_CUDA_NUM_THREADS is large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -442,6 +443,56 @@ def argmax(x, axis=0):
return out


def argsort(input, axis=-1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Need unit testing in https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/test_layers.py

  2. Better add name.

def argsort(input, axis=-1, name=None):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done, thx!


auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context())
.stream();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -442,6 +443,56 @@ def argmax(x, axis=0):
return out


def argsort(input, axis=-1):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

class TestArgsortOp(OpTest):
def setUp(self):
self.init_axis()
x = np.random.random((2, 3, 4, 5)).astype("float32")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int64_t* med_ids) {
int64_t index = threadIdx.x + blockDim.x * blockIdx.x;
if (index < n) {
const int max_rank = 9; // Max rank of a tensor allow in Fluid
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean outside the kernel function? Then done.

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kuke kuke merged commit 5f79c7f into PaddlePaddle:develop Jul 2, 2018
Computer Vision: Faster-RCNN automation moved this from In Progress to Done Jul 2, 2018
kuke pushed a commit to kuke/Paddle that referenced this pull request Aug 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants