Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roi pool operator #5831

Merged
merged 6 commits into from
Nov 24, 2017
Merged

Roi pool operator #5831

merged 6 commits into from
Nov 24, 2017

Conversation

wanghaox
Copy link
Contributor

resolve #5788

Add CPU forward and backward kernel code
Add GPU forward and backward kernel code
Add unitest

wangkuiyi
wangkuiyi previously approved these changes Nov 22, 2017

#define FLT_MAX __FLT_MAX__

constexpr int PADDLE_OPERATORS_ROIPOOL_CUDA_NUM_THREADS = 512;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://google.github.io/styleguide/cppguide.html#Constant_Names, constants should be named as

constexpr int kPaddleOperatorsROIPoolNumCUDAThreads = 512;

Also, I vaguely remember (but not sure) that we can declare this variable static so to limit its usage within this file, and we can have a much shorter name for it:

static constexpr int kNumCUDAThreads = 512;

namespace paddle {
namespace operators {

#define FLT_MAX __FLT_MAX__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of __FLT_MAX__ is not a good idea because it is a GCC-specific predefined macro. If we are using another compiler other than GCC, it is probable that we don't have __FLT_MAX__ defined.

It seems to me that the solution could be using standard C library

#include <float.h>
printf("%f", FLT_MAX);

or the standard C++ library

#include <limits>
printf("%f", numeric_limits<float>::max());

This problem doesn't block the merge of this PR. I created an issue reminding us to fix it.

constexpr int PADDLE_OPERATORS_ROIPOOL_CUDA_NUM_THREADS = 512;
constexpr int PADDLE_OPERATORS_ROIPOOL_MAXIMUM_NUM_BLOCKS = 4096;

inline int PADDLE_OPERATORS_ROIPOOL_GET_BLOCKS(const int N) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you might want to mark this function returning a constant value, but according to the Google C++ Style Guide, we should name functions something like

static inline int NumBlocks(const int N) {

namespace paddle {
namespace operators {

class RoiPoolOp : public framework::OperatorWithKernel {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that in many places of our codebase, we haven't strictly followed the English writing rule that acronyms must be capitalized, but let us start doing so from here by changing Roi into ROI. Thanks!

int c = (index / pooled_width / pooled_height) % channels;
int n = index / pooled_width / pooled_height / channels;

const int64_t* offset_input_rois = input_rois + n * 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add a comment about the magic number 5, or define it a constexpr value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wangkuiyi
Copy link
Collaborator

Is this ROIPool operator being added into CMakeLists.txt? @wanghaox

"The spatial scale must greater than 0");

auto in_dims = in->dims();
int batch_size = in_dims[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be better to add dim ENFORCE of input before or in inferShape since the usage of in_dim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

int channels = in_dims[1];
int height = in_dims[2];
int width = in_dims[3];
int rois_num = rois->dims()[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar with in_dims, might it be better to add dim ENFORCE before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

"(Tensor), "
"ROIs (Regions of Interest) to pool over. "
"Should be a 2-D tensor of shape (num_rois, 5)"
"given as [[batch_id, x1, y1, x2, y2], …].");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be better to declare what each of [batch_id, x1, y1, x2, y2] represents for. And some spelling like Should should be correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X",
"(Tensor), "
"the input of ROIPoolOp.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addition of the shape and meaning(like feature maps from conv) of X might be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

out_dims[1] = channels;
out_dims[2] = pooled_height;
out_dims[3] = pooled_width;
out->Resize(out_dims);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be better to set shape in inferShape rather than shape resizing in here, but I am not sure whether actual shapes are necessary for inferShape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

"given as [[batch_id, x1, y1, x2, y2], …].");
AddOutput("Out",
"(Tensor), "
"ROI pooled output 4-D tensor of shape "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "The output of ROIPoolOp is a 4-D tensor with shape" is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


auto in_dims = in->dims();
auto in_stride = framework::stride(in_dims);
int channels = in_dims[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the CPU kernel, might it be better to add dim ENFORCE before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

out->mutable_data<T>(ctx.GetPlace()),
argmax->mutable_data<int64_t>(ctx.GetPlace()));

return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about deleting the return in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pooled_width,
x_grad->mutable_data<T>(ctx.GetPlace()));
}
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about deleting the return in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

auto spatial_scale = ctx.Attr<float>("spatial_scale");

int rois_num = rois->dims()[0];
int channels = in->dims()[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the CPU kernel, might it be better to add dim ENFORCE before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor Author

@wanghaox wanghaox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update code

: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X",
"(Tensor), "
"the input of ROIPoolOp.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"(Tensor), "
"ROIs (Regions of Interest) to pool over. "
"Should be a 2-D tensor of shape (num_rois, 5)"
"given as [[batch_id, x1, y1, x2, y2], …].");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"given as [[batch_id, x1, y1, x2, y2], …].");
AddOutput("Out",
"(Tensor), "
"ROI pooled output 4-D tensor of shape "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

out->mutable_data<T>(ctx.GetPlace()),
argmax->mutable_data<int64_t>(ctx.GetPlace()));

return;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pooled_width,
x_grad->mutable_data<T>(ctx.GetPlace()));
}
return;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int channels = in_dims[1];
int height = in_dims[2];
int width = in_dims[3];
int rois_num = rois->dims()[0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

auto spatial_scale = ctx.Attr<float>("spatial_scale");

int rois_num = rois->dims()[0];
int channels = in->dims()[1];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()


auto in_dims = in->dims();
auto in_stride = framework::stride(in_dims);
int channels = in_dims[1];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

for w in range(wstart, wend):
if x_i[c, h, w] > out_data[i, c, ph, pw]:
out_data[i, c, ph, pw] = x_i[c, h, w]
argmax_data[i, c, ph, pw] = h * \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python needs the '\'

size_t pool_channel_offset = pooled_height * pooled_width;
const int64_t* argmax_data = argmax->data<int64_t>();

for (size_t n = 0; n < rois_num; ++n) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ops::CPUROIPoolOpKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
roi_pool_grad,
ops::CPUROIPoolGradOpKernel<paddle::platform::CPUPlace, float>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to register double type. And please check whether the slice_sequence_op registered double type, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, slice_sequence_op will be fixed in another commit.

#pragma once
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h"
#include "paddle/operators/strided_memcpy.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header file is not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
using LoD = framework::LoD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments: #5826 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Define an empty pooling region to be zero
bool is_empty = (hend <= hstart) || (wend <= wstart);
output_data[pool_index] =
is_empty ? 0 : -std::numeric_limits<float>::max();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-std::numeric_limits<T>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int64_t* argmax_data = argmax->mutable_data<int64_t>(ctx.GetPlace());

math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), out, static_cast<T>(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to set out zero here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"The format of input tensor is NCHW.");
PADDLE_ENFORCE(rois_dims.size() == 2,
"ROIs should be a 2-D tensor of shape (num_rois, 5)"
"given as [[batch_id, x1, y1, x2, y2], …].");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs to check rois_dims[1] == kROISize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

set_zero(ctx.device_context(), out, static_cast<T>(0));
argmax->mutable_data<int64_t>(ctx.GetPlace());
math::SetConstant<Place, int64_t> set_init;
set_init(ctx.device_context(), argmax, static_cast<int64_t>(-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to set out zero and set argmax -1 here. For GPU, will launch two kernels twice here. Please remove these and handle them in GPUROIPoolForward correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (x_grad) {
x_grad->mutable_data<T>(ctx.GetPlace());
math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), x_grad, static_cast<T>(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, there is no need to set zero here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked, bp needs to set zero.

ops::GPUROIPoolOpKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
roi_pool_grad,
ops::GPUROIPoolGradOpKernel<paddle::platform::GPUPlace, float>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register double type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

wstart = min(max(wstart + roi_start_w, 0), self.width)
wend = min(max(wend + roi_start_w, 0), self.width)

out_data[i, c, ph, pw] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @guoshengCS said, why the code here is not consistant with C++:

// Define an empty pooling region to be zero
bool is_empty = (hend <= hstart) || (wend <= wstart);
output_data[pool_index] = is_empty ? 0 : -std::numeric_limits<float>::max();

Copy link
Contributor Author

@wanghaox wanghaox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update code

"The format of input tensor is NCHW.");
PADDLE_ENFORCE(rois_dims.size() == 2,
"ROIs should be a 2-D tensor of shape (num_rois, 5)"
"given as [[batch_id, x1, y1, x2, y2], …].");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ops::CPUROIPoolOpKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
roi_pool_grad,
ops::CPUROIPoolGradOpKernel<paddle::platform::CPUPlace, float>);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, slice_sequence_op will be fixed in another commit.

static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaxinumNumBlocks);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ops::GPUROIPoolOpKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
roi_pool_grad,
ops::GPUROIPoolGradOpKernel<paddle::platform::GPUPlace, float>);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#pragma once
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h"
#include "paddle/operators/strided_memcpy.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

set_zero(ctx.device_context(), out, static_cast<T>(0));
argmax->mutable_data<int64_t>(ctx.GetPlace());
math::SetConstant<Place, int64_t> set_init;
set_init(ctx.device_context(), argmax, static_cast<int64_t>(-1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (x_grad) {
x_grad->mutable_data<T>(ctx.GetPlace());
math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), x_grad, static_cast<T>(0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked, bp needs to set zero.


using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
using LoD = framework::LoD;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int64_t* argmax_data = argmax->mutable_data<int64_t>(ctx.GetPlace());

math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), out, static_cast<T>(0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

wstart = min(max(wstart + roi_start_w, 0), self.width)
wend = min(max(wend + roi_start_w, 0), self.width)

out_data[i, c, ph, pw] = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wanghaox wanghaox merged commit 0690cca into PaddlePaddle:develop Nov 24, 2017
@wanghaox wanghaox deleted the roi_pool branch November 24, 2017 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Port Operators
Awaiting triage
Development

Successfully merging this pull request may close these issues.

ROI Pooling operator
4 participants