Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctc batch inference, change im2sequence_op #10923

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions paddle/fluid/operators/im2sequence_op.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/fluid/operators/im2sequence_op.h"
#include <vector>

namespace paddle {
namespace operators {
Expand Down Expand Up @@ -53,14 +54,14 @@ class Im2SequenceOp : public framework::OperatorWithKernel {

class Im2SequenceOpMaker : public framework::OpProtoAndCheckerMaker {
public:
Im2SequenceOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
void Make() override {
AddInput("X",
"(Tensor) The input tensor has NCHW format."
"N: batch size"
"C: channels"
"H: height"
"W: width");
AddInput("Image_real_size", "Image real size.");
AddOutput("Out", "(LodTensor) The output data of im2sequence op,");
AddAttr<std::vector<int>>("kernels",
"(vector<int>), the "
Expand All @@ -73,6 +74,13 @@ class Im2SequenceOpMaker : public framework::OpProtoAndCheckerMaker {
"(vector<int> default:{0, 0, 0, 0}), the "
"paddings(up_pad, left_pad, down_pad, right_pad)")
.SetDefault({0, 0, 0, 0});
AddAttr<std::vector<int>>("out_stride",
"(vector<int> dedault:{1,1}),the out_stride "
" (out_stride_height, out_stride_width)")
.SetDefault({1, 1});
AddAttr<bool>("is_inference",
" nor 0 is inference, 0 is train")
.SetDefault({false});
AddComment(R"DOC(
This op uses kernels to scan images and converts these images to sequences.
After expanding, The number of time steps are output_height * output_width
Expand Down Expand Up @@ -147,8 +155,9 @@ class Im2SequenceGradOp : public framework::OperatorWithKernel {
} // namespace paddle

namespace ops = paddle::operators;
REGISTER_OP(im2sequence, ops::Im2SequenceOp, ops::Im2SequenceOpMaker,
im2sequence_grad, ops::Im2SequenceGradOp);
REGISTER_OPERATOR(im2sequence, ops::Im2SequenceOp, ops::Im2SequenceOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(im2sequence_grad, ops::Im2SequenceGradOp);
REGISTER_OP_CPU_KERNEL(
im2sequence,
ops::Im2SequenceKernel<paddle::platform::CPUDeviceContext, float>);
Expand Down
201 changes: 170 additions & 31 deletions paddle/fluid/operators/im2sequence_op.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
limitations under the License. */

#pragma once

#include <string>
#include <fstream>
#include "paddle/fluid/framework/data_layout.h"
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
Expand All @@ -38,8 +39,9 @@ class Im2SequenceKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
const Tensor* in = ctx.Input<Tensor>("X");
// TODO(fuhailong): add new data layer to solve multibatch inference
const Tensor* imgRealSize = ctx.Input<Tensor>("Image_real_size");
LoDTensor* out = ctx.Output<LoDTensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
// TODO(wanghaoshuang): Add layout checker after 'set_layout'
// being available for python API
// PADDLE_ENFORCE_EQ(in->layout(), framework::DataLayout::kNCHW,
Expand All @@ -49,41 +51,178 @@ class Im2SequenceKernel : public framework::OpKernel<T> {
int img_channels = in_dim[1];
int img_height = in_dim[2];
int img_width = in_dim[3];

auto imgRealSize_vec = imgRealSize->data<float>();
auto imgRealSize_dim = imgRealSize->dims();
auto kernels = ctx.Attr<std::vector<int>>("kernels");
auto strides = ctx.Attr<std::vector<int>>("strides");
auto paddings = ctx.Attr<std::vector<int>>("paddings");
int output_height = Im2SeqOutputSize(img_height, kernels[0], paddings[0],
paddings[2], strides[0]);
int output_width = Im2SeqOutputSize(img_width, kernels[1], paddings[1],
paddings[3], strides[1]);
auto out_stride = ctx.Attr<std::vector<int>>("out_stride");
auto is_inference = ctx.Attr<bool>("is_inference");
if (is_inference) {
if (batch_size == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size=1的时候, 图片应该是没有padding的吧,这种情况直接按is_inference=False来算?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batchsize = 1 的时候,直接走的原来的逻辑,我在inference的里面进行了判断

out->mutable_data<T>(ctx.GetPlace());
int output_height = Im2SeqOutputSize(img_height,
kernels[0],
paddings[0],
paddings[2],
strides[0]);
int output_width = Im2SeqOutputSize(img_width,
kernels[1],
paddings[1],
paddings[3],
strides[1]);
const std::vector<int> dilations({1, 1});
auto out_dims = out->dims();
out->Resize({batch_size, out->numel() / batch_size});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这个resize会报错吧? 因为你在line 63之前好像并没有计算正确的out_dims
你应该把line 63移到 line73之后,然后写成out->mutable_data<T>({1 * output_height* output_width, channel * kernel[0] * kernel[1] }, ctx.GetPlace());
或者直接让batch_size=1的情况走原来train的分支。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batchsize = 1 是走的train 分支

for (int i = 0; i < batch_size; i++) {
const Tensor src =
in->Slice(i, i + 1).Resize({img_channels,
img_height,
img_width});
Tensor dst = out->Slice(i, i + 1).Resize({output_height,
output_width,
img_channels,
kernels[0],
kernels[1]});

const std::vector<int> dilations({1, 1});
math::Im2ColFunctor<math::ColFormat::kOCF, DeviceContext, T> f;
auto& dev_ctx = ctx.template device_context<DeviceContext>();
f(dev_ctx, src, dilations, strides, paddings, &dst);
}
out->Resize(out_dims);
// set lod information
// TODO(wanghaoshuang): Move this to InferShape
framework::LoD lod(1);
lod[0].reserve(batch_size + 1);
int offset = 0;
lod[0].push_back(offset);
for (int i = 0; i < batch_size; ++i) {
offset += output_height * output_width;
lod[0].push_back(offset);
}
out->set_lod(lod);
} else {
std::vector<int> imgReal_H;
std::vector<int> imgReal_W;
for (int i = 0; i < batch_size; i++) {
int tmp_real_H = int(imgRealSize_vec[2 * i]);
int tmp_real_W = int(imgRealSize_vec[2 * i + 1]);
for (int j = 0; j < out_stride[0]; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out_stride[0]的意思不是原图缩小了多少倍么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以我们现在的,模型为例子,原图是缩小16倍,如果width可以整除,我们直接除是没有影响的,如果width不可以整除16, 模型里面的逻辑是向上取整,而直接除则是向下取整,所以这个地方不能直接除,而是做了循环除。我们可以将这个out_stride属性改一下,改成做了多少次卷积,同时还得传入conv kernel 的大小,我觉得这样可能会更好一些

tmp_real_H = tmp_real_H / 2 + tmp_real_H % 2;
tmp_real_W = tmp_real_W / 2 + tmp_real_W % 2;
}
imgReal_H.push_back(tmp_real_H);
imgReal_W.push_back(tmp_real_W);
}

auto out_dims = out->dims();
out->Resize({batch_size, out->numel() / batch_size});
for (int i = 0; i < batch_size; i++) {
const Tensor src =
in->Slice(i, i + 1).Resize({img_channels, img_height, img_width});
Tensor dst = out->Slice(i, i + 1).Resize(
{output_height, output_width, img_channels, kernels[0], kernels[1]});
// TODO(fuhailong): for loop to compute real output size
std::vector<int> output_height;
std::vector<int> output_width;
for (int i = 0; i < batch_size; i++) {
output_height.push_back(Im2SeqOutputSize(imgReal_H[i],
kernels[0],
paddings[0],
paddings[2],
strides[0]));
output_width.push_back(Im2SeqOutputSize(imgReal_W[i],
kernels[1],
paddings[1],
paddings[3],
strides[1]));
}
// TODO(fuhailong): compute dims of output
// call: out->mutable_data<T>(ctx.GetPlace(), output_dims);
int result = 0;
for (int i = 0; i < batch_size; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里一共有三个连着的0->batch_size的for循环, 有没有可能合并一下?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的合并应该是可以的

result += output_height[i] * output_width[i];
}

math::Im2ColFunctor<math::ColFormat::kOCF, DeviceContext, T> f;
auto& dev_ctx = ctx.template device_context<DeviceContext>();
f(dev_ctx, src, dilations, strides, paddings, &dst);
}
out->Resize(out_dims);

// set lod information
// TODO(wanghaoshuang): Move this to InferShape
framework::LoD lod(1);
lod[0].reserve(batch_size + 1);
for (int i = 0, offset = 0; i < batch_size + 1; ++i) {
lod[0].push_back(offset);
offset += output_height * output_width;
}
out->set_lod(lod);
}
out->mutable_data<T>({result, img_channels*kernels[0]*kernels[1]},
ctx.GetPlace());
// out->numel();
const std::vector<int> dilations({1, 1});
// TODO(fuhailong): out_dims has two index,
// out_dims[0] and out_dims[1],
// {batchsize*output_height*output_width,channel*kernel[0],*kernel[1]},
// multi batch ,the first place is output_height[i]*output_width[i].
auto out_dims = out->dims();
int offset_out = 0;

for (int i = 0; i < batch_size; i++) {
const Tensor src =
in->Slice(i, i + 1).Resize({img_channels,
img_height,
img_width});
// TODO(fuhailong): add image real size
Tensor dst = out->Slice(offset_out,
offset_out + output_height[i]*output_width[i]).Resize(
{output_height[i], output_width[i],
img_channels, kernels[0], kernels[1]});
offset_out += output_height[i]*output_width[i];

math::Im2ColFunctor<math::ColFormat::kOCF, DeviceContext, T> f;
// eq, kOCF cnn to rnn format
auto& dev_ctx = ctx.template device_context<DeviceContext>();
f(dev_ctx, src, dilations, strides, paddings, &dst);
}
out->Resize(out_dims);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的Resize好像并不需要,在is_inference=True and batch_size>1的情况下,在line 140之后,你并没有把out resize成其它shape吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,这个里面在最开始的时候,out->mutable就已经固定大小了,我再看一下

// set lod information
// TODO(wanghaoshuang): Move this to InferShape
framework::LoD lod(1);
lod[0].reserve(batch_size + 1);
int offset = 0;
lod[0].push_back(offset);
for (int i = 0; i < batch_size; ++i) {
offset += output_height[i] * output_width[i];
lod[0].push_back(offset);
}
out->set_lod(lod);
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,你batch_size=1走的是train的分支,但是你是把train分支的代码copy了过去,能不能换种方式?

out->mutable_data<T>(ctx.GetPlace());
int output_height = Im2SeqOutputSize(img_height,
kernels[0],
paddings[0],
paddings[2],
strides[0]);
int output_width = Im2SeqOutputSize(img_width,
kernels[1],
paddings[1],
paddings[3],
strides[1]);

const std::vector<int> dilations({1, 1});
auto out_dims = out->dims();
out->Resize({batch_size, out->numel() / batch_size});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个resize也有问题,在这之前并没有计算正确的out_dims, 这个和batch_size=1那里的resize问题一样。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方,我觉得没有问题的,使用mutable_data 是由于batchsize 大于1的时候,输出的维度是不一样的,在batchsize = 1 和train的逻辑中,是不存在这个过程的,out->dims()直接获取到out的dim,这个dims应该是不用自己计算的

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在batch_size=1和train的时候,out_dims也是要自己算的。

out->dims()直接获取到out的dim

out->dims()直接获取正确dimsd 前提是在infershape里计算正确的out_dims, 也就是这里:
是在infershape里计算的,https://github.com/PaddlePaddle/Paddle/pull/10923/files#diff-6134b780f1c15f85baabeb43daf9a8cdR50 但是,现在这里计算的out_dims可能是不对的,你可以print log验证一下。

for (int i = 0; i < batch_size; i++) {
const Tensor src =
in->Slice(i, i + 1).Resize({img_channels,
img_height,
img_width});
Tensor dst = out->Slice(i, i + 1).Resize({output_height,
output_width,
img_channels,
kernels[0],
kernels[1]});

math::Im2ColFunctor<math::ColFormat::kOCF, DeviceContext, T> f;
auto& dev_ctx = ctx.template device_context<DeviceContext>();
f(dev_ctx, src, dilations, strides, paddings, &dst);
}
out->Resize(out_dims);
// set lod information
// TODO(wanghaoshuang): Move this to InferShape
framework::LoD lod(1);
lod[0].reserve(batch_size + 1);
int offset = 0;
lod[0].push_back(offset);
for (int i = 0; i < batch_size; ++i) {
offset += output_height * output_width;
lod[0].push_back(offset);
}
out->set_lod(lod);
}
}
};

template <typename DeviceContext, typename T>
Expand Down
Loading