Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fc converter #11043

Merged
merged 9 commits into from
Jun 1, 2018
Merged

Conversation

Superjomn
Copy link
Contributor

No description provided.

@Superjomn Superjomn requested a review from luotao1 June 1, 2018 00:26
framework::OpDesc op_desc(op, nullptr, nullptr);
PADDLE_ENFORCE_EQ(op_desc.Input("X").size(), 1);
PADDLE_ENFORCE_EQ(op_desc.Input("Y").size(), 1); // Y is a weight
PADDLE_ENFORCE_EQ(op_desc.Output("Out").size(), 1); // Y is a weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

61行的注释错了。

for (int h = 0; h < shape.h(); ++h) {
for (int w = 0; w < shape.w(); ++w) {
odata[h * ostrides.h() + w * ostrides.w()] =
idata[h * ostrides.h() + w * ostrides.w()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从函数实现看,odata[i] = idata[i],所以还需要转么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从 TF 里拷贝的,试了下,很奇怪,这个函数好像必须要用。还有一个 Reorder4的函数,后面用到的时候我再仔细看下。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面写Reorder4的时候,可以这里再优化下:

for (int h = 0; h < shape.h(); ++h) {
  for (int w = 0; w < shape.w(); ++w) {
      int index = h * ostrides.h() + w * ostrides.w();
      odata[index] = idata[index];

namespace inference {
namespace tensorrt {

template <typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorder2和ReorderCKtoKC和函数功能和参数含义,请加一下注释。

PADDLE_ENFORCE_NOT_NULL(Y_v);
auto* Y_t = Y_v->GetMutable<framework::LoDTensor>();
// This may trigger a CPU->GPU copy.
// TODO(Superjomn) use some smarter mutable_data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y_v从scope里读取的话,可以一开始就在gpu环境下,这里就不用拷贝了。

Copy link
Contributor Author

@Superjomn Superjomn Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更新了 comment

void DeclParamVar(const std::string& name, const nvinfer1::Dims& dims) {
DeclVar(name, dims);
}

void DeclOutputVar(const std::string& name, const nvinfer1::Dims& dims) {
DeclVar(name, dims);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeclParamVar和DeclOutputVar是一模一样的,需要封两个函数么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用的时候的确需要不同语义,这里区分下 program 和 output, 不然代码里两处都是 DeclVar,看不出区别


ASSERT_FALSE(op_desc_->OutputArgumentNames().empty());
for (const auto& output : op_desc_->OutputArgumentNames()) {
std::vector<float> fluid_out;
std::vector<float> trt_out(200);
std::vector<float> trt_out(200, 2008.);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请注释说下200和2008的含义。

@@ -31,8 +31,10 @@ void paddle::operators::TensorRTEngineKernel<DeviceContext, T>::Prepare(
auto max_workspace = context.Attr<int>("max_workspace");
engine_.reset(new inference::tensorrt::TensorRTEngine(
max_batch_, max_workspace, nullptr));
// TODO(Superjomn) parameters should be passed be analysised and passed from
// outside.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34-35行注释请更新下,有两个be?

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

for (int h = 0; h < shape.h(); ++h) {
for (int w = 0; w < shape.w(); ++w) {
odata[h * ostrides.h() + w * ostrides.w()] =
idata[h * ostrides.h() + w * ostrides.w()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面写Reorder4的时候,可以这里再优化下:

for (int h = 0; h < shape.h(); ++h) {
  for (int w = 0; w < shape.w(); ++w) {
      int index = h * ostrides.h() + w * ostrides.w();
      odata[index] = idata[index];

@Superjomn Superjomn merged commit 0c0c5df into PaddlePaddle:develop Jun 1, 2018
@Superjomn Superjomn deleted the feature/fc_converter branch June 1, 2018 07:39
@Superjomn Superjomn mentioned this pull request Jun 1, 2018
@luotao1 luotao1 mentioned this pull request Jun 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants