Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed graph of ernie when ran with benchmark application #21492

Closed
Sand3r- opened this issue Dec 2, 2019 · 5 comments
Closed

Malformed graph of ernie when ran with benchmark application #21492

Sand3r- opened this issue Dec 2, 2019 · 5 comments
Assignees

Comments

@Sand3r-
Copy link
Contributor

Sand3r- commented Dec 2, 2019

Current behaviour

The error has been discovered thanks to level 3 logging enabled by GLOG_v environmental variable.
GLOG has reported, that:

Some operators use the same variables for reading/writing output. For example, when the fp32_model has been ran, one could observe that scale_op as well transpose2_op accept transpose_4.tmp_0 as their input to the operator (while according to the original graph they do not)
Subpart of a GLOG error documenting this:

operator.cc:172 CPUPlace Op(scale), inputs:{X[transpose_4.tmp_0:float[1, 12, 128, 64]({})]}, outputs:{Out[scale_12.tmp_0:float[1, 12, 128, 64]({})]}.
(...several ops later...)
operator.cc:172 CPUPlace Op(transpose2), inputs:{X[transpose_4.tmp_0:float[1, 12, 128, 64]({})]}, outputs:{Out[fc_66.tmp_0:float[1, 128, 12, 64]({})], XShape[transpose_47.tmp_1:[0, 1, 12, 128, 64]({})]}.

As far as I understand that, this is a bug, since variable names should be unique (as long as they are enclosed in the same scope).

To illustrate the problem, please see the following figure depicting a different model (ernie_quant) which suffers from the same problem:
image

This is a blocking issue for INT8 Ernie quantization task, since our quantization system associates scales with variable names. And if the variable repeats in several places, we have end up with the same scales where we didn't mean to.

Reproduction

  • based on 8da0cd5
    -CPU: including MKLDNN version v.20
    -OS Platform Ubuntu 16.04
    -Cmake orders -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_GPU=OFF -DON_INFER=ON -DWITH_MKLDNN=ON -DWITH_TESTING=ON -DWITH_PROFILER=ON -DWITH_STYLE_CHECK=OFF -DWITH_INFERENCE_API_TEST=ON
    -API information
    To Reproduce
  1. Build paddle
  2. Build benchmark Inference application for ernie https://github.com/PaddlePaddle/benchmark/tree/master/Inference/c%2B%2B/ernie
    Run any 4-input ernie model.

@luotao1 Could you please assign someone to help solving this issue?

@bingyanghuang
Copy link
Contributor

图中variable混乱问题:
用benchmark repo跑ernie的时候无论是fp32模型还是int模型,都会出现variable连接到错误的op上的情况,这个可能会影响ernie 精度的输出。如下图所示,左图为正确的输出,右图为apply pass前保存下来的图:
image
混乱的variable如下图所示:
image

@wojtuss
Copy link

wojtuss commented Dec 11, 2019

The graph is malformed when the memory_optimize_pass is enabled (e.g. via EnableMemoryOptim() method). With the pass being disabled, the graph of the model looks fine.

@luotao1
Copy link
Contributor

luotao1 commented Dec 13, 2019

related #21598, how about disabling this pass when using MKLDNN?

@Sand3r-
Copy link
Contributor Author

Sand3r- commented Dec 18, 2019

related #21598, how about disabling this pass when using MKLDNN?

We can surely do that. I've opened up a PR implementing this: #21826

@luotao1
Copy link
Contributor

luotao1 commented Dec 20, 2019

#21826 disable memory optimization pass when mkldnn is on

@luotao1 luotao1 closed this as completed Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants