Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fc-residual quantization #46917

Merged
merged 20 commits into from
Nov 21, 2022
Merged

Conversation

sfraczek
Copy link
Contributor

PR types

Performance optimization

PR changes

Others

Describe

Add quantization of FC+residual fused op

@paddle-bot
Copy link

paddle-bot bot commented Oct 11, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Comment on lines 430 to 448
bool residual_fc = false;

std::string last_op_input_name =
FindInputNameByVarName(last_op_op, quant_out->Name());
if (last_op_input_name.empty()) {
last_op_input_name =
FindOutputNameByVarName(last_op_op, quant_out->Name());
PADDLE_ENFORCE_EQ(last_op_input_name,
"ResidualData",
platform::errors::InvalidArgument(
"Only the ResidualData output is allowed to be "
"linked to earlier op."));
PADDLE_ENFORCE_EQ(last_op_op->Type(),
"fc",
platform::errors::InvalidArgument(
"Only fc operator supports ResidualData output "
"linked as input."));
residual_fc = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems a little bit odd for me.
Does residual connection must have an empty name?
Are there any other scenarios where the name can be empty?
What if op will have empty name in graph, will this pass throw an exception?

Copy link
Contributor Author

@sfraczek sfraczek Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's empty because FindInputNameByVarName returns empty string if it's not found. In that case we search for name in the outputs instead of inputs because fc is special case when residualData is output instead of input.
If this if is entered and those two checks pass, then we know we are dealing with fc with residual data (in output). And so we set residual_fc= true which later informs decisions whether we should call SetInput or SetOutput

Comment on lines 88 to 97
bool residual_fc = input_name == "ResidualData" && op->Op()->Type() == "fc";
auto inputs = residual_fc ? op->Op()->OutputNames() : op->Op()->InputNames();
bool name_found =
std::find(inputs.begin(), inputs.end(), input_name) != inputs.end();
PADDLE_ENFORCE_EQ(name_found,
true,
platform::errors::InvalidArgument(
"Var(%s) isn't the input of the %s operator.",
"Var(%s) isn't the %s of the %s operator.",
input_name,
residual_fc ? "output" : "input",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's stinky for me, what if we will add more ops with residual connection in the future ?

Copy link
Contributor Author

@sfraczek sfraczek Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what we will agree on with baidu. Previously it was easy because we assumed that ResidualData will be input because on the graph it's linked as input and in OpDesc it was set as input. But recently when FC with residual was added, we were instructed to use Output in OpDesc but it's impossible to set it as output in graph so it's still linked as input. From this comes this confusion in code both in graph pattern detector I had to write custom asserts and here I had to do some hacks that I don't have a good idea for design yet especially that it might be subject to change based on what baidu will say about this. I don't know if there is a great way to make this clear because it is inherently confusing so let's say it's a POC.

@@ -32,7 +32,7 @@ class Graph;
namespace {
void LogEnabledOps(const int counter, const std::string& details) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is redefined in each pass. Some log only if count is > 0, others log all. Do you think we could extract this fuse pass logger to mkldnn_reuse.h and only call it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw one in cpu_quantize_pass.cc from which I copied and modified it but this is only two examples so please give me a third as per the rule of three :D
https://understandlegacycode.com/blog/refactoring-rule-of-three/#use-the-rule-of-three

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we have multiple fuse pass counters with logs e.g. elt_act_mkldnn_fuse_pass.cc:102, fc_elementwise_add_mkldnn_fuse_pass.cc:119, matmul_transpose_reshape_mkldnn_fuse_pass.cc:108 etc.

I think that same code is repeated each time, with only hardcoded op name being changed. It is not particularly related to your PR, I just wanted to gather opinions if you find it good idea as future improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok there is some repetition even in non mkldnn passes so if we were to add such method in future we could put it in FusePassBase or some other common parent class.

@sfraczek
Copy link
Contributor Author

sfraczek commented Oct 12, 2022

Hi @zhiqiu,
This PR #41776 has created some additional confusion with regards to writing asserts for residual connection in graph_pattern_detector.cc https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/ir/graph_pattern_detector.cc#L1081-L1094. Also in quantization (this PR) now we have to look for residual data not just in input but also in output. Can you please share your thoughts on how we should proceed with residual connections?

tsocha
tsocha previously approved these changes Oct 13, 2022
jakpiase
jakpiase previously approved these changes Oct 13, 2022
Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but this logic with residualdata as an output looks weird to me

Silv3S
Silv3S previously approved these changes Oct 13, 2022
Copy link
Member

@Silv3S Silv3S left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

is_residual_unsigned,
"Scale_in_eltwise");
} else {
if (!AreScalesPresentForNodes({input, weights})) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check if scales are present for input and weights first, because it's common for both conditions. Then if with_residual_data check only if scales are present for node residual_data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

call twice AreScalesPresntForNodes instead of if-else
@sfraczek sfraczek dismissed stale reviews from Silv3S, jakpiase, and tsocha via d25018d October 13, 2022 14:55
@zhiqiu
Copy link
Contributor

zhiqiu commented Oct 14, 2022

Hi @zhiqiu, This PR #41776 has created some additional confusion with regards to writing asserts for residual connection in graph_pattern_detector.cc https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/ir/graph_pattern_detector.cc#L1081-L1094. Also in quantization (this PR) now we have to look for residual data not just in input but also in output. Can you please share your thoughts on how we should proceed with residual connections?

Hi, I don't really know the background of this work. But I think it is not a good design that ResidualData can be both input and output. Can you explain more about your coments above, i.e.,instructed to use Output in OpDesc?

@sfraczek
Copy link
Contributor Author

Hi, I don't really know the background of this work. But I think it is not a good design that ResidualData can be both input and output. Can you explain more about your coments above, i.e.,instructed to use Output in OpDesc?

I was referring to your suggestion that ResidualData should be made output instead of input #40834 (comment). This PR was closed and reopened as new one PR #41776 and merged. Previously, we had already been using ResidualData as input for convolution and now we have ResidualData for FC as output. In Paddle we have quantization code for convolution op with ResidualData as input. Here I am adding quantization of fc with ResidualData as Output.
As you say, it seems wrong to have both input and output so we want to unify it.
We can

  • change CONV's ResidualData to Output,
  • change FC's ResidualData to Input.

To me it seems that Input was better for the following reasons:

  1. In graph_pattern_detector.cc it was easy to write a pattern using existing assert functions but for residual_data as output the pattern got this complicated because it checks if it is input and output at the same time so I added a comment https://github.com/PaddlePaddle/Paddle/pull/46757/files#diff-3363ff47a1f22a111386eeff77a8572f6618d7d0602ffe71e92ecf8682faa84eR1081-R1094,
  2. Netron shows an arrow for ResidualData connection only when it is input but when it's output it doesn't show it so at the first glance there is no connection visible when analyzing graph of model and it requires clicking on a node to see if it has residualData output. Using netron to inspect models is what we are used to do routinely to look for optimizations.

It is logically a little confusing to me that we have to SetOutput and ir_node_link_to residual_data -> fc at the same time https://github.com/PaddlePaddle/Paddle/pull/41776/files#diff-9bebe579d9782bcfbf4b11872065b743461e89a36c3924972e98ee7df8824d16R102-R109

Beside the above a small extra argument for me is that In quantization code (cpu_quantize_pass.cc) we will not be able to reuse the QuantizeInput method like it is used for all inputs but we will have to add another method that will be doing mostly the same thing but replace input with output - perhaps called QuantizeResidualData.

@paddle-bot-old paddle-bot-old bot added contributor External developers and removed contributor External developers labels Oct 17, 2022
@zhiqiu
Copy link
Contributor

zhiqiu commented Oct 19, 2022

Hi, I don't really know the background of this work. But I think it is not a good design that ResidualData can be both input and output. Can you explain more about your coments above, i.e.,instructed to use Output in OpDesc?

I was referring to your suggestion that ResidualData should be made output instead of input #40834 (comment). This PR was closed and reopened as new one PR #41776 and merged. Previously, we had already been using ResidualData as input for convolution and now we have ResidualData for FC as output. In Paddle we have quantization code for convolution op with ResidualData as input. Here I am adding quantization of fc with ResidualData as Output. As you say, it seems wrong to have both input and output so we want to unify it. We can

  • change CONV's ResidualData to Output,
  • change FC's ResidualData to Input.

To me it seems that Input was better for the following reasons:

  1. In graph_pattern_detector.cc it was easy to write a pattern using existing assert functions but for residual_data as output the pattern got this complicated because it checks if it is input and output at the same time so I added a comment https://github.com/PaddlePaddle/Paddle/pull/46757/files#diff-3363ff47a1f22a111386eeff77a8572f6618d7d0602ffe71e92ecf8682faa84eR1081-R1094,
  2. Netron shows an arrow for ResidualData connection only when it is input but when it's output it doesn't show it so at the first glance there is no connection visible when analyzing graph of model and it requires clicking on a node to see if it has residualData output. Using netron to inspect models is what we are used to do routinely to look for optimizations.

It is logically a little confusing to me that we have to SetOutput and ir_node_link_to residual_data -> fc at the same time https://github.com/PaddlePaddle/Paddle/pull/41776/files#diff-9bebe579d9782bcfbf4b11872065b743461e89a36c3924972e98ee7df8824d16R102-R109

Beside the above a small extra argument for me is that In quantization code (cpu_quantize_pass.cc) we will not be able to reuse the QuantizeInput method like it is used for all inputs but we will have to add another method that will be doing mostly the same thing but replace input with output - perhaps called QuantizeResidualData.

Thanks, I got your views. If Input s already used and it is BETTER, I agree that you can change output to input.

@sfraczek
Copy link
Contributor Author

Something is broken about the CI.

@yeliang2258
Copy link
Contributor

@sfraczek Please resolve the conflict of the code, thank you

@yeliang2258
Copy link
Contributor

@sfraczek Please resolve the conflict of the code, thanks

@sfraczek

This comment was marked as outdated.

@sfraczek
Copy link
Contributor Author

sfraczek commented Nov 15, 2022

@chenwhql Please approve because the test that failed is outdated. It uses old Quant2Int8MkldnnPass which was replaced by C++ passes. We are planning to rewrite the test in Q1.

@sfraczek sfraczek requested a review from Silv3S November 15, 2022 16:44
@chenwhql
Copy link
Contributor

@chenwhql Please approve because the test that failed is outdated. It uses old Quant2Int8MkldnnPass which was replaced by C++ passes. We are planning to rewrite the test in Q1.

@sfraczek I don't understand, approving cannot resolve the test failed problem

revert changes to unsupported script
Copy link
Contributor

@yeliang2258 yeliang2258 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jczaja jczaja merged commit fed0ed3 into PaddlePaddle:develop Nov 21, 2022
@paddle-bot
Copy link

paddle-bot bot commented Nov 21, 2022

你的PR已合入Paddle库,请关注后续测试结果。
Your PR has been merged into the repository. An official integration test will be conducted later. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers int8 Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants