Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FC + elementwise_add (Residual connection) #40834

Closed
wants to merge 50 commits into from

Conversation

Silv3S
Copy link
Member

@Silv3S Silv3S commented Mar 22, 2022

PR types

Performance optimization

PR changes

OPs

Describe

  • Fuse FC with elementwise_add so residual data is added inside FC operator,
  • add unit tests with multiple cases for this fuse pass,

Benchmarks

Test were conducted with oneDNN 2.5.4 , fc_mkldnn_pass and fc_act_mkldnn_pass were enabled:

  • VIT-OCR is performing 4% faster (12 fc+residual fusions),
  • BERT is performing 8% faster (24 fc+residual fusions).

paddle/fluid/operators/fc_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/fc_op.cc Outdated Show resolved Hide resolved
paddle/fluid/framework/ir/graph_traits.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc Outdated Show resolved Hide resolved
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc Outdated Show resolved Hide resolved
tsocha
tsocha previously approved these changes Apr 5, 2022
jczaja
jczaja previously approved these changes Apr 5, 2022
Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jczaja
Copy link
Contributor

jczaja commented Apr 5, 2022

@zhiqiu Could you please review changes at fc_mkldnn_op.cc related to ShareDataWith ? Changes are modeled after conv_mkldnn_op.cc where residual connections part is also using ShareDataWith(). We tried to use TensorCopy but there was significant performance degradation, so FC and CONV both are using ShareDataWith. We need your approval to pass CI. Feel free to ask questions

@jczaja jczaja requested a review from baoachun April 5, 2022 11:54
@jczaja
Copy link
Contributor

jczaja commented Apr 5, 2022

@baoachun Please review

@baoachun
Copy link
Contributor

baoachun commented Apr 7, 2022

Hi @Silv3S, well done!Could you please attach performance and accuracy comparison data for this pass?

@lidanqing-intel
Copy link
Contributor

lidanqing-intel commented Apr 7, 2022

@baoachun After @Silv3S provide performance, could you please help with PR-CI-APPROVAL? Thanks !
From the PR description, I can see with this fuse pass VIT-OCR is performing 4% faster.

@zhiqiu
Copy link
Contributor

zhiqiu commented Apr 7, 2022

@zhiqiu Could you please review changes at fc_mkldnn_op.cc related to ShareDataWith ? Changes are modeled after conv_mkldnn_op.cc where residual connections part is also using ShareDataWith(). We tried to use TensorCopy but there was significant performance degradation, so FC and CONV both are using ShareDataWith. We need your approval to pass CI. Feel free to ask questions

Hi, it is not suggested to ShareData of the input and output of an operator, since it makes the op be an implicit in-place indeed. So is it possible to make residual the same varariable as Output, just like Param and ParamOut in sgd operator.

@Silv3S Silv3S dismissed stale reviews from jczaja and tsocha via cbe267f April 8, 2022 09:04
@Silv3S
Copy link
Member Author

Silv3S commented Apr 8, 2022

Hi @Silv3S, well done!Could you please attach performance and accuracy comparison data for this pass?

Hi @baoachun, thank you! I updated PR description to provide performance comparison for affected models. I also prepared test which compares results between NativeConfig and AnalysisConfig, but I will need your help with uploading needed data.txt file to vit-ocr.tgz repository.

@Silv3S
Copy link
Member Author

Silv3S commented Apr 8, 2022

Hi, it is not suggested to ShareData of the input and output of an operator, since it makes the op be an implicit in-place indeed. So is it possible to make residual the same varariable as Output, just like Param and ParamOut in sgd operator.

Thank you for review @zhiqiu. Since we don't modify residual_param inside FC_op, I set it explicitly as output. Now ShareDataWith is called on two outputs of operator. Please advise if this approach is good.

zhiqiu
zhiqiu previously approved these changes Apr 11, 2022
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhiqiu
Copy link
Contributor

zhiqiu commented Apr 11, 2022

Hi, it is not suggested to ShareData of the input and output of an operator, since it makes the op be an implicit in-place indeed. So is it possible to make residual the same varariable as Output, just like Param and ParamOut in sgd operator.

Thank you for review @zhiqiu. Since we don't modify residual_param inside FC_op, I set it explicitly as output. Now ShareDataWith is called on two outputs of operator. Please advise if this approach is good.

Thanks, LGTM

@Silv3S
Copy link
Member Author

Silv3S commented Apr 11, 2022

Hi @XieYunshen, could you please approve setting timeout for new unit test?

@Silv3S
Copy link
Member Author

Silv3S commented Apr 13, 2022

Continued in #41776 due to CI bug (non refreshing checks). Please approve second PR (code didn't change)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants