Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMP] fix static promote #53439

Merged
merged 7 commits into from May 8, 2023
Merged

[AMP] fix static promote #53439

merged 7 commits into from May 8, 2023

Conversation

zhangting2020
Copy link
Contributor

@zhangting2020 zhangting2020 commented Apr 28, 2023

PR types

Bug fixes

PR changes

Others

Description

fix static promote

将因性能有问题而放入unsupprot_list中的算子放入黑名单中,以保证在O2模式下,只有3种场景权重会保持fp32:

  • 算子不支持fp16
  • 不在fp16-guard下
  • 特殊算子bn等需要保持fp32权重

一些模型中可能存在某些算子权重被后续在白名单中的算子使用,权重的名字同时在keep_fp32_var_names和to_fp16_var_names中,可能会导致权重var.dtype和存储的数据的dtype不同。解决方案:如果var在keep_fp32_var_names中,那么将从to_fp16_var_names移除

该场景在transformer模型中存在,修复以下报错问题
image

@paddle-bot
Copy link

paddle-bot bot commented Apr 28, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot
Copy link

paddle-bot bot commented Apr 28, 2023

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@@ -196,8 +225,6 @@ def _update_list(self):
elif op_name in self.gray_list:
self.gray_list.remove(op_name)
self.white_list.add(op_name)
if op_name in _extra_unsupported_list:
self.unsupported_list.remove(op_name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该文件235行,实际上不需要把自定义黑名单中的算子放入不支持列表中,因为已经加入了黑名单中
但是由于python/paddle/distributed/passes/auto_parallel_fp16.py 实现中用了AutoMixedPrecisionLists,但是其中部分流程的处理未考虑黑名单算子,因此如果235行删除,会导致相关单测失败。目前尚不清楚直接修改auto_parallel_fp16.py的实现有多大影响,因此暂时未删除。
image

# from black_list and unsupport_list.
if op in ['lookup_table', 'lookup_table_v2']:
continue
if _need_keep_fp32(op, amp_lists.unsupported_list, use_fp16_guard):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了解决transformer模型中出现的报错问题,同时为了动静态图在模型参数类型转换时行为统一:黑名单中的算子依然会保持fp16权重,仅不支持fp16的算子或者不在use_fp16_guard下的算子需要保持fp32权重。

@Xreki
Copy link
Contributor

Xreki commented May 6, 2023

PR最好merge下develop

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xreki Xreki merged commit 2bf6128 into PaddlePaddle:develop May 8, 2023
24 checks passed
niuliling123 pushed a commit to niuliling123/Paddle that referenced this pull request May 9, 2023
lanxianghit pushed a commit that referenced this pull request May 9, 2023
fix static promote
将因性能有问题而放入unsupprot_list中的算子放入黑名单中,以保证在O2模式下,只有3种场景权重会保持fp32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants