Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4 No.49】:为 Paddle bce_loss 支持 float16 数据类型 #50930

Merged
merged 36 commits into from Apr 17, 2023

Conversation

thunder95
Copy link
Contributor

@thunder95 thunder95 commented Feb 26, 2023

PR types

Performance optimization

PR changes

OPs

Describe

为bce_loss 新增float16 数据类型

测试设备:RTX 2070s

目前bce_loss前向和反向推理性能测试:

Case No. input_shape fp32(ms) fp16(ms) diff relative diff
1 [16, 3, 64, 64, 1] 0.024328 0.0209348 0.003393 faster than 16.21%

中文API文档更新支持fp16数据类型: PaddlePaddle/docs#5704

@paddle-bot
Copy link

paddle-bot bot commented Feb 26, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Feb 26, 2023
MT x_mt = static_cast<MT>(x);
MT term1 = max((static_cast<MT>(one) - x_mt) * x_mt, static_cast<MT>(eps));
return static_cast<T>(static_cast<MT>(dout) *
(x_mt - static_cast<MT>(label)) / term1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eps的问题,36行,1e-12在fp16表示下会下溢出为0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已做调整,不知道是否可以这样写。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以简化一下代码?one和eps作为成员变量,初始化为MT类型。原来的构造函数可以删掉了

@@ -279,6 +280,48 @@ def init_test_cast(self):
self.shape = [2, 3, 20]


@unittest.skipIf(
not core.is_compiled_with_cuda(), "core is not compiled with CUDA"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不需要添加了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已移除


class TestBceLossOpFP16Case1(OpTest):
def init_test_cast(self):
self.shape = [20, 30, 40, 50]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该继承TestBceLossOpFP16,以及拼写错误cast->case。下面的case也一样

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

place = core.CUDAPlace(0)
if core.is_float16_supported(place):
self.check_grad_with_place(
place, ['X'], 'Out', max_relative_error=0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外整个单测的写法需要参考低精度算子单测规范https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/amp_precision/amp_test_dev_guide_cn.html

  • 可以继承TestBceLossOp,并对其做简单修改,简化代码
  • 反向的相对误差是否合理?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangting2020 请老师指导下, 不清楚什么地方写的有问题,反向的相对误差始终偏大
AssertionError: 0.42 not less than or equal to 0.001
AssertionError: 0.81 not less than or equal to 0.001
AssertionError: 0.81 not less than or equal to 0.001

@@ -68,7 +68,7 @@ class BCEWithLogitsLoss(Layer):
Args:
weight (Tensor, optional): A manual rescaling weight given to the loss of each
batch element. If given, it has to be a 1D Tensor whose size is `[N, ]`,
The data type is float32, float64. Default is ``'None'``.
The data type is float16, float32, float64. Default is ``'None'``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个API对应的是bce_loss吗?

另外,实现为class的API,通常实现中可能会调用functional下面的API,具体需要查看代码

  • 需要对2个api的文档同步修改
  • 需要对静态图分支的类型检查做修改
  • 需要添加一个静态图的fp16单测,继承unittest,调用api即可。参考#51168中的静态图单测

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。

MT x_mt = static_cast<MT>(x);
MT term1 = max((static_cast<MT>(one) - x_mt) * x_mt, static_cast<MT>(eps));
return static_cast<T>(static_cast<MT>(dout) *
(x_mt - static_cast<MT>(label)) / term1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以简化一下代码?one和eps作为成员变量,初始化为MT类型。原来的构造函数可以删掉了

static_cast<MT>(neg_100));
return static_cast<T>(
((static_cast<MT>(label) - static_cast<MT>(one)) * term2) -
(static_cast<MT>(label) * term1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里和上面也是类似的问题,我觉得可以修改下原始的实现。one和neg_100本来是成员变量,可以初始化就为MT 类型。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改


class TestBceLossOpFP16Case2(TestBceLossOpFP16):
def init_test_case(self):
self.shape = [2, 3, 20]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上述单测可以再简化一下,TestBceLossOpFP16继承了TestBceLossOp,可以对TestBceLossOp做一些调整,比如初始化case的时候能够设置dtype,shape。这样可以去掉很多冗余的代码。

max_relative_error为什么会这么大?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

暂时为了测试ci, 反向的相对误差很大,一直没找到原因
AssertionError: 0.42 not less than or equal to 0.001
AssertionError: 0.81 not less than or equal to 0.001
AssertionError: 0.81 not less than or equal to 0.001

feed={'x': x_data, 'y': y_data}, fetch_list=[out]
)[0]
np.testing.assert_allclose(
output_pd, output_np, rtol=1e-3, atol=1e-3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atol设置为0能通过吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangting2020 这里是没问题的, atol=1e-3也能通过。


self.inputs = {'X': input_np, 'Label': label_np}
self.outputs = {'Out': output_np}

def test_check_output(self):
self.check_output()
self.check_output(check_eager=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangting2020 请问老师 check_eager的含义和用途是什么

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是框架升级过程中单测系统为了测试动态图加入的一个参数,不影响测试效果。

你需要看一下反向的计算精度问题,单测失败提示精度检查无法通过。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangting2020 检查了很久不知道哪里出问题了。目前看来是计算numeric_grads的时候跟预期相差较大, 例如在进行mean计算时 np.array([85.02881]).astype(np.float16) => 85.0, 导致pos和neg虽在float下有差异,但是float16二者取值都是85.0, 所以计算结果得到的梯度是0。 如果计算mean时,将输入设置成float32,就会得到梯度的值,且误差从0.42缩小到0.05. 希望老师进一步指导意见。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

计算numeric_grads的时候跟预期相差较大,这个是指单测框架的实现中的哪一部分,能否贴一下链接,有可能是单测框架上造成的理论梯度值有精度损失,我确认下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangting2020 老师您好, 我merge了最新的代码发现之前的op_test.py这个文件没有了,之前是在那个文件里面打印的相关输出进行数值对比的, 我把atol和max_relative_error都去掉之后,居然通过检查了,就好像没检查一样。请问这部分后来是做了大的调整吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1)应该是单测的一些文件目录做了调整,现在是在这个文件里了。python/paddle/fluid/tests/unittests/eager_op_test.py

(2)根据你描述的现象我比较担心可能会出现随机挂的问题。建议先在自己的开发环境上,把单测的shape调整几组,并且尝试重复运行单测比如100次:ctest -R test_bce_loss --repeat-until-fail 100 ,如果这样能通过测试,那应该就没问题了。用ctest执行单测,需要编译的时候开启DWITH_TESTING,比如

cmake .. -DPY_VERSION=3.7 -DWITH_GPU=ON -DWITH_TESTING=ON -DCMAKE_BUILD_TYPE=Release -DWITH_DISTRIBUTE=OFF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangting2020 运行命令ctest -R test_bce_loss --repeat-until-fail 100, 分别测试了[10, 10], [100, 100], [5000, 5000], [20, 30, 40, 50], 结果全部通过

@luotao1
Copy link
Contributor

luotao1 commented Apr 14, 2023

image

@thunder95 需要修复下ROCM流水线,看了下历史记录都是挂的

@thunder95
Copy link
Contributor Author

@luotao1 谢谢, 才发现这里有个问题, 看日志的时候没拉完

Copy link
Contributor

@zhangting2020 zhangting2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 44e6de9 into PaddlePaddle:develop Apr 17, 2023
24 checks passed
jjyaoao pushed a commit to jjyaoao/Paddle that referenced this pull request Apr 19, 2023
…addlePaddle#50930)

* untracked files

* bce_loss_fp16

* remove unused files

* back max_rel_erro still big

* simplify code

* upd

* fix max_relative_error

* restart ci

* Update test_bce_loss.py

* Update test_bce_loss.py

* Update test_bce_loss.py

* Update test_bce_loss.py

* try to pass test

* restore file

* remove error value

* fix bug

---------

Co-authored-by: Zhang Ting <Douyaer2020@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants