Support reduce_sum_op float16 #32966

thisjiang · 2021-05-18T10:49:41Z

PR types

New features

PR changes

OPs

Describe

起因

混合精度训练pure_fp16训练不支持grad clip的优化器，其中一个问题在于reduce_sum_op遇到paddle::platform::float16类型编译会报错。

问题排查

从编译日志分析，编译出错在cub::Reduce处，也就是说cub::Reduce不支持paddle::platform::float16类型。

再进一步分析，编译报错原因是paddle::platform::float16 to float转换报错，仅实验float(float16_num)和static_cast<float>(float16_num)都不会报错，但float num = float16_num会报错，且报错信息一致，因此可以认为cub::Reduce报错的原因也是因为内部实现有将float16类型数据直接赋值=给了float数据。

解决办法

方案1

既然是类型转换出错，那么一定是输入输出类型不同，即输入为paddle::platform::float16输出为float。分析代码，在TensorReduceFunctor::apply处，输入类型为Tx，TransformOp读入数据并处理后返回也是Tx，但输出类型为Ty，因此当Ty不等于Tx时就存在类型转换。一个简单的办法就是给TransformOp套层转换，使得输入Tx，输出Ty：

template <typename Tx, typename Ty, typename TransformOp>
struct ConversionFunctor {
  const TransformOp& transformer;
  HOSTDEVICE explicit inline ConversionFunctor(const TransformOp& transformer)
        : transformer(transformer) {} 
  HOSTDEVICE inline Ty operator()(const Tx& x) const {
    return static_cast<Ty>(transformer(x));
  }
};

方案2

方案1当然可行，且不用修改其它代码，十分方便。但问题是输入输出都是float16时存在较大的精度误差，因此最好的办法是当输入为float16时使用精度较高的float类型进行累加。反应在代码里，即增加MPType用于中间计算类型，当输入为float16时设为float，否则不变。

当然，由于cub::Reduce内部的实现我们无法修改，且cub::Reduce也并没有提供类似MPType这种参数，因此我们只能自己实现一个ReduceKernel1D，ReduceKernel1D内部调用cub::BlockReduce进行计算，但计算输入为MPType。同时为保证一致性，给所有其它自写ReduceKernel、ReduceKernel2D添加MPType以确保float16下的计算精度。

修改为commit 5596a88。

存在的问题

自写kernel需要从定义上就修改TransformOp，即允许operator()接受其它类型的输入，由于许多op都用到了TensorReduce函数，因此需要修改所有这些opTensorReduce对应的TransformOp::operator()，改动范围非常大。
自写kernel性能很难比成熟的cub::Reduce还要快，因此相比cub::Reduce可能会有性能损失。
除了自写的ReduceKernel1D外还需要改动已有的ReduceKernel2D、ReduceKernel，这部分改动量也比较大。

有没有什么办法可以避免以上三个问题呢？比如TensorReduceFunctor::apply处特例化输入输出都是float16时先强制TensorReduce输出为float类型，然后cast为float16类型？

paddle-bot-old · 2021-05-18T10:49:44Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wzzju · 2021-05-21T07:37:15Z

paddle/fluid/operators/kron_op.h

@@ -301,7 +301,10 @@ template <typename T>
 struct IdentityFunctor {
  HOSTDEVICE explicit inline IdentityFunctor() {}

-  HOSTDEVICE inline T operator()(const T& x) const { return x; }
+  template <typename T2>


为什么使用T2?可以考虑其他字母？T还有什么用呢？

wzzju · 2021-05-21T07:37:55Z

paddle/fluid/operators/matmul_v2_op.h

@@ -38,7 +38,10 @@ template <typename T>
 struct IdentityFunctor {
  HOSTDEVICE explicit inline IdentityFunctor() {}

-  HOSTDEVICE inline T operator()(const T& x) const { return x; }
+  template <typename T2>


wzzju

LGTM.

Xreki

LGTM for op benchmark ci

Xreki · 2021-06-09T10:31:08Z

paddle/fluid/operators/kron_op.h

@@ -241,7 +241,10 @@ template <typename T>
 struct IdentityFunctor {
  HOSTDEVICE explicit inline IdentityFunctor() {}

-  HOSTDEVICE inline T operator()(const T& x) const { return x; }
+  template <typename U>


有个疑问：类定义里面的模板类型T是不是没用了？

对的，加上这个U是因为如果没有就只能接受float16的参数，在使用float来累加时编译就会报错。之所以还留着这个T是为了兼容性考虑，若去掉则需要改动所有调用IdentityFunctor的地方。

Xreki

LGTM for op benchmark ci

thisjiang added 2 commits May 18, 2021 08:35

add reduce_sum_op by add self-kernel

0d8bb9f

set all ReduceKernel MPType for accuracy

5596a88

wzzju reviewed May 21, 2021

View reviewed changes

thisjiang added 5 commits May 24, 2021 11:12

add float16 test script which input is integer number

6b10f60

solve reduce sum float16 check_grad problem

8eff911

Merge branch 'develop' into optimize-reduce_sum_op_fp16

c9c8c77

solve conflict and change test script for CI

badec26

change kernel register for CI

a791464

wzzju previously approved these changes Jun 9, 2021

View reviewed changes

Xreki previously approved these changes Jun 9, 2021

View reviewed changes

remove all useless template

89e4127

thisjiang dismissed stale reviews from Xreki and wzzju via 89e4127 June 10, 2021 11:52

Xreki approved these changes Jun 15, 2021

View reviewed changes

wzzju merged commit 606939d into PaddlePaddle:develop Jun 15, 2021

thisjiang deleted the optimize-reduce_sum_op_fp16 branch July 9, 2021 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reduce_sum_op float16 #32966

Support reduce_sum_op float16 #32966

thisjiang commented May 18, 2021 •

edited

Loading

paddle-bot-old bot commented May 18, 2021

wzzju May 21, 2021

wzzju May 21, 2021

wzzju left a comment

Xreki left a comment

Xreki Jun 9, 2021

thisjiang Jun 9, 2021

Xreki left a comment

Support reduce_sum_op float16 #32966

Support reduce_sum_op float16 #32966

Conversation

thisjiang commented May 18, 2021 • edited Loading

PR types

PR changes

Describe

起因

问题排查

解决办法

方案1

方案2

存在的问题

paddle-bot-old bot commented May 18, 2021

wzzju May 21, 2021

Choose a reason for hiding this comment

wzzju May 21, 2021

Choose a reason for hiding this comment

wzzju left a comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

Xreki Jun 9, 2021

Choose a reason for hiding this comment

thisjiang Jun 9, 2021

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

thisjiang commented May 18, 2021 •

edited

Loading