Optimize the calling logic of the cast and cast_grad kernels to reduce compilation size #57083

AnnaTrainingG · 2023-09-07T13:12:54Z

PR types

Others

PR changes

Others

Description

Pcard-70459
compile test 4.1
体积变化：

paddle-bot · 2023-09-07T13:12:58Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Xreki · 2023-09-07T14:51:00Z

paddle/phi/kernels/gpu/cast_grad_kernel.cu

-                       CastCUDAKernelImpl<T, data_t>(
-                           dev_ctx, out_grad, x_grad->dtype(), x_grad);
-                     }));
+  CastImpl<T, Context>(dev_ctx, out_grad, x_grad->dtype(), x_grad);


这样还是会编译2份吧？能直接调phi::CastKernel吗？

Xreki · 2023-09-12T05:54:11Z

paddle/phi/kernels/gpu/cast_impl.h

 }

+template <typename T, typename Context>
+void CastKernel(const Context& dev_ctx,


CastKernel实现不用挪到头文件中，在cast_kernel.cu里面实现就行，cast_grad_kernel.cu里面只需include cast_kernel.h

Xreki · 2023-09-13T12:06:41Z

paddle/phi/kernels/gpu/cast_grad_kernel.cu

-                       CastCUDAKernelImpl<T, data_t>(
-                           dev_ctx, out_grad, x_grad->dtype(), x_grad);
-                     }));
+  CastKernel<T, Context>(dev_ctx, out_grad, x_grad->dtype(), x_grad);


我觉得这里传x.dtype()比较合适。x_grad是输出，其实它的数据类型在Alloc之前是没有确定的，后面也会重置（cast_impl.h里面的第39行out->set_type(out_dtype);）。

Xreki

LGTM

…e compilation size (PaddlePaddle#57083)

Xreki reviewed Sep 7, 2023

View reviewed changes

Xreki reviewed Sep 12, 2023

View reviewed changes

AnnaTrainingG force-pushed the compile_4_1 branch 2 times, most recently from 3dc24d2 to e55c537 Compare September 13, 2023 05:36

Xreki reviewed Sep 13, 2023

View reviewed changes

update

6871280

AnnaTrainingG force-pushed the compile_4_1 branch from e55c537 to 6871280 Compare September 14, 2023 07:23

update

f0256b8

Xreki approved these changes Sep 15, 2023

View reviewed changes

AnnaTrainingG changed the title ~~compile test 4.1~~ Optimize the calling logic of the cast and cast_grad kernels to reduce compilation size Sep 15, 2023

AnnaTrainingG merged commit b9ee842 into PaddlePaddle:develop Sep 15, 2023
27 checks passed

danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023

Optimize the calling logic of the cast and cast_grad kernels to reduc…

358bfca

…e compilation size (PaddlePaddle#57083)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the calling logic of the cast and cast_grad kernels to reduce compilation size #57083

Optimize the calling logic of the cast and cast_grad kernels to reduce compilation size #57083

AnnaTrainingG commented Sep 7, 2023 •

edited

Loading

paddle-bot bot commented Sep 7, 2023

Xreki Sep 7, 2023

AnnaTrainingG Sep 13, 2023

Xreki Sep 12, 2023

AnnaTrainingG Sep 13, 2023

Xreki Sep 13, 2023

AnnaTrainingG Sep 14, 2023

Xreki left a comment

Optimize the calling logic of the cast and cast_grad kernels to reduce compilation size #57083

Optimize the calling logic of the cast and cast_grad kernels to reduce compilation size #57083

Conversation

AnnaTrainingG commented Sep 7, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Sep 7, 2023

Xreki Sep 7, 2023

Choose a reason for hiding this comment

AnnaTrainingG Sep 13, 2023

Choose a reason for hiding this comment

Xreki Sep 12, 2023

Choose a reason for hiding this comment

AnnaTrainingG Sep 13, 2023

Choose a reason for hiding this comment

Xreki Sep 13, 2023

Choose a reason for hiding this comment

AnnaTrainingG Sep 14, 2023

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

AnnaTrainingG commented Sep 7, 2023 •

edited

Loading