Reduction for int8 and bfloat16 by qianfengz · Pull Request #125 · ROCm/composable_kernel

qianfengz · 2022-03-13T12:43:30Z

This P.R provide the following:

Reduction for int8 and bfloat16 tensor data
Fix some issues

Some MThreadSlicedSize + OutDstVectorSize configuration are invalidate
GetWorkspaceSizeInBytes() of DeviceReduceMultiblockPartialReduce does not calculate workspace correctly, which could cause GPU memory fault

Refine the codes in DeviceReduceXXX
Add tensor reduction configuration 4-d all-dimension reduction
Some re-naming

…y Blockwise Reduction

…rface template parameter

…_xxx.hpp

…lice-size/vector-size configurations

…) to make int8 completely pass

…ReduceDim or InvariantDims/ReduceDims

…evice_reduce_xxx instances

…ce_int8_bp16

…mple_reduce/test_reduce/

…vior when no command argument

asroy · 2022-03-21T22:00:46Z

-using kInDataType  = ck::half_t;
-using kOutDataType = ck::half_t;
-using kAccDataType = float;
+using hInDataType  = half_float::half;


HostInDataType
HostOutDataType
HostAccDataType

https://github.com/ROCmSoftwarePlatform/composable_kernel/wiki/Coding-Style#naming-style

asroy · 2022-03-21T22:01:20Z

-using kInDataType  = ck::half_t;
-using kOutDataType = ck::half_t;
-using kAccDataType = float;
+using hInDataType  = half_float::half;


Something wrong with using ck::half_t on host?

The reason is that Reduction needs to use abs() and isnan() on fp16. But for ck::half_t, the __habs() and __hisnan() can only be used in __device__ mode to do the functionality of abs() and isnan(). In the other side, half_float::half has direct and complete implementation of abs() and isnan() on the host side.

asroy · 2022-03-21T22:04:28Z

 struct DeviceReduce : public BaseOperator
 {
-    virtual size_t GetWorkspaceSizeInBytes(const std::vector<int>& inLengths)
+    virtual size_t GetWorkspaceSizeInBytes(const std::vector<int> inLengths,


please use long_index_t,

I'm going to make sure all files in include/ck including device operation meet this standard

https://github.com/ROCmSoftwarePlatform/composable_kernel/wiki/Coding-Style#integer-type

asroy

Please fix naming issue and merge conflict. Otherwise LGTM

qianfengz and others added 30 commits March 6, 2022 06:46

Use thread cluster descriptor and explicit M_K 2d descriptor to simpl…

61054dd

…y Blockwise Reduction

Change by replacing ReduceDims by NumReduceDims as Device Reduce inte…

106951c

…rface template parameter

Rename the folder name for the pool2d and reduce examples

896e2af

Update to reduction test scripts

7fea393

Add Readme for pool2d_fwd and reduce_blockwise examples

e27fc75

Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX)

6b91757

Tiny fix in reduce profiler and tiny update in reduce testing scripts

a2fbd87

Merge branch 'pr82-followup' into ck_reduce_int8_bp16

0e197b8

Tiny fix in testing script profile_reduce_no_index.sh

5881bf8

Tiny fix in testing script profile_reduce_no_index.sh

5357c36

Merge branch 'develop' into pr82-followup

7398cef

Add support for bfp16 reduction (using bhalf_t = ushort)

d2ec785

Tiny fix in amd_buffer_addressing.hpp

55ff757

Tiny change in script/profile_reduce_with_index.sh

ab45ae0

Use AccDataType for Beta value and use element_wise::PassThrough

f95b23c

Use type_convert for type converting in host layer reduction

1600461

Renaming and refining in Reduction profiler/device layer/examples

9327afb

Renaming and refining in Reduction profiler/device layer/examples

a29ccd5

Renaming all NumReduceDims to NumReduceDim

c6e55e8

Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2

c5d051d

Update to testing scripts to add bf16 support

5801348

Align the files for int8/bfloat16 with the re-organized directory tree

aec51ed

Merge branch 'develop' into ck_reduce_int8_bp16

5fd206d

added more static_assert

50fc7dd

Remove buggy tunable configurations defined in device_reduce_instance…

6a0afa5

…_xxx.hpp

Add static_assert to give compile-time warning for incorrect thread s…

48a931d

…lice-size/vector-size configurations

minor change

43c8b6d

Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce…

60a65c1

…) to make int8 completely pass

Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp

f15e568

Tiny fix in script/profile_reduce_no_index.sh

ad71fa5

qianfengz added 4 commits March 13, 2022 05:42

Refine in DeviceReduce layer with regard to using NumInvariantDim/Num…

fd72e6e

…ReduceDim or InvariantDims/ReduceDims

Generic renaming in host reduction and DeviceReduce layer

1763be6

Add support for 4-d all dimension reduction in the profiler and add_d…

959dc4c

…evice_reduce_xxx instances

Merge branch 'develop' into ck_reduce_int8_bp16

d86e66f

qianfengz requested a review from asroy March 13, 2022 15:36

qianfengz added 5 commits March 15, 2022 10:52

Merge branch 'add_more_static_assert_to_threadwise_copy' into ck_redu…

47e41fb

…ce_int8_bp16

Use multi-thread and simplification for host Reduction implementation

bccfe3b

Add ctest for reduction

f41118e

Update to clarify the using of data init method in produce_reduce/exa…

12647ee

…mple_reduce/test_reduce/

Update to the reduce CTest executables to enable default testing beha…

987668c

…vior when no command argument

asroy reviewed Mar 21, 2022

View reviewed changes

asroy suggested changes Mar 21, 2022

View reviewed changes

asroy added the CI - Pass label Mar 21, 2022

qianfengz added 2 commits March 22, 2022 04:09

Merge branch 'develop' into ck_reduce_int8_bp16

3306ac4

Renaming

47504ea

asroy self-requested a review March 22, 2022 19:34

asroy approved these changes Mar 22, 2022

View reviewed changes

asroy merged commit 9a8ee8a into develop Mar 22, 2022

asroy mentioned this pull request Mar 22, 2022

More static_assert to threadwise_transfer #122

Closed

qianfengz deleted the ck_reduce_int8_bp16 branch March 23, 2022 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduction for int8 and bfloat16#125

Reduction for int8 and bfloat16#125
asroy merged 41 commits into
developfrom
ck_reduce_int8_bp16

qianfengz commented Mar 13, 2022

Uh oh!

asroy Mar 21, 2022

Uh oh!

qianfengz Mar 22, 2022

Uh oh!

asroy Mar 21, 2022

Uh oh!

qianfengz Mar 22, 2022

Uh oh!

asroy Mar 21, 2022 •

edited

Loading

Uh oh!

qianfengz Mar 22, 2022

Uh oh!

asroy left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qianfengz commented Mar 13, 2022

Uh oh!

asroy Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

qianfengz Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

asroy Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

qianfengz Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

asroy Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qianfengz Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

asroy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asroy Mar 21, 2022 •

edited

Loading