Skip to content

Reduction for int8 and bfloat16#125

Merged
asroy merged 41 commits into
developfrom
ck_reduce_int8_bp16
Mar 22, 2022
Merged

Reduction for int8 and bfloat16#125
asroy merged 41 commits into
developfrom
ck_reduce_int8_bp16

Conversation

@qianfengz
Copy link
Copy Markdown
Contributor

This P.R provide the following:

  1. Reduction for int8 and bfloat16 tensor data
  2. Fix some issues
  • Some MThreadSlicedSize + OutDstVectorSize configuration are invalidate
  • GetWorkspaceSizeInBytes() of DeviceReduceMultiblockPartialReduce does not calculate workspace correctly, which could cause GPU memory fault
  1. Refine the codes in DeviceReduceXXX
  2. Add tensor reduction configuration 4-d all-dimension reduction
  3. Some re-naming

qianfengz and others added 30 commits March 6, 2022 06:46
@qianfengz qianfengz requested a review from asroy March 13, 2022 15:36
Comment thread example/12_reduce/reduce_blockwise.cpp Outdated
using kInDataType = ck::half_t;
using kOutDataType = ck::half_t;
using kAccDataType = float;
using hInDataType = half_float::half;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread example/12_reduce/reduce_blockwise.cpp Outdated
using kInDataType = ck::half_t;
using kOutDataType = ck::half_t;
using kAccDataType = float;
using hInDataType = half_float::half;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something wrong with using ck::half_t on host?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that Reduction needs to use abs() and isnan() on fp16. But for ck::half_t, the __habs() and __hisnan() can only be used in __device__ mode to do the functionality of abs() and isnan(). In the other side, half_float::half has direct and complete implementation of abs() and isnan() on the host side.

struct DeviceReduce : public BaseOperator
{
virtual size_t GetWorkspaceSizeInBytes(const std::vector<int>& inLengths)
virtual size_t GetWorkspaceSizeInBytes(const std::vector<int> inLengths,
Copy link
Copy Markdown
Contributor

@asroy asroy Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use long_index_t,

I'm going to make sure all files in include/ck including device operation meet this standard

https://github.com/ROCmSoftwarePlatform/composable_kernel/wiki/Coding-Style#integer-type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

@asroy asroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix naming issue and merge conflict. Otherwise LGTM

@asroy asroy self-requested a review March 22, 2022 19:34
@asroy asroy merged commit 9a8ee8a into develop Mar 22, 2022
@qianfengz qianfengz deleted the ck_reduce_int8_bp16 branch March 23, 2022 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants