Skip to content

FP16 data in-register transpose#41

Merged
asroy merged 18 commits into
developfrom
fix_16bit_packing
Nov 15, 2021
Merged

FP16 data in-register transpose#41
asroy merged 18 commits into
developfrom
fix_16bit_packing

Conversation

@asroy
Copy link
Copy Markdown
Contributor

@asroy asroy commented Oct 20, 2021

  • Implement fp16 data transpose in VGPR using v_mov_b32_f16 inline asm, used by in ThreadwiseTensorSliceTransfer_v3r2, which can be turned off by setting CK_EXPERIMENTAL_USE_IN_REGISTER_SUB_DWORD_TRANSPOSE to 0
  • Add StaticBufferTupleOfVector
  • Add StaticTensor and StaticTensorTupleOfVectorBuffer

Also:

  • Rename StaticBufferV2 to StaticBufferOfVectorTypeV2

@asroy asroy changed the title Fix 16bit packing FP16 data in-register transpose Oct 20, 2021
@asroy asroy requested a review from zjing14 October 20, 2021 05:32
@asroy asroy merged commit b491ebf into develop Nov 15, 2021
@junliume junliume deleted the fix_16bit_packing branch October 21, 2023 06:09
carlushuang added a commit that referenced this pull request Feb 22, 2024
Move some header files from xformers to CK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant