Add attention sink support for FMHA FWD#3368
Conversation
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Updated the pipeline creation logic to include 'sink' parameter in product combinations and adjusted the FmhaFwdPipeline calls accordingly.
There was a problem hiding this comment.
Pull request overview
This PR reverts a previous revert, re-introducing the "attention sink" feature originally added in PR #2892. The attention sink mechanism allows certain tokens at the beginning of the sequence to always be attended to, regardless of the attention mask pattern.
Key changes:
- Added
kHasSinkboolean parameter throughout the trait hierarchy and pipeline implementations - Updated masking logic to support sink-aware bounds checking via
GetSinkTileRangeAlongXandIsOutOfSinkBoundmethods - Modified loop calculations in pipelines to handle sink regions separately from regular attention regions
- Updated code generation scripts to emit sink-enabled kernel variants
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
include/ck_tile/ops/fmha/pipeline/tile_fmha_traits.hpp |
Added kHasSink parameter to trait templates |
include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_*.hpp |
Updated pipelines to compute sink loop counts and adjust window offsets |
include/ck_tile/ops/fmha/kernel/fmha_*_kernel.hpp |
Added sink_size to kernel argument structures and kernel name generation |
include/ck_tile/ops/fmha/block/block_masking.hpp |
Implemented sink-aware masking methods (GetSinkTileRangeAlongX, IsOutOfSinkBound) |
include/ck_tile/ops/fmha/block/variants.hpp |
Added LogitsSinkMask methods to attention variants |
example/ck_tile/01_fmha/fmha_fwd*.hpp |
Added sink parameters to trait structures and argument passing |
example/ck_tile/01_fmha/codegen/ops/*.py |
Updated code generation to produce sink-enabled kernel instances |
example/ck_tile/01_fmha/script/*.sh |
Added new test scripts for sink functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Are the relevant changes ready on the AITER side? |
aiter pr: ROCm/aiter#1272 |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Proposed changes
This is the second attempt to introduce attention sink. The first attempt (#2892) was reverted by (#3250)
Key changes:
kHasSinkboolean parameter throughout the trait hierarchy and pipeline implementationsGetSinkTileRangeAlongXandIsOutOfSinkBoundmethodsExample mask window_size[2,0], sink_size = 2
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed files