[AIROCMLIR-707] Make attentionSweeps split_kv filter device-memory-aware by bogdan-petkovic · Pull Request #2366 · ROCm/rocMLIR

bogdan-petkovic · 2026-05-07T11:34:59Z

Motivation

Some attention sweep samples with large split_kv generate very high temporary storage pressure and are currently rejected as FAIL after entering the pipeline.
This PR adds an early sweep-side prefilter so memory-heavy splitKV cases can be skipped before expensive execution, reducing wasted sweep effort while keeping existing compiler/runtime behavior unchanged.

Technical Details

Updated mlir/utils/performance/attentionSweeps.py to add a device-memory-aware splitKV prefilter in sampling:

Added splitKV extra-storage estimator for sampled attention shapes.
Added default splitKV limit policy based on visible device memory:
- deviceMem / 8, clamped to [1 GiB, 8 GiB]
- fallback to 1.5 GiB when memory query is unavailable.
Added CLI control:
- --splitkv-extra-bytes-limit (non-negative int64-validated override).
  Refactored sample filtering to track reasons separately:
MAX_TOKENS filter
splitKV extra-storage filter
cumulative reporting across initial and refill sampling batches.
Added focused unit tests in mlir/utils/performance/tests/test_attentionSweeps.py:
limit policy behavior (clamp/fallback),
splitKV estimator behavior,
per-reason filter accounting.
No compiler/verifier logic was changed in this PR; scope is limited to sweep generation/filtering behavior.

Test Plan

Run attentionSweeps.py through CI to validate end-to-end sweep behavior with the new splitKV prefilter.
Confirm that memory-heavy splitKV cases are filtered before expensive execution stages.

Test Result

CI attention-sweep run

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

Copilot

Pull request overview

This PR updates the attention sweep generator to proactively filter out sampled configurations with large split_kv settings that are likely to create excessive temporary-storage pressure, using a device-memory-aware default limit (with a CLI override). This keeps the compiler/runtime behavior unchanged while reducing wasted sweep effort on configurations likely to OOM or time out later in the pipeline.

Changes:

Add a splitKV extra temporary-storage estimator and use it as an early sampling-time filter.
Add a default splitKV limit policy derived from visible device VRAM (with clamping/fallback) and a --splitkv-extra-bytes-limit override.
Improve filtering accounting/reporting by tracking reasons (MAX_TOKENS vs splitKV extra-storage) and reporting cumulative totals across refill batches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    parser.add_argument(
+        '--splitkv-extra-bytes-limit',
+        type=_parse_nonnegative_int64,
+        default=None,
+        help=("Max allowed estimated splitKV extra temporary storage (bytes). "
+              "If unset, a device-based default is used (deviceMem/8 clamped to "
+              "[1 GiB, 8 GiB], fallback 1.5 GiB)."))


attentionSweeps: add device-memory-based splitKV prefilter

00a837e

Signed-off-by: bogdan-petkovic <bogdan.petkovic@htecgroup.com>

dorde-antic requested a review from Copilot May 8, 2026 11:13

Copilot started reviewing on behalf of dorde-antic May 8, 2026 11:16 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

bogdan-petkovic mentioned this pull request May 15, 2026

[AIROCMLIR-707] Fix split-kv attention masking and sweep RMS for attention configs #2371

Open

2 tasks

bogdan-petkovic closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIROCMLIR-707] Make attentionSweeps split_kv filter device-memory-aware#2366

[AIROCMLIR-707] Make attentionSweeps split_kv filter device-memory-aware#2366
bogdan-petkovic wants to merge 1 commit into
ROCm:developfrom
bogdan-petkovic:bogdan-petkovic/attention-sweeps-device-memory-filter

bogdan-petkovic commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bogdan-petkovic commented May 7, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants