[https://nvbugs/6117811][fix] Fix XQA IMA for invalid pages with sliding window#14459
[https://nvbugs/6117811][fix] Fix XQA IMA for invalid pages with sliding window#14459pengbowang-nv wants to merge 5 commits into
Conversation
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
/bot run |
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run --add-multi-gpu-test |
📝 WalkthroughWalkthroughThis PR adds sliding-window support to paged KV cache kernels by tracking and validating leading page skips. A new ChangesSliding-Window KV Cache Leading Page Skip Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/kernels/xqa/mha.cu (1)
1-2:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the SPDX copyright year.
This file changed in the PR, but the header still ends at 2025.
Suggested fix
- * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.As per coding guidelines, "Include NVIDIA copyright header on ALL new files; update year on modified files."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/kernels/xqa/mha.cu` around lines 1 - 2, Update the SPDX header year range in the file header comment in cpp/kernels/xqa/mha.cu: replace the trailing year "2025" with "2026" so the header reads "2023-2026" (or the appropriate inclusive end year) to reflect the modification; locate the top-of-file comment block containing "SPDX-FileCopyrightText" and adjust the year range accordingly.cpp/kernels/xqa/mha_sm90.cu (1)
2041-2050:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftDon't pass
kBAD_PAGE_INDEXthrough to TMA loads.
loadPages()now marks skipped leading pages withkBAD_PAGE_INDEX, butKVTilePartLoader::loadData()still unconditionally feeds everypages[i]intotma::loadAsync. On a partially skipped first tile, this still issues a TMA read against the invalid page index the PR is trying to avoid. This needs a real skip/zero-fill path before the TMA call, not just a sentinel inpages.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/kernels/xqa/mha_sm90.cu` around lines 2041 - 2050, loadPages() now marks skipped leading pages with kBAD_PAGE_INDEX but KVTilePartLoader::loadData() still passes every pages[i] into tma::loadAsync; change loadData() so it does not call tma::loadAsync for entries equal to kBAD_PAGE_INDEX — instead zero-fill the corresponding destination region (or skip/compact the pages array so tma only sees valid page indices and separately memset zeros for skipped slots). Locate the pages[] usage and the tma::loadAsync invocation in KVTilePartLoader::loadData() and add a conditional path that either (a) builds a contiguous list of valid pages and issues TMA loads only for those, mapping results back, or (b) branches per-page to zero-fill when pages[i] == kBAD_PAGE_INDEX before/without calling tma::loadAsync.
🧹 Nitpick comments (3)
cpp/kernels/xqa/ldgsts.cuh (1)
33-38: ⚡ Quick winFIXME comment indicates this is a workaround pending investigation.
The guard prevents unintended global memory reads when
srcSize == 0, but the root cause (race condition or compiler issue) remains unclear. This workaround appears safe, as thecp.asyncinstruction should respect thesrcSizeparameter and handlenullptrcorrectly when filling with zeros.Do you want me to help investigate the root cause or open an issue to track this for deeper analysis?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/kernels/xqa/ldgsts.cuh` around lines 33 - 38, The current FIXME should not be left as-is — keep the protective conditional assignment (if (srcSize == 0) { src = nullptr; }) but replace the vague FIXME with a clear TODO that documents why the guard is required (mention srcSize, src and cp.async behavior), what hypotheses remain (possible race or compiler codegen bug), and add a task/issue number linking to a new issue created in your tracker for deeper investigation; ensure the comment explains expected safe behavior and that the guard is a temporary workaround pending the issue resolution so future readers know to remove/reevaluate it after the ticket is closed.cpp/kernels/xqa/test/test.cpp (2)
1633-1644: 💤 Low valueConsider documenting the test parameter rationale.
The test uses
seqLen = 256 + 57without explaining why this specific value was chosen. A comment describing the test scenario would improve maintainability.📝 Suggested clarification
TEST(RefCheck, sliding_window_invalid_prefix_pages) { + // Test with seqLen=313 and slidingWinSize=128 to create ~185 tokens (5-6 pages) + // outside the sliding window, verifying kernels skip poisoned leading pages. `#if` SPEC_DEC runTest<1, HEAD_GRP_SIZE, 3>(16, 256 + 57, false, true, false, false, false, ~0U, 128);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/kernels/xqa/test/test.cpp` around lines 1633 - 1644, Add a short inline comment above the TEST named RefCheck::sliding_window_invalid_prefix_pages (or next to the runTest call) explaining why seqLen is set to 256 + 57 (e.g., to cross a page/buffer boundary, force an off-by-one/prefix-page condition, or to simulate X full pages plus Y extra bytes) and what specific behavior the test is validating; reference the runTest invocation and the explicit seqLen value (256 + 57) so future readers understand the scenario being exercised and can adjust the numbers safely.
662-690: ⚡ Quick winConsider adding documentation for the page poisoning logic.
This block implements a test feature that overwrites leading page indices with an invalid value to verify kernel behavior with sliding windows. The logic is correct but complex, and would benefit from comments explaining:
- The purpose of poisoning (testing invalid page handling)
- How
seqBegrepresents the start of the valid sliding window range- Why pages [0, nbPoisonPages) fall outside the window and are safe to poison
- The SPEC_DEC-specific logic for computing the valid range
📝 Suggested documentation
`#if` USE_PAGED_KV_CACHE && SLIDING_WINDOW && XQA_TEST_POISON_SLIDING_WINDOW_PREFIX_PAGES { + // Poison leading pages that fall outside the sliding window to verify + // the kernel correctly skips invalid page indices. constexpr KVCachePageIndex kPoisonPageIdx = static_cast<KVCachePageIndex>(1U << 20); `#if` SPEC_DEC + // For spec decode, compute the position of the first query token, + // then determine where the sliding window begins relative to that position. uint32_t const firstQSeqLen = seqLen - qSeqLen + 1; uint32_t const seqBeg = firstQSeqLen < slidingWinSize ? 0 : firstQSeqLen - slidingWinSize; `#else` + // Sliding window includes the most recent slidingWinSize tokens. uint32_t const seqBeg = seqLen < slidingWinSize ? 0 : seqLen - slidingWinSize; `#endif` + // Poison all complete pages before the sliding window starts. uint32_t const nbPoisonPages = std::min<uint32_t>(seqBeg / tokensPerPage, nbPagesPerSeq);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/kernels/xqa/test/test.cpp` around lines 662 - 690, Add clear inline comments to the page-poisoning test block (guarded by USE_PAGED_KV_CACHE, SLIDING_WINDOW and XQA_TEST_POISON_SLIDING_WINDOW_PREFIX_PAGES) explaining the intent and math: state that kPoisonPageIdx is an invalid page index used to verify kernel handling of out‑of‑window pages; document how seqBeg (and the SPEC_DEC variant that uses firstQSeqLen) computes the first token index inside the valid sliding window; explain why nbPoisonPages = min(seqBeg / tokensPerPage, nbPagesPerSeq) yields the count of prefix pages outside the window that are safe to overwrite; and clarify the two layout branches that fill pageList (PAGED_KV_CACHE_LAYOUT == 1 vs else) so reviewers understand which dimensions (batch, beam, kv) are being poisoned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@3rdparty/fetch_content.json`:
- Line 40: The "git_repository" entry for Eigen (the git URL
"https://gitlab.com/libeigen/eigen.git" and the tag 3.4.0) cannot be reliably
fetched via git transport in this environment due to unauthenticated git access;
update the fetch configuration in 3rdparty/fetch_content.json to use an
archive-based fetch (point to the official release archive like
"eigen-3.4.0.zip" or the HTTPS archive URL for tag 3.4.0) or ensure the fetch
mechanism can supply appropriate credentials/token for git transport, adjusting
the "git_repository" or replacing it with an "archive_url"/"url" entry and
keeping the tag/version metadata (3.4.0) consistent.
---
Outside diff comments:
In `@cpp/kernels/xqa/mha_sm90.cu`:
- Around line 2041-2050: loadPages() now marks skipped leading pages with
kBAD_PAGE_INDEX but KVTilePartLoader::loadData() still passes every pages[i]
into tma::loadAsync; change loadData() so it does not call tma::loadAsync for
entries equal to kBAD_PAGE_INDEX — instead zero-fill the corresponding
destination region (or skip/compact the pages array so tma only sees valid page
indices and separately memset zeros for skipped slots). Locate the pages[] usage
and the tma::loadAsync invocation in KVTilePartLoader::loadData() and add a
conditional path that either (a) builds a contiguous list of valid pages and
issues TMA loads only for those, mapping results back, or (b) branches per-page
to zero-fill when pages[i] == kBAD_PAGE_INDEX before/without calling
tma::loadAsync.
In `@cpp/kernels/xqa/mha.cu`:
- Around line 1-2: Update the SPDX header year range in the file header comment
in cpp/kernels/xqa/mha.cu: replace the trailing year "2025" with "2026" so the
header reads "2023-2026" (or the appropriate inclusive end year) to reflect the
modification; locate the top-of-file comment block containing
"SPDX-FileCopyrightText" and adjust the year range accordingly.
---
Nitpick comments:
In `@cpp/kernels/xqa/ldgsts.cuh`:
- Around line 33-38: The current FIXME should not be left as-is — keep the
protective conditional assignment (if (srcSize == 0) { src = nullptr; }) but
replace the vague FIXME with a clear TODO that documents why the guard is
required (mention srcSize, src and cp.async behavior), what hypotheses remain
(possible race or compiler codegen bug), and add a task/issue number linking to
a new issue created in your tracker for deeper investigation; ensure the comment
explains expected safe behavior and that the guard is a temporary workaround
pending the issue resolution so future readers know to remove/reevaluate it
after the ticket is closed.
In `@cpp/kernels/xqa/test/test.cpp`:
- Around line 1633-1644: Add a short inline comment above the TEST named
RefCheck::sliding_window_invalid_prefix_pages (or next to the runTest call)
explaining why seqLen is set to 256 + 57 (e.g., to cross a page/buffer boundary,
force an off-by-one/prefix-page condition, or to simulate X full pages plus Y
extra bytes) and what specific behavior the test is validating; reference the
runTest invocation and the explicit seqLen value (256 + 57) so future readers
understand the scenario being exercised and can adjust the numbers safely.
- Around line 662-690: Add clear inline comments to the page-poisoning test
block (guarded by USE_PAGED_KV_CACHE, SLIDING_WINDOW and
XQA_TEST_POISON_SLIDING_WINDOW_PREFIX_PAGES) explaining the intent and math:
state that kPoisonPageIdx is an invalid page index used to verify kernel
handling of out‑of‑window pages; document how seqBeg (and the SPEC_DEC variant
that uses firstQSeqLen) computes the first token index inside the valid sliding
window; explain why nbPoisonPages = min(seqBeg / tokensPerPage, nbPagesPerSeq)
yields the count of prefix pages outside the window that are safe to overwrite;
and clarify the two layout branches that fill pageList (PAGED_KV_CACHE_LAYOUT ==
1 vs else) so reviewers understand which dimensions (batch, beam, kv) are being
poisoned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 492671f6-682b-43eb-8425-124a38eef15c
📒 Files selected for processing (7)
3rdparty/fetch_content.jsoncpp/kernels/xqa/CMakeLists.txtcpp/kernels/xqa/ldgsts.cuhcpp/kernels/xqa/mha.cucpp/kernels/xqa/mhaUtils.cuhcpp/kernels/xqa/mha_sm90.cucpp/kernels/xqa/test/test.cpp
| { | ||
| "name": "eigen", | ||
| "git_repository": "https://github.com/libeigen/eigen", | ||
| "git_repository": "https://gitlab.com/libeigen/eigen.git", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify the eigen GitLab repository URL and tag 3.4.0
# Test 1: Check if the GitLab repository is accessible
echo "Checking GitLab repository accessibility..."
curl -I -s https://gitlab.com/libeigen/eigen.git | head -n 1
# Test 2: Verify tag 3.4.0 exists
echo "Verifying tag 3.4.0..."
git ls-remote https://gitlab.com/libeigen/eigen.git refs/tags/3.4.0Repository: NVIDIA/TensorRT-LLM
Length of output: 594
Ensure eigen@3.4.0 is actually fetchable from gitlab.com (not just via HTTP)
https://gitlab.com/libeigen/eigen.gitis reachable over HTTPS, and the3.4.0tag/zip (eigen-3.4.0.zip) downloads successfully.gittransport access fails in this environment (git ls-remote ... refs/tags/3.4.0→HTTP Basic: Access denied), so the dependency fetch must not rely on unauthenticated cloning/ls-remoteof the*.gitURL—either switch to an archive-based fetch or supply a token/credential compatible with the fetch method used.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@3rdparty/fetch_content.json` at line 40, The "git_repository" entry for Eigen
(the git URL "https://gitlab.com/libeigen/eigen.git" and the tag 3.4.0) cannot
be reliably fetched via git transport in this environment due to unauthenticated
git access; update the fetch configuration in 3rdparty/fetch_content.json to use
an archive-based fetch (point to the official release archive like
"eigen-3.4.0.zip" or the HTTPS archive URL for tag 3.4.0) or ensure the fetch
mechanism can supply appropriate credentials/token for git transport, adjusting
the "git_repository" or replacing it with an "archive_url"/"url" entry and
keeping the tag/version metadata (3.4.0) consistent.
|
PR_Github #49922 [ run ] triggered by Bot. Commit: |
|
PR_Github #49922 [ run ] completed with state
|
|
Thank you for the change, Pengbo! Can you please unwaive the tests in this MR as well? Approved. |
Summary by CodeRabbit
Bug Fixes
Tests
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.