Skip to content

[https://nvbugs/6117811][fix] Fix XQA IMA for invalid pages with sliding window#14459

Open
pengbowang-nv wants to merge 5 commits into
NVIDIA:mainfrom
pengbowang-nv:dev-fix-xqa-ima-placeholder-blocks
Open

[https://nvbugs/6117811][fix] Fix XQA IMA for invalid pages with sliding window#14459
pengbowang-nv wants to merge 5 commits into
NVIDIA:mainfrom
pengbowang-nv:dev-fix-xqa-ima-placeholder-blocks

Conversation

@pengbowang-nv
Copy link
Copy Markdown
Collaborator

@pengbowang-nv pengbowang-nv commented May 22, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed paged KV-cache page skipping logic for sliding-window scenarios.
    • Added memory safety guard for zero-size operations.
    • Corrected Hopper-specific compilation guard.
  • Tests

    • Added test coverage for sliding-window with invalid prefix pages.
  • Chores

    • Updated dependency source URL for Eigen library.
    • Updated copyright year headers.

Review Change Stack

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
@pengbowang-nv pengbowang-nv requested a review from a team as a code owner May 22, 2026 10:10
@pengbowang-nv pengbowang-nv requested review from brb-nv and lowsfer May 22, 2026 10:11
@pengbowang-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@pengbowang-nv
Copy link
Copy Markdown
Collaborator Author

/bot help

@github-actions
Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@pengbowang-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu-test

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

This PR adds sliding-window support to paged KV cache kernels by tracking and validating leading page skips. A new nbSkipLeadingPages parameter threads through utility functions, main kernel integration points, and specialized tile loaders to enforce valid page ranges during cache lookups.

Changes

Sliding-Window KV Cache Leading Page Skip Support

Layer / File(s) Summary
Dependency and Build Configuration
3rdparty/fetch_content.json, cpp/kernels/xqa/CMakeLists.txt
Eigen repository migrated to GitLab; CMakeLists.txt updated to conditionally include 3rdparty directory when XQA is the top-level project.
Copy Async Memory Safety Guard
cpp/kernels/xqa/ldgsts.cuh
copyAsync now sets source pointer to nullptr when size is zero, preventing global memory reads during zero-fill operations.
Utility Function Signatures and Page Range Validation
cpp/kernels/xqa/mhaUtils.cuh
getPage and loadPagesForBeamSearchAsync templates now accept nbSkipLeadingPages and validate that page indices fall within [nbSkipLeadingPages, nbPages) range; invalid pages return kBAD_PAGE_INDEX or copy zero bytes.
Main Kernel Page Loading Integration
cpp/kernels/xqa/mha.cu
Computes nbSkipLeadingPages from token skip count and passes it to K-cache and V-cache page loading routines for both single-beam and multi-beam execution paths.
SM90 Tile Loader with Page Skip Support
cpp/kernels/xqa/mha_sm90.cu
KVTilePartLoader gains nbSkipLeadingPages member; kernel_mha computes the skip count and initializes both K and V loaders with it; loadPages bounds-checks page indices against the valid range.
Test Coverage for Invalid Prefix Pages
cpp/kernels/xqa/test/test.cpp
Introduces compile-time macro to poison leading page list entries and adds gtest case validating sliding-window kernels with invalid prefix pages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely blank/template-only with no actual explanation of the issue, solution, or test coverage provided by the author. Add a description explaining the bug, why invalid pages occur with sliding windows, the solution approach, and list the test cases that validate the fix.
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: addressing XQA IMA invalid pages with sliding window functionality, with proper NVBugs ticket and fix type notation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/kernels/xqa/mha.cu (1)

1-2: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the SPDX copyright year.

This file changed in the PR, but the header still ends at 2025.

Suggested fix
- * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines, "Include NVIDIA copyright header on ALL new files; update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/kernels/xqa/mha.cu` around lines 1 - 2, Update the SPDX header year range
in the file header comment in cpp/kernels/xqa/mha.cu: replace the trailing year
"2025" with "2026" so the header reads "2023-2026" (or the appropriate inclusive
end year) to reflect the modification; locate the top-of-file comment block
containing "SPDX-FileCopyrightText" and adjust the year range accordingly.
cpp/kernels/xqa/mha_sm90.cu (1)

2041-2050: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Don't pass kBAD_PAGE_INDEX through to TMA loads.

loadPages() now marks skipped leading pages with kBAD_PAGE_INDEX, but KVTilePartLoader::loadData() still unconditionally feeds every pages[i] into tma::loadAsync. On a partially skipped first tile, this still issues a TMA read against the invalid page index the PR is trying to avoid. This needs a real skip/zero-fill path before the TMA call, not just a sentinel in pages.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/kernels/xqa/mha_sm90.cu` around lines 2041 - 2050, loadPages() now marks
skipped leading pages with kBAD_PAGE_INDEX but KVTilePartLoader::loadData()
still passes every pages[i] into tma::loadAsync; change loadData() so it does
not call tma::loadAsync for entries equal to kBAD_PAGE_INDEX — instead zero-fill
the corresponding destination region (or skip/compact the pages array so tma
only sees valid page indices and separately memset zeros for skipped slots).
Locate the pages[] usage and the tma::loadAsync invocation in
KVTilePartLoader::loadData() and add a conditional path that either (a) builds a
contiguous list of valid pages and issues TMA loads only for those, mapping
results back, or (b) branches per-page to zero-fill when pages[i] ==
kBAD_PAGE_INDEX before/without calling tma::loadAsync.
🧹 Nitpick comments (3)
cpp/kernels/xqa/ldgsts.cuh (1)

33-38: ⚡ Quick win

FIXME comment indicates this is a workaround pending investigation.

The guard prevents unintended global memory reads when srcSize == 0, but the root cause (race condition or compiler issue) remains unclear. This workaround appears safe, as the cp.async instruction should respect the srcSize parameter and handle nullptr correctly when filling with zeros.

Do you want me to help investigate the root cause or open an issue to track this for deeper analysis?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/kernels/xqa/ldgsts.cuh` around lines 33 - 38, The current FIXME should
not be left as-is — keep the protective conditional assignment (if (srcSize ==
0) { src = nullptr; }) but replace the vague FIXME with a clear TODO that
documents why the guard is required (mention srcSize, src and cp.async
behavior), what hypotheses remain (possible race or compiler codegen bug), and
add a task/issue number linking to a new issue created in your tracker for
deeper investigation; ensure the comment explains expected safe behavior and
that the guard is a temporary workaround pending the issue resolution so future
readers know to remove/reevaluate it after the ticket is closed.
cpp/kernels/xqa/test/test.cpp (2)

1633-1644: 💤 Low value

Consider documenting the test parameter rationale.

The test uses seqLen = 256 + 57 without explaining why this specific value was chosen. A comment describing the test scenario would improve maintainability.

📝 Suggested clarification
 TEST(RefCheck, sliding_window_invalid_prefix_pages)
 {
+    // Test with seqLen=313 and slidingWinSize=128 to create ~185 tokens (5-6 pages)
+    // outside the sliding window, verifying kernels skip poisoned leading pages.
 `#if` SPEC_DEC
     runTest<1, HEAD_GRP_SIZE, 3>(16, 256 + 57, false, true, false, false, false, ~0U, 128);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/kernels/xqa/test/test.cpp` around lines 1633 - 1644, Add a short inline
comment above the TEST named RefCheck::sliding_window_invalid_prefix_pages (or
next to the runTest call) explaining why seqLen is set to 256 + 57 (e.g., to
cross a page/buffer boundary, force an off-by-one/prefix-page condition, or to
simulate X full pages plus Y extra bytes) and what specific behavior the test is
validating; reference the runTest invocation and the explicit seqLen value (256
+ 57) so future readers understand the scenario being exercised and can adjust
the numbers safely.

662-690: ⚡ Quick win

Consider adding documentation for the page poisoning logic.

This block implements a test feature that overwrites leading page indices with an invalid value to verify kernel behavior with sliding windows. The logic is correct but complex, and would benefit from comments explaining:

  • The purpose of poisoning (testing invalid page handling)
  • How seqBeg represents the start of the valid sliding window range
  • Why pages [0, nbPoisonPages) fall outside the window and are safe to poison
  • The SPEC_DEC-specific logic for computing the valid range
📝 Suggested documentation
 `#if` USE_PAGED_KV_CACHE && SLIDING_WINDOW && XQA_TEST_POISON_SLIDING_WINDOW_PREFIX_PAGES
     {
+        // Poison leading pages that fall outside the sliding window to verify
+        // the kernel correctly skips invalid page indices.
         constexpr KVCachePageIndex kPoisonPageIdx = static_cast<KVCachePageIndex>(1U << 20);
 `#if` SPEC_DEC
+        // For spec decode, compute the position of the first query token,
+        // then determine where the sliding window begins relative to that position.
         uint32_t const firstQSeqLen = seqLen - qSeqLen + 1;
         uint32_t const seqBeg = firstQSeqLen < slidingWinSize ? 0 : firstQSeqLen - slidingWinSize;
 `#else`
+        // Sliding window includes the most recent slidingWinSize tokens.
         uint32_t const seqBeg = seqLen < slidingWinSize ? 0 : seqLen - slidingWinSize;
 `#endif`
+        // Poison all complete pages before the sliding window starts.
         uint32_t const nbPoisonPages = std::min<uint32_t>(seqBeg / tokensPerPage, nbPagesPerSeq);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/kernels/xqa/test/test.cpp` around lines 662 - 690, Add clear inline
comments to the page-poisoning test block (guarded by USE_PAGED_KV_CACHE,
SLIDING_WINDOW and XQA_TEST_POISON_SLIDING_WINDOW_PREFIX_PAGES) explaining the
intent and math: state that kPoisonPageIdx is an invalid page index used to
verify kernel handling of out‑of‑window pages; document how seqBeg (and the
SPEC_DEC variant that uses firstQSeqLen) computes the first token index inside
the valid sliding window; explain why nbPoisonPages = min(seqBeg /
tokensPerPage, nbPagesPerSeq) yields the count of prefix pages outside the
window that are safe to overwrite; and clarify the two layout branches that fill
pageList (PAGED_KV_CACHE_LAYOUT == 1 vs else) so reviewers understand which
dimensions (batch, beam, kv) are being poisoned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@3rdparty/fetch_content.json`:
- Line 40: The "git_repository" entry for Eigen (the git URL
"https://gitlab.com/libeigen/eigen.git" and the tag 3.4.0) cannot be reliably
fetched via git transport in this environment due to unauthenticated git access;
update the fetch configuration in 3rdparty/fetch_content.json to use an
archive-based fetch (point to the official release archive like
"eigen-3.4.0.zip" or the HTTPS archive URL for tag 3.4.0) or ensure the fetch
mechanism can supply appropriate credentials/token for git transport, adjusting
the "git_repository" or replacing it with an "archive_url"/"url" entry and
keeping the tag/version metadata (3.4.0) consistent.

---

Outside diff comments:
In `@cpp/kernels/xqa/mha_sm90.cu`:
- Around line 2041-2050: loadPages() now marks skipped leading pages with
kBAD_PAGE_INDEX but KVTilePartLoader::loadData() still passes every pages[i]
into tma::loadAsync; change loadData() so it does not call tma::loadAsync for
entries equal to kBAD_PAGE_INDEX — instead zero-fill the corresponding
destination region (or skip/compact the pages array so tma only sees valid page
indices and separately memset zeros for skipped slots). Locate the pages[] usage
and the tma::loadAsync invocation in KVTilePartLoader::loadData() and add a
conditional path that either (a) builds a contiguous list of valid pages and
issues TMA loads only for those, mapping results back, or (b) branches per-page
to zero-fill when pages[i] == kBAD_PAGE_INDEX before/without calling
tma::loadAsync.

In `@cpp/kernels/xqa/mha.cu`:
- Around line 1-2: Update the SPDX header year range in the file header comment
in cpp/kernels/xqa/mha.cu: replace the trailing year "2025" with "2026" so the
header reads "2023-2026" (or the appropriate inclusive end year) to reflect the
modification; locate the top-of-file comment block containing
"SPDX-FileCopyrightText" and adjust the year range accordingly.

---

Nitpick comments:
In `@cpp/kernels/xqa/ldgsts.cuh`:
- Around line 33-38: The current FIXME should not be left as-is — keep the
protective conditional assignment (if (srcSize == 0) { src = nullptr; }) but
replace the vague FIXME with a clear TODO that documents why the guard is
required (mention srcSize, src and cp.async behavior), what hypotheses remain
(possible race or compiler codegen bug), and add a task/issue number linking to
a new issue created in your tracker for deeper investigation; ensure the comment
explains expected safe behavior and that the guard is a temporary workaround
pending the issue resolution so future readers know to remove/reevaluate it
after the ticket is closed.

In `@cpp/kernels/xqa/test/test.cpp`:
- Around line 1633-1644: Add a short inline comment above the TEST named
RefCheck::sliding_window_invalid_prefix_pages (or next to the runTest call)
explaining why seqLen is set to 256 + 57 (e.g., to cross a page/buffer boundary,
force an off-by-one/prefix-page condition, or to simulate X full pages plus Y
extra bytes) and what specific behavior the test is validating; reference the
runTest invocation and the explicit seqLen value (256 + 57) so future readers
understand the scenario being exercised and can adjust the numbers safely.
- Around line 662-690: Add clear inline comments to the page-poisoning test
block (guarded by USE_PAGED_KV_CACHE, SLIDING_WINDOW and
XQA_TEST_POISON_SLIDING_WINDOW_PREFIX_PAGES) explaining the intent and math:
state that kPoisonPageIdx is an invalid page index used to verify kernel
handling of out‑of‑window pages; document how seqBeg (and the SPEC_DEC variant
that uses firstQSeqLen) computes the first token index inside the valid sliding
window; explain why nbPoisonPages = min(seqBeg / tokensPerPage, nbPagesPerSeq)
yields the count of prefix pages outside the window that are safe to overwrite;
and clarify the two layout branches that fill pageList (PAGED_KV_CACHE_LAYOUT ==
1 vs else) so reviewers understand which dimensions (batch, beam, kv) are being
poisoned.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 492671f6-682b-43eb-8425-124a38eef15c

📥 Commits

Reviewing files that changed from the base of the PR and between e796f16 and 37b8f13.

📒 Files selected for processing (7)
  • 3rdparty/fetch_content.json
  • cpp/kernels/xqa/CMakeLists.txt
  • cpp/kernels/xqa/ldgsts.cuh
  • cpp/kernels/xqa/mha.cu
  • cpp/kernels/xqa/mhaUtils.cuh
  • cpp/kernels/xqa/mha_sm90.cu
  • cpp/kernels/xqa/test/test.cpp

{
"name": "eigen",
"git_repository": "https://github.com/libeigen/eigen",
"git_repository": "https://gitlab.com/libeigen/eigen.git",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify the eigen GitLab repository URL and tag 3.4.0

# Test 1: Check if the GitLab repository is accessible
echo "Checking GitLab repository accessibility..."
curl -I -s https://gitlab.com/libeigen/eigen.git | head -n 1

# Test 2: Verify tag 3.4.0 exists
echo "Verifying tag 3.4.0..."
git ls-remote https://gitlab.com/libeigen/eigen.git refs/tags/3.4.0

Repository: NVIDIA/TensorRT-LLM

Length of output: 594


Ensure eigen@3.4.0 is actually fetchable from gitlab.com (not just via HTTP)

  • https://gitlab.com/libeigen/eigen.git is reachable over HTTPS, and the 3.4.0 tag/zip (eigen-3.4.0.zip) downloads successfully.
  • git transport access fails in this environment (git ls-remote ... refs/tags/3.4.0HTTP Basic: Access denied), so the dependency fetch must not rely on unauthenticated cloning/ls-remote of the *.git URL—either switch to an archive-based fetch or supply a token/credential compatible with the fetch method used.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@3rdparty/fetch_content.json` at line 40, The "git_repository" entry for Eigen
(the git URL "https://gitlab.com/libeigen/eigen.git" and the tag 3.4.0) cannot
be reliably fetched via git transport in this environment due to unauthenticated
git access; update the fetch configuration in 3rdparty/fetch_content.json to use
an archive-based fetch (point to the official release archive like
"eigen-3.4.0.zip" or the HTTPS archive URL for tag 3.4.0) or ensure the fetch
mechanism can supply appropriate credentials/token for git transport, adjusting
the "git_repository" or replacing it with an "archive_url"/"url" entry and
keeping the tag/version metadata (3.4.0) consistent.

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49922 [ run ] triggered by Bot. Commit: 37b8f13 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49922 [ run ] completed with state SUCCESS. Commit: 37b8f13
/LLM/main/L0_MergeRequest_PR pipeline #39500 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Copy link
Copy Markdown
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

@brb-nv
Copy link
Copy Markdown
Collaborator

brb-nv commented May 22, 2026

Thank you for the change, Pengbo! Can you please unwaive the tests in this MR as well? Approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants