[TRTLLM-10804][infra] add LLM_SBSA_WHEEL_DOCKER_IMAGE by niukuo · Pull Request #12635 · NVIDIA/TensorRT-LLM

niukuo · 2026-03-31T16:50:37Z

Summary by CodeRabbit

New Features
- Added support for building and testing wheels on ARM64 (SBSA) architecture.
- Improved wheel versioning with local version metadata support.
Tests
- Enhanced wheel validation with stricter file parsing and compatibility checks.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

niukuo · 2026-03-31T16:51:22Z

/bot run --skip-test --disable-fail-fast

tensorrt-cicd · 2026-03-31T16:56:45Z

PR_Github #40984 [ run ] triggered by Bot. Commit: 780e860 Link to invocation

coderabbitai · 2026-03-31T17:03:34Z

📝 Walkthrough

Walkthrough

This pull request introduces wheel-specific Docker image support across the Jenkins pipeline and test infrastructure. A new LLM_SBSA_WHEEL_DOCKER_IMAGE property is defined, threaded through the MR pipeline to SBSA test stages as a wheelDockerImage parameter, and consumed by test scripts. The test build logic is extended to extract local version metadata from image tags and pass it through to wheel selection and validation.

Changes

Cohort / File(s)	Summary
SBSA Image Configuration & Parameter Wiring `jenkins/current_image_tags.properties`, `jenkins/L0_MergeRequest.groovy`	Added `LLM_SBSA_WHEEL_DOCKER_IMAGE` property and extended `getContainerURIs()` to populate it. Updated SBSA test stage parameterization to pass `wheelDockerImage` from the new property.
AArch64 Test Build Logic `jenkins/L0_Test.groovy`	Extended `runLLMBuild` and `checkPipInstall` signatures to accept and use `version_local` parameter. Added logic to extract PyTorch version from `LLM_DOCKER_IMAGE` tag and format it as `version_local` (e.g., `ngcpytorch2602`). Updated AArch64 wheel path composition to nest under `cpu_arch/`. Switched AArch64 image source from `LLM_DOCKER_IMAGE` to `LLM_WHEEL_DOCKER_IMAGE` for PY312-UB2404 case.
Wheel Selection & Validation `tests/unittest/test_pip_install.py`	Enhanced `get_wheel_url` to accept `version_local` and `cpython_version` parameters. Replaced simple `.whl` name filtering with `packaging.utils.parse_wheel_filename` parsing. Added stricter matching: distribution name validation, local version metadata comparison against `version_local`, and ABI/interpreter tag matching. Extended CLI with `--version_local` and `--cpython_version` arguments.

Sequence Diagram(s)

sequenceDiagram
    participant Jenkins as Jenkins MR<br/>Pipeline
    participant L0Test as L0_Test.groovy
    participant DockerImage as Docker Image<br/>Tag
    participant Build as runLLMBuild
    participant TestUtil as test_pip_install.py
    
    Jenkins->>L0Test: Pass wheelDockerImage
    L0Test->>DockerImage: Extract PyTorch version<br/>(e.g., pytorch-26.02)
    DockerImage-->>L0Test: Version components
    L0Test->>L0Test: Format version_local<br/>(e.g., ngcpytorch2602)
    L0Test->>Build: Call runLLMBuild<br/>(cpu_arch, version_local)
    Build->>Build: Mutate tensorrt_llm/version.py<br/>Add +version_local suffix
    Build-->>L0Test: Wheel built
    L0Test->>TestUtil: Call test_pip_install.py<br/>(wheel_path, version_local)
    TestUtil->>TestUtil: Parse wheel filename<br/>Match local version metadata
    TestUtil-->>L0Test: Wheel validation passed

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	PR description is entirely empty, containing only the template placeholder comments with no actual content filled in.	Fill out the Description section explaining what changes were made and why. Provide Test Coverage details. Complete the PR Checklist with actual checkmarks and ensure all sections are addressed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main infrastructure change: adding a new Docker image variable (LLM_SBSA_WHEEL_DOCKER_IMAGE) and directly aligns with the code modifications in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unittest/test_pip_install.py (1)

104-140: ⚠️ Potential issue | 🟡 Minor

Use any() to iterate over frozenset of tags and catch specific exceptions.

parse_wheel_filename returns tags as a frozenset[Tag], not a single Tag. The unpacking (tag, ) = tags fails on wheels with multiple valid tag combinations. Additionally, catch ValueError specifically instead of broad Exception per coding guidelines.

♻️ Proposed fix

         try:
             from packaging.utils import parse_wheel_filename
-            name, ver, build, tags = parse_wheel_filename(filename)
-        except Exception as e:
+            name, ver, _build, tags = parse_wheel_filename(filename)
+        except ValueError as e:
             print(f"error: {e}")
             continue
-        (tag, ) = tags
-        if tag.abi != cpython_version or tag.interpreter != cpython_version:
+        if not any(tag.abi == cpython_version and tag.interpreter == cpython_version
+                   for tag in tags):
             continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/test_pip_install.py` around lines 104 - 140, The get_wheel_url
helper incorrectly assumes parse_wheel_filename returns a single Tag and catches
all exceptions; change the except to catch ValueError specifically for
parse_wheel_filename, and replace the unpacking "(tag, ) = tags" with an
any(...) check that iterates over the frozenset tags to see if any tag has
tag.abi == cpython_version and tag.interpreter == cpython_version (use
any(tag.abi == cpython_version and tag.interpreter == cpython_version for tag in
tags)); keep the surrounding name/version filtering logic intact in the
get_wheel_url function.

🧹 Nitpick comments (1)

jenkins/L0_Test.groovy (1)

3549-3558: Use an explicit DLFW flag here.

values[4] is still a path fragment ("" / "dlfw/"), but this block now relies on its truthiness to decide whether to mint a local-version wheel. An explicit boolean or exact comparison would make this much harder to misread.

♻️ Low-friction cleanup

-            def isDlfw = values[4]
+            def wheelVariant = values[4]
             def versionLocal = ""
-            if (isDlfw) {
+            if (wheelVariant == "dlfw/") {
                 // Extract PyTorch version from LLM_DOCKER_IMAGE. e.g. pytorch-25.12 -> 2512
                 def matcher = LLM_DOCKER_IMAGE =~ /:pytorch-(\d+)\.(\d+)-/
                 if (!matcher) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@jenkins/L0_Test.groovy` around lines 3549 - 3558, The current check uses def
isDlfw = values[4] (a path fragment like "" or "dlfw/") and relies on its
truthiness to decide creating versionLocal; change this to an explicit boolean
or comparison: compute a boolean flag (e.g., def isDlfw = values[4] == 'dlfw/'
or def isDlfw = values[4].trim() == 'dlfw') and use that in the if (isDlfw)
block, leaving the extraction logic that uses LLM_DOCKER_IMAGE and assignment to
versionLocal unchanged (referencing isDlfw, values[4], LLM_DOCKER_IMAGE,
versionLocal).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@jenkins/L0_Test.groovy`:
- Line 41: LLM_WHEEL_DOCKER_IMAGE is being set directly from
env.wheelDockerImage which makes it required; change the assignment to fall back
to the existing docker image variable when wheelDockerImage is empty (e.g., set
LLM_WHEEL_DOCKER_IMAGE = env.wheelDockerImage ?: env.dockerImage) so older
callers that only provide dockerImage still get a valid image; update the
assignment referencing LLM_WHEEL_DOCKER_IMAGE, env.wheelDockerImage, and
env.dockerImage.

---

Outside diff comments:
In `@tests/unittest/test_pip_install.py`:
- Around line 104-140: The get_wheel_url helper incorrectly assumes
parse_wheel_filename returns a single Tag and catches all exceptions; change the
except to catch ValueError specifically for parse_wheel_filename, and replace
the unpacking "(tag, ) = tags" with an any(...) check that iterates over the
frozenset tags to see if any tag has tag.abi == cpython_version and
tag.interpreter == cpython_version (use any(tag.abi == cpython_version and
tag.interpreter == cpython_version for tag in tags)); keep the surrounding
name/version filtering logic intact in the get_wheel_url function.

---

Nitpick comments:
In `@jenkins/L0_Test.groovy`:
- Around line 3549-3558: The current check uses def isDlfw = values[4] (a path
fragment like "" or "dlfw/") and relies on its truthiness to decide creating
versionLocal; change this to an explicit boolean or comparison: compute a
boolean flag (e.g., def isDlfw = values[4] == 'dlfw/' or def isDlfw =
values[4].trim() == 'dlfw') and use that in the if (isDlfw) block, leaving the
extraction logic that uses LLM_DOCKER_IMAGE and assignment to versionLocal
unchanged (referencing isDlfw, values[4], LLM_DOCKER_IMAGE, versionLocal).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 32975d44-6396-47ca-98d2-7a8d0875e035

📥 Commits

Reviewing files that changed from the base of the PR and between 70e8608 and 780e860.

📒 Files selected for processing (4)

jenkins/L0_MergeRequest.groovy
jenkins/L0_Test.groovy
jenkins/current_image_tags.properties
tests/unittest/test_pip_install.py

tensorrt-cicd · 2026-03-31T20:51:56Z

PR_Github #40984 [ run ] completed with state FAILURE. Commit: 780e860
/LLM/main/L0_MergeRequest_PR pipeline #31966 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

niukuo · 2026-04-30T09:20:01Z

/bot run --stage-list 'Build-Docker-Images'

tensorrt-cicd · 2026-04-30T09:27:01Z

PR_Github #46352 [ run ] triggered by Bot. Commit: a73c2bb Link to invocation

tensorrt-cicd · 2026-04-30T18:50:15Z

PR_Github #46352 [ run ] completed with state FAILURE. Commit: a73c2bb
/LLM/main/L0_MergeRequest_PR pipeline #36440 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

niukuo · 2026-05-11T06:58:40Z

/bot run --stage-list 'Build-Docker-Images'

tensorrt-cicd · 2026-05-11T07:04:00Z

PR_Github #47679 [ run ] triggered by Bot. Commit: a289ee7 Link to invocation

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

niukuo · 2026-05-11T08:36:19Z

/bot run --stage-list 'Build-Docker-Images'

tensorrt-cicd · 2026-05-11T08:41:36Z

PR_Github #47699 [ run ] triggered by Bot. Commit: 5aa0240 Link to invocation

niukuo · 2026-05-11T10:13:52Z

/bot run --skip-test --disable-fail-fast

tensorrt-cicd · 2026-05-11T10:19:51Z

PR_Github #47725 [ run ] triggered by Bot. Commit: 5aa0240 Link to invocation

tensorrt-cicd · 2026-05-11T22:55:56Z

PR_Github #47725 [ run ] completed with state SUCCESS. Commit: 5aa0240
/LLM/main/L0_MergeRequest_PR pipeline #37621 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tensorrt-cicd · 2026-05-12T01:23:59Z

PR_Github #47819 [ run ] triggered by Bot. Commit: 5aa0240 Link to invocation

Updated installation instructions for TensorRT LLM to include the latest version 1.3.0rc16 and clarify usage with the NGC PyTorch container. Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

tensorrt-cicd · 2026-05-12T13:39:52Z

PR_Github #47819 [ run ] completed with state FAILURE. Commit: 5aa0240
/LLM/main/L0_MergeRequest_PR pipeline #37707 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

tensorrt-cicd · 2026-05-12T16:26:00Z

PR_Github #47991 [ run ] triggered by Bot. Commit: 07545b4 Link to invocation

tensorrt-cicd · 2026-05-12T16:56:51Z

PR_Github #47998 [ run ] triggered by Bot. Commit: 07545b4 Link to invocation

tensorrt-cicd · 2026-05-13T06:24:32Z

PR_Github #47998 [ run ] completed with state SUCCESS. Commit: 07545b4
/LLM/main/L0_MergeRequest_PR pipeline #37835 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tensorrt-cicd · 2026-05-13T13:28:51Z

PR_Github #48180 [ run ] triggered by Bot. Commit: 07545b4 Link to invocation

tensorrt-cicd · 2026-05-13T21:09:03Z

PR_Github #48180 [ run ] completed with state SUCCESS. Commit: 07545b4
/LLM/main/L0_MergeRequest_PR pipeline #38000 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

niukuo · 2026-05-13T23:35:51Z

/bot skip --comment "sanity check passed"

tensorrt-cicd · 2026-05-13T23:42:02Z

PR_Github #48247 [ skip ] triggered by Bot. Commit: 07545b4 Link to invocation

tensorrt-cicd · 2026-05-13T23:49:19Z

PR_Github #48247 [ skip ] completed with state SUCCESS. Commit: 07545b4
Skipping testing for commit 07545b4

Link to invocation

niukuo requested review from a team as code owners March 31, 2026 16:50

niukuo requested review from dpitman-nvda and tburt-nv March 31, 2026 16:50

github-actions Bot assigned niukuo Mar 31, 2026

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread jenkins/L0_Test.groovy

dpitman-nvda requested changes Apr 1, 2026

View reviewed changes

Comment thread jenkins/L0_Test.groovy Outdated

niukuo force-pushed the dlfw_image branch from 780e860 to b59fd5f Compare April 30, 2026 08:54

niukuo requested a review from a team as a code owner April 30, 2026 08:54

niukuo requested review from arysef and nv-guomingz April 30, 2026 08:54

niukuo force-pushed the dlfw_image branch from b59fd5f to a73c2bb Compare April 30, 2026 09:18

[TRTLLM-10804][infra] add LLM_SBSA_WHEEL_DOCKER_IMAGE

a289ee7

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

niukuo force-pushed the dlfw_image branch from a73c2bb to a289ee7 Compare May 11, 2026 06:58

[None][infra] L0_MergeRequest pass default tag to BuildDockerImages

5aa0240

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

niukuo force-pushed the dlfw_image branch from 10a1f85 to 5aa0240 Compare May 11, 2026 08:36

dpitman-nvda approved these changes May 11, 2026

View reviewed changes

laikhtewari approved these changes May 12, 2026

View reviewed changes

Update TensorRT LLM installation instructions

b12724e

Updated installation instructions for TensorRT LLM to include the latest version 1.3.0rc16 and clarify usage with the NGC PyTorch container. Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

niukuo force-pushed the dlfw_image branch from 065684b to b12724e Compare May 12, 2026 02:35

fix: update base image version

07545b4

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>

niukuo requested a review from a team as a code owner May 12, 2026 16:22

niukuo force-pushed the dlfw_image branch from 18b8e20 to 07545b4 Compare May 12, 2026 16:22

niukuo merged commit fc60281 into NVIDIA:main May 13, 2026
6 checks passed

Conversation

niukuo commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

niukuo commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

Uh oh!

niukuo commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

niukuo commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

niukuo commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

niukuo commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

niukuo commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

niukuo commented Mar 31, 2026 •

edited

Loading

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading