Skip to content

[TRTLLM-10804][infra] add LLM_SBSA_WHEEL_DOCKER_IMAGE#12635

Merged
niukuo merged 4 commits into
NVIDIA:mainfrom
niukuo:dlfw_image
May 13, 2026
Merged

[TRTLLM-10804][infra] add LLM_SBSA_WHEEL_DOCKER_IMAGE#12635
niukuo merged 4 commits into
NVIDIA:mainfrom
niukuo:dlfw_image

Conversation

@niukuo
Copy link
Copy Markdown
Collaborator

@niukuo niukuo commented Mar 31, 2026

Summary by CodeRabbit

  • New Features

    • Added support for building and testing wheels on ARM64 (SBSA) architecture.
    • Improved wheel versioning with local version metadata support.
  • Tests

    • Enhanced wheel validation with stricter file parsing and compatibility checks.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@niukuo niukuo requested review from a team as code owners March 31, 2026 16:50
@niukuo niukuo requested review from dpitman-nvda and tburt-nv March 31, 2026 16:50
@niukuo
Copy link
Copy Markdown
Collaborator Author

niukuo commented Mar 31, 2026

/bot run --skip-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40984 [ run ] triggered by Bot. Commit: 780e860 Link to invocation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

This pull request introduces wheel-specific Docker image support across the Jenkins pipeline and test infrastructure. A new LLM_SBSA_WHEEL_DOCKER_IMAGE property is defined, threaded through the MR pipeline to SBSA test stages as a wheelDockerImage parameter, and consumed by test scripts. The test build logic is extended to extract local version metadata from image tags and pass it through to wheel selection and validation.

Changes

Cohort / File(s) Summary
SBSA Image Configuration & Parameter Wiring
jenkins/current_image_tags.properties, jenkins/L0_MergeRequest.groovy
Added LLM_SBSA_WHEEL_DOCKER_IMAGE property and extended getContainerURIs() to populate it. Updated SBSA test stage parameterization to pass wheelDockerImage from the new property.
AArch64 Test Build Logic
jenkins/L0_Test.groovy
Extended runLLMBuild and checkPipInstall signatures to accept and use version_local parameter. Added logic to extract PyTorch version from LLM_DOCKER_IMAGE tag and format it as version_local (e.g., ngcpytorch2602). Updated AArch64 wheel path composition to nest under cpu_arch/. Switched AArch64 image source from LLM_DOCKER_IMAGE to LLM_WHEEL_DOCKER_IMAGE for PY312-UB2404 case.
Wheel Selection & Validation
tests/unittest/test_pip_install.py
Enhanced get_wheel_url to accept version_local and cpython_version parameters. Replaced simple .whl name filtering with packaging.utils.parse_wheel_filename parsing. Added stricter matching: distribution name validation, local version metadata comparison against version_local, and ABI/interpreter tag matching. Extended CLI with --version_local and --cpython_version arguments.

Sequence Diagram(s)

sequenceDiagram
    participant Jenkins as Jenkins MR<br/>Pipeline
    participant L0Test as L0_Test.groovy
    participant DockerImage as Docker Image<br/>Tag
    participant Build as runLLMBuild
    participant TestUtil as test_pip_install.py
    
    Jenkins->>L0Test: Pass wheelDockerImage
    L0Test->>DockerImage: Extract PyTorch version<br/>(e.g., pytorch-26.02)
    DockerImage-->>L0Test: Version components
    L0Test->>L0Test: Format version_local<br/>(e.g., ngcpytorch2602)
    L0Test->>Build: Call runLLMBuild<br/>(cpu_arch, version_local)
    Build->>Build: Mutate tensorrt_llm/version.py<br/>Add +version_local suffix
    Build-->>L0Test: Wheel built
    L0Test->>TestUtil: Call test_pip_install.py<br/>(wheel_path, version_local)
    TestUtil->>TestUtil: Parse wheel filename<br/>Match local version metadata
    TestUtil-->>L0Test: Wheel validation passed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning PR description is entirely empty, containing only the template placeholder comments with no actual content filled in. Fill out the Description section explaining what changes were made and why. Provide Test Coverage details. Complete the PR Checklist with actual checkmarks and ensure all sections are addressed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main infrastructure change: adding a new Docker image variable (LLM_SBSA_WHEEL_DOCKER_IMAGE) and directly aligns with the code modifications in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unittest/test_pip_install.py (1)

104-140: ⚠️ Potential issue | 🟡 Minor

Use any() to iterate over frozenset of tags and catch specific exceptions.

parse_wheel_filename returns tags as a frozenset[Tag], not a single Tag. The unpacking (tag, ) = tags fails on wheels with multiple valid tag combinations. Additionally, catch ValueError specifically instead of broad Exception per coding guidelines.

♻️ Proposed fix
         try:
             from packaging.utils import parse_wheel_filename
-            name, ver, build, tags = parse_wheel_filename(filename)
-        except Exception as e:
+            name, ver, _build, tags = parse_wheel_filename(filename)
+        except ValueError as e:
             print(f"error: {e}")
             continue
-        (tag, ) = tags
-        if tag.abi != cpython_version or tag.interpreter != cpython_version:
+        if not any(tag.abi == cpython_version and tag.interpreter == cpython_version
+                   for tag in tags):
             continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/test_pip_install.py` around lines 104 - 140, The get_wheel_url
helper incorrectly assumes parse_wheel_filename returns a single Tag and catches
all exceptions; change the except to catch ValueError specifically for
parse_wheel_filename, and replace the unpacking "(tag, ) = tags" with an
any(...) check that iterates over the frozenset tags to see if any tag has
tag.abi == cpython_version and tag.interpreter == cpython_version (use
any(tag.abi == cpython_version and tag.interpreter == cpython_version for tag in
tags)); keep the surrounding name/version filtering logic intact in the
get_wheel_url function.
🧹 Nitpick comments (1)
jenkins/L0_Test.groovy (1)

3549-3558: Use an explicit DLFW flag here.

values[4] is still a path fragment ("" / "dlfw/"), but this block now relies on its truthiness to decide whether to mint a local-version wheel. An explicit boolean or exact comparison would make this much harder to misread.

♻️ Low-friction cleanup
-            def isDlfw = values[4]
+            def wheelVariant = values[4]
             def versionLocal = ""
-            if (isDlfw) {
+            if (wheelVariant == "dlfw/") {
                 // Extract PyTorch version from LLM_DOCKER_IMAGE. e.g. pytorch-25.12 -> 2512
                 def matcher = LLM_DOCKER_IMAGE =~ /:pytorch-(\d+)\.(\d+)-/
                 if (!matcher) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/L0_Test.groovy` around lines 3549 - 3558, The current check uses def
isDlfw = values[4] (a path fragment like "" or "dlfw/") and relies on its
truthiness to decide creating versionLocal; change this to an explicit boolean
or comparison: compute a boolean flag (e.g., def isDlfw = values[4] == 'dlfw/'
or def isDlfw = values[4].trim() == 'dlfw') and use that in the if (isDlfw)
block, leaving the extraction logic that uses LLM_DOCKER_IMAGE and assignment to
versionLocal unchanged (referencing isDlfw, values[4], LLM_DOCKER_IMAGE,
versionLocal).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@jenkins/L0_Test.groovy`:
- Line 41: LLM_WHEEL_DOCKER_IMAGE is being set directly from
env.wheelDockerImage which makes it required; change the assignment to fall back
to the existing docker image variable when wheelDockerImage is empty (e.g., set
LLM_WHEEL_DOCKER_IMAGE = env.wheelDockerImage ?: env.dockerImage) so older
callers that only provide dockerImage still get a valid image; update the
assignment referencing LLM_WHEEL_DOCKER_IMAGE, env.wheelDockerImage, and
env.dockerImage.

---

Outside diff comments:
In `@tests/unittest/test_pip_install.py`:
- Around line 104-140: The get_wheel_url helper incorrectly assumes
parse_wheel_filename returns a single Tag and catches all exceptions; change the
except to catch ValueError specifically for parse_wheel_filename, and replace
the unpacking "(tag, ) = tags" with an any(...) check that iterates over the
frozenset tags to see if any tag has tag.abi == cpython_version and
tag.interpreter == cpython_version (use any(tag.abi == cpython_version and
tag.interpreter == cpython_version for tag in tags)); keep the surrounding
name/version filtering logic intact in the get_wheel_url function.

---

Nitpick comments:
In `@jenkins/L0_Test.groovy`:
- Around line 3549-3558: The current check uses def isDlfw = values[4] (a path
fragment like "" or "dlfw/") and relies on its truthiness to decide creating
versionLocal; change this to an explicit boolean or comparison: compute a
boolean flag (e.g., def isDlfw = values[4] == 'dlfw/' or def isDlfw =
values[4].trim() == 'dlfw') and use that in the if (isDlfw) block, leaving the
extraction logic that uses LLM_DOCKER_IMAGE and assignment to versionLocal
unchanged (referencing isDlfw, values[4], LLM_DOCKER_IMAGE, versionLocal).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 32975d44-6396-47ca-98d2-7a8d0875e035

📥 Commits

Reviewing files that changed from the base of the PR and between 70e8608 and 780e860.

📒 Files selected for processing (4)
  • jenkins/L0_MergeRequest.groovy
  • jenkins/L0_Test.groovy
  • jenkins/current_image_tags.properties
  • tests/unittest/test_pip_install.py

Comment thread jenkins/L0_Test.groovy
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40984 [ run ] completed with state FAILURE. Commit: 780e860
/LLM/main/L0_MergeRequest_PR pipeline #31966 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Comment thread jenkins/L0_Test.groovy Outdated
@niukuo niukuo requested a review from a team as a code owner April 30, 2026 08:54
@niukuo niukuo requested review from arysef and nv-guomingz April 30, 2026 08:54
@niukuo
Copy link
Copy Markdown
Collaborator Author

niukuo commented Apr 30, 2026

/bot run --stage-list 'Build-Docker-Images'

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46352 [ run ] triggered by Bot. Commit: a73c2bb Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46352 [ run ] completed with state FAILURE. Commit: a73c2bb
/LLM/main/L0_MergeRequest_PR pipeline #36440 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
@niukuo
Copy link
Copy Markdown
Collaborator Author

niukuo commented May 11, 2026

/bot run --stage-list 'Build-Docker-Images'

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47679 [ run ] triggered by Bot. Commit: a289ee7 Link to invocation

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
@niukuo
Copy link
Copy Markdown
Collaborator Author

niukuo commented May 11, 2026

/bot run --stage-list 'Build-Docker-Images'

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47699 [ run ] triggered by Bot. Commit: 5aa0240 Link to invocation

@niukuo
Copy link
Copy Markdown
Collaborator Author

niukuo commented May 11, 2026

/bot run --skip-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47725 [ run ] triggered by Bot. Commit: 5aa0240 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47725 [ run ] completed with state SUCCESS. Commit: 5aa0240
/LLM/main/L0_MergeRequest_PR pipeline #37621 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47819 [ run ] triggered by Bot. Commit: 5aa0240 Link to invocation

Updated installation instructions for TensorRT LLM to include the latest version 1.3.0rc16 and clarify usage with the NGC PyTorch container.

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47819 [ run ] completed with state FAILURE. Commit: 5aa0240
/LLM/main/L0_MergeRequest_PR pipeline #37707 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
@niukuo niukuo requested a review from a team as a code owner May 12, 2026 16:22
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47991 [ run ] triggered by Bot. Commit: 07545b4 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47998 [ run ] triggered by Bot. Commit: 07545b4 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47998 [ run ] completed with state SUCCESS. Commit: 07545b4
/LLM/main/L0_MergeRequest_PR pipeline #37835 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48180 [ run ] triggered by Bot. Commit: 07545b4 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48180 [ run ] completed with state SUCCESS. Commit: 07545b4
/LLM/main/L0_MergeRequest_PR pipeline #38000 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@niukuo
Copy link
Copy Markdown
Collaborator Author

niukuo commented May 13, 2026

/bot skip --comment "sanity check passed"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48247 [ skip ] triggered by Bot. Commit: 07545b4 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48247 [ skip ] completed with state SUCCESS. Commit: 07545b4
Skipping testing for commit 07545b4

Link to invocation

@niukuo niukuo merged commit fc60281 into NVIDIA:main May 13, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants