[NVBUG-6248780][fix] Add --decoupled flag to benchmark_core_model in multi-instance test by karljang · Pull Request #14888 · NVIDIA/TensorRT-LLM

karljang · 2026-06-03T04:55:37Z

Problem

test_llmapi_backend_multi_instance fails with:

tritonclient.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED]
ModelInfer RPC doesn't support models with decoupled transaction policy

Root Cause

The test sets set_llmapi_decoupled_mode(new_model_repo, True) and model_config["triton_config"]["decoupled"] = True, but the benchmark_core_model.py invocation was missing the --decoupled flag. Without it, the benchmark uses client.infer() (unary ModelInfer RPC) which Triton rejects on decoupled models.

This was introduced in #14079 which added decoupled mode to the multi-instance test but missed adding the flag to the benchmark command (while correctly adding it to the parametrized test_llmapi_backend).

Fix

Add --decoupled to the benchmark_core_model.py command, which switches it to use async_stream_infer (bidirectional streaming RPC) — matching the server configuration.

Regression Info

2026-05-25 PASSED (llm20260525_a8cd4fff)
2026-05-30 FAILED (llm20260530_74d7c3ac)

Summary by CodeRabbit

Tests
- Updated test configuration to ensure consistent behavior with decoupled model setup in integration tests.

Note: This release includes internal testing infrastructure improvements. No user-facing changes are present in this update.

karljang · 2026-06-03T04:58:07Z

/bot run --disable-fail-fast

coderabbitai · 2026-06-03T04:58:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 51dd5213-988e-458d-b07e-80978d57333b

📥 Commits

Reviewing files that changed from the base of the PR and between c798fd9 and a382c0d.

📒 Files selected for processing (1)

tests/integration/defs/triton_server/test_triton_llm.py

📝 Walkthrough

Walkthrough

The test_llmapi_backend_multi_instance test is updated to include the --decoupled CLI flag when invoking benchmark_core_model.py, ensuring the benchmark execution mode aligns with the test's decoupled model repository configuration.

Changes

Decoupled mode flag alignment

Layer / File(s)	Summary
Add --decoupled flag to benchmark invocation `tests/integration/defs/triton_server/test_triton_llm.py`	The test_llmapi_backend_multi_instance test adds the `--decoupled` CLI flag to the benchmark_core_model.py command invocation to align the benchmark execution with the test's decoupled model repo setup.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding the --decoupled flag to the benchmark_core_model command in the multi-instance test, which directly addresses the root cause of the test failure.
Description check	✅ Passed	The description provides clear problem statement, root cause analysis, fix explanation, and regression information. However, it lacks a dedicated 'Test Coverage' section explicitly listing relevant tests as required by the template.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-06-03T05:05:20Z

PR_Github #51754 [ run ] triggered by Bot. Commit: a382c0d Link to invocation

tensorrt-cicd · 2026-06-03T08:12:31Z

PR_Github #51754 [ run ] completed with state ABORTED. Commit: a382c0d
/LLM/main/L0_MergeRequest_PR pipeline #41127 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

karljang · 2026-06-03T16:12:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-03T16:20:02Z

PR_Github #51872 [ run ] triggered by Bot. Commit: a382c0d Link to invocation

tensorrt-cicd · 2026-06-03T17:08:18Z

PR_Github #51872 [ run ] completed with state SUCCESS. Commit: a382c0d
/LLM/main/L0_MergeRequest_PR pipeline #41231 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…multi-instance test test_llmapi_backend_multi_instance sets decoupled mode on the Triton model but the benchmark_core_model.py invocation was missing the --decoupled flag. Without it, the benchmark uses client.infer() (unary ModelInfer RPC) which Triton rejects on decoupled models with: 'ModelInfer RPC doesn't support models with decoupled transaction policy' Add the --decoupled flag so benchmark_core_model uses async_stream_infer, matching the server configuration. Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

karljang · 2026-06-03T17:14:30Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-03T17:21:43Z

PR_Github #51882 [ run ] triggered by Bot. Commit: 46506f0 Link to invocation

tensorrt-cicd · 2026-06-03T20:41:16Z

PR_Github #51882 [ run ] completed with state SUCCESS. Commit: 46506f0
/LLM/main/L0_MergeRequest_PR pipeline #41240 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

karljang · 2026-06-03T21:15:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-03T21:21:51Z

PR_Github #51904 [ run ] triggered by Bot. Commit: 46506f0 Link to invocation

tensorrt-cicd · 2026-06-03T22:50:28Z

PR_Github #51904 [ run ] completed with state SUCCESS. Commit: 46506f0
/LLM/main/L0_MergeRequest_PR pipeline #41261 completed with status: 'SUCCESS'

CI Report

Link to invocation

karljang requested a review from a team as a code owner June 3, 2026 04:55

github-actions Bot assigned karljang Jun 3, 2026

karljang force-pushed the fix/triton-multi-instance-decoupled branch from a382c0d to 46506f0 Compare June 3, 2026 17:14

xinhe-nv approved these changes Jun 4, 2026

View reviewed changes

karljang merged commit 222d9e8 into NVIDIA:main Jun 4, 2026
7 checks passed

Conversation

karljang commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Regression Info

Summary by CodeRabbit

Uh oh!

karljang commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

karljang commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

karljang commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

karljang commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karljang commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading