[NVBUG-6248780][fix] Add --decoupled flag to benchmark_core_model in multi-instance test#14888
Conversation
|
/bot run --disable-fail-fast |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe ChangesDecoupled mode flag alignment
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
PR_Github #51754 [ run ] triggered by Bot. Commit: |
|
PR_Github #51754 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #51872 [ run ] triggered by Bot. Commit: |
|
PR_Github #51872 [ run ] completed with state
|
…multi-instance test test_llmapi_backend_multi_instance sets decoupled mode on the Triton model but the benchmark_core_model.py invocation was missing the --decoupled flag. Without it, the benchmark uses client.infer() (unary ModelInfer RPC) which Triton rejects on decoupled models with: 'ModelInfer RPC doesn't support models with decoupled transaction policy' Add the --decoupled flag so benchmark_core_model uses async_stream_infer, matching the server configuration. Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
a382c0d to
46506f0
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #51882 [ run ] triggered by Bot. Commit: |
|
PR_Github #51882 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #51904 [ run ] triggered by Bot. Commit: |
|
PR_Github #51904 [ run ] completed with state |
Problem
test_llmapi_backend_multi_instancefails with:Root Cause
The test sets
set_llmapi_decoupled_mode(new_model_repo, True)andmodel_config["triton_config"]["decoupled"] = True, but thebenchmark_core_model.pyinvocation was missing the--decoupledflag. Without it, the benchmark usesclient.infer()(unary ModelInfer RPC) which Triton rejects on decoupled models.This was introduced in #14079 which added decoupled mode to the multi-instance test but missed adding the flag to the benchmark command (while correctly adding it to the parametrized
test_llmapi_backend).Fix
Add
--decoupledto thebenchmark_core_model.pycommand, which switches it to useasync_stream_infer(bidirectional streaming RPC) — matching the server configuration.Regression Info
Summary by CodeRabbit
Note: This release includes internal testing infrastructure improvements. No user-facing changes are present in this update.