Skip to content

[TRI-978][fix] Fix streaming=None crash and CI test failures in L0_backend_trtllm#13308

Open
mc-nv wants to merge 2 commits into
NVIDIA:mainfrom
triton-inference-server:mchornyi/TRI-978/fix-test-nvidia-main
Open

[TRI-978][fix] Fix streaming=None crash and CI test failures in L0_backend_trtllm#13308
mc-nv wants to merge 2 commits into
NVIDIA:mainfrom
triton-inference-server:mchornyi/TRI-978/fix-test-nvidia-main

Conversation

@mc-nv
Copy link
Copy Markdown

@mc-nv mc-nv commented Apr 22, 2026

Summary

Fixes multiple CI failures in L0_backend_trtllm caused by the executor requiring an explicit streaming bool.

  • model.py: coerce streaming=NoneFalse when get_input_scalar_by_name() returns None (mirrors [TRI-966] [fix] Fix L0_backend_trtllm #13276)
  • base_metrics_verification_tests.py: pass explicit streaming=False tensor; rename "streaming""stream" to match the ensemble model's declared external input
  • custom_metrics_verification_tests.py: fix malformed assertTrue chained comparison; use fromtimestamp instead of utcfromtimestamp to avoid 8-hour offset on UTC-offset runners
  • custom_metrics_reporter.cc: set tm_isdst=-1 before mktime() so DST is auto-determined, matching localtime() used in getCurrentTimestamp()
  • benchmark_core_model.py: pass explicit streaming tensor; switch to gRPC protocol to avoid gevent segfault on aarch64 during Python exit

Related: #13276

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved streaming parameter handling with proper boolean normalization
    • Enhanced timestamp conversion accuracy for daylight saving time compatibility
  • Tests

    • Updated test infrastructure to include streaming parameter support in inference requests
    • Added explicit protocol specification to benchmarking tools

…tion_tests (#6)

* fix(test): pass explicit streaming=False in base_metrics_verification_tests

tensorrt_llm.bindings.executor.Request.__init__() no longer accepts
streaming=None; the test was omitting the streaming tensor, causing
get_input_scalar_by_name() to return None and the executor to crash.

Relates-to: TRI-978

* fix(trtllm): coerce streaming=None to False; fix ensemble input name in test

- model.py: guard against streaming=None from get_input_scalar_by_name
  by coercing to bool (None → False). Mirrors NVIDIA#13276.
- base_metrics_verification_tests.py: rename tensor "streaming" → "stream"
  to match the ensemble model's declared external input (ensemble maps
  "stream" → "streaming" internally; passing "streaming" directly caused
  a 400 rejection).

Relates-to: TRI-978

* fix(test): fix malformed assertTrue chained comparison in custom_metrics_verification_tests

The assertion was split by a comma, making the upper-bound check
(difference <= 1s) a message arg rather than part of the condition.
Only the lower bound (-1s <= difference) was actually verified, causing
failures on B200 SBSA where the log timestamp precedes the metrics
timestamp by more than 1s.

Relates-to: TRI-978

* fix(test): use fromtimestamp instead of utcfromtimestamp in custom_metrics_verification_tests

The server writes log timestamps in local time, but dt_curl was computed
via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes
an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp()
so both dt_log and dt_curl are in the same local timezone.

Relates-to: TRI-978

* chore: update copyright year to 2024-2026 in modified test files

Relates-to: TRI-978

* fix(metrics): set tm_isdst=-1 in convertTimestampToMicroseconds to fix DST timezone offset

std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized).
When mktime() is called on a runner in a DST-observing timezone, it treats
the parsed local time as non-DST time, producing a UTC timestamp that is
off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically,
matching the behavior of localtime() used in getCurrentTimestamp().

* fix(bench): pass explicit streaming tensor in benchmark_core_model.py

TRT-LLM executor now requires streaming to be an explicit bool.
Without it, model.py receives streaming=None causing a TypeError crash.
Use FLAGS.decoupled to determine the streaming value (True for decoupled,
False for standard synchronous inference).

* fix(ci): switch benchmark_core_model to grpc to avoid gevent segfault on aarch64
@mc-nv mc-nv marked this pull request as ready for review April 22, 2026 01:47
@mc-nv mc-nv requested a review from a team as a code owner April 22, 2026 01:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

These changes normalize streaming input handling across the model and test infrastructure by explicitly converting and validating streaming parameters as booleans, update test configurations to pass streaming inputs, improve timestamp timezone and daylight saving time interpretation, and add explicit protocol specification to benchmarking tool invocations.

Changes

Cohort / File(s) Summary
Streaming Input Normalization
triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py
convert_request() now converts streaming input to Python bool, with explicit False default for None values, normalizing the type before downstream control flow checks.
Test Infrastructure Updates
triton_backend/ci/L0_backend_trtllm/base_metrics_verification_tests.py, triton_backend/ci/L0_backend_trtllm/test.sh
Updated copyright year to 2024-2026, modified inference requests to include "stream" tensor input with boolean False value, and added explicit --protocol grpc flag to benchmark tool invocation.
Timestamp & DST Handling
triton_backend/ci/L0_backend_trtllm/custom_metrics_verification_tests.py, triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc
Changed timestamp conversion from UTC-based (utcfromtimestamp) to local time (fromtimestamp), updated assertion format with explicit error message, and set tm_isdst = -1 to enable automatic DST determination.
Benchmark Tool Enhancement
triton_backend/tools/inflight_batcher_llm/benchmark_core_model.py
Added streaming tensor input to inference requests in both warm-up and main benchmarking loops, mirroring the --decoupled flag value as a boolean tensor.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main issue (streaming=None crash) and the scope (CI test failures in L0_backend_trtllm), with proper JIRA ticket and fix type designation.
Description check ✅ Passed The PR description provides a clear summary of all changes with specific file-by-file breakdown, explains the rationale for each change, and relates it to the underlying issue.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc (1)

78-78: Use a named constant for the DST sentinel value.

tm.tm_isdst = -1 is functionally correct, but -1 is a magic literal in assignment context. Please use a named constant.

♻️ Proposed refactor
-    tm.tm_isdst = -1; // Let mktime determine DST to match localtime() used in getCurrentTimestamp
+    int const kAutoDetermineDst = -1;
+    tm.tm_isdst = kAutoDetermineDst; // Let mktime determine DST to match localtime() used in getCurrentTimestamp

As per coding guidelines "Except for 0, nullptr, true, and false, all other literal values in C++ should only be used for variable initialization; use named constants instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc`
at line 78, Replace the magic literal -1 used to set tm.tm_isdst with a named
constant (e.g., constexpr int kDstUnknown = -1) and use that constant in the
assignment inside the getCurrentTimestamp / custom metrics timestamp
construction (the tm.tm_isdst = ... line); ensure the constant has a clear name
and a short comment indicating it signals "let mktime determine DST" so the
assignment reads tm.tm_isdst = kDstUnknown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc`:
- Line 78: Replace the magic literal -1 used to set tm.tm_isdst with a named
constant (e.g., constexpr int kDstUnknown = -1) and use that constant in the
assignment inside the getCurrentTimestamp / custom metrics timestamp
construction (the tm.tm_isdst = ... line); ensure the constant has a clear name
and a short comment indicating it signals "let mktime determine DST" so the
assignment reads tm.tm_isdst = kDstUnknown.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 15632072-155e-4898-8ac9-ac957a736f05

📥 Commits

Reviewing files that changed from the base of the PR and between f689e60 and 0d5352f.

📒 Files selected for processing (6)
  • triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py
  • triton_backend/ci/L0_backend_trtllm/base_metrics_verification_tests.py
  • triton_backend/ci/L0_backend_trtllm/custom_metrics_verification_tests.py
  • triton_backend/ci/L0_backend_trtllm/test.sh
  • triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc
  • triton_backend/tools/inflight_batcher_llm/benchmark_core_model.py

std::tm tm = {};
std::stringstream ss(ts);
ss >> std::get_time(&tm, "%m-%d-%Y %H:%M:%S");
tm.tm_isdst = -1; // Let mktime determine DST to match localtime() used in getCurrentTimestamp
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ guidelines require nontrivial literals to be named constants. Use a small function-scope constant, for example int const kAutoDetermineDst = -1; then assign that here.

utils.prepare_tensor("max_tokens", output0_len, "http"),
utils.prepare_tensor("bad_words", bad_words_list, "http"),
utils.prepare_tensor("stop_words", stop_words_list, "http"),
utils.prepare_tensor("stream", np.array([[False]], dtype=bool),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be streaming?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants