[TRI-978][fix] Fix streaming=None crash and CI test failures in L0_backend_trtllm#13308
[TRI-978][fix] Fix streaming=None crash and CI test failures in L0_backend_trtllm#13308mc-nv wants to merge 2 commits into
Conversation
…tion_tests (#6) * fix(test): pass explicit streaming=False in base_metrics_verification_tests tensorrt_llm.bindings.executor.Request.__init__() no longer accepts streaming=None; the test was omitting the streaming tensor, causing get_input_scalar_by_name() to return None and the executor to crash. Relates-to: TRI-978 * fix(trtllm): coerce streaming=None to False; fix ensemble input name in test - model.py: guard against streaming=None from get_input_scalar_by_name by coercing to bool (None → False). Mirrors NVIDIA#13276. - base_metrics_verification_tests.py: rename tensor "streaming" → "stream" to match the ensemble model's declared external input (ensemble maps "stream" → "streaming" internally; passing "streaming" directly caused a 400 rejection). Relates-to: TRI-978 * fix(test): fix malformed assertTrue chained comparison in custom_metrics_verification_tests The assertion was split by a comma, making the upper-bound check (difference <= 1s) a message arg rather than part of the condition. Only the lower bound (-1s <= difference) was actually verified, causing failures on B200 SBSA where the log timestamp precedes the metrics timestamp by more than 1s. Relates-to: TRI-978 * fix(test): use fromtimestamp instead of utcfromtimestamp in custom_metrics_verification_tests The server writes log timestamps in local time, but dt_curl was computed via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp() so both dt_log and dt_curl are in the same local timezone. Relates-to: TRI-978 * chore: update copyright year to 2024-2026 in modified test files Relates-to: TRI-978 * fix(metrics): set tm_isdst=-1 in convertTimestampToMicroseconds to fix DST timezone offset std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized). When mktime() is called on a runner in a DST-observing timezone, it treats the parsed local time as non-DST time, producing a UTC timestamp that is off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically, matching the behavior of localtime() used in getCurrentTimestamp(). * fix(bench): pass explicit streaming tensor in benchmark_core_model.py TRT-LLM executor now requires streaming to be an explicit bool. Without it, model.py receives streaming=None causing a TypeError crash. Use FLAGS.decoupled to determine the streaming value (True for decoupled, False for standard synchronous inference). * fix(ci): switch benchmark_core_model to grpc to avoid gevent segfault on aarch64
📝 WalkthroughWalkthroughThese changes normalize streaming input handling across the model and test infrastructure by explicitly converting and validating streaming parameters as booleans, update test configurations to pass streaming inputs, improve timestamp timezone and daylight saving time interpretation, and add explicit protocol specification to benchmarking tool invocations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc (1)
78-78: Use a named constant for the DST sentinel value.
tm.tm_isdst = -1is functionally correct, but-1is a magic literal in assignment context. Please use a named constant.♻️ Proposed refactor
- tm.tm_isdst = -1; // Let mktime determine DST to match localtime() used in getCurrentTimestamp + int const kAutoDetermineDst = -1; + tm.tm_isdst = kAutoDetermineDst; // Let mktime determine DST to match localtime() used in getCurrentTimestampAs per coding guidelines "Except for
0,nullptr,true, andfalse, all other literal values in C++ should only be used for variable initialization; use named constants instead."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc` at line 78, Replace the magic literal -1 used to set tm.tm_isdst with a named constant (e.g., constexpr int kDstUnknown = -1) and use that constant in the assignment inside the getCurrentTimestamp / custom metrics timestamp construction (the tm.tm_isdst = ... line); ensure the constant has a clear name and a short comment indicating it signals "let mktime determine DST" so the assignment reads tm.tm_isdst = kDstUnknown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@triton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cc`:
- Line 78: Replace the magic literal -1 used to set tm.tm_isdst with a named
constant (e.g., constexpr int kDstUnknown = -1) and use that constant in the
assignment inside the getCurrentTimestamp / custom metrics timestamp
construction (the tm.tm_isdst = ... line); ensure the constant has a clear name
and a short comment indicating it signals "let mktime determine DST" so the
assignment reads tm.tm_isdst = kDstUnknown.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 15632072-155e-4898-8ac9-ac957a736f05
📒 Files selected for processing (6)
triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.pytriton_backend/ci/L0_backend_trtllm/base_metrics_verification_tests.pytriton_backend/ci/L0_backend_trtllm/custom_metrics_verification_tests.pytriton_backend/ci/L0_backend_trtllm/test.shtriton_backend/inflight_batcher_llm/src/custom_metrics_reporter/custom_metrics_reporter.cctriton_backend/tools/inflight_batcher_llm/benchmark_core_model.py
| std::tm tm = {}; | ||
| std::stringstream ss(ts); | ||
| ss >> std::get_time(&tm, "%m-%d-%Y %H:%M:%S"); | ||
| tm.tm_isdst = -1; // Let mktime determine DST to match localtime() used in getCurrentTimestamp |
There was a problem hiding this comment.
C++ guidelines require nontrivial literals to be named constants. Use a small function-scope constant, for example int const kAutoDetermineDst = -1; then assign that here.
| utils.prepare_tensor("max_tokens", output0_len, "http"), | ||
| utils.prepare_tensor("bad_words", bad_words_list, "http"), | ||
| utils.prepare_tensor("stop_words", stop_words_list, "http"), | ||
| utils.prepare_tensor("stream", np.array([[False]], dtype=bool), |
There was a problem hiding this comment.
Should this be streaming?
There was a problem hiding this comment.
Summary
Fixes multiple CI failures in
L0_backend_trtllmcaused by the executor requiring an explicitstreamingbool.model.py: coercestreaming=None→Falsewhenget_input_scalar_by_name()returnsNone(mirrors [TRI-966] [fix] Fix L0_backend_trtllm #13276)base_metrics_verification_tests.py: pass explicitstreaming=Falsetensor; rename"streaming"→"stream"to match the ensemble model's declared external inputcustom_metrics_verification_tests.py: fix malformedassertTruechained comparison; usefromtimestampinstead ofutcfromtimestampto avoid 8-hour offset on UTC-offset runnerscustom_metrics_reporter.cc: settm_isdst=-1beforemktime()so DST is auto-determined, matchinglocaltime()used ingetCurrentTimestamp()benchmark_core_model.py: pass explicitstreamingtensor; switch to gRPC protocol to avoid gevent segfault on aarch64 during Python exitRelated: #13276
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests