[TRTLLM-11319][feat] VisualGen public output API + bench timing decomposition by zhenhuaw-me · Pull Request #13635 · NVIDIA/TensorRT-LLM

zhenhuaw-me · 2026-04-30T03:03:07Z

Description

Reworks the public output story of tensorrt_llm.visual_gen so callers get a coherent, request-aware return type with measurable engine-side timing — and reworks the offline bench so its latency and engine generation metrics line up with what the serving path does.

Public output API

VisualGenOutput replaces the raw MediaOutput as the user-facing return type. Carries image/video/audio tensors plus request_id, error, frame_rate / audio_sample_rate, and engine-side VisualGenMetrics.
VisualGenResult is a single Future-like awaitable handle that resolves to VisualGenOutput (single prompt) or List[VisualGenOutput] (batch). Batched generate(List[str]) fans out into per-item outputs with per-item error semantics.
VisualGenOutput.save(path) is the single user-facing way to persist a generated tensor to disk (image PNG/JPG/WEBP, video MP4/AVI).
Public exceptions: dropped VisualGenError / VisualGenParamsError in favor of standard RuntimeError / ValueError / NotImplementedError. Aligns with vLLM/SGLang convention where diffusion engines raise standard exception types and callers catch by built-in type, not by package-specific wrappers.

Internal types and contracts

Internal MediaOutput renamed to PipelineOutput and gains three CUDA-event timing phases (pre_denoise / denoise / post_denoise) plus an executor-measured generation field on the wire response.
Tightened batched-shape contract: image is always (B, H, W, C), video (B, T, H, W, C), audio (B, channels, T_audio). Asserts in split_visual_gen_output make a violating pipeline fail loudly rather than silently corrupt per-item outputs by indexing along the wrong dim. LTX-2's decode_audio no longer unconditionally .squeeze(0)s, restoring the contract for B=1.

Module boundaries

Tensor-encoding code extracted from tensorrt_llm/serve/media_storage.py into a new tensorrt_llm/media/encoding.py module of free functions; MediaStorage keeps its file-storage role under serve/.
The eight video-generation route handlers in openai_server.py are extracted into a sibling mixin (tensorrt_llm/serve/openai_video_routes.py) so openai_server.py stays under the per-file line budget. Behavior is strictly preserved.

Engine reliability

Fixed a timeout-leak in DiffusionRemoteClient: previously, a request that timed out in aresult() could have its late-arriving response pinned in completed_responses for the process lifetime, leaking a full PipelineOutput (including video tensors) per timeout. New abandon_request_id flow uses an abandoned-id set so _store_response drops late responses and abandon_request_id pops cached ones; both orderings handled under the same async lock.

Server endpoints

/v1/images/edits short-circuits to HTTP 501 NotImplementedError. Reason: no in-tree pipeline implements image editing today (Flux/Flux2 are gen-only; Wan/LTX-2 produce video). The handler used to dispatch through the generator and 500 on a downstream None check; 501 is the honest answer until an edit-capable pipeline lands. Route stays registered so re-enabling is a one-line change.

Benchmark (`tensorrt_llm/bench/benchmark/visual_gen.py`)

Per-run persisted media dir: _run_benchmark wraps each run in one tempfile.TemporaryDirectory and saves each result into it. Mirrors the server's persisted-write pattern (no per-request mkstemp / unlink overhead in the timing window).
ffmpeg-optional fallback via resolve_video_format("auto"), matching how the server treats output_format="auto". The bench no longer hard-fails on hosts without ffmpeg.
Timing units: dropped _ms suffix and store wall-clock seconds end-to-end. CudaPhaseTimer.fill() divides Event.elapsed_time by 1000 once at the boundary so every downstream type carries seconds.
Renamed the engine-side metric from pipeline (a class-name leak) to generation (describes what the timer measures, not how). Threaded through VisualGenMetrics, DiffusionResponse, server log lines, bench per-request and aggregate types.
VisualGenBenchmarkMetrics gains mean / median / std / min / max / percentiles_generation alongside the existing latency aggregates. The latency >= generation gap is the encode + persist + IPC overhead the bench measures around the engine, and the report exposes that headroom.

API stability

YAML coverage for the VisualGen public surface is explicitly deferred until the API exits prototype status (all classes are @set_api_status("prototype")).

Test Coverage

tests/unittest/visual_gen/test_output.py — VisualGenOutput dataclass, VisualGenMetrics, to_visual_gen_output / split_visual_gen_output success and error paths, batch fan-out, VisualGenOutput.save() routing, VisualGenResult sync result(), async aresult(), __await__, timeout-leak fix (5 new tests covering abandon_request_id's two race orderings via a minimal DiffusionRemoteClient stub).
tests/unittest/_torch/visual_gen/test_visual_gen_params.py — request-validation surface migrated to plain ValueError.
tests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py — FastAPI mock tests; TestImageEdit reduced to a 501 assertion plus the existing pydantic-validation 400 test.
tests/unittest/media/test_encoding.py — PNG round-trip, video_to_bytes AVI bytes, resolve_video_format dispatch, ffmpeg fallback paths.
tests/integration/defs/examples/test_visual_gen.py — Wan T2V and Flux end-to-end public-output contract checks (single + batch + async), including error-path semantics. Field-name assertions updated for the timing rename.
tests/integration/defs/visual_gen/test_visual_gen_benchmark.py — bench JSON-key assertions extended with mean_generation.

Summary by CodeRabbit

New Features
- Added performance metrics (pipeline latency, denoising timing) to generation outputs via VisualGenMetrics
- Introduced VisualGenOutput.save() method for easy output persistence
- Added OpenAI-compatible video generation endpoints
Improvements
- Refactored output handling for cleaner result objects with error tracking
- Consolidated media encoding functionality into dedicated utilities
Documentation
- Updated all visual generation examples to use new output API

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

zhenhuaw-me · 2026-04-30T04:08:25Z

/bot run --disable-fail-fast

coderabbitai · 2026-04-30T04:18:15Z

📝 Walkthrough

Walkthrough

The changes refactor the visual generation output architecture, replacing MediaOutput with a new internal PipelineOutput type enhanced with CUDA timing instrumentation via CudaPhaseTimer, introducing public VisualGenOutput and VisualGenMetrics types as the primary user-facing API, migrating media encoding logic from MediaStorage into a new dedicated tensorrt_llm.media.encoding module, updating all example scripts to use output.save(), extracting OpenAI video routes into a reusable _VideoRoutesMixin, and extending test coverage for the new output and encoding APIs.

Changes

Cohort / File(s)	Summary
Example Output Persistence `examples/visual_gen/models/wan_t2v.py`, `examples/visual_gen/quickstart_example.py`, `examples/visual_gen/visual_gen_flux.py`, `examples/visual_gen/visual_gen_ltx2.py`, `examples/visual_gen/visual_gen_wan_i2v.py`, `examples/visual_gen/visual_gen_wan_t2v.py`	Removed `MediaStorage` imports and replaced `MediaStorage.save_*()` calls with `output.save(args.output_path)` for all generated visual content (images and videos).
Public Output API `tensorrt_llm/visual_gen/output.py`	Added `VisualGenMetrics` and `VisualGenOutput` dataclasses as public result containers with timing metadata and a `save()` method for persisting generated media. Replaced `MediaOutput` with these new types.
Internal Pipeline Output `tensorrt_llm/_torch/visual_gen/output.py`	Introduced `PipelineOutput` internal dataclass replacing `MediaOutput` with frame rate and audio sample rate metadata. Added `CudaPhaseTimer` helper for measuring CUDA-event-based phase timings (pre-denoise, denoise, post-denoise). Added conversion utilities `to_visual_gen_output()` and `split_visual_gen_output()` for transforming responses into public output types.
Internal Output Exports `tensorrt_llm/_torch/visual_gen/__init__.py`, `tensorrt_llm/visual_gen/__init__.py`	Updated module exports: replaced `MediaOutput` with `PipelineOutput` and `VisualGenMetrics`/`VisualGenOutput` re-exports in `__all__` lists.
Main Package Exports `tensorrt_llm/__init__.py`	Added `VisualGenMetrics` and `VisualGenOutput` to root-level exports for public API access.
Pipeline Implementations `tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py`, `tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux2.py`, `tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2.py`, `tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`, `tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan.py`, `tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py`	Replaced `MediaOutput` return type with `PipelineOutput` wrapped in `CudaPhaseTimer`. Added CUDA phase markers (`mark_pre_start`, `mark_denoise_start`, `mark_post_start`, `mark_end`) around denoising and decoding stages to instrument per-phase latencies. Updated docstrings to reflect `PipelineOutput`.
Executor Response Type `tensorrt_llm/_torch/visual_gen/executor.py`	Updated `DiffusionResponse` dataclass: changed output field from `MediaOutput` to `PipelineOutput`, added `pipeline_ms` field. Instrumented request processing with host-side wall-clock timing around `pipeline.infer()` using `time.perf_counter()`.
Main VisualGen API `tensorrt_llm/visual_gen/visual_gen.py`	Updated `VisualGenResult` to be awaitable and resolve to `VisualGenOutput` or `List[VisualGenOutput]`. Changed `VisualGen.generate()` return type from `MediaOutput` to `Union[VisualGenOutput, List[VisualGenOutput]]`. Added input validation and batch-size tracking. Updated error semantics: single prompts raise on error, batch items return errors within output objects.
Media Encoding Module `tensorrt_llm/media/__init__.py`, `tensorrt_llm/media/encoding.py`	Created new `tensorrt_llm.media.encoding` module with public tensor-based encoding functions: `save_image()`, `image_to_bytes()`, `save_video()`, `video_to_bytes()`. Supports both ffmpeg-based MP4 encoding and pure-Python MJPEG/AVI fallback. Includes optional audio muxing and format resolution logic.
MediaStorage Reduction `tensorrt_llm/serve/media_storage.py`	Removed all media encoding/persistence methods (`save_image`, `convert_image_to_bytes`, `save_video`, `convert_video_to_bytes`, internal helpers, and ffmpeg handling). Left only documentation redirecting to new `tensorrt_llm.media.encoding` module.
OpenAI Server Video Routes `tensorrt_llm/serve/openai_server.py`, `tensorrt_llm/serve/openai_video_routes.py`	Created `_VideoRoutesMixin` providing eight OpenAI-compatible video endpoints (sync/async generation, job listing/retrieval/deletion, content serving, metadata access). Refactored `OpenAIServer` to inherit from mixin. Switched image encoding from `MediaStorage` to `tensorrt_llm.media.encoding`. Added end-to-end timing and metrics extraction from `VisualGenOutput.metrics`.
Benchmark Metrics `tensorrt_llm/bench/benchmark/visual_gen.py`, `tensorrt_llm/bench/benchmark/visual_gen_utils.py`	Added `_save_to_tempfile()` helper to include encoding latency in end-to-end measurements. Updated `VisualGenRequestOutput` with `pipeline_ms` and `denoise_ms` fields. Refactored result capture to extract metrics from `result.metrics`.
Test Files `tests/integration/defs/examples/test_visual_gen.py`, `tests/unittest/_torch/visual_gen/test_media_storage.py`, `tests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py`, `tests/unittest/dynamo/test_imports.py`, `tests/unittest/media/test_encoding.py`, `tests/unittest/visual_gen/test_output.py`	Deleted obsolete `MediaStorage` unit tests. Added comprehensive new tests for `VisualGenOutput`, `VisualGenMetrics`, `PipelineOutput`, media encoding functions, and `VisualGenResult` awaitability. Updated endpoint tests to mock `VisualGenOutput` instead of `MediaOutput`. Updated import validation test to expect `PipelineOutput` export. Updated integration tests to use `output.save()` and validate output metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant VisualGen
    participant Executor
    participant Pipeline
    participant CudaPhaseTimer
    participant Encoder

    Client->>VisualGen: generate(inputs)
    VisualGen->>Executor: submit request (prompt, params)
    Executor->>Pipeline: create CudaPhaseTimer
    Pipeline->>CudaPhaseTimer: mark_pre_start()
    Pipeline->>Pipeline: initialization
    Pipeline->>CudaPhaseTimer: mark_denoise_start()
    Pipeline->>Pipeline: denoising loop
    Pipeline->>CudaPhaseTimer: mark_post_start()
    Pipeline->>Pipeline: decode video/image
    Pipeline->>CudaPhaseTimer: mark_end()
    CudaPhaseTimer->>Pipeline: fill(PipelineOutput)
    Pipeline->>Executor: return PipelineOutput with metrics
    Executor->>Executor: measure pipeline_ms (host-side)
    Executor->>VisualGen: DiffusionResponse(output=PipelineOutput, pipeline_ms)
    VisualGen->>VisualGen: convert to VisualGenOutput
    VisualGen->>Client: return VisualGenOutput
    Client->>Encoder: output.save(path)
    Encoder->>Encoder: encode image/video
    Encoder->>Encoder: write to file
    Encoder->>Client: return path

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.03% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately summarizes the main change: introducing a public VisualGenOutput API and benchmark timing decomposition for the visual generation feature.
Description check	✅ Passed	PR description is comprehensive with clear sections on public API changes, internal types, module boundaries, engine reliability improvements, server endpoint changes, and benchmark updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

tests/integration/defs/examples/test_visual_gen.py (1)
482-490: ⚡ Quick win

Assert audio_sample_rate on the two-stage LTX-2 path too.

The single-stage path checks that rate propagation worked, but this one doesn't. If two-stage output stops populating audio_sample_rate, output.save(...) can still pass by falling back to its default, so this regression would go uncaught.
Suggested test addition
         assert output.error is None
         assert output.video is not None
         assert output.frame_rate == LTX2_T2V_FRAME_RATE
+        assert output.audio_sample_rate is not None and output.audio_sample_rate > 0
         assert output.metrics is not None
         assert output.metrics.pipeline_ms > 0
         assert output.metrics.denoise_ms > 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/examples/test_visual_gen.py` around lines 482 - 490,
Add an assertion that the two-stage LTX-2 output populates audio_sample_rate:
after calling visual_gen.generate(...) and before output.save(...), assert that
output.audio_sample_rate equals the expected constant (e.g.
LTX2_T2V_AUDIO_SAMPLE_RATE) so that regressions where audio_sample_rate is
missing are caught; locate this in the same block that calls visual_gen.generate
and uses output.save to ensure parity with the single-stage test.
tests/unittest/media/test_encoding.py (2)
94-116: ⚡ Quick win

Make the AVI fallback tests deterministic.

These cases currently depend on the host having no ffmpeg. On builders where ffmpeg is present, save_video(..., format="avi") can take the ffmpeg path instead, so the test stops validating the pure-Python fallback it documents. Patch ffmpeg availability to False and reset the cached encoder in the test to pin the intended branch.

As per coding guidelines, "Coverage expectations: Assess whether new/changed tests cover happy path, important edge cases, and failure modes relevant to the feature or fix."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/media/test_encoding.py` around lines 94 - 116, Tests that
exercise the pure-Python AVI fallback must force the non-ffmpeg branch; in the
two tests calling video_to_bytes and save_video (symbols: video_to_bytes,
save_video, _dummy_video) use monkeypatch to set whatever ffmpeg-availability
flag/function your video module exposes to False (e.g., patch a function like
ffmpeg_available() to return False or set a module-level FFMPEG_AVAILABLE =
False), then clear/reset the module encoder cache before calling the functions
(remove or reinitialize the cached encoder object/attribute used by the
pure-Python path so the fallback is chosen, e.g., delattr(video_module,
"<cached_encoder_name>") or call the provided reset function) so the tests
deterministically exercise the pure-Python AVI encoder.
1-24: QA list updates look unnecessary here.

These additions stay in tests/unittest/, so I don't see a corresponding need for changes under tests/integration/test_lists/qa/ in this PR.

As per coding guidelines, "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/media/test_encoding.py` around lines 1 - 24, This PR only
modifies the unit test for the module tensorrt_llm.media.encoding (the test that
exercises image_to_bytes/save_image/video_to_bytes/resolve_video_format), so
explicitly state in the PR description or a commit message that QA list updates
under the integration QA lists are unnecessary for this change; update the PR
text to say "QA list updates unnecessary — change limited to unit tests" (or
similar), and remove or revert any unrelated changes made to the integration QA
list if present.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/visual_gen/visual_gen_ltx2.py`:
- Line 416: The CLI help text for the --output_path argument is out of sync with
the actual save call (output.save(args.output_path)); find the
parser.add_argument call that defines '--output_path' and update its help string
to accurately list the formats output.save supports (replace the old ".gif/.png
video outputs" text with the correct supported file types/containers), so the
help matches the behavior of output.save and args.output_path.

In `@tensorrt_llm/_torch/visual_gen/output.py`:
- Around line 148-149: The current truthiness check on resp.error_msg allows
empty-string errors to be treated as success and lead to dereferencing
resp.output; change both checks (the one around VisualGenOutput creation and the
other at lines handling resp.output) to explicitly test for None (e.g., if
resp.error_msg is not None) so any non-None error_msg — including empty string —
short-circuits and returns VisualGenOutput(request_id=resp.request_id,
error=resp.error_msg) instead of accessing resp.output.

In `@tensorrt_llm/bench/benchmark/visual_gen_utils.py`:
- Around line 36-47: The new timing fields pipeline_ms and denoise_ms on the
visual generation result are not being propagated into the JSON/console outputs;
update calculate_metrics to aggregate these per-request fields (e.g.,
sum/count/min/max/percentiles as appropriate) alongside e2e_latency and then
modify build_visual_gen_result_dict to include the aggregated pipeline_ms and
denoise_ms values in the returned dict and any per-request entries; locate uses
of VisualGenOutput.metrics and the result dataclass fields (pipeline_ms,
denoise_ms, success, e2e_latency) to read the engine-side timings and ensure
they are serialized into the final metrics JSON/console output.

In `@tensorrt_llm/serve/openai_video_routes.py`:
- Around line 98-103: The except blocks currently map all non-ValueError
exceptions to the default (client) error response; update each generic except
Exception handler (the ones that log traceback and call
create_error_response(str(e))) to return an internal server error (500) instead
of the default 400 by invoking create_error_response with an explicit 500 status
(or using an existing create_internal_error_response helper if present).
Concretely, in the handlers around the create_error_response usages (e.g.,
inside the functions that catch ValueError and Exception, and at the other noted
sites) keep the ValueError branch as-is but change the generic Exception branch
to logger.error(traceback.format_exc()) followed by return
self.create_error_response(str(e), status=500) (or equivalent API) so
server-side failures are classified as InternalServerError. Ensure you apply the
same change in the other occurrences mentioned (the other except Exception
blocks).
- Around line 196-202: video_gen_tasks is never cleaned up and delete_video()
does not cancel running jobs; update the flow so tasks are removed and cancelled
when finished or deleted: when creating the task for _generate_video_background,
attach an add_done_callback that removes the entry from self.video_gen_tasks
(and logs exceptions from task.result()); modify delete_video() to look up the
task in self.video_gen_tasks, if present cancel it (task.cancel()), await or
suppress asyncio.CancelledError as appropriate, and then delete the key so a
deleted job cannot keep running or leak memory; apply the same pattern to other
places that spawn background generation tasks referenced in the file.
- Around line 195-217: Persist the VideoJob to VIDEO_STORE before scheduling the
background task: construct the VideoJob (VideoJob(...)) and call await
VIDEO_STORE.upsert(video_id, video_job) first, then create and assign the
background task to self.video_gen_tasks[video_id] which calls
self._generate_video_background(...); this ensures _generate_video_background
can always find the job by id (video_id) even if it completes quickly. Return
the JSONResponse after the upsert and task scheduling.

In `@tensorrt_llm/visual_gen/visual_gen.py`:
- Around line 599-609: The single-request timeout path currently sets
self._finished before populating self._resolved, so the first raise
VisualGenError is not reflected for subsequent aresult()/result() calls; change
both timeout branches (the block handling response is None and the similar block
at the other location) to first set self._resolved =
[VisualGenOutput(request_id=self.request_id, error="Generation timed out")] (or
a single VisualGenOutput for non-batch), then set self._finished = True, and
finally raise VisualGenError("Generation timed out") so that later calls return
the stored error state; update references in the response handling logic that
use self._batch_size, self._resolved, self._finished, VisualGenOutput, and
VisualGenError.

---

Nitpick comments:
In `@tests/integration/defs/examples/test_visual_gen.py`:
- Around line 482-490: Add an assertion that the two-stage LTX-2 output
populates audio_sample_rate: after calling visual_gen.generate(...) and before
output.save(...), assert that output.audio_sample_rate equals the expected
constant (e.g. LTX2_T2V_AUDIO_SAMPLE_RATE) so that regressions where
audio_sample_rate is missing are caught; locate this in the same block that
calls visual_gen.generate and uses output.save to ensure parity with the
single-stage test.

In `@tests/unittest/media/test_encoding.py`:
- Around line 94-116: Tests that exercise the pure-Python AVI fallback must
force the non-ffmpeg branch; in the two tests calling video_to_bytes and
save_video (symbols: video_to_bytes, save_video, _dummy_video) use monkeypatch
to set whatever ffmpeg-availability flag/function your video module exposes to
False (e.g., patch a function like ffmpeg_available() to return False or set a
module-level FFMPEG_AVAILABLE = False), then clear/reset the module encoder
cache before calling the functions (remove or reinitialize the cached encoder
object/attribute used by the pure-Python path so the fallback is chosen, e.g.,
delattr(video_module, "<cached_encoder_name>") or call the provided reset
function) so the tests deterministically exercise the pure-Python AVI encoder.
- Around line 1-24: This PR only modifies the unit test for the module
tensorrt_llm.media.encoding (the test that exercises
image_to_bytes/save_image/video_to_bytes/resolve_video_format), so explicitly
state in the PR description or a commit message that QA list updates under the
integration QA lists are unnecessary for this change; update the PR text to say
"QA list updates unnecessary — change limited to unit tests" (or similar), and
remove or revert any unrelated changes made to the integration QA list if
present.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 34454c9b-615c-41d0-8b29-1ba1656ba7fb

📥 Commits

Reviewing files that changed from the base of the PR and between 37fc0e3 and 13a9b96.

📒 Files selected for processing (34)

examples/visual_gen/models/wan_t2v.py
examples/visual_gen/quickstart_example.py
examples/visual_gen/visual_gen_flux.py
examples/visual_gen/visual_gen_ltx2.py
examples/visual_gen/visual_gen_wan_i2v.py
examples/visual_gen/visual_gen_wan_t2v.py
tensorrt_llm/__init__.py
tensorrt_llm/_torch/visual_gen/__init__.py
tensorrt_llm/_torch/visual_gen/executor.py
tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py
tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux2.py
tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2.py
tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py
tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan.py
tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py
tensorrt_llm/_torch/visual_gen/output.py
tensorrt_llm/bench/benchmark/visual_gen.py
tensorrt_llm/bench/benchmark/visual_gen_utils.py
tensorrt_llm/media/__init__.py
tensorrt_llm/media/encoding.py
tensorrt_llm/serve/media_storage.py
tensorrt_llm/serve/openai_server.py
tensorrt_llm/serve/openai_video_routes.py
tensorrt_llm/visual_gen/__init__.py
tensorrt_llm/visual_gen/output.py
tensorrt_llm/visual_gen/visual_gen.py
tests/integration/defs/examples/test_visual_gen.py
tests/unittest/_torch/visual_gen/test_media_storage.py
tests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py
tests/unittest/dynamo/test_imports.py
tests/unittest/media/__init__.py
tests/unittest/media/test_encoding.py
tests/unittest/visual_gen/__init__.py
tests/unittest/visual_gen/test_output.py

💤 Files with no reviewable changes (1)

tests/unittest/_torch/visual_gen/test_media_storage.py

- DiffusionResponse.error_msg checks switch from truthy to is-not-None in to_visual_gen_output and split_visual_gen_output so an empty-string error message still routes through the failure branch instead of dereferencing resp.output. - VisualGenResult.aresult timeout path now persists the resolved error state on self._resolved before flipping self._finished, so subsequent aresult/result calls replay the same VisualGenError instead of silently returning None via the fast path. Batch and single-prompt branches both route through _resolved_value. - Video routes (sync gen, async gen, list, get-metadata, get-content, delete) now classify generic exceptions as InternalServerError/500 instead of leaving them at the create_error_response default of 400, while ValueError stays at 400. test_sync_video_failure updated to match. - openai_video_generation_async persists the queued VideoJob to VIDEO_STORE before scheduling the background task, closing the race where a fast-completing task could fail to find the job and leave it stuck at queued. - Background video tasks now register a done callback that drops the entry from self.video_gen_tasks (with an is-identity guard) and logs unexpected exceptions, bounding memory growth. delete_video cancels any in-flight task before removing the file and store entry so a deleted job cannot recreate output afterwards. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

zhenhuaw-me · 2026-04-30T06:59:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-30T07:06:52Z

PR_Github #46326 [ run ] triggered by Bot. Commit: a3cf840 Link to invocation

tensorrt-cicd · 2026-05-01T02:57:53Z

PR_Github #46326 [ run ] completed with state SUCCESS. Commit: a3cf840
/LLM/main/L0_MergeRequest_PR pipeline #36422 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tensorrt-cicd · 2026-05-06T17:08:16Z

PR_Github #46925 [ run ] completed with state SUCCESS. Commit: 45f1b61
/LLM/main/L0_MergeRequest_PR pipeline #36931 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Fix the duplicate-timeout bug in VisualGenResult.result() that QiJune flagged: the outer future.result(timeout=timeout) could raise concurrent.futures.TimeoutError before the inner aresult timeout branch finished cleanup, leaking late-arriving responses into completed_responses. VisualGenResult.result() - Drop outer timeout argument; only the inner aresult(timeout=timeout) enforces the wait. The inner branch already does abandon_request_id cleanup and resolves to an error output before returning. - Add a comment naming the constraint so the asymmetry is not lost on future readers. Verified - 37 / 37 unit tests in tests/unittest/visual_gen/test_output.py pass, including the existing test_aresult_timeout_invokes_abandon_request_id pair that covers the cleanup contract. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

…ike handle, batch fan-out, encoding extraction Replaces the raw MediaOutput return type with a request-aware public wrapper VisualGenOutput carrying image/video/audio tensors plus request_id, error, frame_rate/audio_sample_rate, and engine-side VisualGenMetrics. Promotes VisualGenResult to a single Future-like awaitable handle that resolves to VisualGenOutput (single prompt) or List[VisualGenOutput] (batch). Fans out batched generate(List[str]) into per-item outputs with per-item error semantics. Adds VisualGenOutput.save(path) as the single user-facing way to write generated media to disk. Extracts tensor-encoding code out of tensorrt_llm/serve/media_storage.py into a new tensorrt_llm/media/encoding.py module of free functions, while keeping MediaStorage in serve/ for its file-storage role. Internally renames MediaOutput -> PipelineOutput with three CUDA-event timing phases (pre_denoise_ms / denoise_ms / post_denoise_ms) plus an executor-measured pipeline_ms on the wire response. Updates every in-tree caller (examples, bench, serve, tests). The eight video-generation route handlers in openai_server.py are extracted into a sibling _VideoRoutesMixin (tensorrt_llm/serve/openai_video_routes.py) so openai_server.py stays under the per-file line budget. Behavior is strictly preserved: mixin methods reach back through self for instance state. API-stability YAML coverage for the VisualGen surface is explicitly deferred until the public API exits prototype status. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

- DiffusionResponse.error_msg checks switch from truthy to is-not-None in to_visual_gen_output and split_visual_gen_output so an empty-string error message still routes through the failure branch instead of dereferencing resp.output. - VisualGenResult.aresult timeout path now persists the resolved error state on self._resolved before flipping self._finished, so subsequent aresult/result calls replay the same VisualGenError instead of silently returning None via the fast path. Batch and single-prompt branches both route through _resolved_value. - Video routes (sync gen, async gen, list, get-metadata, get-content, delete) now classify generic exceptions as InternalServerError/500 instead of leaving them at the create_error_response default of 400, while ValueError stays at 400. test_sync_video_failure updated to match. - openai_video_generation_async persists the queued VideoJob to VIDEO_STORE before scheduling the background task, closing the race where a fast-completing task could fail to find the job and leave it stuck at queued. - Background video tasks now register a done callback that drops the entry from self.video_gen_tasks (with an is-identity guard) and logs unexpected exceptions, bounding memory growth. delete_video cancels any in-flight task before removing the file and store entry so a deleted job cannot recreate output afterwards. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

- Drop VisualGenError and VisualGenParamsError in favor of standard ValueError / RuntimeError / NotImplementedError, matching vLLM/SGLang conventions. Removed exports from tensorrt_llm/__init__.py and tensorrt_llm/visual_gen/__init__.py. Save errors map to: errored output -> RuntimeError, missing frame_rate / no media -> ValueError, audio-only -> NotImplementedError. _validate_request raises plain ValueError. Tests updated to match. - Fix timeout leak in DiffusionRemoteClient. The aresult timeout branch now schedules abandon_request_id on the executor's event loop. New abandon flow uses an _abandoned_request_ids set so _store_response drops late responses for abandoned ids and abandon_request_id pops responses that arrived between the await timeout and the abandon call; both orderings handled under the same async lock. Added five unit tests covering the abandon contract. - Tighten PipelineOutput shape contract to always-batched: image (B, H, W, C), video (B, T, H, W, C), audio (B, channels, T_audio). split_visual_gen_output now asserts the leading dim for all three modalities. Removed the unconditional .squeeze(0) in decode_audio that produced shape-inconsistent audio for B=1 vs B>1. - Image-edit endpoint /v1/images/edits now short-circuits to 501 NotImplemented because no in-tree pipeline implements image edit (Flux/Flux2 are gen-only, Wan/LTX-2 produce video). Trimmed TestImageEdit happy-path tests to a single 501 assertion plus the pre-existing pydantic-validation 400 test. - Replace lazy-import comment in visual_gen/output.py: the import-cycle reason is gone with VisualGenError, only the encoding-stack lazy rationale remains. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

…seconds Bench Option A: align the offline VisualGen bench with what the serving path actually does inside its e2e timing window, and rework the timing field names so they mean what they say. Bench encoding/persist scope (chang-l review feedback #2) - _run_benchmark wraps each run in a single tempfile.TemporaryDirectory and threads media_dir into both _run_sequential and _run_concurrent. - The per-request save now goes to media_dir/{img|vid}_{idx}{ext}; no per-request mkstemp/unlink. The directory is wiped wholesale when the context exits, mirroring the server's persisted-write pattern. - Video extension comes from resolve_video_format("auto") so the bench falls back to .avi (pure-Python) on hosts without ffmpeg, matching how the server handles output_format="auto" instead of hard-failing. Timing-metric rename (drop _ms suffix, store seconds) - VisualGenMetrics: pipeline_ms / pre_denoise_ms / denoise_ms / post_denoise_ms -> pipeline / pre_denoise / denoise / post_denoise. Wall-clock seconds throughout. - PipelineOutput: same rename. CudaPhaseTimer.fill() divides Event.elapsed_time by 1000 once at the boundary so every downstream type carries seconds. - DiffusionResponse.pipeline_ms -> pipeline; executor populates perf_counter() delta directly (no * 1000.0). - VisualGenRequestOutput (bench): e2e_latency / pipeline_ms / denoise_ms -> latency / pipeline / denoise. - VisualGenBenchmarkMetrics: *_e2e_latency_ms -> *_latency. calculate_metrics no longer rescales to ms. - print_visual_gen_results shows "Latency (s)" with :.4f precision and drops the "E2E" qualifier. - build_visual_gen_result_dict JSON keys: mean_e2e_latency_ms -> mean_latency, e2e_latencies -> latencies, etc. - Server log lines (image-gen, sync video, async video): e2e_ms / pipeline_ms / denoise_ms -> latency / pipeline / denoise with :.3f seconds formatting. - Tests in unittest/visual_gen/test_output.py and the integration tests under tests/integration/defs/{examples,visual_gen}/ updated to match. 192 / 192 unit tests passing. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

…gate to bench Decouple the public timing metric name from the internal class/method ("pipeline.infer()") and surface the engine-side floor in the benchmark report alongside the externally observed latency. Rename - VisualGenMetrics.pipeline -> generation. The new name describes what the timer measures from the user's perspective (time spent producing the output), not which code path we measured. Field-level docstrings added so per-field semantics sit next to each declaration; the class docstring keeps the cross-cutting CUDA-event methodology note. - DiffusionResponse.pipeline -> generation; executor populator uses generation_start / generation locals. - VisualGenRequestOutput.pipeline -> generation (bench per-request). - to_visual_gen_output and split_visual_gen_output populate the renamed field. - Server log lines (image-gen, sync video, async video): pipeline=... -> generation=... Bench aggregate - VisualGenBenchmarkMetrics gains mean_generation / median_generation / std_generation / min_generation / max_generation / percentiles_generation alongside the existing latency aggregates. The relationship latency >= generation should hold per request; the gap is the encode + persist + IPC overhead the bench measures around the engine, and the report exposes that headroom. - calculate_metrics collects only non-zero generation samples so backends that don't supply per-phase metrics don't pull the mean toward zero and obscure the latency comparison. - print_visual_gen_results adds a "Generation" section mirroring the "Latency" section. - build_visual_gen_result_dict adds the corresponding JSON keys plus a per-request "generations" array. Tests - tests/unittest/visual_gen/test_output.py: field-name assertions, kwarg arguments, attribute accesses, and the function name test_sub_phase_sum_bounded_by_pipeline -> ..._by_generation updated. - tests/integration/defs/examples/test_visual_gen.py: output.metrics.pipeline -> output.metrics.generation. - tests/integration/defs/visual_gen/test_visual_gen_benchmark.py: added mean_generation JSON-key assertions next to mean_latency. 192 / 192 unit tests passing. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

…elapsed_time Without the sync, cudaEventElapsedTime raises cudaErrorNotReady whenever the pipeline returns the output tensor while it is still resident on the GPU — the common case at fill() time. Caught by the e2e Flux text-to-image tests, which all returned 400 with "Both events must be completed before calculating elapsed time." Syncing on _end also covers the three earlier events because they record on the same default stream. Updated the corresponding methodology docstrings on both the public VisualGenMetrics and the internal CudaPhaseTimer to match the actual behavior. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

@JunyiXu-nv

Round 3 of @JunyiXu-nv's review on PR NVIDIA#13635. Drop empty MediaStorage stub - Remove tensorrt_llm/serve/media_storage.py: encoding moved to tensorrt_llm/media/encoding.py earlier in this PR, leaving only an empty MediaStorage class. No production code referenced it. - Drop the two regression tests in tests/unittest/visual_gen/test_output.py that pinned the class to "no encoding methods" — moot once the file is gone. Switch route call sites to VisualGenOutput.save - openai_video_routes.py (sync + background) and openai_server.py URL-mode image now call output.save(...) instead of save_video/save_image directly. This exercises the public API the PR added and keeps the audio/sample-rate default chain inside one place. Fix stale test-list entry - tests/integration/test_lists/test-db/l0_a10.yml referenced the deleted unittest/_torch/visual_gen/test_media_storage.py. Replace with the two test files this PR adds: unittest/visual_gen/test_output.py and unittest/media/test_encoding.py. Test: - tests/unittest/media/test_encoding.py + tests/unittest/visual_gen/test_output.py: 51 / 51 passing. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

Fix the duplicate-timeout bug in VisualGenResult.result() that QiJune flagged: the outer future.result(timeout=timeout) could raise concurrent.futures.TimeoutError before the inner aresult timeout branch finished cleanup, leaking late-arriving responses into completed_responses. VisualGenResult.result() - Drop outer timeout argument; only the inner aresult(timeout=timeout) enforces the wait. The inner branch already does abandon_request_id cleanup and resolves to an error output before returning. - Add a comment naming the constraint so the asymmetry is not lost on future readers. Verified - 37 / 37 unit tests in tests/unittest/visual_gen/test_output.py pass, including the existing test_aresult_timeout_invokes_abandon_request_id pair that covers the cleanup contract. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

zhenhuaw-me · 2026-05-07T02:50:31Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-07T02:55:56Z

PR_Github #47085 [ run ] triggered by Bot. Commit: e148156 Link to invocation

QiJune

LGTM

tensorrt-cicd · 2026-05-07T12:11:58Z

PR_Github #47085 [ run ] completed with state SUCCESS. Commit: e148156
/LLM/main/L0_MergeRequest_PR pipeline #37056 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…_server refactor This PR's refactor moved save_image out of tensorrt_llm.serve.openai_server into tensorrt_llm.media.encoding. The two TestImageGeneration regression guards in test_trtllm_serve_endpoints.py were still patching "tensorrt_llm.serve.openai_server.save_image", so unittest.mock.patch raised AttributeError on entry, failing both test_image_generation_b64_no_save_image_no_disk_write and test_image_generation_b64_with_4d_batch_pipeline_output. Patch the source module instead. This also tightens the regression guard: it catches a future caller that pulls save_image in via output.save() (the URL path), not just one re-importing it back into openai_server's namespace. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

zhenhuaw-me · 2026-05-07T14:05:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-07T14:11:53Z

PR_Github #47218 [ run ] triggered by Bot. Commit: 2499db9 Link to invocation

brb-nv

LGTM.

tensorrt-cicd · 2026-05-08T03:38:23Z

PR_Github #47218 [ run ] completed with state SUCCESS. Commit: 2499db9
/LLM/main/L0_MergeRequest_PR pipeline #37174 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

zhenhuaw-me · 2026-05-08T04:26:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-08T04:31:45Z

PR_Github #47311 [ run ] triggered by Bot. Commit: 2499db9 Link to invocation

tensorrt-cicd · 2026-05-08T06:32:41Z

PR_Github #47311 [ run ] completed with state SUCCESS. Commit: 2499db9
/LLM/main/L0_MergeRequest_PR pipeline #37252 completed with status: 'SUCCESS'

CI Report

Link to invocation

Port the batch-inference support from the original PR onto the new media/encoding + openai_video_routes layout introduced by NVIDIA#13635: - tensorrt_llm/media/encoding.py: add save_images() and save_videos() free functions plus a shared _resolve_batch_paths() helper. Both accept either a path prefix or an explicit List[str] of per-item paths. - tensorrt_llm/serve/openai_video_routes.py: sync and async video endpoints now call save_videos(). The async background task records every saved path on VideoJob.output_paths; delete_video() removes all of them. The sync endpoint still returns only the first file (OpenAI Videos API has no multi-file response yet). - tensorrt_llm/serve/openai_protocol.py: add VideoJob.output_paths for the multi-output case. - tensorrt_llm/serve/visual_gen_utils.py: map VideoGenerationRequest.n to VisualGenParams.num_images_per_prompt (already done for image requests on main). - tests: cover save_images/save_videos in tests/unittest/media/test_encoding.py and add n=2 batch tests for sync and async video endpoints. Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>

…position (NVIDIA#13635) Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

zhenhuaw-me self-assigned this Apr 30, 2026

zhenhuaw-me added the VisualGen label Apr 30, 2026

zhenhuaw-me commented Apr 30, 2026

View reviewed changes

zhenhuaw-me force-pushed the update-api-step4 branch 2 times, most recently from 3d639b8 to 96c319e Compare April 30, 2026 03:58

zhenhuaw-me commented Apr 30, 2026

View reviewed changes

Comment thread tensorrt_llm/visual_gen/output.py Outdated

zhenhuaw-me force-pushed the update-api-step4 branch from 96c319e to 13a9b96 Compare April 30, 2026 04:06

zhenhuaw-me marked this pull request as ready for review April 30, 2026 04:07

zhenhuaw-me requested review from a team as code owners April 30, 2026 04:07

zhenhuaw-me requested review from FrankD412, JunyiXu-nv, chang-l, kaiyux, laikhtewari, mikeiovine, moraxu and tijyojwad April 30, 2026 04:07

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

chang-l reviewed May 5, 2026

View reviewed changes

zhenhuaw-me added 8 commits May 7, 2026 02:49

zhenhuaw-me force-pushed the update-api-step4 branch from 44c14b0 to e148156 Compare May 7, 2026 02:50

QiJune approved these changes May 7, 2026

View reviewed changes

kaiyux approved these changes May 7, 2026

View reviewed changes

brb-nv approved these changes May 7, 2026

View reviewed changes

zhenhuaw-me merged commit 8e07cea into NVIDIA:main May 8, 2026
6 checks passed

zhenhuaw-me deleted the update-api-step4 branch May 8, 2026 08:17

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026

[TRTLLM-11319][feat] VisualGen public output API + bench timing decom…

ec70ef4

…position (NVIDIA#13635) Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>

Conversation

zhenhuaw-me commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Public output API

Internal types and contracts

Module boundaries

Engine reliability

Server endpoints

Benchmark (tensorrt_llm/bench/benchmark/visual_gen.py)

API stability

Test Coverage

Summary by CodeRabbit

PR Checklist

GitHub Bot Help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhenhuaw-me commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhenhuaw-me commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

zhenhuaw-me commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

zhenhuaw-me commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 8, 2026

zhenhuaw-me commented Apr 30, 2026 •

edited

Loading

Benchmark (`tensorrt_llm/bench/benchmark/visual_gen.py`)

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading