[TRTLLM-13015][feat] drop complex visual_gen CLI example scripts by zhenhuaw-me · Pull Request #14632 · NVIDIA/TensorRT-LLM

zhenhuaw-me · 2026-05-27T13:43:03Z

Description

The per-model CLI wrappers under examples/visual_gen/ (visual_gen_flux.py, visual_gen_ltx2.py, visual_gen_wan_t2v.py, visual_gen_wan_i2v.py, and the multi-node sbatch shell) had grown into hundreds of lines of argparse and dispatch logic that's better kept as local dev scripts than maintained in-tree examples. This PR removes them. Coverage of the VisualGen API is preserved by the slim models/wan_t2v.py example and the trtllm-serve flows under serve/.

Knock-on changes:

examples/visual_gen/models/wan_t2v.py: hardcode the integration-test resolution and prompt so it doubles as the smoke fixture for test_vbench_dimension_score_wan*.
tests/integration/defs/examples/test_visual_gen.py: port _generate_wan_video to invoke models/wan_t2v.py with an on-the-fly VisualGenArgs YAML (cfg_size picked from torch.cuda.device_count()); drop stale docstring references to the removed scripts.
examples/visual_gen/README.md and docs/source/models/visual-generation.md: rewrite as if the CLI wrappers never existed; switch CLI-flag references in the optimization sections to the equivalent VisualGenArgs YAML keys.

Net diff: −2293 lines.

Test Coverage

tests/integration/defs/examples/test_visual_gen.py::test_wan_t2v_example exercises models/wan_t2v.py end-to-end with the checked-in configs/wan2.2-t2v-fp4-1gpu.yaml (Wan 2.2 A14B NVFP4). Verified locally on GB200 (PASSED, 679s).
The three test_vbench_dimension_score_wan* tests already cover the ported _generate_wan_video invocation pattern in CI.
LPIPS tests (test_*_lpips_against_golden) and LTX-2 VBench tests are untouched — they use the Python API directly.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

Documentation
- Updated VisualGen configuration guide with new YAML and programmatic methods for quantization and multi-GPU parallelism setup.
- Streamlined example documentation with directory layout overview and basic usage instructions.
Examples
- Consolidated and reorganized example scripts; model-specific examples now available in the models directory with updated configuration approach.

zhenhuaw-me · 2026-05-27T13:45:23Z

/bot run

coderabbitai · 2026-05-27T13:51:12Z

📝 Walkthrough

Walkthrough

This PR consolidates VisualGen example scripts by migrating from individual CLI-based example scripts (e.g., visual_gen_flux.py, visual_gen_ltx2.py, visual_gen_wan_t2v.py, visual_gen_wan_i2v.py) to a unified model-script-based approach using YAML configuration. Documentation is updated to show the new VisualGenArgs API, the main README is condensed, and integration tests are refactored.

Changes

VisualGen CLI to YAML Configuration Migration

Layer / File(s)	Summary
API Configuration Documentation `docs/source/models/visual-generation.md`	Quantization section now documents `VisualGenArgs.quant_config` with YAML and Python examples specifying `quant_algo` (e.g., `FP8`, `NVFP4`) and `dynamic`. Multi-GPU Parallelism section describes `VisualGenArgs.parallel_config` modes (CFG, Ulysses, Parallel VAE, Attention2D, Ring Attention) with fields and combinability constraints, replacing old CLI `--linear_type` guidance.
Example README and WAN T2V Model Script `examples/visual_gen/README.md`, `examples/visual_gen/models/wan_t2v.py`	README condensed to point users to full documentation and adds Layout section describing repository structure. `wan_t2v.py` model example now explicitly overrides `height`, `width`, and `num_frames` parameters and uses simplified prompt, establishing the pattern for model-based examples.
Integration Test Refactoring for YAML-Based Examples `tests/integration/defs/examples/test_visual_gen.py`	Adds `textwrap` import; `_generate_wan_video` helper now generates `visual_gen_args.yaml` (with GPU-count-derived `cfg_size` and disabled CUDA graph) and invokes `wan_t2v.py` with `--visual_gen_args` instead of prompt/dimension CLI args. Removes LTX-2 helper docstring sentence and updates `test_wan_t2v_example` docstring to reference WAN 2.2 A14B NVFP4 checkpoint and shared YAML config.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

liji-nv
chang-l
hyukn

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: removing complex visual_gen CLI example scripts. It accurately reflects the primary objective of the PR.
Description check	✅ Passed	The PR description is comprehensive and well-structured, clearly explaining the motivation (large argparse/dispatch logic better kept as dev scripts), what is being removed (5 CLI scripts), how coverage is preserved (slim models/wan_t2v.py and trtllm-serve flows), and documenting all knock-on changes with test coverage details.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/integration/defs/examples/test_visual_gen.py (1)
1197-1239: QA list update appears unnecessary for this PR.

This refactor changes invocation/config wiring of an existing integration test path, but does not add a new integration test definition requiring scheduled QA-list enrollment.

As per coding guidelines "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional." and "If the change adds or materially alters an integration test under tests/integration/defs/, call out whether an entry is needed under tests/integration/test_lists/qa/."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/defs/examples/test_visual_gen.py` around lines 1197 - 1239,
This change only refactors invocation/config wiring of an existing integration
test and does not add or materially alter an integration test that would require
QA-list enrollment; update the PR description (or add a one-line comment near
the test_wan_t2v_example definition in
tests/integration/defs/examples/test_visual_gen.py) stating explicitly that a QA
list update is unnecessary for this PR per the coding guidelines so reviewers
know you considered QA-list impact.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/source/models/visual-generation.md`:
- Around line 122-123: Update the docs for parallel_config to use the correct
Attention2D key and combinability: replace mentions of attn2d_row_size and
attn2d_col_size with the single attn2d_size: [row, col] form, and change the
combinability note to state that Attention2D can be composed with Ulysses (it is
only mutually exclusive with ring_size > 1 / Ring Attention). Update the bullets
referencing Attention2D and Ring Attention accordingly so they reflect the
current API symbols (Attention2D, attn2d_size, ring_size) and constraints.

---

Nitpick comments:
In `@tests/integration/defs/examples/test_visual_gen.py`:
- Around line 1197-1239: This change only refactors invocation/config wiring of
an existing integration test and does not add or materially alter an integration
test that would require QA-list enrollment; update the PR description (or add a
one-line comment near the test_wan_t2v_example definition in
tests/integration/defs/examples/test_visual_gen.py) stating explicitly that a QA
list update is unnecessary for this PR per the coding guidelines so reviewers
know you considered QA-list impact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bff1e367-6ccd-4cb7-8dcf-37a983c3859c

📥 Commits

Reviewing files that changed from the base of the PR and between 09f5958 and c3823c9.

📒 Files selected for processing (9)

docs/source/models/visual-generation.md
examples/visual_gen/README.md
examples/visual_gen/models/wan_t2v.py
examples/visual_gen/visual_gen_flux.py
examples/visual_gen/visual_gen_ltx2.py
examples/visual_gen/visual_gen_mgmn_distributed.sh
examples/visual_gen/visual_gen_wan_i2v.py
examples/visual_gen/visual_gen_wan_t2v.py
tests/integration/defs/examples/test_visual_gen.py

💤 Files with no reviewable changes (5)

examples/visual_gen/visual_gen_mgmn_distributed.sh
examples/visual_gen/visual_gen_ltx2.py
examples/visual_gen/visual_gen_wan_i2v.py
examples/visual_gen/visual_gen_wan_t2v.py
examples/visual_gen/visual_gen_flux.py

tensorrt-cicd · 2026-05-27T13:51:24Z

PR_Github #50545 [ run ] triggered by Bot. Commit: c3823c9 Link to invocation

tensorrt-cicd · 2026-05-27T14:34:57Z

PR_Github #50545 [ run ] completed with state FAILURE. Commit: c3823c9
/LLM/main/L0_MergeRequest_PR pipeline #40049 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

zhenhuaw-me · 2026-05-28T00:50:11Z

/bot run

tensorrt-cicd · 2026-05-28T00:56:29Z

PR_Github #50642 [ run ] triggered by Bot. Commit: c3823c9 Link to invocation

tensorrt-cicd · 2026-05-28T03:21:40Z

PR_Github #50642 [ run ] completed with state SUCCESS. Commit: c3823c9
/LLM/main/L0_MergeRequest_PR pipeline #40135 completed with status: 'SUCCESS'

CI Report

Link to invocation

zhenhuaw-me · 2026-05-28T08:47:45Z

/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Post-Merge-1,DGX_B200-4_GPUs-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-05-28T08:53:28Z

PR_Github #50747 [ run ] triggered by Bot. Commit: c3823c9 Link to invocation

tensorrt-cicd · 2026-05-28T13:16:00Z

PR_Github #50747 [ run ] completed with state SUCCESS. Commit: c3823c9
/LLM/main/L0_MergeRequest_PR pipeline #40226 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

zhenhuaw-me · 2026-05-29T01:26:14Z

/bot reuse --comment "pre-merge and the additional stages passing"

github-actions · 2026-05-29T01:26:22Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

NVShreyas

LGTM

karljang

LGTM

chang-l

Thanks for cleanup.

One heads-up: after this, models/ only has the T2V example, so I2V / FLUX / LTX-2 all offline examples are gone even though the supported-models table still lists them. API coverage is preserved via quickstart_example.py + serve/ + the Python-API tests, and I see the plan is to add them in follow-ups -- just flagging this for visibility. Tiny nit: the README says "Per-model example scripts" (plural) but there's one for now.

Left one inline note on the Attention2D config key.

zhenhuaw-me · 2026-05-30T00:57:29Z

/bot reuse --comment "previous CI passing and the new commits are doc only change"

github-actions · 2026-05-30T00:57:35Z