[https://nvbugs/6162857][fix] Use generation metrics for VisualGen perf sanity#14176
Conversation
📝 WalkthroughWalkthroughThis PR migrates VisualGen performance sanity test metrics from end-to-end latency to separate latency and generation metrics. Metric definitions, validation contracts, database upload gating, documentation, and test waivers are coordinated to use mean/median/percentile latency and generation fields instead of e2e_latency, with validation enforcing positive generation values and required percentile fields. ChangesVisualGen metrics migration from e2e_latency to latency/generation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/integration/defs/perf/test_visual_gen_perf_sanity.py (1)
512-517: QA list impact looks unchanged for this PR.This metrics-contract migration does not add/remove test selectors or cases, so test-list entries under
tests/integration/test_lists/qa/do not need updates in this PR.As per coding guidelines: “If the change adds or materially alters an integration test under tests/integration/defs/... call out whether an entry is needed under tests/integration/test_lists/qa/.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/defs/perf/test_visual_gen_perf_sanity.py` around lines 512 - 517, The change to the metrics list in test_visual_gen_perf_sanity.py (the added/modified metric names like "mean_latency", "median_latency", "percentiles_latency", "mean_generation", "median_generation", "percentiles_generation") does not add/remove integration tests, so explicitly document that no update to the QA test-list is required: add a one-line comment in tests/integration/defs/perf/test_visual_gen_perf_sanity.py near the metrics block (or a short note in the PR description) stating that QA list entries under tests/integration/test_lists/qa/ are unchanged and no action is needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/defs/perf/test_visual_gen_perf_sanity.py`:
- Around line 549-558: The current validation for result_data["mean_generation"]
accepts NaN/Inf because float(... ) <= 0 is false for non-finite values; update
the check in the test (the block that inspects result_data["mean_generation"])
to explicitly verify finiteness and positivity by using math.isfinite on
float(result_data["mean_generation"]) and raising the same ValueError if the
value is not finite or is <= 0 so invalid NaN/Inf values are rejected.
---
Nitpick comments:
In `@tests/integration/defs/perf/test_visual_gen_perf_sanity.py`:
- Around line 512-517: The change to the metrics list in
test_visual_gen_perf_sanity.py (the added/modified metric names like
"mean_latency", "median_latency", "percentiles_latency", "mean_generation",
"median_generation", "percentiles_generation") does not add/remove integration
tests, so explicitly document that no update to the QA test-list is required:
add a one-line comment in
tests/integration/defs/perf/test_visual_gen_perf_sanity.py near the metrics
block (or a short note in the PR description) stating that QA list entries under
tests/integration/test_lists/qa/ are unchanged and no action is needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 247229f9-a49a-4273-a85e-985e05dc2611
📒 Files selected for processing (4)
tests/integration/defs/perf/README_test_visual_gen_perf_sanity.mdtests/integration/defs/perf/test_visual_gen_perf_sanity.pytests/integration/defs/perf/visual_gen_perf_utils.pytests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
e788658 to
6fcdefa
Compare
|
/bot run |
|
PR_Github #48572 [ run ] triggered by Bot. Commit: |
6fcdefa to
0e41c6b
Compare
|
PR_Github #48572 [ run ] completed with state
|
|
/bot cancel |
|
/bot run |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
PR_Github #48581 [ run ] triggered by Bot. Commit: |
|
PR_Github #48581 [ run ] completed with state
|
0e41c6b to
a01e03f
Compare
a01e03f to
e009b7e
Compare
|
/bot run |
|
PR_Github #49040 [ run ] triggered by Bot. Commit: |
|
PR_Github #49040 [ run ] completed with state
|
|
/bot run |
|
PR_Github #49154 [ run ] triggered by Bot. Commit: |
|
PR_Github #49154 [ run ] completed with state
|
6749a7b to
e4468b6
Compare
|
/bot run |
|
PR_Github #50176 [ run ] triggered by Bot. Commit: |
|
PR_Github #50176 [ run ] completed with state
|
|
/bot run |
e4468b6 to
e0fa37a
Compare
|
/bot run |
|
PR_Github #50270 [ run ] triggered by Bot. Commit: |
|
PR_Github #50270 [ run ] completed with state
|
|
/bot run |
|
PR_Github #50336 [ run ] triggered by Bot. Commit: |
|
PR_Github #50336 [ run ] completed with state
|
|
/bot run |
|
PR_Github #50643 [ run ] triggered by Bot. Commit: |
|
PR_Github #50643 [ run ] completed with state |
Track VisualGen latency and generation from the new benchmark result schema, and use generation as the regression metric while keeping upload gated on OpenSearch configuration. Signed-off-by: Taian Zhang <taianz@nvidia.com>
Signed-off-by: Taian Zhang <taianz@nvidia.com>
Remove waivers that were accidentally preserved while resolving the upstream rebase conflict so VisualGen perf sanity runs as intended. Signed-off-by: Taian Zhang <taianz@nvidia.com>
e0fa37a to
e8b3397
Compare
|
/bot run |
|
PR_Github #50705 [ run ] triggered by Bot. Commit: |
|
PR_Github #50705 [ run ] completed with state
|
|
/bot run |
|
PR_Github #50966 [ run ] triggered by Bot. Commit: |
|
PR_Github #50966 [ run ] completed with state |
5bfb2e2 to
e8b3397
Compare
…d_strength to extra_params, clamp seed Third round of self-review feedback on PR NVIDIA#14733. Three code-fix threads + one CodeRabbit follow-up; conflict-resolution merge done in this round too. Pipeline forward() signature reorder - Moved `seed: int` to the second positional argument (right after `prompt`, or after `image` for `WanImageToVideoPipeline`) across flux / flux2 / wan / wan_i2v / ltx2 / ltx2_two_stages, and dropped the bare `*,` keyword-only marker. Callers (`infer()` and `_run_warmup`) already used kwargs only, so this is a mechanical internal-API change. - Aligned `Cosmos3OmniMoTPipeline.forward` with the same convention — it still carried `seed: int = 42` as a stale default. image_cond_strength moved to per-pipeline extra_params - Cross-framework check confirmed that diffusers, vllm-omni, and SGLang Diffusion all treat conditioning-strength knobs as pipeline-specific (diffusers exposes them only on `LTXConditionPipeline` / SD img2img / SVD; vllm-omni and SGLang route them through generic `diffusers_kwargs` bags). None treats `image_cond_strength` as a first-class request field. - Dropped `image_cond_strength` from `VisualGenParams`, from `VideoGenerationRequest`, from the `visual_gen_utils` translation, from `_GENERATION_CONFIG_FIELDS`, and from the Wan-defaults carve-out (which let `get_wan_default_params` lose the `include_i2v` argument entirely). - Added the field to `LTX2Pipeline.extra_param_specs` (`default=1.0`); both `LTX2Pipeline.infer()` and `LTX2TwoStagesPipeline.infer()` now read it from `req.params.extra_params["image_cond_strength"]`. Wan pipelines reject the key via the existing unknown-`extra_params` path. - Updated the LTX-2 example script so `--image_cond_strength` flows into `extra_params` instead of the dropped top-level field, and trimmed the serve README's per-request control list. Seed clamp + N-image RNG semantics doc - Introduced `MAX_UINT32_SEED = 2**32 - 1` and applied `ge=0, le=MAX_UINT32_SEED` on `VisualGenParams.seed` and on the three openai_protocol video / image / edit request schemas. - Coordinator-rank random seed materialization moved from `secrets.randbits(63)` to `secrets.randbits(32)` so engine-drawn seeds stay inside the OpenAI DALL-E range that vllm-omni adopts. - Documented the chosen `num_images_per_prompt > 1` semantics in the `VisualGenParams.seed` description: diffusers/vllm-omni style (single `torch.Generator(seed=s)` driving N latents from one RNG stream), not SGLang's `[s, s+1, …]` per-image expansion. - Three new unit tests pin the range bounds. Conflict resolution - Rebased the three serve-parity commits onto origin/main, resolving conflicts in `openai_video_routes.py` against the `build_visual_gen_timing_headers` introduction from NVIDIA#14176. Kept the upstream cleanup that uses local `generation` / `denoise` variables. Tests - `test_visual_gen_params.py`: three obsolete top-level image_cond_strength tests replaced with two extra_params variants; three new tests for seed bounds. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Fixes https://nvbugs/6162857 by aligning VisualGen perf sanity with the new benchmark metric schema and ensuring online benchmarks receive real engine-side generation timing.
Description
VisualGen benchmark results now report latency and generation separately in seconds. This PR updates perf sanity validation and OpenSearch extraction to upload both metric families, and uses median generation as the regression metric.
For online image and sync-video benchmarks,
trtllm-servenow returns VisualGen engine-side timing through the standardServer-Timingresponse header. The benchmark client parses that metadata into the result JSON, so generation metrics no longer default to zero. This is non-breaking for regular serving clients: request schemas and response bodies are unchanged, and clients that ignore headers keep the same behavior.Test Coverage
python3 -m py_compilefor the modified VisualGen serve, benchmark, and perf sanity files.Server-Timingon image and sync-video responses.PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.