Skip to content

[TRTLLM-11362][feat] Add batch generation support to visual gen pipelines#12121

Open
karljang wants to merge 1 commit intoNVIDIA:mainfrom
karljang:feat/visual-gen-batch-support
Open

[TRTLLM-11362][feat] Add batch generation support to visual gen pipelines#12121
karljang wants to merge 1 commit intoNVIDIA:mainfrom
karljang:feat/visual-gen-batch-support

Conversation

@karljang
Copy link
Collaborator

@karljang karljang commented Mar 11, 2026

Summary by CodeRabbit

Release Notes

New Features

  • Added batch generation support across visual generation models (FLUX, WAN, WAN I2V)
  • Prompts now accept both single strings and lists of strings for processing multiple items in one request
  • Automatic batch size inference based on input type
  • Returns appropriately shaped outputs for single and batched generations

Tests

  • Added comprehensive batch generation tests for all visual generation pipelines

Summary

Add batch inference support to all visual generation pipelines (FLUX.1, FLUX.2, WAN T2V, WAN I2V). A single forward() call can now accept a list of prompts and generate all outputs in parallel, with proper CFG (classifier-free guidance) handling for the full batch.

  • prompt parameter accepts Union[str, List[str]] across all pipelines
  • Single prompt returns original shape (H,W,C) / (T,H,W,C) for backward compatibility
  • Batch prompts return (B,H,W,C) / (B,T,H,W,C) with batch dimension prepended
  • Seed behavior aligned with HF diffusers: single generator with batch_size in tensor shape (not per-sample seed+i)
  • API-level support in DiffusionRequest and VisualGeneration.generate_async()

Test Coverage

  • pytest tests/unittest/_torch/visual_gen/test_visual_gen_args.py — API-level batch input parsing (no GPU)
  • pytest tests/unittest/_torch/visual_gen/test_flux_pipeline.py -k "batch" — FLUX batch generation (1x GPU)
  • pytest tests/unittest/_torch/visual_gen/test_wan.py -k "batch" — WAN T2V batch generation (1x GPU)
  • pytest tests/unittest/_torch/visual_gen/test_wan_i2v.py -k "batch" — WAN I2V batch generation (1x GPU)
  • Verified single-prompt backward compatibility (output shapes unchanged)
  • Verified seed alignment with HF diffusers (single generator, batch_size in shape)

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@karljang karljang requested review from chang-l and zhenhuaw-me March 11, 2026 19:40
@karljang karljang requested review from a team as code owners March 11, 2026 19:40
@karljang karljang requested a review from syuoni March 11, 2026 19:40
@karljang karljang requested a review from o-stoner March 11, 2026 19:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

This PR adds batch prompt generation support across visual generation pipelines (Flux, Flux2, WAN, WAN I2V) and related APIs by widening prompt parameter types to Union[str, List[str]], updating latent preparation and decoding methods to handle batch dimensions, and introducing batch-aware test coverage.

Changes

Cohort / File(s) Summary
Executor and API Layer
tensorrt_llm/_torch/visual_gen/executor.py, tensorrt_llm/llmapi/visual_gen.py
DiffusionRequest prompt type expanded to Union[str, List[str]]; VisualGen.generate_async enhanced to parse and batch multiple prompts from list/tuple inputs.
Flux Pipeline
tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py
Added batch prompt support via Union[str, List[str]] typing; _prepare_latents and _decode_latents updated to accept and use batch_size parameter; forward method now returns single image (H, W, C) for batch_size=1 or batched tensor (B, H, W, C) otherwise.
Flux2 Pipeline
tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux2.py
Batch prompt support added with Union[str, List[str]] typing; latent preparation and decoding made batch-aware with batch_size parameter; forward now returns MediaOutput instead of dict; conditional return shapes (single vs. batched).
WAN Pipelines
tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan.py, tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py
WAN T2V pipeline gains Union[str, List[str]] prompt typing and batch_size propagation through latent preparation and decoding; WAN I2V extends batch support with image embedding repetition to match batch_size and batch-aware conditioning creation.
Flux Batch Tests
tests/unittest/_torch/visual_gen/test_flux_pipeline.py
Added TestFluxBatchGeneration class with batch tests for FLUX.1 and FLUX.2; validates single-prompt (3D) and batch (4D) output shapes; verifies per-sample seeding differences in Flux2.
WAN Batch Tests
tests/unittest/_torch/visual_gen/test_wan.py, tests/unittest/_torch/visual_gen/test_wan_i2v.py
Added TestWanBatchGeneration for WAN T2V with single-prompt backward compatibility (4D output) and batch prompt validation (5D output); added TestWanI2VBatchGeneration for I2V pipeline with similar shape validation tests.
Input Parsing Tests
tests/unittest/_torch/visual_gen/test_visual_gen_args.py
Added comprehensive unit tests for batch input parsing, validating DiffusionRequest prompt handling and VisualGen.batch scenarios with string, dict, list, and mixed inputs.

Sequence Diagram

sequenceDiagram
    participant Client
    participant VisualGen
    participant Pipeline
    participant LatentProcessor
    participant Decoder
    participant Output

    Client->>VisualGen: generate_async(prompts: Union[str, List[str]])
    VisualGen->>VisualGen: Infer batch_size from prompt type
    VisualGen->>Pipeline: forward(prompt, batch_size)
    
    Pipeline->>Pipeline: batch_size = len(prompt) if list else 1
    Pipeline->>Pipeline: Encode prompt(s) to embeddings
    
    Pipeline->>LatentProcessor: _prepare_latents(batch_size, height, width)
    LatentProcessor->>LatentProcessor: Create batch-aware latent shape (B, C, H', W')
    LatentProcessor->>Pipeline: Return latents
    
    Pipeline->>Pipeline: Model processing with batch dimension
    
    Pipeline->>Decoder: _decode_latents(latents, batch_size)
    Decoder->>Decoder: Process per-batch outputs
    alt batch_size == 1
        Decoder->>Decoder: Return single image (H, W, C)
    else batch_size > 1
        Decoder->>Decoder: Return batch tensor (B, H, W, C)
    end
    Decoder->>Pipeline: Return images
    
    Pipeline->>Output: Return batch or single output
    Output->>Client: Deliver result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding batch generation support to visual generation pipelines, matching the core feature across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 81.40% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description clearly explains the batch generation feature, implementation approach, test coverage, and aligns with the template structure with all required sections completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
tests/unittest/_torch/visual_gen/test_wan_i2v.py (1)

1087-1100: This fixture only validates one Wan I2V variant at a time.

i2v_full_pipeline loads whichever single checkpoint CHECKPOINT_PATH points to, and the module default is Wan2.2. That means the Wan 2.1-only batch branch in tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py—the new image_embeds.repeat(batch_size, 1, 1) path—is untested unless CI swaps checkpoints. Please split or parameterize the fixture by variant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_wan_i2v.py` around lines 1087 - 1100,
The fixture i2v_full_pipeline currently loads a single checkpoint via
CHECKPOINT_PATH so only the module default (Wan2.2) is exercised; update the
fixture to cover both Wan variants by parameterizing or splitting it: make
i2v_full_pipeline a parametrized pytest fixture (or add a second fixture) that
iterates over two checkpoint choices or variant labels (e.g., "wan2.2" and
"wan2.1") and for each instantiates VisualGenArgs with the corresponding
checkpoint/variant and then calls PipelineLoader(args).load(skip_warmup=True) so
the Wan2.1 path in tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py
(the image_embeds.repeat(batch_size, 1, 1) branch) is exercised in tests; ensure
the fixture skips when a given checkpoint is missing by checking existence of
each checkpoint path like the current CHECKPOINT_PATH check.
tests/unittest/_torch/visual_gen/test_visual_gen_args.py (1)

273-289: Add a conflicting-negative_prompt batch case here.

This only exercises the happy path where one item provides negative_prompt, so it won't catch the current silent-drop behavior when a later batch item supplies a different value. Once generate_async() is fixed, please pin that validation here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_visual_gen_args.py` around lines 273 -
289, Extend the test to cover a conflicting-negative_prompt case: in
test_list_of_dicts_input (or a new test) call vg.generate_async with inputs
where the first dict has "negative_prompt": "dark" and a later dict has a
different "negative_prompt" (e.g., "light"), then assert that the call fails
validation by wrapping the call in pytest.raises(ValueError) (or the project's
ValidationError) and/or asserting that vg.executor.enqueue_requests was not
invoked; reference generate_async and vg.executor.enqueue_requests to locate the
code under test.
tests/unittest/_torch/visual_gen/test_wan.py (2)

3473-3487: Prefix unused T variable with underscore.

Static analysis flags T as unused. Use _T to indicate the value is intentionally ignored while still documenting the shape.

Proposed fix
         assert result.video.dim() == 4, f"Expected 4D (T,H,W,C), got {result.video.dim()}D"
-        T, H, W, C = result.video.shape
+        _T, H, W, C = result.video.shape
         assert H == 480 and W == 832 and C == 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_wan.py` around lines 3473 - 3487, The
test test_single_prompt_backward_compat in
tests/unittest/_torch/visual_gen/test_wan.py defines T, H, W, C =
result.video.shape but never uses T; rename the unused T to _T to satisfy static
analysis and indicate intentional ignoring of the temporal dimension while
keeping the shape unpacking (change the tuple target from T to _T in the
test_single_prompt_backward_compat function).

3489-3504: Prefix unused T variable with underscore.

Same issue as above—T is unpacked but never used. Use _T to silence the warning.

Proposed fix
         assert result.video.dim() == 5, f"Expected 5D (B,T,H,W,C), got {result.video.dim()}D"
-        B, T, H, W, C = result.video.shape
+        B, _T, H, W, C = result.video.shape
         assert B == 2 and H == 480 and W == 832 and C == 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/test_wan.py` around lines 3489 - 3504, In
test_batch_prompt_shape rename the unused unpacked variable T to _T to silence
the unused-variable warning: update the tuple unpacking in the
test_batch_prompt_shape function (B, T, H, W, C = result.video.shape) to use _T
instead of T and leave the rest of the assertions unchanged so B, H, W, C
assertions still operate on the correct values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py`:
- Around line 507-510: The pre-repeat of image_embeds causes a shape mismatch
when CFG doubles latents in denoise(); inside forward_fn (the function that
builds latent_model_input / calls current_model) detect when
encoder_hidden_states_image (image_embeds) batch size differs from
latents_input/latent_model_input and repeat the image embeddings to match by
computing repeat_factor = latents_input.shape[0] // image_embeds.shape[0] (only
when divisible) and using image_embeds_to_use =
image_embeds.repeat(repeat_factor, 1, 1) before passing
encoder_hidden_states_image=image_embeds_to_use to current_model; this keeps
condition_data expansion logic intact and prevents relying on accidental
broadcasting in denoise()/forward_fn.

In `@tensorrt_llm/llmapi/visual_gen.py`:
- Around line 534-545: The batch-handling branch that builds prompt and
negative_prompt (variables inputs, prompt, negative_prompt) must validate each
item before collapsing: ensure every item is either a str or dict with a present
"prompt" key and reject/raise a clear API error on invalid types or missing
"prompt"; additionally detect per-item conflicting "negative_prompt" values and
either (a) reject the batch with a validation error if you want a single
negative_prompt per-request, or (b) propagate per-item negatives by extending
DiffusionRequest.negative_prompt to accept a list and attach corresponding
negatives instead of using only the first one; implement the chosen behavior in
the function that builds the DiffusionRequest so callers get deterministic
errors or per-sample negatives rather than silent drops.

---

Nitpick comments:
In `@tests/unittest/_torch/visual_gen/test_visual_gen_args.py`:
- Around line 273-289: Extend the test to cover a conflicting-negative_prompt
case: in test_list_of_dicts_input (or a new test) call vg.generate_async with
inputs where the first dict has "negative_prompt": "dark" and a later dict has a
different "negative_prompt" (e.g., "light"), then assert that the call fails
validation by wrapping the call in pytest.raises(ValueError) (or the project's
ValidationError) and/or asserting that vg.executor.enqueue_requests was not
invoked; reference generate_async and vg.executor.enqueue_requests to locate the
code under test.

In `@tests/unittest/_torch/visual_gen/test_wan_i2v.py`:
- Around line 1087-1100: The fixture i2v_full_pipeline currently loads a single
checkpoint via CHECKPOINT_PATH so only the module default (Wan2.2) is exercised;
update the fixture to cover both Wan variants by parameterizing or splitting it:
make i2v_full_pipeline a parametrized pytest fixture (or add a second fixture)
that iterates over two checkpoint choices or variant labels (e.g., "wan2.2" and
"wan2.1") and for each instantiates VisualGenArgs with the corresponding
checkpoint/variant and then calls PipelineLoader(args).load(skip_warmup=True) so
the Wan2.1 path in tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py
(the image_embeds.repeat(batch_size, 1, 1) branch) is exercised in tests; ensure
the fixture skips when a given checkpoint is missing by checking existence of
each checkpoint path like the current CHECKPOINT_PATH check.

In `@tests/unittest/_torch/visual_gen/test_wan.py`:
- Around line 3473-3487: The test test_single_prompt_backward_compat in
tests/unittest/_torch/visual_gen/test_wan.py defines T, H, W, C =
result.video.shape but never uses T; rename the unused T to _T to satisfy static
analysis and indicate intentional ignoring of the temporal dimension while
keeping the shape unpacking (change the tuple target from T to _T in the
test_single_prompt_backward_compat function).
- Around line 3489-3504: In test_batch_prompt_shape rename the unused unpacked
variable T to _T to silence the unused-variable warning: update the tuple
unpacking in the test_batch_prompt_shape function (B, T, H, W, C =
result.video.shape) to use _T instead of T and leave the rest of the assertions
unchanged so B, H, W, C assertions still operate on the correct values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e3ca3680-b29e-48d5-9490-7092f6461670

📥 Commits

Reviewing files that changed from the base of the PR and between 0350b7f and 31ef88c.

📒 Files selected for processing (10)
  • tensorrt_llm/_torch/visual_gen/executor.py
  • tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py
  • tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux2.py
  • tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan.py
  • tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py
  • tensorrt_llm/llmapi/visual_gen.py
  • tests/unittest/_torch/visual_gen/test_flux_pipeline.py
  • tests/unittest/_torch/visual_gen/test_visual_gen_args.py
  • tests/unittest/_torch/visual_gen/test_wan.py
  • tests/unittest/_torch/visual_gen/test_wan_i2v.py

@karljang karljang force-pushed the feat/visual-gen-batch-support branch 2 times, most recently from c91da00 to a003a3d Compare March 11, 2026 20:18
@karljang
Copy link
Collaborator Author

/bot help

@github-actions
Copy link

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@karljang
Copy link
Collaborator Author

/bot run --disable-fail-fast

@karljang karljang force-pushed the feat/visual-gen-batch-support branch from a003a3d to b88c1de Compare March 11, 2026 20:50
@tensorrt-cicd
Copy link
Collaborator

PR_Github #38632 [ run ] triggered by Bot. Commit: b88c1de Link to invocation

Copy link
Collaborator

@chang-l chang-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we may need some more work to properly enable batched inference for the Wan I2V task. Is that correct, @o-stoner @karljang ? For example, supporting multiple images within a single request, and batching requests with independent text/image inputs.

Maybe we can have another PR to enable batch generation with image input if I understand the current limitation correctly

image_embeds = image_embeds.to(self.dtype)
# Repeat for batch: single image, multiple prompts
if batch_size > 1:
image_embeds = image_embeds.repeat(batch_size, 1, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is a batch of requests may also contain different images, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Actually current implementation is aligned with HF Diffuser batch support, meaning HF Diffuser WAN I2V pipeline supports only one image with multiple prompts~

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. diffusers does support multiple images. Let me check further.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed using CC :)
So HF diffusers WAN I2V does NOT support multiple images. The PipelineImageInput type alias allows list[PIL.Image], but the WAN I2V check_inputs() rejects it. It's single image only — the type is shared across all diffusers pipelines but each pipeline validates differently.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking @karljang, since this PR aligns with diffusers, I think probably it is fine to leave it as-is in this PR.
But more general, I think batching support should be extended to handle multiple independent requests in future, i.e., the image encoder would need to handle batched image inputs, although the throughput impact may be limited.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, could we have some logging/warnings to users regarding this single-image multiple prompts/requests limitation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a great idea! I’ve just added the information log, and I’ve also added input validation to the WAN I2V code.

@karljang karljang force-pushed the feat/visual-gen-batch-support branch 2 times, most recently from 494685a to 284b86f Compare March 11, 2026 22:59
@tensorrt-cicd
Copy link
Collaborator

PR_Github #38632 [ run ] completed with state SUCCESS. Commit: b88c1de
/LLM/main/L0_MergeRequest_PR pipeline #29963 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@karljang karljang force-pushed the feat/visual-gen-batch-support branch from 284b86f to c081e32 Compare March 12, 2026 04:58
…ines

Add batch inference support to all visual generation pipelines (FLUX.1,
FLUX.2, WAN T2V, WAN I2V). A single forward() call now accepts a list
of prompts and generates all outputs in parallel with proper CFG handling.

- prompt parameter accepts Union[str, List[str]] across all pipelines
- Single prompt returns original shape for backward compatibility
- Seed behavior aligned with HF diffusers (single generator, batch_size in shape)
- API-level support in DiffusionRequest and VisualGeneration.generate_async()
- 16 new tests covering batch shape, backward compat, and API parsing

Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
@karljang karljang force-pushed the feat/visual-gen-batch-support branch from c081e32 to e7337bf Compare March 12, 2026 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants