[None][fix] Fix scheduler off-by-one in FLUX pipelines at high resolutions by karljang · Pull Request #13091 · NVIDIA/TensorRT-LLM

karljang · 2026-04-15T19:59:45Z

Summary

Add set_begin_index(0) after set_timesteps in FLUX.1 and FLUX.2 pipelines to match upstream diffusers behavior
Fixes IndexError when running FLUX at high resolutions (e.g. 6400×6400 / 160K+ tokens) with few inference steps

Root Cause

FLUX uses dynamic timestep shifting (mu) that scales linearly with sequence length. At high resolutions (160K+ tokens), mu becomes large enough (~27.5) that all scheduler sigmas collapse to the same value (1.0) and all timesteps become identical (1000.0).

Without set_begin_index(0), the scheduler's _init_step_index calls index_for_timestep(1000.0), finds all timesteps matching, and picks index 1 (not 0). After N steps, step_index reaches N+1, causing an out-of-bounds access on self.sigmas[step_index + 1].

The upstream diffusers FluxPipeline avoids this by calling self.scheduler.set_begin_index(0) before the denoise loop (diffusers pipeline_flux.py:936), which forces _init_step_index to use index 0.

Test Plan

Verified FLUX.1-dev at 6400×6400 (160K tokens, 4 steps) completes successfully with this fix (previously crashed with IndexError)
Verified no regression at standard resolutions (3072×3072, 4352×4352)
Sage attention unit tests pass (864/864) on PR [TRTLLM-11485][feat] Feature rework: Add SageAttention refreshed kernels (attentionOp only) #12937 branch with this fix applied

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed scheduler initialization in Flux visual generation models to ensure consistent pipeline behavior across generation runs.

…tions Add set_begin_index(0) after set_timesteps in FLUX.1 and FLUX.2 pipelines to match upstream diffusers behavior. Without this, when the dynamic shift parameter mu is large (which happens at high resolutions like 6400x6400 / 160K+ tokens), all scheduler timesteps collapse to the same value. The scheduler's index_for_timestep then picks index 1 instead of 0 for the first step, causing an IndexError on the final step when it tries to access sigmas[num_steps + 1]. Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

coderabbitai · 2026-04-15T20:02:50Z

📝 Walkthrough

Walkthrough

Two Flux pipeline implementations now explicitly set the scheduler's begin index to 0 immediately after retrieving timesteps, ensuring consistent scheduler state initialization at the start of pipeline execution.

Changes

Cohort / File(s)	Summary
Scheduler Initialization in Flux Pipelines `tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py`, `tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux2.py`	Added explicit call to `self.scheduler.set_begin_index(0)` after retrieving timesteps in the forward method to ensure scheduler state is properly initialized.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the fix being applied: adding scheduler initialization to resolve an off-by-one error in FLUX pipelines at high resolutions.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The pull request description comprehensively explains the issue, root cause, solution, and test coverage following the template structure.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

karljang · 2026-04-17T01:23:29Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-17T01:29:35Z

PR_Github #43870 [ run ] triggered by Bot. Commit: 57ef61c Link to invocation

tensorrt-cicd · 2026-04-17T02:52:09Z

PR_Github #43870 [ run ] completed with state FAILURE. Commit: 57ef61c
/LLM/main/L0_MergeRequest_PR pipeline #34322 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

karljang · 2026-04-17T04:42:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-17T04:48:30Z

PR_Github #43951 [ run ] triggered by Bot. Commit: 57ef61c Link to invocation

tensorrt-cicd · 2026-04-17T09:17:42Z

PR_Github #43951 [ run ] completed with state SUCCESS. Commit: 57ef61c
/LLM/main/L0_MergeRequest_PR pipeline #34397 completed with status: 'SUCCESS'

CI Report

Link to invocation

karljang requested a review from a team as a code owner April 15, 2026 19:59

github-actions Bot assigned karljang Apr 15, 2026

venkywonka approved these changes Apr 17, 2026

View reviewed changes

karljang merged commit 7cf851c into NVIDIA:main Apr 17, 2026
11 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Fix scheduler off-by-one in FLUX pipelines at high resolutions#13091

[None][fix] Fix scheduler off-by-one in FLUX pipelines at high resolutions#13091
karljang merged 1 commit intoNVIDIA:mainfrom
karljang:fix/flux-160k-scheduler-begin-index

karljang commented Apr 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

karljang commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

karljang commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

karljang commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Test Plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

karljang commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

karljang commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karljang commented Apr 15, 2026 •

edited

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading