perf: batch reset invalid stream offsets in stream supervisor by jtuglu1 · Pull Request #19431 · apache/druid

jtuglu1 · 2026-05-08T00:51:50Z

Description

Problem

I've noticed that recent Druid can be extremely slow to recover from a lagging supervisor (where offsets are invalid and tuningConfig.resetOffsetAutomatically=true). I have observed the following issue occurs:

Kafka supervisor falls behind, offsets are invalid
Supervisor starts creating a new task group
Supervisor runs into an invalid offset for a particular partition, throwing an exception. This exception propagates up to the supervisor, forcing us to wait until another round of runInternal(), which can take O(XXs) depending on the supervisor run loop configuration. Since these partitions are evaluated serially, if multiple partitions are invalid offsets, this means we keep looping for every invalid offset for every partition for every taskGroup, leading to many minutes of downtime. This issue is exacerbated when ingesting from large Kafka topics with many partitions.

Solution

For each taskGroup, identify + reset all invalid partition offsets in one go. Once we have established the set of invalid offsets, perform an internal reset, then throw the exception. This reduces the time-to-recovery of a fatally-lagged supervisor from N runInternal() calls to 1 runInternal() call, where N is the # of invalid partition offsets.

Release note

Batch reset invalid stream offsets in stream supervisor to speed up stream supervisor recovery.

This PR has:

FrankChen021

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

This is an automated review by Codex GPT-5

github-actions Bot added Area - Streaming Ingestion Area - Ingestion labels May 8, 2026

jtuglu1 changed the title ~~perf: reset invalid stream offsets in batch during recovery~~ perf: batch reset invalid stream offsets in stream supervisor May 8, 2026

jtuglu1 requested review from abhishekrb19 and gianm May 8, 2026 01:52

jtuglu1 force-pushed the reset-offsets-in-batch-to-speed-up-supervisor-recovery branch from 594147e to 51a32eb Compare May 8, 2026 02:31

perf: reset invalid stream offsets in batch during recovery

657e28d

jtuglu1 force-pushed the reset-offsets-in-batch-to-speed-up-supervisor-recovery branch from 51a32eb to 657e28d Compare May 8, 2026 02:38

jtuglu1 requested review from FrankChen021 and kfaraz May 8, 2026 04:35

jtuglu1 commented May 8, 2026

View reviewed changes

Comment thread .../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

jtuglu1 added the Performance label May 8, 2026

jtuglu1 requested a review from GWphua May 8, 2026 08:43

FrankChen021 reviewed May 8, 2026

View reviewed changes

jtuglu1 requested a review from maytasm May 9, 2026 06:52

maytasm reviewed May 11, 2026

View reviewed changes

Comment thread .../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

maytasm approved these changes May 11, 2026

View reviewed changes

ensure invalid sequence numbers are included with partitions

aa81d50

jtuglu1 merged commit 99c9743 into apache:master May 11, 2026
38 checks passed

github-actions Bot added this to the 38.0.0 milestone May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: batch reset invalid stream offsets in stream supervisor#19431

perf: batch reset invalid stream offsets in stream supervisor#19431
jtuglu1 merged 2 commits into
apache:masterfrom
jtuglu1:reset-offsets-in-batch-to-speed-up-supervisor-recovery

jtuglu1 commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

FrankChen021 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jtuglu1 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Release note

Uh oh!

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jtuglu1 commented May 8, 2026 •

edited

Loading