Skip to content

Add circuit breaking mechanisms for offset auto-reset backfill#18501

Open
rseetham wants to merge 4 commits into
apache:masterfrom
rseetham:add-backfill-circuit-breaking
Open

Add circuit breaking mechanisms for offset auto-reset backfill#18501
rseetham wants to merge 4 commits into
apache:masterfrom
rseetham:add-backfill-circuit-breaking

Conversation

@rseetham
Copy link
Copy Markdown
Contributor

@rseetham rseetham commented May 14, 2026

Introduces four independent circuit breakers to prevent unbounded backfill triggering when a cluster is overwhelmed or restarts after prolonged downtime:

  1. Pause flag per topic (realtime.segment.offsetAutoReset.pause): operator-set boolean in stream config; checked in computeStartOffset() before any backfill decision is made.

  2. Max segments guard (realtime.segment.offsetAutoReset.maxSegmentsBeforeBackfillSkip): skips backfill trigger if table's segment count >= configured limit, preventing znode exhaustion when ingestion is permanently elevated.

  3. Max concurrent backfills per controller (controller.realtime.offsetAutoReset.maxConcurrentBackfillsPerController): caps the number of tables that can simultaneously backfill on a single controller instance, guarding against cluster-restart storms.

  4. Per-partition in-flight collision threshold (controller.realtime.offsetAutoReset.maxBackfillCollisionsBeforeAutoPause, default 3): tracks consecutive backfill-trigger attempts on a partition that already has an active backfill. Below the threshold the new trigger is allowed; at or above the threshold the topic's pause flag is set automatically and a metric is emitted requiring operator intervention.

New ControllerMeter entries are added for each skipped-backfill scenario to enable alerting on all circuit breaker activations.

Fixes: #18314

Deployed and tested.
Screenshot 2026-05-20 at 11 06 32 PM

bugfix

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

Codecov Report

❌ Patch coverage is 68.22430% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.32%. Comparing base (1a313c3) to head (f8717c3).
⚠️ Report is 38 commits behind head on master.

Files with missing lines Patch % Lines
...ler/validation/RealtimeOffsetAutoResetManager.java 75.00% 12 Missing and 7 partials ⚠️
.../core/realtime/PinotLLCRealtimeSegmentManager.java 8.33% 9 Missing and 2 partials ⚠️
...java/org/apache/pinot/spi/stream/StreamConfig.java 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18501      +/-   ##
============================================
+ Coverage     63.68%   64.32%   +0.63%     
+ Complexity     1684     1126     -558     
============================================
  Files          3266     3311      +45     
  Lines        199836   203940    +4104     
  Branches      31023    31740     +717     
============================================
+ Hits         127272   131186    +3914     
+ Misses        62424    62228     -196     
- Partials      10140    10526     +386     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.32% <68.22%> (+0.63%) ⬆️
temurin 64.32% <68.22%> (+0.63%) ⬆️
unittests 64.32% <68.22%> (+0.63%) ⬆️
unittests1 56.73% <56.25%> (+0.89%) ⬆️
unittests2 35.57% <68.22%> (+0.63%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

rseetham added 3 commits May 20, 2026 22:54
Introduces four independent circuit breakers to prevent unbounded backfill
triggering when a cluster is overwhelmed or restarts after prolonged downtime:

1. Pause flag per topic (`realtime.segment.offsetAutoReset.pause`): operator-set
   boolean in stream config; checked in computeStartOffset() before any backfill
   decision is made.

2. Max segments guard (`realtime.segment.offsetAutoReset.maxSegmentsBeforeBackfillSkip`):
   skips backfill trigger if table's segment count >= configured limit, preventing
   znode exhaustion when ingestion is permanently elevated.

3. Max concurrent backfills per controller
   (`controller.realtime.offsetAutoReset.maxConcurrentBackfillsPerController`):
   caps the number of tables that can simultaneously backfill on a single
   controller instance, guarding against cluster-restart storms.

4. Per-partition in-flight collision threshold
   (`controller.realtime.offsetAutoReset.maxBackfillCollisionsBeforeAutoPause`,
   default 3): tracks consecutive backfill-trigger attempts on a partition that
   already has an active backfill. Below the threshold the new trigger is allowed;
   at or above the threshold the topic's pause flag is set automatically and a
   metric is emitted requiring operator intervention.

New ControllerMeter entries are added for each skipped-backfill scenario to
enable alerting on all circuit breaker activations.

Fixes: apache#18314
…uto-reset backfill

Adds three new metrics to make the backfill circuit breaking feature
fully observable without log parsing:

- ControllerGauge.BACKFILL_TOPICS_IN_PROGRESS (per-table): snapshot of
  how many backfill Kafka topics are actively running for a table.
- ControllerMeter.OFFSET_AUTO_RESET_HANDLER_INIT_FAILURE (per-table):
  fires when handler construction fails silently — previously unalertable.
- ControllerMeter.OFFSET_AUTO_RESET_AUTO_PAUSE_FAILURE (per-table):
  fires when the ZK write to set the pause flag fails, so the circuit
  breaker appears to activate but the table keeps running.
- ControllerMeter.OFFSET_AUTO_RESET_BACKFILL_CLEANUP_COMPLETED (per-table):
  fires when backfill topics finish cleanup; absence over time signals
  stuck backfills.

Also fixes a bug in setPauseFlag() where the topic name was looked up using
the bare key "topic.name" (StreamConfigProperties.STREAM_TOPIC_NAME) instead
of the prefixed key "stream.<type>.topic.name" that stream config maps
actually use. This caused auto-pause to silently skip writing the pause flag.
@rseetham rseetham force-pushed the add-backfill-circuit-breaking branch from 70f9fca to f0742d1 Compare May 21, 2026 05:55
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal issue; see inline comment.

// Track collisions per (table, topic, partition); auto-pause if collisions exceed the threshold.
String partitionKey = tableNameWithType + ":" + topicName + ":" + partitionStr;
Set<String> activeBackfillTopics = _tableBackfillTopics.get(tableNameWithType);
boolean anyBackfillInFlight = activeBackfillTopics != null && !activeBackfillTopics.isEmpty();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaker is described as per-partition, but the guard is actually table-wide: anyBackfillInFlight becomes true whenever any backfill topic exists for the table. After that, every unrelated (topic, partition) trigger increments its own collision counter and can auto-pause its source topic even though there was no collision on that partition. The check needs to prove that the active backfill matches the same source topic/partition, not just that the table has some backfill running.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this more complicated than it needs to be. I only need to check collisions per partition, so I removed that check altogether now. Thanks for pointing it out.

…rity

- Rename _tableTopicsUnderBackfill -> _sourceTopicsByTable
- Rename _tableBackfillTopics -> _backfillTopicsByTable
- Rename _partitionInFlightCollisionCount -> _partitionsInFlightCount
- Fix collision logic: set counter to 1 on successful trigger; only
  increment on subsequent triggers for the same (table,topic,partition).
  Previously the guard was table-wide (any backfill in flight would
  trigger collision counting for unrelated partitions).
- Add topic.partition key to BACKFILL_SKIPPED_IN_FLIGHT,
  BACKFILL_SKIPPED_PAUSED, and BACKFILL_OFFSETS metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Backfill Circuit Breaking for Offset Reset Feature

3 participants