Filter segments near retention expiry from MergeRollup and UpsertCompactMerge task generators#18285
Merged
xiangfu0 merged 6 commits intoApr 26, 2026
Conversation
…actMerge task generators Signed-off-by: Jinesh Parakh <jineshparakh@hotmail.com>
Contributor
Author
|
@KKcorps @shounakmk219 requesting review |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18285 +/- ##
============================================
- Coverage 63.61% 63.39% -0.23%
- Complexity 1659 1679 +20
============================================
Files 3246 3253 +7
Lines 197514 198721 +1207
Branches 30578 30780 +202
============================================
+ Hits 125656 125982 +326
- Misses 61813 62668 +855
- Partials 10045 10071 +26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
reviewed
Apr 23, 2026
xiangfu0
approved these changes
Apr 23, 2026
noob-se7en
reviewed
Apr 23, 2026
…c/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java Co-authored-by: Xiang Fu <xiangfu.1024@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MinionTaskUtils.filterSegmentsPastRetention and TimeRetentionStrategy.isPurgeable previously duplicated the same segment expiry comparison independently. Moving the shared logic into RetentionUtils (pinot-common) ensures both callers stay in sync if the purgeability check evolves — TimeRetentionStrategy now delegates to it, and MinionTaskUtils uses it directly. Signed-off-by: Jinesh Parakh <jineshparakh@hotmail.com>
…ention-filter-for-minion-tasks
Contributor
Author
|
@xiangfu0 @noob-se7en can any one of you please re-trigger the failed tests? |
Contributor
Author
|
@xiangfu0 can you please re-review this? I've made the changes |
xiangfu0
reviewed
Apr 25, 2026
Contributor
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal issue; see inline comment.
Signed-off-by: Jinesh Parakh <jineshparakh@hotmail.com>
xiangfu0
added a commit
to pinot-contrib/pinot-docs
that referenced
this pull request
Apr 26, 2026
#769) This PR documents the new `retentionExpiryBufferPeriod` configuration option added in apache/pinot#18285. ## Changes - **minion-merge-rollup-task.md**: Added `retentionExpiryBufferPeriod` to the configuration table with description of purpose, format, default behavior, and fail-open behavior. Included a new "Watermark Impact of Retention Buffer" section documenting the watermark advancement behavior and configuration caveats. - **upsert-compact-merge-task.md**: Added `retentionExpiryBufferPeriod` to the configuration table with description of purpose, format, and default behavior. ## Related Issue This documents the changes in apache/pinot#18285 which adds a retention-aware filter to MergeRollupTaskGenerator and UpsertCompactMergeTaskGenerator to exclude segments nearing retention expiry from task generation, avoiding a race condition where RetentionManager could delete segments between task generation and download.
xiangfu0
added a commit
to xiangfu0/pinot
that referenced
this pull request
Apr 27, 2026
…al classpath guard After rebase onto upstream/master (fixes the CRITICAL "branch reverts unrelated master commits" finding by absorbing apache#18335, apache#18285, apache#18341, apache#18340), close the remaining MAJOR review gaps: 1. PinotDataSource classpath conflict guard, both connectors: - Treat LinkageError as a conflict, not as "v3 absent". A linkage failure means the v3 class IS resolvable to the loader but cannot run — the case the guard exists for. Falling back to "absent" would let the non-deterministic-resolution failure mode the guard is supposed to prevent slip through. Bias to fail-closed; users who genuinely need both jars can still set -Dpinot.spark.connector.skip-conflict-guard=true. - Add a symmetric guard to pinot-spark-3-connector that probes for the v4 PinotDataSource class. The conflict can now be caught regardless of which connector Spark instantiates first. 2. PinotWriteBuilder (spark-3): keep a deprecated 2-arg constructor `PinotWriteBuilder(filters: Array[Filter], logicalWriteInfo: LogicalWriteInfo)` so the constructor-signature change does not silently break external embedders mid-deprecation. The `filters` parameter is ignored at build time and overwrite()/truncate() still throw — purely a binary-compat shim during the spark-3 sunset window. 3. pinot-batch-ingestion-spark-4 netty.version override: expand the property comment to call out that the override propagates into the production shaded jar (not just the test classpath), document validated spark4.version pairings, and add SparkVersionAlignmentTest that pins netty 4.2.x on the test classpath via Class.forName(KQueueIoHandler) + getImplementationVersion. A future spark4.version bump that shadows netty back to 4.1 (or forward to 4.3) now fails the build immediately. 4. FilterPushDownTest (both connectors): add an explicit regression test for `In(attr, Array.empty)` — the empty-IN compile path was previously covered only by the umbrella test and would not have surfaced a regression cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xiangfu0
added a commit
to xiangfu0/pinot
that referenced
this pull request
Apr 27, 2026
…al classpath guard After rebase onto upstream/master (fixes the CRITICAL "branch reverts unrelated master commits" finding by absorbing apache#18335, apache#18285, apache#18341, apache#18340), close the remaining MAJOR review gaps: 1. PinotDataSource classpath conflict guard, both connectors: - Treat LinkageError as a conflict, not as "v3 absent". A linkage failure means the v3 class IS resolvable to the loader but cannot run — the case the guard exists for. Falling back to "absent" would let the non-deterministic-resolution failure mode the guard is supposed to prevent slip through. Bias to fail-closed; users who genuinely need both jars can still set -Dpinot.spark.connector.skip-conflict-guard=true. - Add a symmetric guard to pinot-spark-3-connector that probes for the v4 PinotDataSource class. The conflict can now be caught regardless of which connector Spark instantiates first. 2. PinotWriteBuilder (spark-3): keep a deprecated 2-arg constructor `PinotWriteBuilder(filters: Array[Filter], logicalWriteInfo: LogicalWriteInfo)` so the constructor-signature change does not silently break external embedders mid-deprecation. The `filters` parameter is ignored at build time and overwrite()/truncate() still throw — purely a binary-compat shim during the spark-3 sunset window. 3. pinot-batch-ingestion-spark-4 netty.version override: expand the property comment to call out that the override propagates into the production shaded jar (not just the test classpath), document validated spark4.version pairings, and add SparkVersionAlignmentTest that pins netty 4.2.x on the test classpath via Class.forName(KQueueIoHandler) + getImplementationVersion. A future spark4.version bump that shadows netty back to 4.1 (or forward to 4.3) now fails the build immediately. 4. FilterPushDownTest (both connectors): add an explicit regression test for `In(attr, Array.empty)` — the empty-IN compile path was previously covered only by the umbrella test and would not have surfaced a regression cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xiangfu0
added a commit
to xiangfu0/pinot
that referenced
this pull request
May 1, 2026
…al classpath guard After rebase onto upstream/master (fixes the CRITICAL "branch reverts unrelated master commits" finding by absorbing apache#18335, apache#18285, apache#18341, apache#18340), close the remaining MAJOR review gaps: 1. PinotDataSource classpath conflict guard, both connectors: - Treat LinkageError as a conflict, not as "v3 absent". A linkage failure means the v3 class IS resolvable to the loader but cannot run — the case the guard exists for. Falling back to "absent" would let the non-deterministic-resolution failure mode the guard is supposed to prevent slip through. Bias to fail-closed; users who genuinely need both jars can still set -Dpinot.spark.connector.skip-conflict-guard=true. - Add a symmetric guard to pinot-spark-3-connector that probes for the v4 PinotDataSource class. The conflict can now be caught regardless of which connector Spark instantiates first. 2. PinotWriteBuilder (spark-3): keep a deprecated 2-arg constructor `PinotWriteBuilder(filters: Array[Filter], logicalWriteInfo: LogicalWriteInfo)` so the constructor-signature change does not silently break external embedders mid-deprecation. The `filters` parameter is ignored at build time and overwrite()/truncate() still throw — purely a binary-compat shim during the spark-3 sunset window. 3. pinot-batch-ingestion-spark-4 netty.version override: expand the property comment to call out that the override propagates into the production shaded jar (not just the test classpath), document validated spark4.version pairings, and add SparkVersionAlignmentTest that pins netty 4.2.x on the test classpath via Class.forName(KQueueIoHandler) + getImplementationVersion. A future spark4.version bump that shadows netty back to 4.1 (or forward to 4.3) now fails the build immediately. 4. FilterPushDownTest (both connectors): add an explicit regression test for `In(attr, Array.empty)` — the empty-IN compile path was previously covered only by the umbrella test and would not have surfaced a regression cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xiangfu0
added a commit
to xiangfu0/pinot
that referenced
this pull request
May 7, 2026
…al classpath guard After rebase onto upstream/master (fixes the CRITICAL "branch reverts unrelated master commits" finding by absorbing apache#18335, apache#18285, apache#18341, apache#18340), close the remaining MAJOR review gaps: 1. PinotDataSource classpath conflict guard, both connectors: - Treat LinkageError as a conflict, not as "v3 absent". A linkage failure means the v3 class IS resolvable to the loader but cannot run — the case the guard exists for. Falling back to "absent" would let the non-deterministic-resolution failure mode the guard is supposed to prevent slip through. Bias to fail-closed; users who genuinely need both jars can still set -Dpinot.spark.connector.skip-conflict-guard=true. - Add a symmetric guard to pinot-spark-3-connector that probes for the v4 PinotDataSource class. The conflict can now be caught regardless of which connector Spark instantiates first. 2. PinotWriteBuilder (spark-3): keep a deprecated 2-arg constructor `PinotWriteBuilder(filters: Array[Filter], logicalWriteInfo: LogicalWriteInfo)` so the constructor-signature change does not silently break external embedders mid-deprecation. The `filters` parameter is ignored at build time and overwrite()/truncate() still throw — purely a binary-compat shim during the spark-3 sunset window. 3. pinot-batch-ingestion-spark-4 netty.version override: expand the property comment to call out that the override propagates into the production shaded jar (not just the test classpath), document validated spark4.version pairings, and add SparkVersionAlignmentTest that pins netty 4.2.x on the test classpath via Class.forName(KQueueIoHandler) + getImplementationVersion. A future spark4.version bump that shadows netty back to 4.1 (or forward to 4.3) now fails the build immediately. 4. FilterPushDownTest (both connectors): add an explicit regression test for `In(attr, Array.empty)` — the empty-IN compile path was previously covered only by the umbrella test and would not have surfaced a regression cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a retention-aware filter to
MergeRollupTaskGeneratorandUpsertCompactMergeTaskGeneratorthat excludes segments nearing their retention expiry from task generation. This closes a race condition where RetentionManager could delete segments between the time a task is generated (controller-side) and when the executor downloads them (minion-side), causing NoSuchKeyException failures and misleading alerts.Design Decisions
Generator-side filtering over executor-side skipping
For merge/rollup tasks, we chose generator-side filtering rather than executor-side skipping because:
{A, B, C} into D, the resulting merged segment D would be incomplete, and the lineagerecord would incorrectly claim B was replaced.
Configurable buffer via retentionExpiryBufferPeriod
retentionExpiryBufferPeriod(e.g. "1h", "30m") allows operators to exclude segments earlier than the exact retention boundary, providing a safetymargin against clock skew between controller and RetentionManager.
retentionMs - bufferMs. If thebuffer >= retention, the filter fails open (returns all segments) with a WARN log.It is table-level within each task (not per-merge-level for MergeRollupTask) because retention itself is table-level.
Single source of truth for retention logic
RetentionUtils.isPurgeable()is the single source of truth for the time-comparison and creation-time fallback logic. It supports thecontroller.retentionManager.enableCreationTimeFallbackcluster config introduced in Add support for handling scenarios where end time is invalid during RetentionManager run #18148.TimeRetentionStrategydelegates toRetentionUtils.isPurgeable()after its completion-status check, ensuring the controller's RetentionManager and the minion task generatorsalways use identical retention logic — the invariant is enforced by construction, not by discipline.
MergeRollupTaskGeneratorandUpsertCompactMergeTaskGeneratorread the creation-time fallback flag from Helix cluster config viaMinionTaskUtils.isCreationTimeFallbackEnabled()— the same authoritative source that RetentionManager reacts to viaonChange(). The read is hoisted before the per-table loop toavoid redundant ZK calls and to provide a consistent config snapshot across all tables in a single scheduling pass.
Fail-open on all error paths
Watermark impact (MergeRollupTask)
If all segments in an early time bucket are filtered out, the watermark will advance past them permanently. This is expected since those segments would be purged by RetentionManager
regardless. However, if caused by a misconfigured buffer, correcting the config will not recover already-skipped buckets. This is documented in the Javadoc.
currentTimeMs as a parameter
filterSegmentsPastRetention accepts currentTimeMs as an explicit parameter rather than calling System.currentTimeMillis() internally. This makes the method deterministic and testable
so boundary tests can use exact values without clock-drift workarounds.
Single constant definition
RETENTION_EXPIRY_BUFFER_PERIOD_KEY is defined once as a public static final in MinionTaskUtils (the utility that reads it). It is not duplicated in MinionConstants.MergeTask or
MinionConstants.UpsertCompactMergeTask to avoid triple-definition drift.
Test Plan
enabled keeps recent creation time)