Skip to content

feat: add SegmentPruner support for datasources/policies#19228

Merged
clintropolis merged 2 commits intoapache:masterfrom
clintropolis:segment-pruner-improvements
Mar 30, 2026
Merged

feat: add SegmentPruner support for datasources/policies#19228
clintropolis merged 2 commits intoapache:masterfrom
clintropolis:segment-pruner-improvements

Conversation

@clintropolis
Copy link
Copy Markdown
Member

changes:

  • adds new include method to SegmentPruner for checking individual segments for whether or not to prune
  • adds default implementation of prune method which calls include
  • adds new combine method to SegmentPruner for merging pruners
  • adds new CompositeSegmentPruner for cases where pruners cannot be naturally combined
  • adds new createSegmentPruner method to DataSource and Policy so that they can participate in pruning
  • updates ExecutionVertex to combine the new datasource pruner with the pruner of the filter

changes:
* adds new `include` method to `SegmentPruner` for checking individual segments for whether or not to prune
* adds default implementation of `prune` method which calls `include`
* adds new `combine` method to `SegmentPruner` for merging pruners
* adds new `CompositeSegmentPruner` for cases where pruners cannot be naturally combined
* adds new `createSegmentPruner` method to `DataSource` and `Policy` so that they can participate in pruning
* updates `ExecutionVertex` to combine the new datasource pruner with the pruner of the filter
@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Mar 28, 2026
@clintropolis clintropolis changed the title Add SegmentPruner support for RestrictedDataSource policy filters feat: add SegmentPruner support for datasources/policies Mar 28, 2026
* such as filters may still be used.
*/
@Nullable
default SegmentPruner createSegmentPruner()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also implement this in FilteredDataSource.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I thought about this, but ignored it for now since I think FilteredDataSource, and UnnestDataSource since it has a filter too, both need to be a bit more thoughtful in how they prune. I think they need to be combining with the pruner that is beneath them from the base, but maybe only in some cases or modifying it in others? Like I think for unnest we might want to like prune differently if the filter is on the unnest column, depending on whether unnest is on a mvd or an array, similar to what we do for unnest filter pushdown? I'm not certain if we have to do anything besides combine FilteredDataSource, I haven't fully thought about it yet, and didn't want to for now 😅

I'll look into improving this in a follow-up.

@clintropolis clintropolis merged commit 3d8b81c into apache:master Mar 30, 2026
113 of 116 checks passed
@clintropolis clintropolis deleted the segment-pruner-improvements branch March 30, 2026 21:21
@github-actions github-actions bot added this to the 37.0.0 milestone Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants