feat: adds row-based compaction eligibility filtering #19205
feat: adds row-based compaction eligibility filtering #19205cecemei merged 5 commits intoapache:masterfrom
Conversation
capistrant
left a comment
There was a problem hiding this comment.
tiny requests for change for defensiveness + nullable annotation. let me know what you think of the comments. otherwise looks good and very useful.
server/src/main/java/org/apache/druid/server/compaction/CompactionStatistics.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/compaction/MostFragmentedIntervalFirstPolicy.java
Outdated
Show resolved
Hide resolved
|
beyond the scope of this PR, but we should probably document this policy for druid 37. I'm pretty sure it was released in Druid 36 and has proved at least worthy of an experimental doc, but maybe even just a GA doc. |
Added a release note entry and updated documentation |
capistrant
left a comment
There was a problem hiding this comment.
thanks for taking the time to add the docs here so we don't forget to follow up! +1 on green CI
kfaraz
left a comment
There was a problem hiding this comment.
Some suggestions for follow up
| } | ||
|
|
||
| public static List<PartitionsSpec> getPartitionsSpec() | ||
| public static List<Object[]> getPolicyAndPartition() |
There was a problem hiding this comment.
Since this is a junit5 test, you may use Stream<Argument> instead.
| segments, | ||
| umbrellaInterval, | ||
| compactionInterval, | ||
| Math.toIntExact(compactionStatistics.getNumIntervals()), |
There was a problem hiding this comment.
Instead of converting long to int here, update the type of numIntervals in CompactionStatistics class to be int instead.
| } | ||
|
|
||
| public static CompactionStatus complete( | ||
| CompactionStatistics compactionStatistics, |
There was a problem hiding this comment.
| CompactionStatistics compactionStatistics, | |
| CompactionStatistics compactedStats, |
| ); | ||
| } | ||
| } else { | ||
| logger.error("Zero total rows in compaction candidate, something is wrong"); |
There was a problem hiding this comment.
Include the datasource name and interval in this log line.
Description
Adds row-based compaction eligibility filtering, extends MostFragmentedIntervalFirstPolicy with row count analysis to complement the existing byte-based filtering
Release Notes
The
mostFragmentedFirstcompaction policy now supportsminUncompactedRowsPercentForFullCompaction, allowing minor compaction decisions based on uncompacted row percentage in addition to byte-based thresholds.This PR has: