Skip to content

feat: adds row-based compaction eligibility filtering #19205

Merged
cecemei merged 5 commits intoapache:masterfrom
cecemei:metric2
Mar 25, 2026
Merged

feat: adds row-based compaction eligibility filtering #19205
cecemei merged 5 commits intoapache:masterfrom
cecemei:metric2

Conversation

@cecemei
Copy link
Copy Markdown
Contributor

@cecemei cecemei commented Mar 25, 2026

Description

Adds row-based compaction eligibility filtering, extends MostFragmentedIntervalFirstPolicy with row count analysis to complement the existing byte-based filtering

Release Notes

The mostFragmentedFirst compaction policy now supports minUncompactedRowsPercentForFullCompaction, allowing minor compaction decisions based on uncompacted row percentage in addition to byte-based thresholds.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cecemei cecemei changed the title Adds row-based compaction eligibility filtering feat: adds row-based compaction eligibility filtering Mar 25, 2026
@cecemei cecemei marked this pull request as ready for review March 25, 2026 03:18
Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny requests for change for defensiveness + nullable annotation. let me know what you think of the comments. otherwise looks good and very useful.

@capistrant
Copy link
Copy Markdown
Contributor

beyond the scope of this PR, but we should probably document this policy for druid 37. I'm pretty sure it was released in Druid 36 and has proved at least worthy of an experimental doc, but maybe even just a GA doc.

@cecemei
Copy link
Copy Markdown
Contributor Author

cecemei commented Mar 25, 2026

beyond the scope of this PR, but we should probably document this policy for druid 37. I'm pretty sure it was released in Druid 36 and has proved at least worthy of an experimental doc, but maybe even just a GA doc.

Added a release note entry and updated documentation

Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking the time to add the docs here so we don't forget to follow up! +1 on green CI

@cecemei cecemei merged commit 5f77596 into apache:master Mar 25, 2026
62 of 64 checks passed
@github-actions github-actions bot added this to the 37.0.0 milestone Mar 25, 2026
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions for follow up

}

public static List<PartitionsSpec> getPartitionsSpec()
public static List<Object[]> getPolicyAndPartition()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a junit5 test, you may use Stream<Argument> instead.

segments,
umbrellaInterval,
compactionInterval,
Math.toIntExact(compactionStatistics.getNumIntervals()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of converting long to int here, update the type of numIntervals in CompactionStatistics class to be int instead.

}

public static CompactionStatus complete(
CompactionStatistics compactionStatistics,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CompactionStatistics compactionStatistics,
CompactionStatistics compactedStats,

);
}
} else {
logger.error("Zero total rows in compaction candidate, something is wrong");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include the datasource name and interval in this log line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants