-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support overlapping segment intervals in auto compaction #12062
Conversation
{ | ||
final ISOChronology chrono = ISOChronology.getInstance(DateTimes.inferTzFromString("America/Los_Angeles")); | ||
Map<String, Object> specs = ImmutableMap.of("%%GRANULARITYSPEC%%", new UniformGranularitySpec(Granularities.WEEK, Granularities.NONE, false, ImmutableList.of(new Interval("2013-08-31/2013-09-02", chrono)))); | ||
// Create WEEK segment with 2013-08-26 to 2013-09-20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2013-08-26 to 2013-09-02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (config.getGranularitySpec() == null || config.getGranularitySpec().getSegmentGranularity() == null) { | ||
// Determines segmentGranularity from the segmentsToCompact | ||
// Each batch of segmentToCompact from CompactionSegmentIterator will contains a single time chunk | ||
boolean allSegmentsOverlapped = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all segments have same interval -> no need to do union
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @maytasm!
Support overlapping segment intervals in auto compaction
Description
This PR fixes two problems when Druid compact overlapping segment intervals via auto compaction.
Imagine we have a segment with interval 2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z (MONTH segmentGranularity) and another segment with interval 2016-06-27T00:00:00.000Z/2016-07-04T00:00:00.000Z (WEEK segmentGranularity).
CompactionSegmentIterator
only return segment from a single time chunk bucket. For example, NewestSegmentFirstIterator would return the interval 2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z and submit a compaction task with the interval 2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z. However, the segment return from the iterator would only contains the MONTH segment and hence the sha256OfSortedSegmentIds calculated by auto compaction only contains the MONTH segment (2016-07-01T00:00:00.000Z/2016-08-01T00:00:00.000Z). This causes compaction task to fail when it starts running as the task would get all segments marked as used in the interval, which would be both the WEEK segment and MONTH segment, then compute the sha256 and compare it with the sha256 in the compaction spec. The sha256 would be different as the compaction task's sha256 only contains the MONTH segment. This issue is fixed by removing the sha256OfSortedSegmentIds from the compaction task spec created by auto compaction. sha256OfSortedSegmentIds was added in Use hash of Segment IDs instead of a list of explicit segments in auto compaction #8571 to enforce a limit on the number of segments in one compaction task. However, this is no longer necessary as compaction task can use parallel ingestion task.CompactionSegmentIterator
. To fix this issue, the segmentGranularity to be used in compaction task is determined in auto compaction based on the segments returned by auto compaction'sCompactionSegmentIterator
, thus ensuring that we preserve the same bucketing/chunking of segments.This PR has: