Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support segmentGranularity for auto-compaction #10843

Merged
merged 32 commits into from
Feb 12, 2021
Merged

Conversation

maytasm
Copy link
Contributor

@maytasm maytasm commented Feb 3, 2021

Support segmentGranularity for auto-compaction

Description

The auto-compaction configuration should support a segmentGranularity option like manual compaction task.

  • Storing granularitySpec in CompactionState. This is so that auto compaction can determine if compaction needs to be done or not (when segmentGranularity in auto compaction config changes)
  • Adds granularitySpec inDataSourceCompactionConfig. This allows segmentGranularity to be set in auto compaction config. Currently, only segmentGranularity of granularitySpec is supported by auto compaction and setting other values (i.e. queryGranularity) will fail.
  • In CompactSegments, pass granularitySpec to indexingServiceClient.compactSegments. This is to pass the segmentGranularity value inside granularitySpec into the ingestionSpec of the compact task. Note that we also add granularitySpec field in CompactionTask and deprecates segmentGranularity field in CompactionTask. This is so that in the future we can support queryGranularity and rollup in compaction task. As a result, in CompactionTask we uses the segmentGranularity inside granularitySpec instead of the deprecated top level segmentGranularity field.
  • needsCompaction in NewestSegmentFirstIterator needs to take into account changing granularitySpec. We will have to compact segments when segmentGranularity changes.
  • When segmentGranularity changes, we also cancel any active compaction task that have different segmentGranularity and re-compact those intervals with the new segmentGranularity.
  • NewestSegmentFirstIterator will also possibly return multiple segments of the same bucket for the new segmentGranularity (if segmentGranularity given). For example, if the original segment granularity is day then NewestSegmentFirstIterator currently will return a single day for each iteration. But if the new segmentGranularity in the auto compaction config is larger such as week, we cannot just compact a single day with segment granularity = week (Basically we cannot send a compact task with interval of a single day). Hence, the NewestSegmentFirstIterator will have to take into account the new segmentGranularity (week) and combine 7 days and return a period of seven days.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.
Key changed/added classes in this PR
  • CompactSegments
  • NewestSegmentFirstIterator
  • DataSourceCompactionConfig
  • CompactionState

@@ -141,6 +141,8 @@
@Nullable
private final Granularity segmentGranularity;
@Nullable
private final GranularitySpec granularitySpec;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this since it is no longer used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is no longer used?

null
);
} else {
this.granularitySpec = granularitySpec;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if both segmentGranularity & granularitySpec are non-null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

granularitySpec takes priority since segmentGranularity is deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we have a consistent stance on this but I think several places throw an exception if both the deprecated and new property are specified (such as Checks.checkOneNotNullOrEmpty usages).

I don't feel strongly about it, so I'm just mentioning this in case you feel that's a better approach.

@@ -139,10 +138,18 @@ public GranularitySpec withIntervals(List<Interval> inputIntervals)
return new UniformGranularitySpec(segmentGranularity, queryGranularity, rollup, inputIntervals);
}

@Override
Copy link

@loquisgon loquisgon Feb 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this to the abstract base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

// This is needed for datasource that has segmentGranularity configured
// If configured segmentGranularity is finer than current segmentGranularity, the same set of segments
// can belong to multiple intervals in the timeline. We keep track of the
private final Map<String, Set<Interval>> intervalCompactedForDatasource = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to add more to the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int partitions = segmentSet.size();
for (DataSegment segment : segmentSet) {
DataSegment segmentsForCompact = segment.withShardSpec(new NumberedShardSpec(partitionNum, partitions));
// PartitionHolder can only holds chucks of one partition space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chucks -> chunks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants