-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support 'keepSegmentGranularity' for compactionTask #6095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM 👍
return Collections.singletonList( | ||
new IndexIngestionSpec( | ||
dataSchema, | ||
new IndexIOConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth pulling out a method or lambda fn maybe IndexIOConfig makeConfig(dataSchema, interval)
since constructor statements are pretty big and largely the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Fixed.
log.warn("Interval[%s] has no segments, nothing to do.", interval); | ||
return TaskStatus.failure(getId()); | ||
} else { | ||
final String json = jsonMapper.writerWithDefaultPrettyPrinter().writeValueAsString(indexTaskSpec); | ||
log.info("Generated compaction task details: " + json); | ||
log.info("Generated [%d] compaction task specs: ", indexTaskSpecs.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can get rid of the trailing colon, since nothing is really coming after it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return indexTaskSpec.run(toolbox); | ||
final TaskStatus eachResult = eachSpec.run(toolbox); | ||
if (!eachResult.isSuccess()) { | ||
log.warn("Running indexSpec failed. Trying to the next indexSpec."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it throws an exception?
Also, "Trying to the next indexSpec" isn't grammatical, it should be "Trying the next indexSpec".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed.
@@ -245,12 +264,12 @@ public TaskStatus run(final TaskToolbox toolbox) throws Exception | |||
* | |||
* @return null if input segments don't exist. Otherwise, a generated ingestionSpec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This javadoc should be updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
final GranularitySpec granularitySpec = new ArbitraryGranularitySpec( | ||
new NoneGranularity(), | ||
GranularityType.NONE.getDefaultGranularity(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granularities.NONE
is a simpler way of writing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -150,8 +150,11 @@ private CoordinatorStats doRun( | |||
|
|||
if (segmentsToCompact.size() > 1) { | |||
final DataSourceCompactionConfig config = compactionConfigs.get(dataSourceName); | |||
// Currently set keepSegmentGranularity to false because it breaks the algorithm of CompactionSegmentIterator to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to address this in a future patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'll do in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #6135.
This PR introduces a new configuration,
keepSegmentGranularity
, to compactionTask. Once this is set to true, the compactionTask respects the existing segment boundaries and doesn't compact segments across those segment boundaries.From the implementation side, the compactionTask generates multiple indexTaskSpecs per segment interval and runs them sequentially. This should be fine because there's no (or very little) performance penalty compared to running a single indexTaskSpec.
Another characteristics of this is, the compactionTask is always finished as
SUCCEEDED
no matter how many indexTaskSpecs succeed. This means, the application (the coordinator in automatic compaction) is responsible for checking which segments the compactionTask failed to compact and rerunning another compactionTask.This option is enabled by default in compactionTask, but automatic compaction currently disables this because it breaks the algorithm of
NewestSegmentFirstIterator
. I'll fix and enable in the follow-up pr.