Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support 'keepSegmentGranularity' for compactionTask #6095

Merged
merged 8 commits into from
Aug 9, 2018

Conversation

jihoonson
Copy link
Contributor

This PR introduces a new configuration, keepSegmentGranularity, to compactionTask. Once this is set to true, the compactionTask respects the existing segment boundaries and doesn't compact segments across those segment boundaries.

From the implementation side, the compactionTask generates multiple indexTaskSpecs per segment interval and runs them sequentially. This should be fine because there's no (or very little) performance penalty compared to running a single indexTaskSpec.

Another characteristics of this is, the compactionTask is always finished as SUCCEEDED no matter how many indexTaskSpecs succeed. This means, the application (the coordinator in automatic compaction) is responsible for checking which segments the compactionTask failed to compact and rerunning another compactionTask.

This option is enabled by default in compactionTask, but automatic compaction currently disables this because it breaks the algorithm of NewestSegmentFirstIterator. I'll fix and enable in the follow-up pr.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM 👍

return Collections.singletonList(
new IndexIngestionSpec(
dataSchema,
new IndexIOConfig(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth pulling out a method or lambda fn maybe IndexIOConfig makeConfig(dataSchema, interval) since constructor statements are pretty big and largely the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Fixed.

log.warn("Interval[%s] has no segments, nothing to do.", interval);
return TaskStatus.failure(getId());
} else {
final String json = jsonMapper.writerWithDefaultPrettyPrinter().writeValueAsString(indexTaskSpec);
log.info("Generated compaction task details: " + json);
log.info("Generated [%d] compaction task specs: ", indexTaskSpecs.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get rid of the trailing colon, since nothing is really coming after it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return indexTaskSpec.run(toolbox);
final TaskStatus eachResult = eachSpec.run(toolbox);
if (!eachResult.isSuccess()) {
log.warn("Running indexSpec failed. Trying to the next indexSpec.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it throws an exception?

Also, "Trying to the next indexSpec" isn't grammatical, it should be "Trying the next indexSpec".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed.

@@ -245,12 +264,12 @@ public TaskStatus run(final TaskToolbox toolbox) throws Exception
*
* @return null if input segments don't exist. Otherwise, a generated ingestionSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This javadoc should be updated too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

final GranularitySpec granularitySpec = new ArbitraryGranularitySpec(
new NoneGranularity(),
GranularityType.NONE.getDefaultGranularity(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granularities.NONE is a simpler way of writing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -150,8 +150,11 @@ private CoordinatorStats doRun(

if (segmentsToCompact.size() > 1) {
final DataSourceCompactionConfig config = compactionConfigs.get(dataSourceName);
// Currently set keepSegmentGranularity to false because it breaks the algorithm of CompactionSegmentIterator to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to address this in a future patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll do in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #6135.

@gianm gianm modified the milestones: 0.12.3, 0.13.0 Aug 9, 2018
@gianm gianm merged commit d6a02de into apache:master Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants