Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate NoneShardSpec and drop support for automatic segment merge #6883

Merged
merged 25 commits into from
Mar 16, 2019

Conversation

jihoonson
Copy link
Contributor

@jihoonson jihoonson commented Jan 17, 2019

NoneShardSpec represents a single partition in a timeChunk. It's basically same with other shardSpecs with total number of partitions = 1 without details of the partition scheme. I think we're using it because mergeTask is able to merge segments only if they have NoneShardSpec (#3241). However, now, we have a compactionTask which can merge any segments.

This PR is to deprecate NoneShardSpec. Also, I moved all ShardSpec implementations to druid-core. Other changes are:

  • All task types no long use NoneShardSpec. Realtime tasks use NumberedShardSpec. Batch tasks can use NumberedShardSpec, HashBasedNumberedShardSpec or SingleDimensionShardSpec.
  • IndexTask doesn't support forceExtendableShardSpecs anymore. The way it worked is, when forceGuaranteedRollup = true and forceExtendableShardSpecs = false, NoneShardSpec was used for a single partition while HashBasedNumberedShardSpec was used for others. This is inconsistent because HashBasedNumberedShardSpec is extendable but NoneShardSpec is not. If forceExtendableShardSpecs is set, NumberedShardSpec is used when publishing segments. Now, IndexTask only uses either NumberedShardSpec or HashBasedNumberedShardSpec. Both of them are extendable, so there's no point for forceExtendableShardSpecs.
  • HadoopIndexTask still supports forceExtendableShardSpecs. It can be set to use NumberedShardSpec for range partitioned segments.

This is an incompatible change since mergeTask cannot be used for new segments.

@@ -146,7 +146,7 @@ public DataSegment(
// dataSource
this.dimensions = prepareDimensionsOrMetrics(dimensions, DIMENSIONS_INTERNER);
this.metrics = prepareDimensionsOrMetrics(metrics, METRICS_INTERNER);
this.shardSpec = (shardSpec == null) ? NoneShardSpec.instance() : shardSpec;
this.shardSpec = (shardSpec == null) ? new NumberedShardSpec(0, 1) : shardSpec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to have a static shared instance for NumberedShardSpec(0, 1) the same way we did for NoneShardSpec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can. But I think it would be not common to create a dataSegment with null shardSpec. I guess this code is just for backward compatibility to support a really old version of dataSegment. In recent versions, every dataSegment must have a proper shardSpec.

Copy link
Member

@clintropolis clintropolis Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was more thinking of the impact through DataSegment.Builder which with this change will now create a new object in the constructor to set the default before it is likely later replaced when the actual shard spec is set before completing the build, instead of the previous behavior of re-using the same object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, DataSegment.Builder is mostly for benchmark or unit tests. It's not supposed to be used in production codes.
I don't think the builder pattern is appropriate for DataSegment because it requires for every fields to be filled except loadSpec in its constructor. But, it's quite prevalent in unit tests, so I left it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, just saw a bunch of usages, didn't look too closely at what they were 👍

@@ -373,7 +373,7 @@ public Builder()
this.loadSpec = ImmutableMap.of();
this.dimensions = ImmutableList.of();
this.metrics = ImmutableList.of();
this.shardSpec = NoneShardSpec.instance();
this.shardSpec = new NumberedShardSpec(0, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuing other comment chain, alternatively maybe this line isn't necessary if everything should be setting it

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me I think, but maybe we should mark the MergeTask as deprecated since NoneShardSpec will no longer be produced after this patch, making it unusable for all new segments.

@jihoonson
Copy link
Contributor Author

@clintropolis sounds good. I deprecated mergeTask and druid.coordinator.merge.on option. Also changed to Incompatible because the mergeTask can't handle newly generated segments anymore.

@jihoonson
Copy link
Contributor Author

I don't understand why teamcity complains about IndexMerger. I didn't touch it in this PR.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤘

@jihoonson
Copy link
Contributor Author

@clintropolis thank you for the review. I added the Design Review label since it's an incompatible change now.

@jihoonson
Copy link
Contributor Author

I've removed mergeTask because it wouldn't work anymore. The coordinator throws an exception if druid.coordinator.merge.on is true now.

@fjy fjy added this to the 0.15.0 milestone Mar 11, 2019
Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, because I think it is simplifying in two ways:

  • It promotes usage of 'extendable' specs.
  • It removes the 'auto merge' functionality in the Coordinator (which does horizontal merging, across time chunks, of NoneShardSpecced data) in favor of the newer 'auto compaction' functionality (which can do both horizontal and vertical merging, and can work on time chunks with multiple segments).

I think both of these things will make data management easier and simpler. Please retitle the PR to reflect the second point, though. Right now it only mentions the first one.

new NamedType(HashBasedNumberedShardSpec.class, "hashed")
)
);
return Collections.emptyList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have Jackson modules, you could just extend com.google.inject.Module instead. (A DruidModule is a Guice Module that also adds Jackson modules.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@jihoonson jihoonson changed the title Deprecate NoneShardSpec Deprecate NoneShardSpec and drop support for automatic segment merge Mar 13, 2019
@jihoonson jihoonson merged commit 892d1d3 into apache:master Mar 16, 2019
@jihoonson jihoonson mentioned this pull request Jan 23, 2020
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants