-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate NoneShardSpec and drop support for automatic segment merge #6883
Conversation
@@ -146,7 +146,7 @@ public DataSegment( | |||
// dataSource | |||
this.dimensions = prepareDimensionsOrMetrics(dimensions, DIMENSIONS_INTERNER); | |||
this.metrics = prepareDimensionsOrMetrics(metrics, METRICS_INTERNER); | |||
this.shardSpec = (shardSpec == null) ? NoneShardSpec.instance() : shardSpec; | |||
this.shardSpec = (shardSpec == null) ? new NumberedShardSpec(0, 1) : shardSpec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to have a static shared instance for NumberedShardSpec(0, 1)
the same way we did for NoneShardSpec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we can. But I think it would be not common to create a dataSegment with null shardSpec. I guess this code is just for backward compatibility to support a really old version of dataSegment. In recent versions, every dataSegment must have a proper shardSpec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was more thinking of the impact through DataSegment.Builder
which with this change will now create a new object in the constructor to set the default before it is likely later replaced when the actual shard spec is set before completing the build, instead of the previous behavior of re-using the same object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, DataSegment.Builder
is mostly for benchmark or unit tests. It's not supposed to be used in production codes.
I don't think the builder pattern is appropriate for DataSegment because it requires for every fields to be filled except loadSpec
in its constructor. But, it's quite prevalent in unit tests, so I left it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, just saw a bunch of usages, didn't look too closely at what they were 👍
@@ -373,7 +373,7 @@ public Builder() | |||
this.loadSpec = ImmutableMap.of(); | |||
this.dimensions = ImmutableList.of(); | |||
this.metrics = ImmutableList.of(); | |||
this.shardSpec = NoneShardSpec.instance(); | |||
this.shardSpec = new NumberedShardSpec(0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuing other comment chain, alternatively maybe this line isn't necessary if everything should be setting it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good to me I think, but maybe we should mark the MergeTask
as deprecated since NoneShardSpec
will no longer be produced after this patch, making it unusable for all new segments.
@clintropolis sounds good. I deprecated mergeTask and |
I don't understand why teamcity complains about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🤘
@clintropolis thank you for the review. I added the |
I've removed mergeTask because it wouldn't work anymore. The coordinator throws an exception if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change, because I think it is simplifying in two ways:
- It promotes usage of 'extendable' specs.
- It removes the 'auto merge' functionality in the Coordinator (which does horizontal merging, across time chunks, of NoneShardSpecced data) in favor of the newer 'auto compaction' functionality (which can do both horizontal and vertical merging, and can work on time chunks with multiple segments).
I think both of these things will make data management easier and simpler. Please retitle the PR to reflect the second point, though. Right now it only mentions the first one.
new NamedType(HashBasedNumberedShardSpec.class, "hashed") | ||
) | ||
); | ||
return Collections.emptyList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have Jackson modules, you could just extend com.google.inject.Module
instead. (A DruidModule is a Guice Module that also adds Jackson modules.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
NoneShardSpec
represents a single partition in a timeChunk. It's basically same with other shardSpecs with total number of partitions = 1 without details of the partition scheme. I think we're using it because mergeTask is able to merge segments only if they haveNoneShardSpec
(#3241). However, now, we have a compactionTask which can merge any segments.This PR is to deprecate
NoneShardSpec
. Also, I moved allShardSpec
implementations todruid-core
. Other changes are:NoneShardSpec
. Realtime tasks useNumberedShardSpec
. Batch tasks can useNumberedShardSpec
,HashBasedNumberedShardSpec
orSingleDimensionShardSpec
.forceExtendableShardSpecs
anymore. The way it worked is, whenforceGuaranteedRollup
= true andforceExtendableShardSpecs
= false,NoneShardSpec
was used for a single partition whileHashBasedNumberedShardSpec
was used for others. This is inconsistent becauseHashBasedNumberedShardSpec
is extendable butNoneShardSpec
is not. IfforceExtendableShardSpecs
is set,NumberedShardSpec
is used when publishing segments. Now, IndexTask only uses eitherNumberedShardSpec
orHashBasedNumberedShardSpec
. Both of them are extendable, so there's no point forforceExtendableShardSpecs
.forceExtendableShardSpecs
. It can be set to useNumberedShardSpec
for range partitioned segments.This is an incompatible change since mergeTask cannot be used for new segments.