-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query granularity to compaction task #10900
Conversation
|
||
import java.util.Objects; | ||
|
||
public class CompactionGranularitySpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is confusing to me since this class contains Granularity
s not GranularitySpec
s. Maybe CompactionGranularities
is a better name? Also the introduction of this shallow class seems like it is adding to the cognitive load of the reader...was there other cleaner way to support the "null" case out of which this class was created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply pass QueryGranularity (similar to SegmentGranularity already being added) to CompactionTask rather than creating a whole new class (CompactionGranularitySpec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is a GranularitySpec that is only applicable for Auto-Compaction/Compaction task. It contains more than just a Granularity. Similar to UniformGranularitySpec, it contains segmentGranularity and queryGranularity. In my next PR which will adds rollup to Auto-Compaction/Compaction task, this class will also contains option to enable/disable rollup. It is different to UniformGranularitySpec that it doesn't support inputIntervals. There is enough distinction between GranularitySpec for Compaction task than normal Index task that I decided to create a new GranularitySpec class for Compaction task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class will not only contains Granularity
hence, it is named CompactionGranularitySpec
. I do not think it adds much cognitive load of the reader since it is a very simple POJO. I think it would be much more cognitive load to try to fit the existing UniformGranularitySpec here since it does not fit well with what Compaction task supports / doesn't supports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SegmentGranularity that is in CompactionTask class was already Deprecated and will be removed in #10912. This is in favor of combining SegmentGranularity and QueryGranularity and other Compaction task granularity-related configs (such as rollup which will be added in my next PR) in CompactionGranularitySpec class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an opportunity here to unify the "granularity spec" design. The same design (i.e. class model) could be used for ingestion & compaction, unified. That may require refactoring of the "GranularitySpec" interface/class hierarchy though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this class as it is not needed. We already have ClientCompactionTaskGranularitySpec class which is use for passing around the Compaction task's ingestionSpec's granularities (segmentGranularity, queryGranularity).
LGTM after CI |
|
||
import java.util.Objects; | ||
|
||
public class ClientCompactionTaskQueryGranularitySpec | ||
public class ClientCompactionTaskGranularitySpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a javadoc for this please. Why is this needed? Why we haven't chosen to use the GranularitySpec class that already exists? Whats the relationship between this and UserCompactionTaskGranularityConfig
I'm ok if you do this in a follow up change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add in a followup change
|
||
import java.util.Objects; | ||
|
||
public class UserCompactionTaskGranularityConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add in a followup change
@@ -238,7 +234,7 @@ public void testSerdeGranularitySpec() throws IOException | |||
null, | |||
new Period(3600), | |||
null, | |||
new UniformGranularitySpec(Granularities.HOUR, null, null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change have any impact for upgrades / downgrades?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not. The json serialization / deserialization key remains the same
template, | ||
"%%GRANULARITY_SPEC%%", | ||
jsonMapper.writeValueAsString(granularityMap) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means that there will always be a granulartySpec provided as part of the compaction task spec. Do we have a test running a compact task without specifying a granularitySpec. I'm trying to think of what happens in an upgrade/downgrade scenario where the compaction job being submitted may not have the granularitySpec as part of the compaction spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are compaction IT (existing ones) still use compaction spec (/indexer/wikipedia_compaction_task.json) without granularitySpec (for example, testCompaction()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically,
template = StringUtils.replace(
template,
"%%GRANULARITY_SPEC%%",
jsonMapper.writeValueAsString(granularityMap)
);
will be a no-op for those tests that uses compaction spec without granularitySpec (/indexer/wikipedia_compaction_task.json)
checkCompactionIntervals(expectedIntervalAfterCompaction); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for what happens when someone tries to go from a larger query granularity to a smaller query granularity - like going from data rolled up to the YEAR, and trying to unroll it to MONTH. I assume the compaction task would fail. What is the error message in this case. Is it clear to the user what the problem is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That functionality is not changed in this PR. It will be the same as doing a reindex task with larger query granularity to a smaller query granularity. Compaction task will reports the result of the index task (which would be the same as doing a reindex task)
Assuming the doc changes are coming in a follow up change - we should make it clear what the tradeoffs are for rolling up data. If someone accidentally rolls up data to a coarser query granularity (MONTH -> YEAR) - do they have any way to get the segments with the finer queryGranularity back (MONTH)?
I don't understand the granularity specs enough. What's the impact of using UniformGranularitySpec instead of ArbitraryGranularitySpec? Is this something a user should be aware of? |
The doc change is coming in a separate PR (#10935). I will make it clear that If someone rolls up data to a coarser query granularity (MONTH -> YEAR) -the segment with finer queryGranularity (MONTH) will be overshadowed. Those segments may be remove from deep storage if a kill task is run on those intervals. Hence, user can lose data with finer queryGranularity (MONTH). Regarding UniformGranularitySpec vs. ArbitraryGranularitySpec. This was a design choice of existing implementation. IT was not changed in this PR. It is not something a user should be aware of. UniformGranularitySpec works much better in this case as Druid will automatically buckets down the interval according to the segmentGranularity. A ArbitraryGranularitySpec requires you to explicitly list out all bucket intervals. |
Add query granularity to compaction task
Description
Add query granularity to compaction task. Note that query granularity is still not supported in auto compaction.
This PR also creates a new class
CompactionGranularitySpec
to use instead ofGranularitySpec
when passing query granularity and segment granularity into compaction task. This allows the null value (value not given by the user) to represent using the original current query granularity and segment granularity (this is the existing behavior for compaction task). Note that the Compaction task ultimately still convertsCompactionGranularitySpec
to aUniformGranularitySpec
when creating the index ingestion specThis PR has: