Add query granularity to compaction task #10900

maytasm · 2021-02-19T00:23:01Z

Add query granularity to compaction task

Description

Add query granularity to compaction task. Note that query granularity is still not supported in auto compaction.
This PR also creates a new class CompactionGranularitySpec to use instead of GranularitySpec when passing query granularity and segment granularity into compaction task. This allows the null value (value not given by the user) to represent using the original current query granularity and segment granularity (this is the existing behavior for compaction task). Note that the Compaction task ultimately still converts CompactionGranularitySpec to a UniformGranularitySpec when creating the index ingestion spec

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

loquisgon · 2021-02-22T23:54:27Z

...r/src/main/java/org/apache/druid/segment/indexing/granularity/CompactionGranularitySpec.java

+
+import java.util.Objects;
+
+public class CompactionGranularitySpec


This name is confusing to me since this class contains Granularitys not GranularitySpecs. Maybe CompactionGranularities is a better name? Also the introduction of this shallow class seems like it is adding to the cognitive load of the reader...was there other cleaner way to support the "null" case out of which this class was created?

Why not simply pass QueryGranularity (similar to SegmentGranularity already being added) to CompactionTask rather than creating a whole new class (CompactionGranularitySpec)

This class is a GranularitySpec that is only applicable for Auto-Compaction/Compaction task. It contains more than just a Granularity. Similar to UniformGranularitySpec, it contains segmentGranularity and queryGranularity. In my next PR which will adds rollup to Auto-Compaction/Compaction task, this class will also contains option to enable/disable rollup. It is different to UniformGranularitySpec that it doesn't support inputIntervals. There is enough distinction between GranularitySpec for Compaction task than normal Index task that I decided to create a new GranularitySpec class for Compaction task.

This class will not only contains Granularity hence, it is named CompactionGranularitySpec. I do not think it adds much cognitive load of the reader since it is a very simple POJO. I think it would be much more cognitive load to try to fit the existing UniformGranularitySpec here since it does not fit well with what Compaction task supports / doesn't supports

SegmentGranularity that is in CompactionTask class was already Deprecated and will be removed in #10912. This is in favor of combining SegmentGranularity and QueryGranularity and other Compaction task granularity-related configs (such as rollup which will be added in my next PR) in CompactionGranularitySpec class

There is an opportunity here to unify the "granularity spec" design. The same design (i.e. class model) could be used for ingestion & compaction, unified. That may require refactoring of the "GranularitySpec" interface/class hierarchy though.

Removed this class as it is not needed. We already have ClientCompactionTaskGranularitySpec class which is use for passing around the Compaction task's ingestionSpec's granularities (segmentGranularity, queryGranularity).

jon-wei · 2021-03-02T03:44:22Z

LGTM after CI

suneet-s · 2021-03-02T16:09:34Z

server/src/main/java/org/apache/druid/client/indexing/ClientCompactionTaskGranularitySpec.java


 import java.util.Objects;

-public class ClientCompactionTaskQueryGranularitySpec
+public class ClientCompactionTaskGranularitySpec


Can you add a javadoc for this please. Why is this needed? Why we haven't chosen to use the GranularitySpec class that already exists? Whats the relationship between this and UserCompactionTaskGranularityConfig

I'm ok if you do this in a follow up change.

I'll add in a followup change

suneet-s · 2021-03-02T16:15:53Z

...r/src/main/java/org/apache/druid/server/coordinator/UserCompactionTaskGranularityConfig.java

+
+import java.util.Objects;
+
+public class UserCompactionTaskGranularityConfig


javadocs please

I'll add in a followup change

suneet-s · 2021-03-02T16:20:39Z

server/src/test/java/org/apache/druid/server/coordinator/DataSourceCompactionConfigTest.java

@@ -238,7 +234,7 @@ public void testSerdeGranularitySpec() throws IOException
        null,
        new Period(3600),
        null,
-        new UniformGranularitySpec(Granularities.HOUR, null, null),


Does this change have any impact for upgrades / downgrades?

No it does not. The json serialization / deserialization key remains the same

suneet-s · 2021-03-02T16:27:09Z

integration-tests/src/test/java/org/apache/druid/tests/indexer/ITCompactionTaskTest.java

+        template,
+        "%%GRANULARITY_SPEC%%",
+        jsonMapper.writeValueAsString(granularityMap)
+    );


I think this means that there will always be a granulartySpec provided as part of the compaction task spec. Do we have a test running a compact task without specifying a granularitySpec. I'm trying to think of what happens in an upgrade/downgrade scenario where the compaction job being submitted may not have the granularitySpec as part of the compaction spec

There are compaction IT (existing ones) still use compaction spec (/indexer/wikipedia_compaction_task.json) without granularitySpec (for example, testCompaction()).

Basically,

template = StringUtils.replace( template, "%%GRANULARITY_SPEC%%", jsonMapper.writeValueAsString(granularityMap) );

will be a no-op for those tests that uses compaction spec without granularitySpec (/indexer/wikipedia_compaction_task.json)

suneet-s · 2021-03-02T16:29:25Z

integration-tests/src/test/java/org/apache/druid/tests/indexer/ITCompactionTaskTest.java

+      checkCompactionIntervals(expectedIntervalAfterCompaction);
+    }
+  }
+


Can you add a test for what happens when someone tries to go from a larger query granularity to a smaller query granularity - like going from data rolled up to the YEAR, and trying to unroll it to MONTH. I assume the compaction task would fail. What is the error message in this case. Is it clear to the user what the problem is?

That functionality is not changed in this PR. It will be the same as doing a reindex task with larger query granularity to a smaller query granularity. Compaction task will reports the result of the index task (which would be the same as doing a reindex task)

suneet-s · 2021-03-02T16:36:11Z

Assuming the doc changes are coming in a follow up change - we should make it clear what the tradeoffs are for rolling up data.

If someone accidentally rolls up data to a coarser query granularity (MONTH -> YEAR) - do they have any way to get the segments with the finer queryGranularity back (MONTH)?

Note that the Compaction task ultimately still converts CompactionGranularitySpec to a UniformGranularitySpec when creating the index ingestion spec

I don't understand the granularity specs enough. What's the impact of using UniformGranularitySpec instead of ArbitraryGranularitySpec? Is this something a user should be aware of?

maytasm · 2021-03-02T19:22:51Z

Assuming the doc changes are coming in a follow up change - we should make it clear what the tradeoffs are for rolling up data.

If someone accidentally rolls up data to a coarser query granularity (MONTH -> YEAR) - do they have any way to get the segments with the finer queryGranularity back (MONTH)?

Note that the Compaction task ultimately still converts CompactionGranularitySpec to a UniformGranularitySpec when creating the index ingestion spec

I don't understand the granularity specs enough. What's the impact of using UniformGranularitySpec instead of ArbitraryGranularitySpec? Is this something a user should be aware of?

The doc change is coming in a separate PR (#10935). I will make it clear that If someone rolls up data to a coarser query granularity (MONTH -> YEAR) -the segment with finer queryGranularity (MONTH) will be overshadowed. Those segments may be remove from deep storage if a kill task is run on those intervals. Hence, user can lose data with finer queryGranularity (MONTH).

Regarding UniformGranularitySpec vs. ArbitraryGranularitySpec. This was a design choice of existing implementation. IT was not changed in this PR. It is not something a user should be aware of. UniformGranularitySpec works much better in this case as Druid will automatically buckets down the interval according to the segmentGranularity. A ArbitraryGranularitySpec requires you to explicitly list out all bucket intervals.

maytasm added 8 commits February 18, 2021 14:20

add query granularity to compaction task

3c7f368

fix checkstyle

bf5f497

fix checkstyle

3188e97

fix test

40bc1d0

fix test

f2dadfb

add tests

c08a9ed

fix test

844e404

fix test

d73a845

loquisgon reviewed Feb 22, 2021

View reviewed changes

maytasm added 4 commits February 26, 2021 13:17

cleanup

4d0224d

rename class

c4668c6

fix test

ecd4189

fix test

74c4f0e

jon-wei approved these changes Mar 2, 2021

View reviewed changes

maytasm added 2 commits March 1, 2021 23:40

add test

da3cf03

fix test

e8aacf3

suneet-s reviewed Mar 2, 2021

View reviewed changes

suneet-s added Release Notes Area - Ingestion labels Mar 2, 2021

suneet-s reviewed Mar 2, 2021

View reviewed changes

maytasm merged commit b7b0ee8 into apache:master Mar 2, 2021

maytasm deleted the IMPLY-5800 branch March 2, 2021 19:23

maytasm mentioned this pull request Mar 2, 2021

Add javadoc and test for Granularity configs in Compaction / Auto Compaction #10938

Merged

9 tasks

techdocsmith mentioned this pull request Mar 4, 2021

First refactor of compaction docs #10935

Merged

2 tasks

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add query granularity to compaction task #10900

Add query granularity to compaction task #10900

maytasm commented Feb 19, 2021 •

edited

Loading

loquisgon Feb 22, 2021 •

edited

Loading

loquisgon Feb 23, 2021

maytasm Feb 23, 2021

maytasm Feb 23, 2021

maytasm Feb 23, 2021

loquisgon Feb 23, 2021

maytasm Feb 26, 2021

jon-wei commented Mar 2, 2021

suneet-s Mar 2, 2021 •

edited

Loading

maytasm Mar 2, 2021

suneet-s Mar 2, 2021

maytasm Mar 2, 2021

suneet-s Mar 2, 2021

maytasm Mar 2, 2021

suneet-s Mar 2, 2021

maytasm Mar 2, 2021

maytasm Mar 2, 2021

suneet-s Mar 2, 2021

maytasm Mar 2, 2021

suneet-s commented Mar 2, 2021

maytasm commented Mar 2, 2021 •

edited

Loading


		import java.util.Objects;

		public class CompactionGranularitySpec


		import java.util.Objects;

		public class UserCompactionTaskGranularityConfig

Add query granularity to compaction task #10900

Add query granularity to compaction task #10900

Conversation

maytasm commented Feb 19, 2021 • edited Loading

Description

loquisgon Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei commented Mar 2, 2021

suneet-s Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suneet-s commented Mar 2, 2021

maytasm commented Mar 2, 2021 • edited Loading

maytasm commented Feb 19, 2021 •

edited

Loading

loquisgon Feb 22, 2021 •

edited

Loading

suneet-s Mar 2, 2021 •

edited

Loading

maytasm commented Mar 2, 2021 •

edited

Loading