Keep query granularity of compacted segments after compaction #10856

loquisgon · 2021-02-05T02:01:42Z

Current behavior

When two or more segments are compacted the new compacted segment’s query granularity is null regardless of the query granularities of the segments that were compacted.

Expected behavior

When two or more segments are compacted the new compacted segment’s query granularity should reflect the query granularities of the segments that were compacted. If all the segments that were compacted had the same query granularity then the compacted segment will have the same query granularity when at least one segment’s granularity is non-null. If the compacted segments had different query granularities then the compacted segment will have the finer of all the granularities. The existing method of class Granularities: List granularitiesFinerThan(final Granularity gran0) skips NONE and ALL granularities so we will write a new Comparator that includes NONE and ALL. In particular, If at least one segments has NONE then the resulting granularity for the newly created, compacted segment, will also be NONE thus avoiding destructing data.

Reasoning of why we decided the expected behavior

When the compacted segments have the same query granularity the expected behavior makes sense without controversy. However when the query granularities of the segments that were compacted are different there are various choices. One choice is to pick the coarsest granularity. We decided against this because this is a destructive operation on some records of the segments that were compacted. Another choice is to use a configuration dependent flag. We decided against this so we give ourselves more time to learn about the data lifecycle management use cases. We will revisit this decision at a later point.

Impact on existing documentation

The new behavior needs to be documented. In particular if the segments that were compacted had different granularities it needs to be explained that the “finest” non-null granularity was chosen. It also needs to be documented that this choice may cause some records that previously had a coarsest query granularity to appear to have “spikes” (since now the whole segment has a finer granularity).

loquisgon · 2021-02-05T02:12:35Z

indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java

    for (NonnullPair<QueryableIndex, DataSegment> pair : queryableIndexAndSegments) {
      final QueryableIndex index = pair.lhs;
      if (index.getMetadata() == null) {
        throw new RE("Index metadata doesn't exist for segment[%s]", pair.rhs.getId());
      }
+      // carry-overs (i.e. query granularity & rollup) are valid iff they are the same in every segment:


This is a left over comment... I will remove it

maytasm

Can you please also add integration test that do compaction that calls SegmentMetadata queries to verify that queryGranularity is not null and matches what is expected

maytasm · 2021-02-10T07:57:32Z

core/src/main/java/org/apache/druid/java/util/common/granularity/Granularity.java

@@ -40,6 +41,30 @@

 public abstract class Granularity implements Cacheable
 {
+
+  /**


nit: is javadoc meant to be on the compare method instead of the IS_FINER_THAN variable?

moved to method

Added the IT test as requested above

maytasm · 2021-02-10T08:06:09Z

core/src/test/java/org/apache/druid/java/util/common/GranularityTest.java

+    Assert.assertTrue(Granularity.IS_FINER_THAN.compare(NONE, MINUTE) < 0);
+    Assert.assertTrue(Granularity.IS_FINER_THAN.compare(MINUTE, NONE) > 0);
+    Assert.assertTrue(Granularity.IS_FINER_THAN.compare(DAY, MONTH) < 0);
+    Granularity day = DAY;


nit: why is this a variable?

Can you add
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(NONE, NONE)
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(ALL, ALL)
too?

I had to create a variable because the comparator complained that it was being used against itself...

Added a comment to explain why a variable is needed

maytasm · 2021-02-10T08:13:27Z

indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java

        dataSource,
-        new TimestampSpec(null, null, null),
+        new TimestampSpec(null, null, null
+        ),


nit: remove newline

…isting comment

…nularity

…ty propagation affecting size

suneet-s · 2021-02-17T19:03:14Z

Added Release Notes I'm unsure if this needs to be called out, but the release manager can decide if this behavior needs to be described in the release notes or just the docs.

Impact on existing documentation

The new behavior needs to be documented. In particular if the segments that were compacted had different granularities it needs to be explained that the “finest” non-null granularity was chosen. It also needs to be documented that this choice may cause some records that previously had a coarsest query granularity to appear to have “spikes” (since now the whole segment has a finer granularity).

@loquisgon Which docs do you think should be updated? Could you include these doc updates in this PR or create a follow up issue so we don't lose track of the update.

cc @techdocsmith since you've been looking at docs more holistically recently

techdocsmith · 2021-02-17T19:09:49Z

Thanks @suneet-s . @loquisgon , if you want to file a separate issue for the docs, I can work on it based upon the changes we discussed. If you want to keep them in this PR we can collaborate on it that way, too. I'm open.

loquisgon · 2021-02-17T23:35:42Z

@suneet-s @techdocsmith I created an issue to track the doc changes: #10897

loquisgon commented Feb 5, 2021

View reviewed changes

maytasm requested changes Feb 10, 2021

View reviewed changes

maytasm approved these changes Feb 17, 2021

View reviewed changes

Agustin Gonzalez added 8 commits February 17, 2021 09:44

Keep query granularity of compacted segments after compaction

1de6bb9

Protect against null isRollup

c378159

Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an ex…

d510345

…isting comment

Make sure that NONE is also included when comparing for the finer gra…

4a143ad

…nularity

Update integration test check for segment size due to query granulari…

d076ef3

…ty propagation affecting size

Minor code cleanup

e27aaf9

Added functional test to verify queryGranlarity after compaction

5ab5f3b

Minor style fix

0b4b91f

loquisgon force-pushed the preserve-qg branch from 8e868fb to 0b4b91f Compare February 17, 2021 18:04

suneet-s added Area - Ingestion Release Notes labels Feb 17, 2021

Update unit tests

1c747d7

loquisgon mentioned this pull request Feb 17, 2021

Update docs related to update query granularity changes #10897

Closed

maytasm merged commit eabad0f into apache:master Feb 18, 2021

loquisgon deleted the preserve-qg branch February 18, 2021 16:58

techdocsmith mentioned this pull request Mar 2, 2021

First refactor of compaction docs #10935

Merged

2 tasks

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep query granularity of compacted segments after compaction #10856

Keep query granularity of compacted segments after compaction #10856

loquisgon commented Feb 5, 2021 •

edited

Loading

loquisgon Feb 5, 2021

loquisgon Feb 9, 2021

maytasm left a comment

maytasm Feb 10, 2021

loquisgon Feb 10, 2021

loquisgon Feb 17, 2021

maytasm Feb 10, 2021

maytasm Feb 10, 2021

loquisgon Feb 10, 2021

loquisgon Feb 10, 2021

loquisgon Feb 10, 2021

maytasm Feb 10, 2021

loquisgon Feb 10, 2021

suneet-s commented Feb 17, 2021 •

edited

Loading

Impact on existing documentation

techdocsmith commented Feb 17, 2021

loquisgon commented Feb 17, 2021

Keep query granularity of compacted segments after compaction #10856

Keep query granularity of compacted segments after compaction #10856

Conversation

loquisgon commented Feb 5, 2021 • edited Loading

Current behavior

Expected behavior

Reasoning of why we decided the expected behavior

Impact on existing documentation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maytasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suneet-s commented Feb 17, 2021 • edited Loading

Impact on existing documentation

techdocsmith commented Feb 17, 2021

loquisgon commented Feb 17, 2021

loquisgon commented Feb 5, 2021 •

edited

Loading

suneet-s commented Feb 17, 2021 •

edited

Loading