Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep query granularity of compacted segments after compaction #10856

Merged
merged 9 commits into from
Feb 18, 2021

Conversation

loquisgon
Copy link

@loquisgon loquisgon commented Feb 5, 2021

Current behavior

When two or more segments are compacted the new compacted segment’s query granularity is null regardless of the query granularities of the segments that were compacted.

Expected behavior

When two or more segments are compacted the new compacted segment’s query granularity should reflect the query granularities of the segments that were compacted. If all the segments that were compacted had the same query granularity then the compacted segment will have the same query granularity when at least one segment’s granularity is non-null. If the compacted segments had different query granularities then the compacted segment will have the finer of all the granularities. The existing method of class Granularities: List granularitiesFinerThan(final Granularity gran0) skips NONE and ALL granularities so we will write a new Comparator that includes NONE and ALL. In particular, If at least one segments has NONE then the resulting granularity for the newly created, compacted segment, will also be NONE thus avoiding destructing data.

Reasoning of why we decided the expected behavior

When the compacted segments have the same query granularity the expected behavior makes sense without controversy. However when the query granularities of the segments that were compacted are different there are various choices. One choice is to pick the coarsest granularity. We decided against this because this is a destructive operation on some records of the segments that were compacted. Another choice is to use a configuration dependent flag. We decided against this so we give ourselves more time to learn about the data lifecycle management use cases. We will revisit this decision at a later point.

Impact on existing documentation

The new behavior needs to be documented. In particular if the segments that were compacted had different granularities it needs to be explained that the “finest” non-null granularity was chosen. It also needs to be documented that this choice may cause some records that previously had a coarsest query granularity to appear to have “spikes” (since now the whole segment has a finer granularity).

for (NonnullPair<QueryableIndex, DataSegment> pair : queryableIndexAndSegments) {
final QueryableIndex index = pair.lhs;
if (index.getMetadata() == null) {
throw new RE("Index metadata doesn't exist for segment[%s]", pair.rhs.getId());
}
// carry-overs (i.e. query granularity & rollup) are valid iff they are the same in every segment:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a left over comment... I will remove it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@maytasm maytasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add integration test that do compaction that calls SegmentMetadata queries to verify that queryGranularity is not null and matches what is expected

@@ -40,6 +41,30 @@

public abstract class Granularity implements Cacheable
{

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is javadoc meant to be on the compare method instead of the IS_FINER_THAN variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the IT test as requested above

Assert.assertTrue(Granularity.IS_FINER_THAN.compare(NONE, MINUTE) < 0);
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(MINUTE, NONE) > 0);
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(DAY, MONTH) < 0);
Granularity day = DAY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is this a variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(NONE, NONE)
Assert.assertTrue(Granularity.IS_FINER_THAN.compare(ALL, ALL)
too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create a variable because the comparator complained that it was being used against itself...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to explain why a variable is needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

dataSource,
new TimestampSpec(null, null, null),
new TimestampSpec(null, null, null
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove newline

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@suneet-s
Copy link
Contributor

suneet-s commented Feb 17, 2021

Added Release Notes I'm unsure if this needs to be called out, but the release manager can decide if this behavior needs to be described in the release notes or just the docs.

Impact on existing documentation

The new behavior needs to be documented. In particular if the segments that were compacted had different granularities it needs to be explained that the “finest” non-null granularity was chosen. It also needs to be documented that this choice may cause some records that previously had a coarsest query granularity to appear to have “spikes” (since now the whole segment has a finer granularity).

@loquisgon Which docs do you think should be updated? Could you include these doc updates in this PR or create a follow up issue so we don't lose track of the update.

cc @techdocsmith since you've been looking at docs more holistically recently

@techdocsmith
Copy link
Contributor

Thanks @suneet-s . @loquisgon , if you want to file a separate issue for the docs, I can work on it based upon the changes we discussed. If you want to keep them in this PR we can collaborate on it that way, too. I'm open.

@loquisgon
Copy link
Author

@suneet-s @techdocsmith I created an issue to track the doc changes: #10897

@maytasm maytasm merged commit eabad0f into apache:master Feb 18, 2021
@loquisgon loquisgon deleted the preserve-qg branch February 18, 2021 16:58
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants