Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query granularity to compaction task #10900

Merged
merged 14 commits into from
Mar 2, 2021
Merged

Conversation

maytasm
Copy link
Contributor

@maytasm maytasm commented Feb 19, 2021

Add query granularity to compaction task

Description

Add query granularity to compaction task. Note that query granularity is still not supported in auto compaction.
This PR also creates a new class CompactionGranularitySpec to use instead of GranularitySpec when passing query granularity and segment granularity into compaction task. This allows the null value (value not given by the user) to represent using the original current query granularity and segment granularity (this is the existing behavior for compaction task). Note that the Compaction task ultimately still converts CompactionGranularitySpec to a UniformGranularitySpec when creating the index ingestion spec

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.


import java.util.Objects;

public class CompactionGranularitySpec
Copy link

@loquisgon loquisgon Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is confusing to me since this class contains Granularitys not GranularitySpecs. Maybe CompactionGranularities is a better name? Also the introduction of this shallow class seems like it is adding to the cognitive load of the reader...was there other cleaner way to support the "null" case out of which this class was created?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply pass QueryGranularity (similar to SegmentGranularity already being added) to CompactionTask rather than creating a whole new class (CompactionGranularitySpec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a GranularitySpec that is only applicable for Auto-Compaction/Compaction task. It contains more than just a Granularity. Similar to UniformGranularitySpec, it contains segmentGranularity and queryGranularity. In my next PR which will adds rollup to Auto-Compaction/Compaction task, this class will also contains option to enable/disable rollup. It is different to UniformGranularitySpec that it doesn't support inputIntervals. There is enough distinction between GranularitySpec for Compaction task than normal Index task that I decided to create a new GranularitySpec class for Compaction task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class will not only contains Granularity hence, it is named CompactionGranularitySpec. I do not think it adds much cognitive load of the reader since it is a very simple POJO. I think it would be much more cognitive load to try to fit the existing UniformGranularitySpec here since it does not fit well with what Compaction task supports / doesn't supports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SegmentGranularity that is in CompactionTask class was already Deprecated and will be removed in #10912. This is in favor of combining SegmentGranularity and QueryGranularity and other Compaction task granularity-related configs (such as rollup which will be added in my next PR) in CompactionGranularitySpec class

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an opportunity here to unify the "granularity spec" design. The same design (i.e. class model) could be used for ingestion & compaction, unified. That may require refactoring of the "GranularitySpec" interface/class hierarchy though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this class as it is not needed. We already have ClientCompactionTaskGranularitySpec class which is use for passing around the Compaction task's ingestionSpec's granularities (segmentGranularity, queryGranularity).

@jon-wei
Copy link
Contributor

jon-wei commented Mar 2, 2021

LGTM after CI


import java.util.Objects;

public class ClientCompactionTaskQueryGranularitySpec
public class ClientCompactionTaskGranularitySpec
Copy link
Contributor

@suneet-s suneet-s Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a javadoc for this please. Why is this needed? Why we haven't chosen to use the GranularitySpec class that already exists? Whats the relationship between this and UserCompactionTaskGranularityConfig

I'm ok if you do this in a follow up change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add in a followup change


import java.util.Objects;

public class UserCompactionTaskGranularityConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add in a followup change

@@ -238,7 +234,7 @@ public void testSerdeGranularitySpec() throws IOException
null,
new Period(3600),
null,
new UniformGranularitySpec(Granularities.HOUR, null, null),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change have any impact for upgrades / downgrades?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not. The json serialization / deserialization key remains the same

template,
"%%GRANULARITY_SPEC%%",
jsonMapper.writeValueAsString(granularityMap)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means that there will always be a granulartySpec provided as part of the compaction task spec. Do we have a test running a compact task without specifying a granularitySpec. I'm trying to think of what happens in an upgrade/downgrade scenario where the compaction job being submitted may not have the granularitySpec as part of the compaction spec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are compaction IT (existing ones) still use compaction spec (/indexer/wikipedia_compaction_task.json) without granularitySpec (for example, testCompaction()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically,

    template = StringUtils.replace(
        template,
        "%%GRANULARITY_SPEC%%",
        jsonMapper.writeValueAsString(granularityMap)
    );

will be a no-op for those tests that uses compaction spec without granularitySpec (/indexer/wikipedia_compaction_task.json)

checkCompactionIntervals(expectedIntervalAfterCompaction);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for what happens when someone tries to go from a larger query granularity to a smaller query granularity - like going from data rolled up to the YEAR, and trying to unroll it to MONTH. I assume the compaction task would fail. What is the error message in this case. Is it clear to the user what the problem is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That functionality is not changed in this PR. It will be the same as doing a reindex task with larger query granularity to a smaller query granularity. Compaction task will reports the result of the index task (which would be the same as doing a reindex task)

@suneet-s
Copy link
Contributor

suneet-s commented Mar 2, 2021

Assuming the doc changes are coming in a follow up change - we should make it clear what the tradeoffs are for rolling up data.

If someone accidentally rolls up data to a coarser query granularity (MONTH -> YEAR) - do they have any way to get the segments with the finer queryGranularity back (MONTH)?

Note that the Compaction task ultimately still converts CompactionGranularitySpec to a UniformGranularitySpec when creating the index ingestion spec

I don't understand the granularity specs enough. What's the impact of using UniformGranularitySpec instead of ArbitraryGranularitySpec? Is this something a user should be aware of?

@maytasm
Copy link
Contributor Author

maytasm commented Mar 2, 2021

Assuming the doc changes are coming in a follow up change - we should make it clear what the tradeoffs are for rolling up data.

If someone accidentally rolls up data to a coarser query granularity (MONTH -> YEAR) - do they have any way to get the segments with the finer queryGranularity back (MONTH)?

Note that the Compaction task ultimately still converts CompactionGranularitySpec to a UniformGranularitySpec when creating the index ingestion spec

I don't understand the granularity specs enough. What's the impact of using UniformGranularitySpec instead of ArbitraryGranularitySpec? Is this something a user should be aware of?

The doc change is coming in a separate PR (#10935). I will make it clear that If someone rolls up data to a coarser query granularity (MONTH -> YEAR) -the segment with finer queryGranularity (MONTH) will be overshadowed. Those segments may be remove from deep storage if a kill task is run on those intervals. Hence, user can lose data with finer queryGranularity (MONTH).

Regarding UniformGranularitySpec vs. ArbitraryGranularitySpec. This was a design choice of existing implementation. IT was not changed in this PR. It is not something a user should be aware of. UniformGranularitySpec works much better in this case as Druid will automatically buckets down the interval according to the segmentGranularity. A ArbitraryGranularitySpec requires you to explicitly list out all bucket intervals.

@maytasm maytasm merged commit b7b0ee8 into apache:master Mar 2, 2021
@maytasm maytasm deleted the IMPLY-5800 branch March 2, 2021 19:23
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants