Skip to content

Comments

Segment pruning for multi-dim partitioning given query domain#12046

Merged
abhishekagarwal87 merged 7 commits intoapache:masterfrom
AmatyaAvadhanula:feature-multidim_pruning
Dec 17, 2021
Merged

Segment pruning for multi-dim partitioning given query domain#12046
abhishekagarwal87 merged 7 commits intoapache:masterfrom
AmatyaAvadhanula:feature-multidim_pruning

Conversation

@AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Dec 9, 2021

Description

Segment pruning for multi-dim partitioning for a given query

DimensionRangeShardSpec#possibleInDomain has been modified to enhance pruning when multi-dim partitioning is used.

Idea

While iterating through each dimension,

  1. If query domain doesn't overlap with the set of permissible values in the segment, the segment is pruned.
  2. If the overlap happens on a boundary, consider the next dimensions.
  3. If there is an overlap within the segment boundaries, the segment cannot be pruned.

Example

Index on (Hour, Minute, Second). Index.size is 3
I)
start = (3, 25, 10)
end = (5, 10, 30)
query domain = {3} * [0, 10] * {10, 20, 30, 40}
EffectiveDomain[:1] == {3} == start[:1]
EffectiveDomain[:2] == {3} * ([0, 10] INTERSECTION [25, INF))
== {} -> PRUNE

II)
start = (3, 25, 10)
end = (5, 15, 30)
query domain = {4} * [0, 10] * {10, 20, 30, 40}
EffectiveDomain[:1] == {4} (!= {} && != start[:1] && != {end[:1]}) -> ACCEPT

III)
start = (3, 25, 10)
end = (5, 15, 30)
query domain = {5} * [0, 10] * {10, 20, 30, 40}
EffectiveDomain[:1] == {5} == end[:1]
EffectiveDomain[:2] == {5} * ([0, 10] INTERSECTION (-INF, 15])
== {5} * [0, 10] (! ={} && != {end[:2]}) -> ACCEPT

IV)
start = (3, 25, 10)
end = (5, 15, 30)
query domain = {5} * [15, 40] * {10, 20, 30, 40}
EffectiveDomain[:1] == {5} == end[:1]
EffectiveDomain[:2] == {5} * ([15, 40] INTERSECTION (-INF, 15])
== {5} * {15} == {end[:2]}
EffectiveDomain[:3] == {5} * {15} * ({10, 20, 30, 40} * (-INF, 30])
== {5} * {15} * {10, 20, 30} != {} -> ACCEPT

V)
start = (3, 25, 10)
end = (5, 15, 30)
query domain = {5} * [15, 40] * {50}
EffectiveDomain[:1] == {5} == end[:1]
EffectiveDomain[:2] == {5} * ([15, 40] INTERSECTION (-INF, 15])
== {5} * {15} == {end[:2]}
EffectiveDomain[:3] == {5} * {15} * ({40} * (-INF, 30])
== {5} * {15} * {}
== {} -> PRUNE


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as draft December 9, 2021 13:19
@lgtm-com
Copy link

lgtm-com bot commented Dec 9, 2021

This pull request introduces 2 alerts when merging b70c0c9 into 6ac4e2d - view on LGTM.com

new alerts:

  • 2 for Dereferenced variable may be null

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review December 15, 2021 07:25
@kfaraz
Copy link
Contributor

kfaraz commented Dec 15, 2021

Logic looks good to me.

@abhishekagarwal87 abhishekagarwal87 merged commit c0b1514 into apache:master Dec 17, 2021
@abhishekagarwal87
Copy link
Contributor

Merged the change since failure is unrelated and code change doesn't affect the phase 2 tests.

@abhishekagarwal87
Copy link
Contributor

Thank you @AmatyaAvadhanula for your first contribution.

@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants