MSQ: Add limitHint to global-sort shuffles. by gianm · Pull Request #16911 · apache/druid

gianm · 2024-08-16T17:32:30Z

This allows pushing down limits into the SuperSorter.

.../multi-stage-query/src/main/java/org/apache/druid/msq/kernel/LimitHintJsonIncludeFilter.java

+ * and also requires spotbugs exclusions (see spotbugs-exclude.xml).
+ */
+@SuppressWarnings({"EqualsAndHashcode", "EqualsHashCode"})
+public class LimitHintJsonIncludeFilter


LakshSingla

WDYT of pushing this limit hint into the InputChannelsImpl#openSorted (perhaps in a future patch)? This will also reduce the amount of data a single worker would have to read per partition.

LakshSingla · 2024-08-18T15:41:22Z

processing/src/main/java/org/apache/druid/frame/processor/SuperSorter.java

+      } else if (rowLimit == 0 && activeProcessors == 0) {
+        // We had a row limit, and got it all the way down to zero.
+        // Generate empty output channels for any partitions that we haven't written yet.
+        superSorterProgressTracker.markTriviallyComplete();


If there was a row limit initially and we brought it down to zero, should it be called trivially completed? Trivially completed means that there wasn't any data to begin with.

Oh, yeah, you're right. I changed this to instead call addMergedBatchesForLevel to "fill in" the empty channels.

LakshSingla · 2024-08-18T15:46:38Z

...e/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/groupby/GroupByQueryKit.java

Can similar optimisation be applied for ScanQueryKit as well?

Good point. I just added it. It's only there for the case where the scan query requires a sort. In theory it could be there for the non-sort case too, but in this patch only the sorting shuffle spec has a limit hint. I think we could add limitHint to other specs in the future.

gianm · 2024-08-20T06:52:55Z

WDYT of pushing this limit hint into the InputChannelsImpl#openSorted (perhaps in a future patch)? This will also reduce the amount of data a single worker would have to read per partition.

I do think that'd make sense as a followup.

LakshSingla · 2024-08-20T12:58:45Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/kernel/ShuffleSpec.java

+   *
+   * Implementations may also ignore this hint completely, or may apply a limit that is somewhat higher than this hint.
+   */
+  default long limitHint()


I was wondering if there's any merit in separating the limit hint into a POJO, and having both limit and offset baked into it. Something like DefaultLimitSpec without the columns. It's up to the reader to calculate the combined limit and use that, instead of baking that knowledge into a long.

I was thinking of use cases when we want to pushdown this limit into other portions of MSQ's stack and want to distinguish between rows [0..offset) (thrown away) and rows [offset, offset + limit) (kept)

I don't think offset can be pushed down? Limit can be pushed down when sorting because if some row is in the top N globally, it must also be in the first N rows of whichever partition it appears in.

But with offset, if for example you have LIMIT N OFFSET M, we can't push down OFFSET M (i.e. skip M rows). It is possible that some of the first M rows in partition A still need to appear in the final resultset, perhaps because some of them are greater than any of the first M rows in the globally-sorted result. So the best we can do is push down LIMIT N + M.

LakshSingla · 2024-08-30T04:16:44Z

.../multi-stage-query/src/main/java/org/apache/druid/msq/kernel/LimitHintJsonIncludeFilter.java

For my understanding, how does this work?

It's this API: https://fasterxml.github.io/jackson-annotations/javadoc/2.9/com/fasterxml/jackson/annotation/JsonInclude.Include.html#CUSTOM

Value that indicates that separate filter Object (specified by JsonInclude.valueFilter() for value itself, and/or JsonInclude.contentFilter() for contents of structured types) is to be used for determining inclusion criteria. Filter object's equals() method is called with value to serialize; if it returns true value is excluded (that is, filtered out); if false value is included.

It's kind of goofy, but it's the only tool Jackson provides us for keeping the serialized JSON clean other than "include non-null", "include non-default", and "include non-empty".

Logical merge conflict between apache#16911 and apache#16914.

Logical merge conflict between #16911 and #16914.

* MSQ: Add limitHint to global-sort shuffles. This allows pushing down limits into the SuperSorter. * Test fixes. * Add limitSpec to ScanQueryKit. Fix SuperSorter tracking.

Logical merge conflict between apache#16911 and apache#16914.

Previously, the processor used "remainingChannels" to track the number of non-null entries of currentFrame. Now, "remainingChannels" tracks the number of channels that are unfinished. The difference is subtle. In the previous code, when an input channel was blocked upon exiting nextFrame(), the "currentFrames" entry would be null, and therefore the "remainingChannels" variable would be decremented. After the next await and call to populateCurrentFramesAndTournamentTree(), "remainingChannels" would be incremented if the channel had become unblocked after awaiting. This means that finished(), which returned true if remainingChannels was zero, would not be reliable if called between nextFrame() and the next await + populateCurrentFramesAndTournamentTree(). This patch changes things such that finished() is always reliable. This fixes a regression introduced in PR apache#16911, which added a call to finished() that was, at that time, unsafe.

Previously, the processor used "remainingChannels" to track the number of non-null entries of currentFrame. Now, "remainingChannels" tracks the number of channels that are unfinished. The difference is subtle. In the previous code, when an input channel was blocked upon exiting nextFrame(), the "currentFrames" entry would be null, and therefore the "remainingChannels" variable would be decremented. After the next await and call to populateCurrentFramesAndTournamentTree(), "remainingChannels" would be incremented if the channel had become unblocked after awaiting. This means that finished(), which returned true if remainingChannels was zero, would not be reliable if called between nextFrame() and the next await + populateCurrentFramesAndTournamentTree(). This patch changes things such that finished() is always reliable. This fixes a regression introduced in PR #16911, which added a call to finished() that was, at that time, unsafe.

Previously, the processor used "remainingChannels" to track the number of non-null entries of currentFrame. Now, "remainingChannels" tracks the number of channels that are unfinished. The difference is subtle. In the previous code, when an input channel was blocked upon exiting nextFrame(), the "currentFrames" entry would be null, and therefore the "remainingChannels" variable would be decremented. After the next await and call to populateCurrentFramesAndTournamentTree(), "remainingChannels" would be incremented if the channel had become unblocked after awaiting. This means that finished(), which returned true if remainingChannels was zero, would not be reliable if called between nextFrame() and the next await + populateCurrentFramesAndTournamentTree(). This patch changes things such that finished() is always reliable. This fixes a regression introduced in PR apache#16911, which added a call to finished() that was, at that time, unsafe.

…17194) Previously, the processor used "remainingChannels" to track the number of non-null entries of currentFrame. Now, "remainingChannels" tracks the number of channels that are unfinished. The difference is subtle. In the previous code, when an input channel was blocked upon exiting nextFrame(), the "currentFrames" entry would be null, and therefore the "remainingChannels" variable would be decremented. After the next await and call to populateCurrentFramesAndTournamentTree(), "remainingChannels" would be incremented if the channel had become unblocked after awaiting. This means that finished(), which returned true if remainingChannels was zero, would not be reliable if called between nextFrame() and the next await + populateCurrentFramesAndTournamentTree(). This patch changes things such that finished() is always reliable. This fixes a regression introduced in PR #16911, which added a call to finished() that was, at that time, unsafe. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

MSQ: Add limitHint to global-sort shuffles.

f76893e

This allows pushing down limits into the SuperSorter.

gianm added Performance Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Aug 16, 2024

github-actions bot added the Area - Batch Ingestion label Aug 16, 2024

github-advanced-security bot found potential problems Aug 16, 2024

View reviewed changes

LakshSingla reviewed Aug 18, 2024

View reviewed changes

gianm added 2 commits August 19, 2024 23:13

Merge branch 'master' into msq-limit-hint

67f64e9

Test fixes.

7c4369d

Add limitSpec to ScanQueryKit. Fix SuperSorter tracking.

9172768

LakshSingla reviewed Aug 20, 2024

View reviewed changes

LakshSingla approved these changes Aug 30, 2024

View reviewed changes

LakshSingla reviewed Aug 30, 2024

View reviewed changes

gianm merged commit 786c959 into apache:master Sep 3, 2024

gianm deleted the msq-limit-hint branch September 3, 2024 16:05

gianm added a commit to gianm/druid that referenced this pull request Sep 3, 2024

Fix logical merge conflict in SuperSorterTest.

a8ac685

Logical merge conflict between apache#16911 and apache#16914.

gianm mentioned this pull request Sep 3, 2024

Fix logical merge conflict in SuperSorterTest. #16993

Merged

abhishekrb19 pushed a commit that referenced this pull request Sep 3, 2024

Fix logical merge conflict in SuperSorterTest. (#16993)

57c4b55

Logical merge conflict between #16911 and #16914.

edgar2020 pushed a commit to edgar2020/druid that referenced this pull request Sep 5, 2024

Fix logical merge conflict in SuperSorterTest. (apache#16993)

93c6671

Logical merge conflict between apache#16911 and apache#16914.

gianm mentioned this pull request Sep 17, 2024

FrameChannelMerger: Fix incorrect behavior of finished(). #17088

Merged

kfaraz added this to the 31.0.0 milestone Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSQ: Add limitHint to global-sort shuffles.#16911

MSQ: Add limitHint to global-sort shuffles.#16911
gianm merged 4 commits intoapache:masterfrom
gianm:msq-limit-hint

gianm commented Aug 16, 2024

Uh oh!

Check failure

LakshSingla left a comment

Uh oh!

LakshSingla Aug 18, 2024

Uh oh!

gianm Aug 20, 2024 •

edited

Loading

Uh oh!

LakshSingla Aug 18, 2024

Uh oh!

gianm Aug 20, 2024

Uh oh!

gianm commented Aug 20, 2024

Uh oh!

LakshSingla Aug 20, 2024 •

edited

Loading

Uh oh!

gianm Aug 22, 2024

Uh oh!

LakshSingla Aug 30, 2024 •

edited

Loading

Uh oh!

gianm Sep 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gianm commented Aug 16, 2024

Uh oh!

Check failure

LakshSingla left a comment

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

gianm Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

gianm Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

gianm commented Aug 20, 2024

Uh oh!

LakshSingla Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gianm Aug 20, 2024 •

edited

Loading

LakshSingla Aug 20, 2024 •

edited

Loading

LakshSingla Aug 30, 2024 •

edited

Loading