Make WindowFrames more specific #16741

kgyrtkirk · 2024-07-16T06:45:44Z

this patch changes the WindowFrame internals / representation a bit; introduces a dedicated frametype for rows, groups so that later if we decide to add better ranges support it will less likely need a bigger refactor

it also changes how its represented in the native query:

-      frame: { peerType: "ROWS", lowUnbounded: true, lowOffset: 0, uppUnbounded: true, uppOffset: 0 }
+      frame: { type: rows }

-       frame: { peerType: "ROWS", lowUnbounded: false, lowOffset: 0, uppUnbounded: false, uppOffset: 2 }
+      frame: { type: rows, lowerOffset: 0, upperOffset: 2 }

-      frame: { peerType: "RANGE", lowUnbounded: true, lowOffset: 0, uppUnbounded: false, uppOffset: 0, orderBy: [{ column: l1, direction: ASC }] }
+      frame: { type: group, upperOffset: 0, orderBy: [{ column: l1, direction: ASC }] }

this patch changes the WindowFrame internals / representation a bit; introduces a dedicated frametype for rows, groups and unbounded.

sql/src/test/java/org/apache/druid/sql/calcite/CalciteWindowQueryTest.java

sreemanamala · 2024-07-16T15:18:24Z

processing/src/main/java/org/apache/druid/query/operator/window/WindowFrame.java

+    @JsonSubTypes.Type(name = "rows", value = WindowFrame.Rows.class),
+    @JsonSubTypes.Type(name = "groups", value = WindowFrame.Groups.class),
+})
+public interface WindowFrame


currently as we just support only UNBOUNDED and CURRENT ROW for RANGE (which would be same for GROUPS), so this looks good to use groups to represent RANGE queries as well. But later on, if we want to support both, shouldnt we have something to distinguish them?

yes we will have that; later when we add support for real RANGE stuff; we will add a new JsonSubType for that -

this makes it clear the right now we don't support generic RANGE stuff :)

sreemanamala · 2024-07-16T15:24:56Z

processing/src/main/java/org/apache/druid/query/operator/window/WindowFrame.java

+      this.orderBy = ImmutableList.copyOf(orderBy);
+    }
+
+    public List<String> getOrderByColNames()


would be nice to have getOrderByColumns as well to read the data along with direction

actually we are not using the "direction" right now during processing ; and that's because it must be pre-ordered...

shouldn't we ask for a list of strings instead ?

yes, even I checked it we dont use the direction of cols when we are at this level

sql/src/test/resources/calcite/tests/window/no_grouping.sqlTest

sql/src/test/java/org/apache/druid/sql/calcite/CalciteWindowQueryTest.java

sreemanamala · 2024-07-16T16:36:12Z

.../src/test/java/org/apache/druid/query/rowsandcols/semantic/FramedOnHeapAggregatableTest.java

@@ -371,7 +370,7 @@ public void testUnboundedWindowedAggregation()
    FramedOnHeapAggregatable agger = FramedOnHeapAggregatable.fromRAC(rac);

    final RowsAndColumns results = agger.aggregateAll(
-        new WindowFrame(WindowFrame.PeerType.ROWS, true, 0, true, 0, null),
+        WindowFrame.rows(null, null),


what is the difference between WindowFrame.rows(null, 0) and WindowFrame.unbounded()? Is it just the rows frame unwraps as OffsetFrame? can we use them interchangeably?

WindowFrame.unbounded() is the same as WindowFrame.rows(null,null) ; during compilation we will do unbounded in those cases in Windowing

changeds this call to unbounded()

...c/main/java/org/apache/druid/query/rowsandcols/semantic/DefaultFramedOnHeapAggregatable.java

sreemanamala · 2024-07-16T16:47:26Z

...c/main/java/org/apache/druid/query/rowsandcols/semantic/DefaultFramedOnHeapAggregatable.java

+  private static Iterable<AggInterval> buildUnboundedIteratorFor(AppendableRowsAndColumns rac)
+  {
+    int[] groupBoundaries = new int[] {0, rac.numRows()};
+    return new GroupIteratorForWindowFrame(WindowFrame.rows(0, 0), groupBoundaries);


Shouldnt this be WindowFrame.rows(null, null) for better readability? Technically anything would work

yes; it could be anything as long as current row is inside :D
changed it to WindowFrame.rows(null, null) :D

clintropolis · 2024-07-18T01:23:55Z

processing/src/main/java/org/apache/druid/query/operator/window/WindowFrame.java

-public class WindowFrame
+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
+@JsonSubTypes(value = {
+    @JsonSubTypes.Type(name = "unbounded", value = WindowFrame.Unbounded.class),


is this really the right abstraction? "unbounded" isn't really a frame type, rather a frame start/end thing?

Why not just make a singleton instance for an 'unbounded' Rows.class? (assuming this is meant to be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING or whatever)

this is more like just syntactic sugar for native queries; yes it could also be {type: rows } - that's kinda the same - unbounded is just more explicit if its present in the native query.

removed unbounded - it made things simpler as well!

clintropolis · 2024-07-18T01:31:40Z

processing/src/main/java/org/apache/druid/query/operator/window/WindowFrame.java

-      @JsonProperty("uppUnbounded") boolean upperUnbounded,
-      @JsonProperty("uppOffset") int upperOffset,


is this going to cause compatibility issues during rolling upgrade because on OffsetFrame these are replaced by upperOffset? I forget exactly if/where these get sent over the wire (msq maybe?). Does OffsetFrame needs to accept these old property names too just to be safe?

that's kinda the main reason it's not yet rolled out - and its not on the wire; so we could still change it...

clintropolis · 2024-07-18T01:35:22Z

processing/src/main/java/org/apache/druid/query/operator/window/WindowFrame.java


-public class WindowFrame
+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")


is this going to cause issues during rolling upgrade? old json used peerType to pick rows/groups, but is probably uppercase from enum?

it could not cause such issues - as it have not worked correctly previously

clintropolis · 2024-07-18T01:43:17Z

processing/src/main/java/org/apache/druid/query/operator/window/WindowFrame.java

+  static Groups groups(
+      final Integer lowerOffset,
+      final Integer upperOffset,
+      final List<String> orderByColumns)


nit: this could all fit on one line (also not sure why style bot isn't flagging the ')' not being on a newline...)

joined it back to occupy a single line;
unfortunately I can't set that policy in my formatter

(cherry picked from commit 0b0ed1b0692a74ad0bd02fad350c3e26188baf81)

…wFrame

clintropolis · 2024-07-23T20:12:57Z

processing/src/main/java/org/apache/druid/query/operator/window/WindowFrame.java

+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
+@JsonSubTypes(value = {
+    @JsonSubTypes.Type(name = "rows", value = WindowFrame.Rows.class),
+    @JsonSubTypes.Type(name = "groups", value = WindowFrame.Groups.class),


actually, I have another question now that i re-read the PR description, why did RANGE change to groups? Is this because only group by queries are supported currently? Or was that restriction lifted? I naively would expect the syntax present in the query the user wrote to match what appears here, but that doesn't seem to be the case with this change. Is this confusing? How will this make a future refactor easier?

also, do we even support using groups in SQL syntax? it seems like maybe we don't and there are no tests, is this also confusing? I'm confused.

I'll try to cover all aspects of your questions :)

why did RANGE change to groups

for a RANGE if both endpoints are UNBOUNDED or CURRENT ROW there is no difference between RANGE and GROUPS (we support only these right now)

Is this because only group by queries are supported currently?

it has no connection to that - groups is a frame evaluation mode; which groups rows with the same value together
I always find a different doc about this...today I've found this ; but its also in the standard.

I naively would expect the syntax present in the query the user wrote to match what appears here, but that doesn't seem to be the case with this change. Is this confusing?

I don't feel like its confusing - as the resulting plan may be different from what the user supplied already: filters and join conditions may have changed; predicates could be pushed and window frame specs may change

I don't know if this would be optimized; but consider that the user gives: SUM() OVER (PARTITION BY X ORDER BY Y,Z RANGE BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED PROCEEDING)

that's a fully unbounded window...do we need to do the ORDER? if not...then it could be executed as SUM() OVER (PARTITION BY X)

How will this make a future refactor easier?

We will be able to add range later along with the supporting algo enhancements

also, do we even support using groups in SQL syntax?

Although the SQL standard has it - the Calcite layer doesn't accept GROUPS today - only RANGE and ROWS are allowed.
With this change we will naturally expose the support of groups in the native layer - and native api users may even go beyond and use all features of groups.
When the Calcite support will arrive for GROUPS we will already have everything prepared to enable full support for it.

it seems like maybe we don't and there are no tests, is this also confusing? I'm confused.

As we recognize edge cases of RANGE as GROUPS - the sqlTest plans contain rows and groups.
For all RANGE frames we can't reliably execute correctly there is an exception explaining it.

yea, i understand GROUPS is part of the standard, it just seems really strange that internally we "support" it but don't support it in SQL, and externally we support RANGE but dont support it internally. If we are doing this just for edge cases, then we should really add lots of javadocs to explain what is going on so it isn't so confusing maybe?

…anges-windowFrame

…wFrame

Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods

Make WindowFrames more specific

19e2f00

this patch changes the WindowFrame internals / representation a bit; introduces a dedicated frametype for rows, groups and unbounded.

github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 16, 2024

github-advanced-security bot found potential problems Jul 16, 2024

View reviewed changes

sql/src/test/java/org/apache/druid/sql/calcite/CalciteWindowQueryTest.java Fixed Show fixed Hide fixed

sreemanamala reviewed Jul 16, 2024

View reviewed changes

kgyrtkirk added 2 commits July 17, 2024 12:12

cleanup; use orderByColumns for groups

ae3516e

add test/etc

f6aad3c

clintropolis reviewed Jul 18, 2024

View reviewed changes

kgyrtkirk added 5 commits July 18, 2024 08:39

changed to 1 line

f8e3225

(cherry picked from commit 0b0ed1b0692a74ad0bd02fad350c3e26188baf81)

Merge remote-tracking branch 'apache/master' into window-ranges-windo…

afe22ee

…wFrame

more stable query

b385822

remove unbounded

d294af3

use different agg in test

7c94597

sreemanamala approved these changes Jul 20, 2024

View reviewed changes

clintropolis approved these changes Jul 23, 2024

View reviewed changes

clintropolis reviewed Jul 23, 2024

View reviewed changes

kgyrtkirk added 6 commits July 24, 2024 08:49

remove unused local

fd2fff7

Merge commit 'a64e9a17462d34c246a8793b3efebcb4ad4a3736' into window-r…

fb0e04b

…anges-windowFrame

Merge remote-tracking branch 'apache/master' into window-ranges-windo…

97145ae

…wFrame

remove outdated restriction from docs

ab0abb9

add comment/test for groups not parsed

94a3b08

remove test...as its a feature

3b68dcf

github-actions bot added the Area - Documentation label Jul 24, 2024

kgyrtkirk merged commit 7e3fab5 into apache:master Jul 25, 2024
88 checks passed

kfaraz added this to the 31.0.0 milestone Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make WindowFrames more specific #16741

Make WindowFrames more specific #16741

kgyrtkirk commented Jul 16, 2024 •

edited

Loading

sreemanamala Jul 16, 2024

kgyrtkirk Jul 17, 2024

sreemanamala Jul 16, 2024

kgyrtkirk Jul 17, 2024

sreemanamala Jul 17, 2024

sreemanamala Jul 16, 2024

kgyrtkirk Jul 17, 2024

sreemanamala Jul 16, 2024

kgyrtkirk Jul 17, 2024

clintropolis Jul 18, 2024

kgyrtkirk Jul 18, 2024

kgyrtkirk Jul 19, 2024

clintropolis Jul 18, 2024

kgyrtkirk Jul 18, 2024

clintropolis Jul 18, 2024

kgyrtkirk Jul 18, 2024

clintropolis Jul 18, 2024

kgyrtkirk Jul 18, 2024

clintropolis Jul 23, 2024 •

edited

Loading

clintropolis Jul 23, 2024 •

edited

Loading

kgyrtkirk Jul 24, 2024

clintropolis Jul 24, 2024

		@JsonProperty("uppUnbounded") boolean upperUnbounded,
		@JsonProperty("uppOffset") int upperOffset,


		public class WindowFrame
		@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")

Make WindowFrames more specific #16741

Make WindowFrames more specific #16741

Conversation

kgyrtkirk commented Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

clintropolis Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgyrtkirk commented Jul 16, 2024 •

edited

Loading

clintropolis Jul 23, 2024 •

edited

Loading

clintropolis Jul 23, 2024 •

edited

Loading