-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make WindowFrames more specific #16741
Make WindowFrames more specific #16741
Conversation
this patch changes the WindowFrame internals / representation a bit; introduces a dedicated frametype for rows, groups and unbounded.
sql/src/test/java/org/apache/druid/sql/calcite/CalciteWindowQueryTest.java
Fixed
Show fixed
Hide fixed
@JsonSubTypes.Type(name = "rows", value = WindowFrame.Rows.class), | ||
@JsonSubTypes.Type(name = "groups", value = WindowFrame.Groups.class), | ||
}) | ||
public interface WindowFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently as we just support only UNBOUNDED and CURRENT ROW for RANGE (which would be same for GROUPS), so this looks good to use groups
to represent RANGE queries as well. But later on, if we want to support both, shouldnt we have something to distinguish them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we will have that; later when we add support for real RANGE
stuff; we will add a new JsonSubType
for that -
this makes it clear the right now we don't support generic RANGE
stuff :)
this.orderBy = ImmutableList.copyOf(orderBy); | ||
} | ||
|
||
public List<String> getOrderByColNames() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have getOrderByColumns
as well to read the data along with direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we are not using the "direction" right now during processing ; and that's because it must be pre-ordered...
shouldn't we ask for a list of strings instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, even I checked it we dont use the direction of cols when we are at this level
sql/src/test/java/org/apache/druid/sql/calcite/CalciteWindowQueryTest.java
Show resolved
Hide resolved
@@ -371,7 +370,7 @@ public void testUnboundedWindowedAggregation() | |||
FramedOnHeapAggregatable agger = FramedOnHeapAggregatable.fromRAC(rac); | |||
|
|||
final RowsAndColumns results = agger.aggregateAll( | |||
new WindowFrame(WindowFrame.PeerType.ROWS, true, 0, true, 0, null), | |||
WindowFrame.rows(null, null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between WindowFrame.rows(null, 0)
and WindowFrame.unbounded()
? Is it just the rows frame unwraps as OffsetFrame
? can we use them interchangeably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WindowFrame.unbounded()
is the same as WindowFrame.rows(null,null)
; during compilation we will do unbounded
in those cases in Windowing
changeds this call to unbounded()
...c/main/java/org/apache/druid/query/rowsandcols/semantic/DefaultFramedOnHeapAggregatable.java
Outdated
Show resolved
Hide resolved
private static Iterable<AggInterval> buildUnboundedIteratorFor(AppendableRowsAndColumns rac) | ||
{ | ||
int[] groupBoundaries = new int[] {0, rac.numRows()}; | ||
return new GroupIteratorForWindowFrame(WindowFrame.rows(0, 0), groupBoundaries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt this be WindowFrame.rows(null, null)
for better readability? Technically anything would work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes; it could be anything as long as current row
is inside :D
changed it to WindowFrame.rows(null, null)
:D
public class WindowFrame | ||
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") | ||
@JsonSubTypes(value = { | ||
@JsonSubTypes.Type(name = "unbounded", value = WindowFrame.Unbounded.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really the right abstraction? "unbounded" isn't really a frame type, rather a frame start/end thing?
Why not just make a singleton instance for an 'unbounded' Rows.class
? (assuming this is meant to be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
or whatever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is more like just syntactic sugar for native queries; yes it could also be {type: rows }
- that's kinda the same - unbounded
is just more explicit if its present in the native query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed unbounded
- it made things simpler as well!
@JsonProperty("uppUnbounded") boolean upperUnbounded, | ||
@JsonProperty("uppOffset") int upperOffset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to cause compatibility issues during rolling upgrade because on OffsetFrame
these are replaced by upperOffset
? I forget exactly if/where these get sent over the wire (msq maybe?). Does OffsetFrame
needs to accept these old property names too just to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's kinda the main reason it's not yet rolled out - and its not on the wire; so we could still change it...
|
||
public class WindowFrame | ||
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to cause issues during rolling upgrade? old json used peerType
to pick rows/groups, but is probably uppercase from enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could not cause such issues - as it have not worked correctly previously
static Groups groups( | ||
final Integer lowerOffset, | ||
final Integer upperOffset, | ||
final List<String> orderByColumns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could all fit on one line (also not sure why style bot isn't flagging the ')' not being on a newline...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
joined it back to occupy a single line;
unfortunately I can't set that policy in my formatter
(cherry picked from commit 0b0ed1b0692a74ad0bd02fad350c3e26188baf81)
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") | ||
@JsonSubTypes(value = { | ||
@JsonSubTypes.Type(name = "rows", value = WindowFrame.Rows.class), | ||
@JsonSubTypes.Type(name = "groups", value = WindowFrame.Groups.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I have another question now that i re-read the PR description, why did RANGE
change to groups
? Is this because only group by queries are supported currently? Or was that restriction lifted? I naively would expect the syntax present in the query the user wrote to match what appears here, but that doesn't seem to be the case with this change. Is this confusing? How will this make a future refactor easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, do we even support using groups in SQL syntax? it seems like maybe we don't and there are no tests, is this also confusing? I'm confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to cover all aspects of your questions :)
why did RANGE change to groups
for a RANGE
if both endpoints are UNBOUNDED
or CURRENT ROW
there is no difference between RANGE
and GROUPS
(we support only these right now)
Is this because only group by queries are supported currently?
it has no connection to that - groups
is a frame evaluation mode; which groups rows with the same value together
I always find a different doc about this...today I've found this ; but its also in the standard.
I naively would expect the syntax present in the query the user wrote to match what appears here, but that doesn't seem to be the case with this change. Is this confusing?
I don't feel like its confusing - as the resulting plan may be different from what the user supplied already: filters and join conditions may have changed; predicates could be pushed and window frame specs may change
- I don't know if this would be optimized; but consider that the user gives:
SUM() OVER (PARTITION BY X ORDER BY Y,Z RANGE BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED PROCEEDING)
- that's a fully unbounded window...do we need to do the
ORDER
? if not...then it could be executed asSUM() OVER (PARTITION BY X)
How will this make a future refactor easier?
We will be able to add range
later along with the supporting algo enhancements
also, do we even support using groups in SQL syntax?
Although the SQL standard has it - the Calcite layer doesn't accept GROUPS
today - only RANGE
and ROWS
are allowed.
With this change we will naturally expose the support of groups
in the native layer - and native api users may even go beyond and use all features of groups
.
When the Calcite support will arrive for GROUPS
we will already have everything prepared to enable full support for it.
it seems like maybe we don't and there are no tests, is this also confusing? I'm confused.
As we recognize edge cases of RANGE
as GROUPS
- the sqlTest
plans contain rows
and groups
.
For all RANGE
frames we can't reliably execute correctly there is an exception explaining it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, i understand GROUPS is part of the standard, it just seems really strange that internally we "support" it but don't support it in SQL, and externally we support RANGE but dont support it internally. If we are doing this just for edge cases, then we should really add lots of javadocs to explain what is going on so it isn't so confusing maybe?
…anges-windowFrame
Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods
this patch changes the
WindowFrame
internals / representation a bit; introduces a dedicated frametype forrows
,groups
so that later if we decide to add better ranges support it will less likely need a bigger refactorit also changes how its represented in the native query: