-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update window functions doc #15902
update window functions doc #15902
Conversation
…ct to the top of the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass left some comments
docs/querying/sql-functions.md
Outdated
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Returns the cumulative distribution of the current row within the window calculated as `number of window rows at the same rank or higher than current row` / `total window rows`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we indicate if the value ranges between 1/#rows and 1 similar to what postgres does ?
docs/querying/sql-functions.md
Outdated
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Returns the value for the expression for the first row within the window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Maybe value evaluated for the expression
docs/querying/sql-functions.md
Outdated
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Returns the value evaluated at the row that precedes the current row by the offset number within the window. `offset` defaults to 1 if not provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it a bit clear like returns the value evaluated at the row which is offset rows preceding the current row within the window
@@ -876,6 +916,14 @@ Returns the value of a numeric or string expression corresponding to the latest | |||
|
|||
Returns the value of a numeric or string expression corresponding to the latest time value from `timestampExpr`. | |||
|
|||
## LEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sorted alphabetically, since LEAD and LAG are similar I expected them to be together but if alphabetical it's all good too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alphabetically. window functions are mixed in with others.
docs/querying/sql-functions.md
Outdated
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Returns the value evaluated at the row that follows the current row by the offset number within the window; if there is no such row, returns the given default value. `offset` defaults to 1 if not provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe similar wording as lag
docs/querying/sql-functions.md
Outdated
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Returns the rank of the row calculated as a percentage according to the formula: `(rank - 1) / (total window rows - 1)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Returns relative rank of the current row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably an alternate way to write the formula would be RANK() OVER (window) / COUNT(1) OVER (window)
|
||
`RANK()` | ||
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should specify with gaps. @kgyrtkirk please chime in
docs/querying/sql-functions.md
Outdated
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Returns the number of the row within the window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention counting from 1 as postgres does.
dimensions, | ||
aggregation function(s) | ||
window_function() | ||
OVER ( PARTITION BY partitioning expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[PARTITION BY ...] [ORDER BY ...] as these can be optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the section about window frames could be moved up here; as that gives a better understanding of partition by and other parts of the frame.
As this sections is about syntax
- possibly remove the select
and other non-related things; and only keep
window_function() OVER window
I think it would be important to somehow show that the window
can be specified later as well
window_function() OVER w
from t
WINDOW w AS (PARTITION BY ... )
... | ||
``` | ||
|
||
Druid applies the GROUP BY dimensions first before calculating all non-window aggregation functions. Then it applies the window function over the aggregate results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also specify what an empty OVER() indicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumyava, @kgyrtkirk what does an empty OVER()
indicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty OVER() or absence of a partition by indicates that all the data belong to a single window
ORDER BY channel,TIME_FLOOR(__time, 'PT1H'), user | ||
``` | ||
|
||
The windows only define the PARTITION BY clause of the window, so Druid performs the calculation over the whole dataset for each value of the partition expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to repeat with the earlier line 150
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NTILE(2)
will return 1
and 2
- I think we are good
docs/querying/sql-functions.md
Outdated
|
||
**Function type:** [Window](sql-window-functions.md#window-function-reference) | ||
|
||
Returns the rank of the row calculated as a percentage according to the formula: `(rank - 1) / (total window rows - 1)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably an alternate way to write the formula would be RANK() OVER (window) / COUNT(1) OVER (window)
```sql | ||
window function | ||
OVER ( | ||
[ PARTITION BY partition expression] ORDER BY order expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORDER BY
is not mandatory
dimensions, | ||
aggregation function(s) | ||
window_function() | ||
OVER ( PARTITION BY partitioning expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the section about window frames could be moved up here; as that gives a better understanding of partition by and other parts of the frame.
As this sections is about syntax
- possibly remove the select
and other non-related things; and only keep
window_function() OVER window
I think it would be important to somehow show that the window
can be specified later as well
window_function() OVER w
from t
WINDOW w AS (PARTITION BY ... )
Apart from adding a line about the empty OVER() rest LGTM ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a spelling change to get the static check to pass
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
- _N_ ROWS PRECEDING: _N_ rows before the current row as ordered by the order expression | ||
- CURRENT ROW: the current row | ||
- _N_ ROWS FOLLOWING: _N_ rows after the current row as ordered by the order expression | ||
- _UNBOUNDED FOLLOWING_: to the end of the window as ordered by the order expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- _UNBOUNDED FOLLOWING_: to the end of the window as ordered by the order expression | |
- UNBOUNDED FOLLOWING: to the end of the window as ordered by the order expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this one says UNBOUNDED but line 131 says UNBOUND. Should they be the same or is this intentional?
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updates LGTM 🦖
Adds more detail about window frames.
Adds information about the strict window frame check: #15746
This PR has: