Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update window functions doc #15902

Merged
merged 16 commits into from
Mar 7, 2024
Merged

update window functions doc #15902

merged 16 commits into from
Mar 7, 2024

Conversation

techdocsmith
Copy link
Contributor

Adds more detail about window frames.
Adds information about the strict window frame check: #15746

This PR has:

  • [x ] been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@soumyava soumyava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass left some comments


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Returns the cumulative distribution of the current row within the window calculated as `number of window rows at the same rank or higher than current row` / `total window rows`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we indicate if the value ranges between 1/#rows and 1 similar to what postgres does ?


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Returns the value for the expression for the first row within the window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Maybe value evaluated for the expression


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Returns the value evaluated at the row that precedes the current row by the offset number within the window. `offset` defaults to 1 if not provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it a bit clear like returns the value evaluated at the row which is offset rows preceding the current row within the window

@@ -876,6 +916,14 @@ Returns the value of a numeric or string expression corresponding to the latest

Returns the value of a numeric or string expression corresponding to the latest time value from `timestampExpr`.

## LEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sorted alphabetically, since LEAD and LAG are similar I expected them to be together but if alphabetical it's all good too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alphabetically. window functions are mixed in with others.


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Returns the value evaluated at the row that follows the current row by the offset number within the window; if there is no such row, returns the given default value. `offset` defaults to 1 if not provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe similar wording as lag


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Returns the rank of the row calculated as a percentage according to the formula: `(rank - 1) / (total window rows - 1)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Returns relative rank of the current row

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably an alternate way to write the formula would be RANK() OVER (window) / COUNT(1) OVER (window)


`RANK()`

**Function type:** [Window](sql-window-functions.md#window-function-reference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should specify with gaps. @kgyrtkirk please chime in


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Returns the number of the row within the window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention counting from 1 as postgres does.

dimensions,
aggregation function(s)
window_function()
OVER ( PARTITION BY partitioning expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PARTITION BY ...] [ORDER BY ...] as these can be optional

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the section about window frames could be moved up here; as that gives a better understanding of partition by and other parts of the frame.

As this sections is about syntax - possibly remove the select and other non-related things; and only keep

window_function() OVER  window

I think it would be important to somehow show that the window can be specified later as well

window_function() OVER  w
from t
WINDOW w AS (PARTITION BY ... )

...
```

Druid applies the GROUP BY dimensions first before calculating all non-window aggregation functions. Then it applies the window function over the aggregate results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also specify what an empty OVER() indicates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumyava, @kgyrtkirk what does an empty OVER() indicate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty OVER() or absence of a partition by indicates that all the data belong to a single window

ORDER BY channel,TIME_FLOOR(__time, 'PT1H'), user
```

The windows only define the PARTITION BY clause of the window, so Druid performs the calculation over the whole dataset for each value of the partition expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to repeat with the earlier line 150


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NTILE(2) will return 1 and 2 - I think we are good


**Function type:** [Window](sql-window-functions.md#window-function-reference)

Returns the rank of the row calculated as a percentage according to the formula: `(rank - 1) / (total window rows - 1)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably an alternate way to write the formula would be RANK() OVER (window) / COUNT(1) OVER (window)

```sql
window function
OVER (
[ PARTITION BY partition expression] ORDER BY order expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORDER BY is not mandatory

dimensions,
aggregation function(s)
window_function()
OVER ( PARTITION BY partitioning expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the section about window frames could be moved up here; as that gives a better understanding of partition by and other parts of the frame.

As this sections is about syntax - possibly remove the select and other non-related things; and only keep

window_function() OVER  window

I think it would be important to somehow show that the window can be specified later as well

window_function() OVER  w
from t
WINDOW w AS (PARTITION BY ... )

@soumyava
Copy link
Contributor

Apart from adding a line about the empty OVER() rest LGTM !

Copy link
Contributor

@soumyava soumyava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

Copy link
Contributor

@soumyava soumyava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a spelling change to get the static check to pass

docs/querying/sql-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
techdocsmith and others added 2 commits February 26, 2024 14:05
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
- _N_ ROWS PRECEDING: _N_ rows before the current row as ordered by the order expression
- CURRENT ROW: the current row
- _N_ ROWS FOLLOWING: _N_ rows after the current row as ordered by the order expression
- _UNBOUNDED FOLLOWING_: to the end of the window as ordered by the order expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- _UNBOUNDED FOLLOWING_: to the end of the window as ordered by the order expression
- UNBOUNDED FOLLOWING: to the end of the window as ordered by the order expression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this one says UNBOUNDED but line 131 says UNBOUND. Should they be the same or is this intentional?

docs/querying/sql-window-functions.md Outdated Show resolved Hide resolved
techdocsmith and others added 2 commits March 7, 2024 14:28
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
techdocsmith and others added 2 commits March 7, 2024 14:37
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Copy link
Member

@vtlim vtlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates LGTM 🦖

@vtlim vtlim merged commit 3caacba into apache:master Mar 7, 2024
11 checks passed
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants