Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable reordering of window operators #16482

Merged
merged 7 commits into from
May 29, 2024
Merged

Conversation

sreemanamala
Copy link
Contributor

Description

This PR aims to enable the re-ordering of window operators in order to optimise the sort and partition operators.
Example :
SELECT m1, m2, SUM(m1) OVER(PARTITION BY m2) as sum1, SUM(m2) OVER() as sum2 from numFoo GROUP BY m1,m2

In order to compute this query, we can order the operators as to first compute the operators corresponding to sum2 and then place the operators corresponding to sum1 which would help us in reducing one sort operator if we order our operators by sum1 and then sum2.

Release note


Key changed/added classes in this PR
  • Windowing

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

* this allows us to sort the window groups in order to optimise the order of operators we would need to compute
* without losing the aggregate column name information (which is part of the computed WindowOperatorFactory)
*/
private static class WindowGroupProcessorWrapper implements Comparable<WindowGroupProcessorWrapper>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of implementing Comparable - a static field can be added to define the comparision logic ; that will provide a place to name the ordering ; and also a place to put usefull notes into its apidocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. Now we can have multiple comparators.
I thought that we would not be requiring multiple, at the end we might have to only use the comparator that would optimise us to use minimal operators.

Comment on lines 312 to 313
// Need to work on this method to optimise the order in which we need to process based on the partitions
// currently just moves the empty windows to the front
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it moves the empty to the front; however the order of the remaining elements are undefined - I wonder if that could cause different behaviour between jvms
something like a move-to-front logic might not have that problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order of remaining elements would be as is, since in all other cases we return 0 to look like they are same. Would that not be sufficient?

Comment on lines +13 to +14
- type: "naivePartition"
partitionColumns: [ ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily right now - but I think a check should be added to the NaivePartitioningOperator:

  • if the partitioningColumns is empty
  • it may only get at most 1 RACs
    I wonder if we've already violated that or not - but probably not...

* this allows us to sort the window groups in order to optimise the order of operators we would need to compute
* without losing the aggregate column name information (which is part of the computed WindowOperatorFactory)
*/
private static class WindowGroupProcessorWrapper implements Comparable<WindowGroupProcessorWrapper>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to suggest a better name for it - but don't really have any good suggestions.... WindowedCalculation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to WindowComputationProcessor. let me know your thoughts on this name

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels May 22, 2024
Copy link
Member

@kgyrtkirk kgyrtkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - left a couple cosmetic related comments

* Comparator on {@link WindowComputationProcessor}
* to move the empty windows to the front
*/
private static final Comparator<WindowComputationProcessor> MOVE_EMPTY_GROUPS_FIRST = (o1, o2) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think usually its more practical to have the comparator inside the class it belongs to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

Comment on lines 282 to 288
/**
* A wrapper class which stores {@link WindowGroup}
* along with its computed {@link WindowOperatorFactory}
* <p>
* this allows us to sort the window groups in order to optimise the order of operators we would need to compute
* without losing the aggregate column name information (which is part of the computed WindowOperatorFactory)
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like this apidoc adds much detail - I believe it would be ok even without it.

Usually I think its better to have the apidoc describe the role or mission of the class - I believe that adding classes like this are part of an emerging architecture.

@kfaraz kfaraz merged commit 27cfe12 into apache:master May 29, 2024
87 checks passed
@sreemanamala sreemanamala deleted the win-iter branch May 29, 2024 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants