Conversation
…e entire partition
|
I checked 'mvn clean install' passed in my laptop. |
|
Thanks. I'll review today. |
|
Sorry for late review. It seems the patch have gone stale. |
Conflicts are resolved: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/WindowAggExec.java tajo-plan/src/main/java/org/apache/tajo/plan/expr/WindowFunctionEval.java
|
Thank you for the review, @jihoonson |
|
Thanks. I'll finish my review soon. |
There was a problem hiding this comment.
We don't need to separately maintain the start bound type and end bound type. Please see the discussion at #13 (comment).
There was a problem hiding this comment.
Hmm... I think the discussion that you linked is not complete one
because UNBOUNDED_PRECEDING is only available at start bound type while UNBOUNDED_FOLLOWING is only available as an end bound type.
I tested in psql as following
mydb=# select id, max(value) over (partition by id rows between unbounded following and unbounded following) from test3;
ERROR: frame start cannot be UNBOUNDED FOLLOWING
LINE 1: ...id, max(value) over (partition by id rows between unbounded ...
^
and saw the error as I expected.
So, I think it needs separation between start bound type and end bound type.
There was a problem hiding this comment.
Right. I hope that you read a couple of comments in the thread of the link. As I commented there, there is only one rule that the end bound cannot precede the start bound. So, naturally, the start bound cannot be unbounded following.
I think that we can make some codes simpler with the unified bound type.
There was a problem hiding this comment.
I don't think only one rule is enough because start bound and end bound can be the same.
For example, we can use rows between current row and current row as followings
mydb=# select id, max(value) over (partition by id rows between current row and current row) from test3;
id | max
----+-----
1 | 1
1 | 2
2 | 5
2 | 6
2 | 7
(5 rows)
So, actually there are two rules.
- end bound cannot precede the start bound, but can be the same as the start bound
- UNBOUNDED PRECEDING cannot be an end bound and UNBOUNDED FOLLOWING cannot be a start bound
However, I totally agree with you that the code becomes much simpler if we use the unified bound type. So, I'll try to unify the bound type.
There was a problem hiding this comment.
Oh, you are right. Thanks.
|
@sirpkt, thanks for great patch! |
|
Thank you for the review, @jihoonson ! |
- Refactor WindowAggExec code - Unify window frame start and end bound type - Remove unnecessary 'static' - Add parameter test in CurrentValue NOT CHANGED YET: - row_number() with DISTINCT still disabled - LEAD() still AggFunction not WindowAggFunc
|
I updated the patch and it passed 'mvn clean install' in my laptop. |
There was a problem hiding this comment.
How about moving these comments to the code where the enum is defined?
There was a problem hiding this comment.
It looks better!
I moved the comments about Function Type and Frame Type to the code where the enum is defined.
|
Thanks @sirpkt. In overall, the updated patch looks good to me. |
There was a problem hiding this comment.
WindowStartBound and WindowEndBound classes look to contain same information.
Do we need to keep them separately?
There was a problem hiding this comment.
You're right, @jihoonson.
We don't need to keep them separately.
I'll merge them into WindowBound.
|
@sirpkt thanks for your nice work. Since this is a new feature which will be very useful and popularly used, I've carefully reviewed. Please think about my comments. Thanks. |
- remove unused import, variables and unnecessary 'static' keywords - refactor WindowAggExec with finer functions and more intuitive names - unify bound types for start and end bound - refactor testWindowQuery to follow Tajo test code convention
|
I reflected considerate and detailed comments from @jihoonson. |
There was a problem hiding this comment.
This looks to be already removed in https://issues.apache.org/jira/browse/TAJO-1442.
There was a problem hiding this comment.
You're right, @jihoonson.
I'll remove the line.
|
@sirpkt thanks for updating your patch. Tajo PostgreSQL As you can see, some values are null in Tajo. |
|
Thank you for the finding, @jihoonson. |
- bug fix for the handling of built-in window functions with frame support - add more test codes for the preceding and following with constants
|
I rebased and updated the patch. |
|
Thanks @sirpkt, but query results are still different. Tajo PostgreSQL |
It supports all ROWS and RANGE window frame.
Cases when window frame is not applied
Cases when window frame should be applied
Based on above information, this patch distinguishes window function types as following three:
And, it further distinguishes window frame types as following four:
Case 1 is the same as previous handling of window function.
Case 2 is handled as incremental termination of aggregation function, which means for every row call merge() and terminate() of the given function
Case 3 is handled almost the same as case 2 except feeding rows to the function from the end of the partition to the start of the frame, i.e., in reverse order
Case 4 is handled by two pass approach: making small loop of feeding rows to the function for each row value computation, I think, which is inevitable since aggregation function does not support sliding window aggregation.
All above are implemented for ROWS first,
and then expanded to support RANGE by including rows that has the same order by value with current row in computation of window function.
This patch includes following changes