TAJO-1010: Improve multiple DISTINCT aggregation.#136
TAJO-1010: Improve multiple DISTINCT aggregation.#136blrunner wants to merge 7 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Is the maximum number of distinct aggregation functions in a SQL block is 2^16-1? It's just wondering. If so, I'll describe it in Tajo user guide later.
There was a problem hiding this comment.
I expected that the maximum number of distinct aggregation functions will not overtake short int. If it is necessary, we should describe it to users later.
|
It's great work. The patch looks great to me. In addition, the algorithm is very well defined and code is very clean. It's readability is very nice. But, there is lack of documentation to explain overall three steps of distinct aggregation. Only someone who already knows the algorithm can understand the source code. DistinctGroupbyBuilder may be good place to have the description. Thanks! |
|
Hi @hyunsik Thank you for your review. |
There was a problem hiding this comment.
Is this normal case? Otherwise, is it some potential bug case that currently you cannot ensure?
There was a problem hiding this comment.
I found a bug on production cluster. So, I had to add above codes.
|
Could you elaborate more three phases in each physical executor? |
…into TAJO-1010 Conflicts: CHANGES tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java
…into TAJO-1010 Conflicts: CHANGES
There was a problem hiding this comment.
Please remove the commented out lines.
|
I'll give more comments soon. |
|
Although I tried to give some advice for comments, I couldn't spend time on it now. However, this issue was scheduled to 0.9.0, and I think this improvement is important in 0.9.0. So, I think that it is hard to delay the commit of this issue to master.branch. This patch already looks good and ready to be committed to master. So, I propose that we commit it now and then revise the comment later. Could you rebase it against the latest patch? If so, I'll finish the review on this patch. |
…into TAJO-1010 Conflicts: CHANGES
|
Hi @hyunsik Thank you for your review. I also agree with your opinion. |
|
+1 Ship it. |
ZEPPELIN-174 don't apply emacs key binding when running on windows
Tajo supports various options for count distinct. Current option is to execute a count distinct query with two execution blocks. It made by DistinctGroupbyBuilder::buildPlan. But now, new option is to execute the query with three execution blocks. You can use this option for set SessionVars.COUNT_DISTINCT_ALGORITHM to three_stages.