Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-1010: Improve multiple DISTINCT aggregation.#136

Closed
blrunner wants to merge 7 commits intoapache:masterfrom
blrunner:TAJO-1010
Closed

TAJO-1010: Improve multiple DISTINCT aggregation.#136
blrunner wants to merge 7 commits intoapache:masterfrom
blrunner:TAJO-1010

Conversation

@blrunner
Copy link
Copy Markdown
Contributor

Tajo supports various options for count distinct. Current option is to execute a count distinct query with two execution blocks. It made by DistinctGroupbyBuilder::buildPlan. But now, new option is to execute the query with three execution blocks. You can use this option for set SessionVars.COUNT_DISTINCT_ALGORITHM to three_stages.

  • In first stage, tajo operator incremented each row to more rows by grouping columns. In addition, the operator must creates each row because of aggregation non-distinct columns.
  • In second stage, tajo operator aggregates the output of the first stage. For reference, it shuffled by grouping columns and aggregation columns.
  • In third stage, tajo operator merges the output of the second stage. For reference, it shuffled by just grouping columns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the maximum number of distinct aggregation functions in a SQL block is 2^16-1? It's just wondering. If so, I'll describe it in Tajo user guide later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected that the maximum number of distinct aggregation functions will not overtake short int. If it is necessary, we should describe it to users later.

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Sep 11, 2014

It's great work. The patch looks great to me. In addition, the algorithm is very well defined and code is very clean. It's readability is very nice.

But, there is lack of documentation to explain overall three steps of distinct aggregation. Only someone who already knows the algorithm can understand the source code. DistinctGroupbyBuilder may be good place to have the description.

Thanks!

@blrunner blrunner changed the title TAJO-1010: Improve multiple DISTINCT aggregation. (hyoungjun, jaehwa) TAJO-1010: Improve multiple DISTINCT aggregation. Sep 12, 2014
@blrunner
Copy link
Copy Markdown
Contributor Author

Hi @hyunsik

Thank you for your review.
I've just added the description at operators.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this normal case? Otherwise, is it some potential bug case that currently you cannot ensure?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a bug on production cluster. So, I had to add above codes.

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Sep 16, 2014

Could you elaborate more three phases in each physical executor?

…into TAJO-1010

Conflicts:
	CHANGES
	tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
	tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the commented out lines.

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Oct 5, 2014

I'll give more comments soon.

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Oct 7, 2014

Although I tried to give some advice for comments, I couldn't spend time on it now.

However, this issue was scheduled to 0.9.0, and I think this improvement is important in 0.9.0. So, I think that it is hard to delay the commit of this issue to master.branch.

This patch already looks good and ready to be committed to master. So, I propose that we commit it now and then revise the comment later.

Could you rebase it against the latest patch? If so, I'll finish the review on this patch.

@blrunner
Copy link
Copy Markdown
Contributor Author

blrunner commented Oct 7, 2014

Hi @hyunsik

Thank you for your review. I also agree with your opinion.
I've just rebased the patch and remove unnecessary comments.

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Oct 7, 2014

+1 Ship it.

@asfgit asfgit closed this in 0dfa397 Oct 8, 2014
babokim pushed a commit to babokim/tajo that referenced this pull request Dec 11, 2014
ZEPPELIN-174 don't apply emacs key binding when running on windows
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants