Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6466][SQL] Remove unnecessary attributes when resolving GroupingSets #5134

Closed
wants to merge 4 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Mar 23, 2015

When resolving GroupingSets, we currently list all outputs of GroupingSets's child plan. However, the columns that are not in groupBy expressions and not used by aggregation expressions are unnecessary and can be removed.

@SparkQA
Copy link

SparkQA commented Mar 23, 2015

Test build #28988 has finished for PR 5134 at commit 8e16206.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

Seems more reasonable to me if we do this in Optimizer, what do you think?

@viirya
Copy link
Member Author

viirya commented Mar 23, 2015

Good suggestion. I will do that later. Thanks.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29080 has finished for PR 5134 at commit a2734b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 27, 2015

/cc @liancheng @marmbrus

val substitution = projections.map { groupExpr =>
val newExprs = groupExpr.collect {
case x: NamedExpression if a.references.contains(x) => x
case l: Literal => l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need special handling here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there are some constant null values and bitmasks we need to keep them.

@marmbrus
Copy link
Contributor

This change should also be accompanied by test cases. There are several examples in the optimizer section of the tests.

@SparkQA
Copy link

SparkQA commented Apr 27, 2015

Test build #31045 has started for PR 5134 at commit a2734b3.

@chenghao-intel
Copy link
Contributor

I don't think it's the correct way for this optimization. Aggregate takes the output of Expand for expression evaluation, not the projections!
It probably need to refactor the GroupingSet a little bit before this optimization, I will submit the code soon.

Sorry @viirya , I will ping you once the refactoring finished.

@viirya
Copy link
Member Author

viirya commented Apr 28, 2015

@chenghao-intel Expand's output is modified too in this optimization.

@chenghao-intel
Copy link
Contributor

I mean projections.map{ groupExpr.collect is not necessary, as output will be exactly match the projections.
Since we will pruning the output anyway, why don't just remove the associated columns from projections directly?

Specifying the output for a logical node is actually quit confusing, as catalyst will take it as referenced attribute also, that's why we need to refactor it.

@viirya
Copy link
Member Author

viirya commented Apr 28, 2015

@chenghao-intel I originally remove the unnecessary columns from projections in Analyzer as you said. However, you suggest that it is more reasonable to move it to Optimizer.

@chenghao-intel
Copy link
Contributor

Sorry, I didn't make it clear, we definitely should do the column pruning in Optimizer. And also the Expand need to be refactor.

@viirya
Copy link
Member Author

viirya commented Apr 28, 2015

ok. I will update the unit test first.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31137 has finished for PR 5134 at commit d5dadec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@viirya
Copy link
Member Author

viirya commented May 21, 2015

@chenghao-intel do you think we still need this? If no, I should close it.

@chenghao-intel
Copy link
Contributor

@viirya The #5780 should fix that also, can you jump there for the code review? Sorry, I will update that PR asap.

@viirya
Copy link
Member Author

viirya commented May 21, 2015

@chenghao-intel OK. then I close this now.

@viirya viirya closed this May 21, 2015
@viirya viirya deleted the remove_attr_expand branch December 27, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants