Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10539][SQL]Project should not be pushed down through Intersect or Except #8742

Closed
wants to merge 2 commits into from

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Sep 14, 2015

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42419 has finished for PR 8742 at commit 040b60a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Sep 14, 2015

I need to investigate more about set operator to make sure I'm doing the right thing. Close it for now.

@yjshen yjshen closed this Sep 14, 2015
@yjshen yjshen reopened this Sep 14, 2015
@yjshen yjshen changed the title [SPARK-10539][SQL]Fix set optimization by eliminate empty project list push down [SPARK-10539][SQL]Project should not be pushed down through Intersect or Except Sep 14, 2015
@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42432 has finished for PR 8742 at commit ce6ed80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Sep 14, 2015

cc @yhuai for review.

val rewrites = buildRewrites(e)
Except(
Project(projectList, left),
Project(projectList.map(pushToRight(_, rewrites)), right))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments in this class to explain why we cannot pushdown projections? For filter pushdown, if the condition has non-deterministic expressions, it is not safe to pushdown filters for some cases. But, it will not be the case because of #7446. But, it is still good to think about if there is any case that filter pushdown is not safe. If we determine it is safe to do filter pushdown, let's add comments to explain the reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai, thanks for your comment. I didn't consider non-deterministic filters' effect on push down when I was doing this, I will think about it and make comments soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments at here the reason that we cannot pushdown projections and why we can pushdown filters?

@marmbrus
Copy link
Contributor

ping :)

@yhuai
Copy link
Contributor

yhuai commented Sep 18, 2015

@yjshen The fix is good. Can you address comments?

@yhuai
Copy link
Contributor

yhuai commented Sep 18, 2015

@yjshen I added the comments and create a new PR (#8823). Can you close this one?

asfgit pushed a commit that referenced this pull request Sep 18, 2015
…ct or Except #8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of #8742.

Author: Yijie Shen <henry.yijieshen@gmail.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #8823 from yhuai/fix_set_optimization.
asfgit pushed a commit that referenced this pull request Sep 18, 2015
…ct or Except #8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of #8742.

Author: Yijie Shen <henry.yijieshen@gmail.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #8823 from yhuai/fix_set_optimization.

(cherry picked from commit c6f8135)
Signed-off-by: Yin Huai <yhuai@databricks.com>

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@yjshen
Copy link
Member Author

yjshen commented Sep 21, 2015

Thanks @yhuai, I'll close this one.

@yjshen yjshen closed this Sep 21, 2015
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
…ct or Except #8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of apache/spark#8742.

Author: Yijie Shen <henry.yijieshen@gmail.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #8823 from yhuai/fix_set_optimization.
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…ct or Except apache#8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of apache#8742.

Author: Yijie Shen <henry.yijieshen@gmail.com>
Author: Yin Huai <yhuai@databricks.com>

Closes apache#8823 from yhuai/fix_set_optimization.

(cherry picked from commit c6f8135)
Signed-off-by: Yin Huai <yhuai@databricks.com>

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

(cherry picked from commit 3df52cc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants