-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17244] Catalyst should not pushdown non-deterministic join conditions #14815
Conversation
cc @brkyvz who found this bug |
@@ -1386,15 +1386,17 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { | |||
object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { | |||
/** | |||
* Splits join condition expressions into three categories based on the attributes required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like this PR is targeting both join conditions and filter predicates. Could you update the code comment of this line and the title of the PR?
Test build #64436 has finished for PR 14815 at commit
|
Test build #64437 has finished for PR 14815 at commit
|
Thanks @gatorsmile, I address your comments. |
Looks pretty good to me! Could you also add one more test case for testing the following scenario:
I am just afraid the others might break it in the future. Thanks! |
That's a good idea. More generally, I think the best way to fix these class of problems is to unify all predicate pushdown logic at a common place. |
Test build #64496 has finished for PR 14815 at commit
|
In 2.0, we combined the PPD rules. Now, only two PPD rules left, |
LGTM pending testing |
Test build #64500 has finished for PR 14815 at commit
|
LGTM. Merging to master and branch 2.0. |
…ditions ## What changes were proposed in this pull request? Given that non-deterministic expressions can be stateful, pushing them down the query plan during the optimization phase can cause incorrect behavior. This patch fixes that issue by explicitly disabling that. ## How was this patch tested? A new test in `FilterPushdownSuite` that checks catalyst behavior for both deterministic and non-deterministic join conditions. Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14815 from sameeragarwal/constraint-inputfile. (cherry picked from commit 540e912) Signed-off-by: Yin Huai <yhuai@databricks.com>
* to evaluate them. | ||
* Splits join condition expressions or filter predicates (on a given join's output) into three | ||
* categories based on the attributes required to evaluate them. Note that we explicitly exclude | ||
* on-deterministic (i.e., stateful) condition expressions in canEvaluateInLeft or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: non-deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good eye! can you please fold this change into one of your open PRs :) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I will :)
What changes were proposed in this pull request?
Given that non-deterministic expressions can be stateful, pushing them down the query plan during the optimization phase can cause incorrect behavior. This patch fixes that issue by explicitly disabling that.
How was this patch tested?
A new test in
FilterPushdownSuite
that checks catalyst behavior for both deterministic and non-deterministic join conditions.