New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24043][SQL] Interpreted Predicate should initialize nondeterministic expressions #21144
Conversation
case n: Nondeterministic => n.initialize(partitionIndex) | ||
case _ => | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to this change: we could simply expect a caller to InterpretedPredicate.create, like SparkPlan.genInterpretedPredicate, to pre-initialized the expression before passing it to create.
This alternative is a little harder to unit test: we would also need to add a flag to shut off predicate codegen so that we can force SparkPlan.newPredicate to use InterpretedPredicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah this is good like it is.
Test build #89798 has finished for PR 21144 at commit
|
@bersprockets can you also check if other interpreted code paths have this problem? |
@hvanhovell will do. |
retest this please |
|
||
override def initialize(partitionIndex: Int): Unit = { | ||
super.initialize(partitionIndex) | ||
expression.foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to use foreachUp
instead? The initialization always does not depend on their children?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu At the moment, all of the initialization methods are very simple and don't rely on children. That could change, but that's the current state.
InterpretedProjection, InterpretedUnsafeProjection, InterpretedMutableProjection and ExpressionEvalHelper currently initialize the Nondeterministic expressions with a foreach. If we want to change it, we should change it in those other places too.
Test build #89805 has finished for PR 21144 at commit
|
retest this please |
Test build #89807 has finished for PR 21144 at commit
|
@hvanhovell I couldn't find another path that had this particular issue, except possibly in the aggregation area (where a grouping projection is created but not initialized). But I couldn't make it fail there. Also, I turned off as much of the code generation as I could (through patching, etc,) and ran the SQL unit tests. I didn't get any 'requirement failed' exceptions. However, I did get 1270 failed tests, The original unit test that got me looking at this ("SPARK-10740: handle nondeterministic expressions correctly for set operations") is still failing, but for a different reason: The rows in the table are processed in a different order between interpreted and codegen mode. rand(7) is used as a filter, and although the sequence of random numbers is the same, the row to which each random number is being compared is different between interpreted and codegen mode. |
@hvanhovell @maropu As it turns out, there are at least two places where an InterpretedPredicate is created but never initialized: SimpleTextSource.buildReader, and ExternalCatalogUtils.prunePartitionsByFilter. It seems unlikely that nondeterministic expressions would be used for partitions, but maybe I am not imaginative enough. Also, these seem like bugs in those two classes, not in InterpretedPredicate. Also, I reran the mostly-interpreted SQL unit tests, this time with SortPrefix implemented, and got the error count down from 1270 to 88. None of the errors were 'requirement failed' exceptions (so no uninitialized nondeterministic expressions). |
@bersprockets I think we assume btw, as you know, the issue |
@hvanhovell @maropu Is there anything on this PR that I should do? |
LGTM - merging to master. Thanks! |
Thanks much! |
shall we backport it to 2.3? |
@cloud-fan I don't think this is an issue in 2.3. It would be an issue only once SPARK-23580 ("Interpreted mode fallback should be implemented") is completed. |
What changes were proposed in this pull request?
When creating an InterpretedPredicate instance, initialize any Nondeterministic expressions in the expression tree to avoid java.lang.IllegalArgumentException on later call to eval().
How was this patch tested?