Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24043][SQL] Interpreted Predicate should initialize nondeterministic expressions #21144

Closed
wants to merge 3 commits into from

Conversation

bersprockets
Copy link
Contributor

What changes were proposed in this pull request?

When creating an InterpretedPredicate instance, initialize any Nondeterministic expressions in the expression tree to avoid java.lang.IllegalArgumentException on later call to eval().

How was this patch tested?

  • sbt SQL tests
  • python SQL tests
  • new unit test

case n: Nondeterministic => n.initialize(partitionIndex)
case _ =>
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this change: we could simply expect a caller to InterpretedPredicate.create, like SparkPlan.genInterpretedPredicate, to pre-initialized the expression before passing it to create.

This alternative is a little harder to unit test: we would also need to add a flag to shut off predicate codegen so that we can force SparkPlan.newPredicate to use InterpretedPredicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah this is good like it is.

@SparkQA
Copy link

SparkQA commented Apr 24, 2018

Test build #89798 has finished for PR 21144 at commit 46972f7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

@bersprockets can you also check if other interpreted code paths have this problem?

@bersprockets
Copy link
Contributor Author

can you also check if other interpreted code paths have this problem?

@hvanhovell will do.

@bersprockets
Copy link
Contributor Author

retest this please


override def initialize(partitionIndex: Int): Unit = {
super.initialize(partitionIndex)
expression.foreach {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to use foreachUp instead? The initialization always does not depend on their children?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu At the moment, all of the initialization methods are very simple and don't rely on children. That could change, but that's the current state.

InterpretedProjection, InterpretedUnsafeProjection, InterpretedMutableProjection and ExpressionEvalHelper currently initialize the Nondeterministic expressions with a foreach. If we want to change it, we should change it in those other places too.

@SparkQA
Copy link

SparkQA commented Apr 24, 2018

Test build #89805 has finished for PR 21144 at commit 46972f7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 24, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 25, 2018

Test build #89807 has finished for PR 21144 at commit 46972f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bersprockets
Copy link
Contributor Author

@hvanhovell I couldn't find another path that had this particular issue, except possibly in the aggregation area (where a grouping projection is created but not initialized). But I couldn't make it fail there.

Also, I turned off as much of the code generation as I could (through patching, etc,) and ran the SQL unit tests. I didn't get any 'requirement failed' exceptions. However, I did get 1270 failed tests,
90% of which failed with UnsupportedOperationExceptions (SortPrefix.eval is the one I keep seeing). Fixing those issues may allow more 'requirement failed' exceptions to pop out.

The original unit test that got me looking at this ("SPARK-10740: handle nondeterministic expressions correctly for set operations") is still failing, but for a different reason: The rows in the table are processed in a different order between interpreted and codegen mode. rand(7) is used as a filter, and although the sequence of random numbers is the same, the row to which each random number is being compared is different between interpreted and codegen mode.

@bersprockets
Copy link
Contributor Author

bersprockets commented May 1, 2018

@hvanhovell @maropu As it turns out, there are at least two places where an InterpretedPredicate is created but never initialized: SimpleTextSource.buildReader, and ExternalCatalogUtils.prunePartitionsByFilter.

It seems unlikely that nondeterministic expressions would be used for partitions, but maybe I am not imaginative enough. Also, these seem like bugs in those two classes, not in InterpretedPredicate.

Also, I reran the mostly-interpreted SQL unit tests, this time with SortPrefix implemented, and got the error count down from 1270 to 88. None of the errors were 'requirement failed' exceptions (so no uninitialized nondeterministic expressions).

@maropu
Copy link
Member

maropu commented May 2, 2018

@bersprockets I think we assume Predicate.initialize is called in not a driver side but executor sides, so it's ok as it is (SimpleTextSource. buildReader and ExternalCatalogUtils.prunePartitionsByFilter).

btw, as you know, the issue UnsupportedOperationExceptions should be fixed in SPARK-23580.

@bersprockets
Copy link
Contributor Author

@hvanhovell @maropu Is there anything on this PR that I should do?

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in d83e963 May 7, 2018
@bersprockets
Copy link
Contributor Author

Thanks much!

@cloud-fan
Copy link
Contributor

shall we backport it to 2.3?

@bersprockets
Copy link
Contributor Author

@cloud-fan I don't think this is an issue in 2.3. It would be an issue only once SPARK-23580 ("Interpreted mode fallback should be implemented") is completed.

@bersprockets bersprockets deleted the interpretedpredicate branch January 1, 2019 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants