Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark-1461] Deferred Expression Evaluation (short-circuit evaluation) #446

Conversation

chenghao-intel
Copy link
Contributor

This patch unify the foldable & nullable interface for Expression.

  1. Deterministic-less UDF (like Rand()) can not be folded.
  2. Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false)

I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@marmbrus
Copy link
Contributor

A few high-level comments:

  • I'm not sure if stateful UDFs are actually something we want to support. The semantics for them are not well defined in partitioned systems, especially where the optimizer decides the partitioning. If you want things like row id there are already ways to do this with map partitions with index.
  • The deferred evaluation class seems like a complicated way to get short circuit evaluation. In a lot of cases can't we just change the ordering of calling the existing eval method? Adding a new interface complicates things, and in some simple benchmarks that I ran this code is actually slower than what was there before (probably because of the extra object allocations).
  • There are a lot of unrelated changes here also. While fixing a minor spelling error or something is okay, making a whole bunch of unrelated changes makes reviewing the PR more difficult for us. For example, maybe you can do the data type additions for Hive UDFs in their own PR.

@chenghao-intel
Copy link
Contributor Author

Thank you @marmbrus , if we are not planning to support the stateful UDFs, the deferred evaluation can be removed, too, and we can just change the ordering of calling the eval method for the existed expressions for short-circuit evaluation.

@chenghao-intel
Copy link
Contributor Author

@marmbrus , I've removed the unrelated changes from this PR, and this PR only for the deferred expression evaluation.

@@ -146,7 +158,7 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi
extends Expression {

def children = predicate :: trueValue :: falseValue :: Nil
def nullable = trueValue.nullable || falseValue.nullable
override def nullable = predicate.nullable || (trueValue.nullable && falseValue.nullable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was correct before. The nullability of the predicate does not affect the nullability of the output since a null predicate will just cause the falseValue to be output, not null.

@marmbrus
Copy link
Contributor

ok to test

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14285/

@marmbrus
Copy link
Contributor

@andrewor14, any idea what is up with org.apache.spark.ui.UISuite?

@andrewor14
Copy link
Contributor

It has the same problem as streaming.UISuite, which was disabled recently. We should do the same for ui.UISuite

asfgit pushed a commit that referenced this pull request Apr 21, 2014
#446 faced a connection refused exception from these tests, causing them to timeout and fail after a long time. For now, let's disable these tests.

(We recently disabled the corresponding test in streaming in 7863ecc. These tests are very similar).

Author: Andrew Or <andrewor14@gmail.com>

Closes #466 from andrewor14/ignore-ui-tests and squashes the following commits:

6f5a362 [Andrew Or] Ignore org.apache.spark.ui.UISuite tests
(cherry picked from commit af46f1f)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14311/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14314/

@marmbrus
Copy link
Contributor

marmbrus commented May 9, 2014

test this please.

@AmplabJenkins
Copy link

Build triggered.

@AmplabJenkins
Copy link

Build started.

@AmplabJenkins
Copy link

Build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14856/

@marmbrus
Copy link
Contributor

marmbrus commented May 9, 2014

LGTM

@pwendell can you please merge?

@pwendell
Copy link
Contributor

@chenghao-intel this has some merge conflicts - mind updating it?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14905/

false
} else if (l == null || r == null ) {
null
if(l == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind adding a space here?

@chenghao-intel
Copy link
Contributor Author

Thanks @rxin.
It's done, can you re-test it?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@rxin
Copy link
Contributor

rxin commented May 16, 2014

LGTM. I will merge it once Jenkins comes back green.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15039/

@rxin
Copy link
Contributor

rxin commented May 16, 2014

I've merged this. Thanks a lot!

asfgit pushed a commit that referenced this pull request May 16, 2014
This patch unify the foldable & nullable interface for Expression.
1) Deterministic-less UDF (like Rand()) can not be folded.
2) Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false)

I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #446 from chenghao-intel/expression_deferred_evaluation and squashes the following commits:

d2729de [Cheng Hao] Fix the codestyle issues
a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation
af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF
b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation

(cherry picked from commit a20fea9)
Signed-off-by: Reynold Xin <rxin@apache.org>
@asfgit asfgit closed this in a20fea9 May 16, 2014
@chenghao-intel chenghao-intel deleted the expression_deferred_evaluation branch May 28, 2014 02:36
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
apache#446 faced a connection refused exception from these tests, causing them to timeout and fail after a long time. For now, let's disable these tests.

(We recently disabled the corresponding test in streaming in 7863ecc. These tests are very similar).

Author: Andrew Or <andrewor14@gmail.com>

Closes apache#466 from andrewor14/ignore-ui-tests and squashes the following commits:

6f5a362 [Andrew Or] Ignore org.apache.spark.ui.UISuite tests
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This patch unify the foldable & nullable interface for Expression.
1) Deterministic-less UDF (like Rand()) can not be folded.
2) Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false)

I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method.

Author: Cheng Hao <hao.cheng@intel.com>

Closes apache#446 from chenghao-intel/expression_deferred_evaluation and squashes the following commits:

d2729de [Cheng Hao] Fix the codestyle issues
a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation
af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF
b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017
j-esse pushed a commit to j-esse/spark that referenced this pull request Jan 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants