[SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field #30703

ulysses-you · 2020-12-10T08:14:38Z

What changes were proposed in this pull request?

The deterministic field is wider than NonDerterministic, we should keep same range between pull out and check analysis.

Why are the changes needed?

For example

select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1)

We will get exception since java_method deterministic field is false but not a NonDeterministic

Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
 java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST
in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true
               ;;

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Add test.

SparkQA · 2020-12-10T09:20:24Z

Test build #132553 has finished for PR 30703 at commit 33eda3c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-10T09:30:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37157/

SparkQA · 2020-12-10T09:30:34Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37157/

ulysses-you · 2020-12-10T09:33:33Z

retest this please

SparkQA · 2020-12-10T10:26:52Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37161/

SparkQA · 2020-12-10T10:54:55Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37161/

SparkQA · 2020-12-10T11:29:45Z

Test build #132557 has finished for PR 30703 at commit 33eda3c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-10T13:33:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37170/

SparkQA · 2020-12-10T13:33:20Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37170/

SparkQA · 2020-12-10T15:54:19Z

Test build #132566 has finished for PR 30703 at commit d1ebbab.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

ulysses-you · 2020-12-11T01:14:28Z

retest this please

maropu · 2020-12-11T01:59:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-        val leafNondeterministic = expr.collect { case n: Nondeterministic => n }
+        val leafNondeterministic = expr.collect {
+          case n: Nondeterministic => n
+          case e: CallMethodViaReflection => e


This fix itself looks fine. Just my nit comment: naming java_method('java.lang.Math', 'abs', c1) "_nondeterministic " in a plan looks a bit weird.

Just for simple test, like the empty scala udf.

Ah, ok. I misunderstood it a bit. nvm.

SparkQA · 2020-12-11T02:35:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37211/

SparkQA · 2020-12-11T03:01:54Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37211/

SparkQA · 2020-12-11T06:21:23Z

Test build #132607 has finished for PR 30703 at commit d1ebbab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ulysses-you · 2020-12-11T06:29:58Z

cc @cloud-fan

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

SparkQA · 2020-12-11T11:59:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37251/

SparkQA · 2020-12-11T12:31:07Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37251/

SparkQA · 2020-12-11T15:31:32Z

Test build #132648 has finished for PR 30703 at commit 9ec0654.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ulysses-you · 2020-12-14T10:29:42Z

cc @cloud-fan @maropu thanks for review !

cloud-fan · 2020-12-14T14:35:22Z

thanks, merging to master/3.1!

…eterministic field ### What changes were proposed in this pull request? The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis. ### Why are the changes needed? For example ``` select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1) ``` We will get exception since `java_method` deterministic field is false but not a `NonDeterministic` ``` Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in Project, Filter, Aggregate or Window, found: java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true ;; ``` ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Add test. Closes #30703 from ulysses-you/SPARK-33733. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 839d689) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2020-12-14T14:36:47Z

@ulysses-you can you open backport PRs for 3.0/2.4? thanks!

ulysses-you · 2020-12-15T01:15:01Z

@cloud-fan will do it.

…ect deterministic field backport [#30703](#30703) for branch-2.4. ### What changes were proposed in this pull request? The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis. ### Why are the changes needed? For example ``` select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1) ``` We will get exception since `java_method` deterministic field is false but not a `NonDeterministic` ``` Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in Project, Filter, Aggregate or Window, found: java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true ;; ``` ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Add test. Closes #30772 from ulysses-you/SPARK-33733-branch-2.4. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ect deterministic field backport [#30703](#30703) for branch-3.0. ### What changes were proposed in this pull request? The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis. ### Why are the changes needed? For example ``` select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1) ``` We will get exception since `java_method` deterministic field is false but not a `NonDeterministic` ``` Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in Project, Filter, Aggregate or Window, found: java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true ;; ``` ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Add test. Closes #30771 from ulysses-you/SPARK-33733-branch-3.0. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

maropu · 2020-12-17T23:52:41Z

late lgtm. thanks for fixing it, @ulysses-you

init

33eda3c

github-actions bot added the SQL label Dec 10, 2020

ulysses-you added 2 commits December 10, 2020 20:20

CallMethodViaReflection and UserDefinedExpression

4b92404

test

d1ebbab

maropu reviewed Dec 11, 2020

View reviewed changes

cloud-fan reviewed Dec 11, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Show resolved Hide resolved

make nondeterministic

9ec0654

cloud-fan closed this in 839d689 Dec 14, 2020

This was referenced Dec 17, 2020

[SPARK-33733][SQL][2.4] PullOutNondeterministic should check and collect deterministic field #30772

Closed

[SPARK-33733][SQL][3.0] PullOutNondeterministic should check and collect deterministic field #30771

Closed

ulysses-you deleted the SPARK-33733 branch March 3, 2021 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field #30703

[SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field #30703

ulysses-you commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

ulysses-you commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

ulysses-you commented Dec 11, 2020

maropu Dec 11, 2020

ulysses-you Dec 11, 2020

maropu Dec 11, 2020

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

ulysses-you commented Dec 11, 2020

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

ulysses-you commented Dec 14, 2020

cloud-fan commented Dec 14, 2020 •

edited

Loading

cloud-fan commented Dec 14, 2020

ulysses-you commented Dec 15, 2020

maropu commented Dec 17, 2020

[SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field #30703

[SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field #30703

Conversation

ulysses-you commented Dec 10, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

ulysses-you commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

SparkQA commented Dec 10, 2020

ulysses-you commented Dec 11, 2020

maropu Dec 11, 2020

Choose a reason for hiding this comment

ulysses-you Dec 11, 2020

Choose a reason for hiding this comment

maropu Dec 11, 2020

Choose a reason for hiding this comment

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

ulysses-you commented Dec 11, 2020

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

SparkQA commented Dec 11, 2020

ulysses-you commented Dec 14, 2020

cloud-fan commented Dec 14, 2020 • edited Loading

cloud-fan commented Dec 14, 2020

ulysses-you commented Dec 15, 2020

maropu commented Dec 17, 2020

cloud-fan commented Dec 14, 2020 •

edited

Loading