Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field #30703

Closed
wants to merge 4 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

The deterministic field is wider than NonDerterministic, we should keep same range between pull out and check analysis.

Why are the changes needed?

For example

select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1)

We will get exception since java_method deterministic field is false but not a NonDeterministic

Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
 java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST
in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true
               ;;

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Add test.

@github-actions github-actions bot added the SQL label Dec 10, 2020
@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132553 has finished for PR 30703 at commit 33eda3c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37157/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37157/

@ulysses-you
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37161/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37161/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132557 has finished for PR 30703 at commit 33eda3c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37170/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37170/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132566 has finished for PR 30703 at commit d1ebbab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

retest this please

val leafNondeterministic = expr.collect { case n: Nondeterministic => n }
val leafNondeterministic = expr.collect {
case n: Nondeterministic => n
case e: CallMethodViaReflection => e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix itself looks fine. Just my nit comment: naming java_method('java.lang.Math', 'abs', c1) "_nondeterministic " in a plan looks a bit weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for simple test, like the empty scala udf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I misunderstood it a bit. nvm.

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37211/

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37211/

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Test build #132607 has finished for PR 30703 at commit d1ebbab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37251/

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37251/

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Test build #132648 has finished for PR 30703 at commit 9ec0654.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan @maropu thanks for review !

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 14, 2020

thanks, merging to master/3.1!

@cloud-fan cloud-fan closed this in 839d689 Dec 14, 2020
cloud-fan pushed a commit that referenced this pull request Dec 14, 2020
…eterministic field

### What changes were proposed in this pull request?

The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis.

### Why are the changes needed?

For example
```
select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1)
```

We will get exception since `java_method` deterministic field is false but not a `NonDeterministic`
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
 java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST
in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true
               ;;
```

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Add test.

Closes #30703 from ulysses-you/SPARK-33733.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 839d689)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

@ulysses-you can you open backport PRs for 3.0/2.4? thanks!

@ulysses-you
Copy link
Contributor Author

@cloud-fan will do it.

cloud-fan pushed a commit that referenced this pull request Dec 17, 2020
…ect deterministic field

backport [#30703](#30703) for branch-2.4.

### What changes were proposed in this pull request?

The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis.

### Why are the changes needed?

For example
```
select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1)
```

We will get exception since `java_method` deterministic field is false but not a `NonDeterministic`
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
 java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST
in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true
               ;;
```

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Add test.

Closes #30772 from ulysses-you/SPARK-33733-branch-2.4.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Dec 17, 2020
…ect deterministic field

backport [#30703](#30703) for branch-3.0.

### What changes were proposed in this pull request?

The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis.

### Why are the changes needed?

For example
```
select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1)
```

We will get exception since `java_method` deterministic field is false but not a `NonDeterministic`
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
 java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST
in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true
               ;;
```

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Add test.

Closes #30771 from ulysses-you/SPARK-33733-branch-3.0.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@maropu
Copy link
Member

maropu commented Dec 17, 2020

late lgtm. thanks for fixing it, @ulysses-you

@ulysses-you ulysses-you deleted the SPARK-33733 branch March 3, 2021 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants