Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20335] [SQL] Children expressions of Hive UDF impacts the determinism of Hive UDF #17635

Closed
wants to merge 3 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

  /**
   * Certain optimizations should not be applied if UDF is not deterministic.
   * Deterministic UDF returns same result each time it is invoked with a
   * particular input. This determinism just needs to hold within the context of
   * a query.
   *
   * @return true if the UDF is deterministic
   */
  boolean deterministic() default true;

Based on the definition of UDFType, when Hive UDF's children are non-deterministic, Hive UDF is also non-deterministic.

How was this patch tested?

Added test cases.

@@ -509,6 +509,19 @@ abstract class AggregationQuerySuite extends QueryTest with SQLTestUtils with Te
Row(null, null, 110.0, null, null, 10.0) :: Nil)
}

test("non-deterministic children expressions of UDAF") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to improve the test case coverage.

@@ -84,6 +85,21 @@ class HiveUDAFSuite extends QueryTest with TestHiveSingleton with SQLTestUtils {
Row(1, Row(1, 1))
))
}

test("non-deterministic children expressions of UDAF") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to improve the test case coverage.

@SparkQA
Copy link

SparkQA commented Apr 14, 2017

Test build #75793 has started for PR 17635 at commit b593be1.

@SparkQA
Copy link

SparkQA commented Apr 14, 2017

Test build #75791 has finished for PR 17635 at commit 5cb4206.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("non-deterministic children expressions of UDAF") {
withTempView("view1") {
spark.range(1).selectExpr("id as x", "id as y").createTempView("view1")
withUserDefinedFunction("testUDAFPercentile" -> true, "testMock" -> true) {
Copy link
Member

@viirya viirya Apr 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testMock? Do we use it?

@viirya
Copy link
Member

viirya commented Apr 14, 2017

LGTM except a minor comment on the test.

@SparkQA
Copy link

SparkQA commented Apr 15, 2017

Test build #75824 has finished for PR 17635 at commit 508a43d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan

@dongjoon-hyun
Copy link
Member

+1

@cloud-fan
Copy link
Contributor

LGTM, merging to master! @gatorsmile shall we backport this PR?

@asfgit asfgit closed this in e090f3c Apr 16, 2017
@gatorsmile
Copy link
Member Author

gatorsmile commented Apr 16, 2017

Maybe, yes. Will do it later. Thank you!

asfgit pushed a commit that referenced this pull request Apr 17, 2017
…acts the determinism of Hive UDF

### What changes were proposed in this pull request?

This PR is to backport #17635 to Spark 2.1

---
```JAVA
  /**
   * Certain optimizations should not be applied if UDF is not deterministic.
   * Deterministic UDF returns same result each time it is invoked with a
   * particular input. This determinism just needs to hold within the context of
   * a query.
   *
   * return true if the UDF is deterministic
   */
  boolean deterministic() default true;
```

Based on the definition of [UDFType](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java#L42-L50), when Hive UDF's children are non-deterministic, Hive UDF is also non-deterministic.

### How was this patch tested?
Added test cases.

Author: Xiao Li <gatorsmile@gmail.com>

Closes #17652 from gatorsmile/backport-17635.
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…minism of Hive UDF

### What changes were proposed in this pull request?
```JAVA
  /**
   * Certain optimizations should not be applied if UDF is not deterministic.
   * Deterministic UDF returns same result each time it is invoked with a
   * particular input. This determinism just needs to hold within the context of
   * a query.
   *
   * return true if the UDF is deterministic
   */
  boolean deterministic() default true;
```

Based on the definition of [UDFType](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java#L42-L50), when Hive UDF's children are non-deterministic, Hive UDF is also non-deterministic.

### How was this patch tested?
Added test cases.

Author: Xiao Li <gatorsmile@gmail.com>

Closes apache#17635 from gatorsmile/udfDeterministic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants