[SPARK-30339][SQL] Avoid to fail twice in function lookup #26994

wzhfy · 2019-12-24T02:38:15Z

What changes were proposed in this pull request?

Currently if function lookup fails, spark will give it a second chance by casting decimal type to double type. But for cases where decimal type doesn't exist, it's meaningless to lookup again and cause extra cost like unnecessary metastore access. We should throw exceptions directly in these cases.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Covered by existing tests.

wzhfy · 2019-12-24T02:49:50Z

cc @dongjoon-hyun @gatorsmile

SparkQA · 2019-12-24T04:44:05Z

Test build #115679 has finished for PR 26994 at commit a471a9b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-12-24T04:47:16Z

retest this please

SparkQA · 2019-12-24T07:12:51Z

Test build #115688 has finished for PR 26994 at commit a471a9b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-12-25T00:06:27Z

retest this please

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala

SparkQA · 2019-12-25T02:25:28Z

Test build #115753 has finished for PR 26994 at commit a471a9b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon

Can you remove

spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala

Lines 152 to 157 in 9a3dcca

    
           test("SPARK-16228 Percentile needs explicit cast to double") { 
        
             sql("select percentile(value, cast(0.5 as double)) from values 1,2,3 T(value)") 
        
             sql("select percentile_approx(value, cast(0.5 as double)) from values 1.0,2.0,3.0 T(value)") 
        
             sql("select percentile(value, 0.5) from values 1,2,3 T(value)") 
        
             sql("select percentile_approx(value, 0.5) from values 1.0,2.0,3.0 T(value)") 
        
           }

this test case while we're here? They don't verify this change anymore since Spark implements them now.

…ese udf now.

HyukjinKwon

Looks fine with @maropu's suggestion.

SparkQA · 2019-12-30T14:46:27Z

Test build #115955 has finished for PR 26994 at commit be6c276.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-30T15:22:17Z

Test build #115957 has finished for PR 26994 at commit 71b85e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-12-30T16:09:34Z

Merged to master.

### What changes were proposed in this pull request? Currently if function lookup fails, spark will give it a second change by casting decimal type to double type. But for cases where decimal type doesn't exist, it's meaningless to lookup again and causes extra cost like unnecessary metastore access. We should throw exceptions directly in these cases. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Covered by existing tests. Closes apache#26994 from wzhfy/avoid_udf_fail_twice. Authored-by: Zhenhua Wang <wzh_zju@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

### What changes were proposed in this pull request? Backported from [pr#26994](#26994). Currently if function lookup fails, spark will give it a second chance by casting decimal type to double type. But for cases where decimal type doesn't exist, it's meaningless to lookup again and cause extra cost like unnecessary metastore access. We should throw exceptions directly in these cases. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Covered by existing tests. Closes #27054 from wzhfy/avoid_udf_fail_twice-2.4. Authored-by: Zhenhua Wang <wzh_zju@163.com> Signed-off-by: Zhenhua Wang <wzh_zju@163.com>

maropu · 2020-01-02T11:46:37Z

Late LGTM. Thanks, @HyukjinKwon !

wzhfy added 2 commits December 24, 2019 10:36

Avoid to fail twice in function lookup

701a6bc

style

a471a9b

maropu reviewed Dec 25, 2019

View reviewed changes

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala Show resolved Hide resolved

HyukjinKwon reviewed Dec 30, 2019

View reviewed changes

the test doesn't verify SPARK-16228 anymore since Spark implements th…

be6c276

…ese udf now.

HyukjinKwon reviewed Dec 30, 2019

View reviewed changes

simplify

71b85e6

HyukjinKwon closed this in a8bf5d8 Dec 30, 2019

wzhfy mentioned this pull request Dec 31, 2019

[SPARK-30339][SQL][2.4] Avoid to fail twice in function lookup #27054

Closed

dongjoon-hyun added the SQL label Feb 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-30339][SQL] Avoid to fail twice in function lookup #26994

[SPARK-30339][SQL] Avoid to fail twice in function lookup #26994

wzhfy commented Dec 24, 2019 •

edited

Loading

wzhfy commented Dec 24, 2019

SparkQA commented Dec 24, 2019

maropu commented Dec 24, 2019

SparkQA commented Dec 24, 2019

maropu commented Dec 25, 2019

SparkQA commented Dec 25, 2019

HyukjinKwon left a comment

HyukjinKwon left a comment

SparkQA commented Dec 30, 2019

SparkQA commented Dec 30, 2019

HyukjinKwon commented Dec 30, 2019

maropu commented Jan 2, 2020

	test("SPARK-16228 Percentile needs explicit cast to double") {
	sql("select percentile(value, cast(0.5 as double)) from values 1,2,3 T(value)")
	sql("select percentile_approx(value, cast(0.5 as double)) from values 1.0,2.0,3.0 T(value)")
	sql("select percentile(value, 0.5) from values 1,2,3 T(value)")
	sql("select percentile_approx(value, 0.5) from values 1.0,2.0,3.0 T(value)")
	}

[SPARK-30339][SQL] Avoid to fail twice in function lookup #26994

[SPARK-30339][SQL] Avoid to fail twice in function lookup #26994

Conversation

wzhfy commented Dec 24, 2019 • edited Loading

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

wzhfy commented Dec 24, 2019

SparkQA commented Dec 24, 2019

maropu commented Dec 24, 2019

SparkQA commented Dec 24, 2019

maropu commented Dec 25, 2019

SparkQA commented Dec 25, 2019

HyukjinKwon left a comment

Choose a reason for hiding this comment

HyukjinKwon left a comment

Choose a reason for hiding this comment

SparkQA commented Dec 30, 2019

SparkQA commented Dec 30, 2019

HyukjinKwon commented Dec 30, 2019

maropu commented Jan 2, 2020

wzhfy commented Dec 24, 2019 •

edited

Loading