Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30339][SQL] Avoid to fail twice in function lookup #26994

Closed
wants to merge 4 commits into from

Conversation

wzhfy
Copy link
Contributor

@wzhfy wzhfy commented Dec 24, 2019

What changes were proposed in this pull request?

Currently if function lookup fails, spark will give it a second chance by casting decimal type to double type. But for cases where decimal type doesn't exist, it's meaningless to lookup again and cause extra cost like unnecessary metastore access. We should throw exceptions directly in these cases.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Covered by existing tests.

@wzhfy
Copy link
Contributor Author

wzhfy commented Dec 24, 2019

cc @dongjoon-hyun @gatorsmile

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115679 has finished for PR 26994 at commit a471a9b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 24, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #115688 has finished for PR 26994 at commit a471a9b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 25, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Dec 25, 2019

Test build #115753 has finished for PR 26994 at commit a471a9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove

test("SPARK-16228 Percentile needs explicit cast to double") {
sql("select percentile(value, cast(0.5 as double)) from values 1,2,3 T(value)")
sql("select percentile_approx(value, cast(0.5 as double)) from values 1.0,2.0,3.0 T(value)")
sql("select percentile(value, 0.5) from values 1,2,3 T(value)")
sql("select percentile_approx(value, 0.5) from values 1.0,2.0,3.0 T(value)")
}
this test case while we're here? They don't verify this change anymore since Spark implements them now.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine with @maropu's suggestion.

@SparkQA
Copy link

SparkQA commented Dec 30, 2019

Test build #115955 has finished for PR 26994 at commit be6c276.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 30, 2019

Test build #115957 has finished for PR 26994 at commit 71b85e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

wzhfy added a commit to wzhfy/spark that referenced this pull request Dec 31, 2019
### What changes were proposed in this pull request?

Currently if function lookup fails, spark will give it a second change by casting decimal type to double type. But for cases where decimal type doesn't exist, it's meaningless to lookup again and causes extra cost like unnecessary metastore access. We should throw exceptions directly in these cases.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Covered by existing tests.

Closes apache#26994 from wzhfy/avoid_udf_fail_twice.

Authored-by: Zhenhua Wang <wzh_zju@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
wzhfy added a commit that referenced this pull request Dec 31, 2019
### What changes were proposed in this pull request?

Backported from [pr#26994](#26994). Currently if function lookup fails, spark will give it a second chance by casting decimal type to double type. But for cases where decimal type doesn't exist, it's meaningless to lookup again and cause extra cost like unnecessary metastore access. We should throw exceptions directly in these cases.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Covered by existing tests.

Closes #27054 from wzhfy/avoid_udf_fail_twice-2.4.

Authored-by: Zhenhua Wang <wzh_zju@163.com>
Signed-off-by: Zhenhua Wang <wzh_zju@163.com>
@maropu
Copy link
Member

maropu commented Jan 2, 2020

Late LGTM. Thanks, @HyukjinKwon !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants