[SPARK-14614][SQL] Add bround function#12376
[SPARK-14614][SQL] Add bround function#12376dongjoon-hyun wants to merge 1 commit intoapache:masterfrom dongjoon-hyun:SPARK-14614
bround function#12376Conversation
|
Test build #55774 has finished for PR 12376 at commit
|
|
Test build #55778 has finished for PR 12376 at commit
|
bround functionbround function
|
Test build #55799 has finished for PR 12376 at commit
|
|
Interesting. While I'm well aware of Bankers' Rounding and the inconsistent implementations of rounding in various SQL engines, I hadn't run into the bround() function before. Searching now, I find that it is also in SQL Server, but it looks to me like this is another bit of rounding functionality that is not standardized across SQL implementations. Do you know any different? While having more than one rounding strategy available within Spark SQL can be important for interoperating with other systems, I'm not sure if we have decided to always follow HQL, particularly as Spark SQL becomes less directly dependent on Hive in Spark 2.0. @marmbrus ? |
|
Test build #55805 has finished for PR 12376 at commit
|
|
Hi, @markhamstra ! Thank you for commenting. I agree with your viewpoint. So, this PR has a meaning to add just a function, In terms of semantics, this is the same implementation with Hive. The following is Hive code from the Hive master branch. public static double bround(double input, int scale) {
if (Double.isNaN(input) || Double.isInfinite(input)) {
return input;
}
return BigDecimal.valueOf(input).setScale(scale, RoundingMode.HALF_EVEN).doubleValue();
}By the way, for the last issue, I think in a different way. |
bround functionbround function
|
Hi, @davies . |
| * | ||
| * @group math_funcs | ||
| * @since 2.0.0 | ||
| */ |
There was a problem hiding this comment.
Could you add this for Python and R? or create a JIRA for it.
There was a problem hiding this comment.
Thank you for review, @davies .
According to your comments, I created SPARK-14639 for it.
There was a problem hiding this comment.
No, please don't do JIRAs that way. A JIRA that just refers to a PR (or a PR description that just refers to a JIRA number) is nearly pointless and very annoying. Always include a description in the JIRA motivating why a change is needed or desired, and a description of what was changed should go in the PR itself.
There was a problem hiding this comment.
Thank you. I'll fill it soon.
There was a problem hiding this comment.
@markhamstra . I updated it.
Please let me know if you think I need to do more.
Thank you in many ways.
There was a problem hiding this comment.
Thanks, that's enough. A small, simple issue like this doesn't require a lot of description, but there still should be some.
|
+1 to native implementations of hive udfs so we can continue to minimize our dependence. |
|
LGTM |
|
Yes, I am also +1 for a native implementation over using Hive, but my question is more whether we want |
|
Thank you, @marmbrus , @davies , @markhamstra . |
|
Hi, @markhamstra . |
|
The above is my opinion about your second question. |
|
By the way, is Spark heading to some SQL Standard? |
|
@dongjoon-hyun Following Hive's lead is definitely one option. I don't know whether it is the right option or whether any strategic decision has been made about how we will handle non-standard SQL functionality in Spark 2.0 -- particularly as we gain independence from Hive's implementation, we could also separate more easily from Hive's interface. |
|
Sure. In this year, Spark seems to able to remove Hive code. I agree that Spark is better than Hive and we can do more. But, in terms of Hive compatibility, Spark had better have same semantics with Hive for a few years. In this PR, I mean I'm wondering whether we can be proud of that we have superior |
|
Hi, @marmbrus , @davies , @markhamstra . |
|
Until now, I cannot find any reason to pursuit other So, could anyone merge this PR please? |
|
Merging this into master, thanks! |
|
Thank you so much, @davies ! |
## What changes were proposed in this pull request? This PR aims to add `bound` function (aka Banker's round) by extending current `round` implementation. [Hive supports `bround` since 1.3.0.](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) **Hive (1.3 ~ 2.0)** ``` hive> select round(2.5), bround(2.5); OK 3.0 2.0 ``` **After this PR** ```scala scala> sql("select round(2.5), bround(2.5)").head res0: org.apache.spark.sql.Row = [3,2] ``` ## How was this patch tested? Pass the Jenkins tests (with extended tests). Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#12376 from dongjoon-hyun/SPARK-14614.
What changes were proposed in this pull request?
This PR aims to add
boundfunction (aka Banker's round) by extending currentroundimplementation. Hive supportsbroundsince 1.3.0.Hive (1.3 ~ 2.0)
After this PR
How was this patch tested?
Pass the Jenkins tests (with extended tests).