[SPARK-14614][SQL] Add `bround` function by dongjoon-hyun · Pull Request #12376 · apache/spark

dongjoon-hyun · 2016-04-14T01:08:59Z

What changes were proposed in this pull request?

This PR aims to add bound function (aka Banker's round) by extending current round implementation. Hive supports bround since 1.3.0.

Hive (1.3 ~ 2.0)

hive> select round(2.5), bround(2.5);
OK
3.0 2.0

After this PR

scala> sql("select round(2.5), bround(2.5)").head
res0: org.apache.spark.sql.Row = [3,2]

How was this patch tested?

Pass the Jenkins tests (with extended tests).

SparkQA · 2016-04-14T01:13:55Z

Test build #55774 has finished for PR 12376 at commit 2950339.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class RoundBase(child: Expression, scale: Expression,
- case class Round(child: Expression, scale: Expression)
- case class BRound(child: Expression, scale: Expression)

SparkQA · 2016-04-14T02:20:13Z

Test build #55778 has finished for PR 12376 at commit 93b6a84.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-14T07:26:46Z

Test build #55799 has finished for PR 12376 at commit 7198a4a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

markhamstra · 2016-04-14T07:46:36Z

Interesting. While I'm well aware of Bankers' Rounding and the inconsistent implementations of rounding in various SQL engines, I hadn't run into the bround() function before. Searching now, I find that it is also in SQL Server, but it looks to me like this is another bit of rounding functionality that is not standardized across SQL implementations. Do you know any different?

While having more than one rounding strategy available within Spark SQL can be important for interoperating with other systems, I'm not sure if we have decided to always follow HQL, particularly as Spark SQL becomes less directly dependent on Hive in Spark 2.0. @marmbrus ?

SparkQA · 2016-04-14T08:33:54Z

Test build #55805 has finished for PR 12376 at commit 2b004c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class RoundBase(child: Expression, scale: Expression,
- case class Round(child: Expression, scale: Expression)
- case class BRound(child: Expression, scale: Expression)

dongjoon-hyun · 2016-04-14T08:35:09Z

Hi, @markhamstra ! Thank you for commenting.

I agree with your viewpoint. So, this PR has a meaning to add just a function, bround; not a HQL language level meaning.

In terms of semantics, this is the same implementation with Hive. The following is Hive code from the Hive master branch.

public static double bround(double input, int scale) {
    if (Double.isNaN(input) || Double.isInfinite(input)) {
      return input;
    }
    return BigDecimal.valueOf(input).setScale(scale, RoundingMode.HALF_EVEN).doubleValue();
}

By the way, for the last issue, I think in a different way.
In order to become less and less directly dependent on Hive, we need to provide this as a Spark SQL function.

dongjoon-hyun · 2016-04-14T18:22:06Z

Hi, @davies .
Could you review this PR, please?

davies · 2016-04-14T18:57:06Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

+   *
+   * @group math_funcs
+   * @since 2.0.0
+   */


Could you add this for Python and R? or create a JIRA for it.

Thank you for review, @davies .
According to your comments, I created SPARK-14639 for it.

No, please don't do JIRAs that way. A JIRA that just refers to a PR (or a PR description that just refers to a JIRA number) is nearly pointless and very annoying. Always include a description in the JIRA motivating why a change is needed or desired, and a description of what was changed should go in the PR itself.

Thank you. I'll fill it soon.

@markhamstra . I updated it.
Please let me know if you think I need to do more.
Thank you in many ways.

Thanks, that's enough. A small, simple issue like this doesn't require a lot of description, but there still should be some.

marmbrus · 2016-04-14T18:59:47Z

+1 to native implementations of hive udfs so we can continue to minimize our dependence.

davies · 2016-04-14T19:00:31Z

LGTM

markhamstra · 2016-04-14T19:08:02Z

Yes, I am also +1 for a native implementation over using Hive, but my question is more whether we want bround in Spark's SQL dialect or whether there is another or standard syntax for accessing bankers' rounding that we should consider. I'm just not sure what our strategy is for adopting non-standardized SQL-ism, whether we are committed to doing whatever Hive does or whether we have some other considerations.

dongjoon-hyun · 2016-04-14T19:23:59Z

Thank you, @marmbrus , @davies , @markhamstra .
@markhamstra . I really appreciate your attention and understand what your concern here correctly, but don't you think it's a little beyond the scope of this kind of tiny PR. I think the topic of your question is related too many things not only this. What about sending an email to dev email list?

dongjoon-hyun · 2016-04-15T17:07:15Z

Hi, @markhamstra .
If we choose one of bround variances, I think we had better choose the one in Hive.
How do you think about that?

dongjoon-hyun · 2016-04-15T17:11:09Z

The above is my opinion about your second question.
For the first question, we already got three +1 for adding bround.
(including yours, thank you.)

dongjoon-hyun · 2016-04-15T17:12:56Z

By the way, is Spark heading to some SQL Standard?

markhamstra · 2016-04-15T17:19:46Z

@dongjoon-hyun Following Hive's lead is definitely one option. I don't know whether it is the right option or whether any strategic decision has been made about how we will handle non-standard SQL functionality in Spark 2.0 -- particularly as we gain independence from Hive's implementation, we could also separate more easily from Hive's interface.

dongjoon-hyun · 2016-04-15T17:36:00Z

Sure. In this year, Spark seems to able to remove Hive code. I agree that Spark is better than Hive and we can do more. But, in terms of Hive compatibility, Spark had better have same semantics with Hive for a few years. In this PR, I mean bround semantics, not other syntax.

I'm wondering whether we can be proud of that we have superior bround semantics.

dongjoon-hyun · 2016-04-16T09:28:59Z

Hi, @marmbrus , @davies , @markhamstra .
I'm not sure what to do next for this PR. If I am supposed to do something, please let me know.

dongjoon-hyun · 2016-04-18T17:32:00Z

Until now, I cannot find any reason to pursuit other bround implementation.
In addition, I think the current implementation of this PR is good for maintainability since it's consistent with existing Spark codebase.

So, could anyone merge this PR please?

davies · 2016-04-18T17:44:26Z

Merging this into master, thanks!

dongjoon-hyun · 2016-04-18T18:13:03Z

Thank you so much, @davies !

## What changes were proposed in this pull request? This PR aims to add `bound` function (aka Banker's round) by extending current `round` implementation. [Hive supports `bround` since 1.3.0.](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) **Hive (1.3 ~ 2.0)** ``` hive> select round(2.5), bround(2.5); OK 3.0 2.0 ``` **After this PR** ```scala scala> sql("select round(2.5), bround(2.5)").head res0: org.apache.spark.sql.Row = [3,2] ``` ## How was this patch tested? Pass the Jenkins tests (with extended tests). Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#12376 from dongjoon-hyun/SPARK-14614.

dongjoon-hyun changed the title ~~[SPARK-14614] Add bround function~~ [WIP][SPARK-14614][SQL] Add bround function Apr 14, 2016

[SPARK-14614] Add bround function

2b004c3

dongjoon-hyun changed the title ~~[WIP][SPARK-14614][SQL] Add bround function~~ [SPARK-14614][SQL] Add bround function Apr 14, 2016

davies reviewed Apr 14, 2016
View reviewed changes

asfgit closed this in 432d139 Apr 18, 2016

dongjoon-hyun deleted the SPARK-14614 branch May 12, 2016 00:59

Conversation

dongjoon-hyun commented Apr 14, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

markhamstra commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

dongjoon-hyun commented Apr 14, 2016

Uh oh!

dongjoon-hyun commented Apr 14, 2016

Uh oh!

davies Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

markhamstra Apr 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

markhamstra Apr 14, 2016

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Apr 14, 2016

Uh oh!

davies commented Apr 14, 2016

Uh oh!

markhamstra commented Apr 14, 2016

Uh oh!

dongjoon-hyun commented Apr 14, 2016

Uh oh!

dongjoon-hyun commented Apr 15, 2016

Uh oh!

dongjoon-hyun commented Apr 15, 2016

Uh oh!

dongjoon-hyun commented Apr 15, 2016

Uh oh!

markhamstra commented Apr 15, 2016

Uh oh!

dongjoon-hyun commented Apr 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Apr 16, 2016

Uh oh!

dongjoon-hyun commented Apr 18, 2016

Uh oh!

davies commented Apr 18, 2016

Uh oh!

dongjoon-hyun commented Apr 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

markhamstra Apr 14, 2016 •

edited

Loading

dongjoon-hyun commented Apr 15, 2016 •

edited

Loading