Skip to content

[SPARK-14614][SQL] Add bround function#12376

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-14614
Closed

[SPARK-14614][SQL] Add bround function#12376
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-14614

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

This PR aims to add bound function (aka Banker's round) by extending current round implementation. Hive supports bround since 1.3.0.

Hive (1.3 ~ 2.0)

hive> select round(2.5), bround(2.5);
OK
3.0 2.0

After this PR

scala> sql("select round(2.5), bround(2.5)").head
res0: org.apache.spark.sql.Row = [3,2]

How was this patch tested?

Pass the Jenkins tests (with extended tests).

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55774 has finished for PR 12376 at commit 2950339.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class RoundBase(child: Expression, scale: Expression,
    • case class Round(child: Expression, scale: Expression)
    • case class BRound(child: Expression, scale: Expression)

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55778 has finished for PR 12376 at commit 93b6a84.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-14614] Add bround function [WIP][SPARK-14614][SQL] Add bround function Apr 14, 2016
@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55799 has finished for PR 12376 at commit 7198a4a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@markhamstra
Copy link
Contributor

Interesting. While I'm well aware of Bankers' Rounding and the inconsistent implementations of rounding in various SQL engines, I hadn't run into the bround() function before. Searching now, I find that it is also in SQL Server, but it looks to me like this is another bit of rounding functionality that is not standardized across SQL implementations. Do you know any different?

While having more than one rounding strategy available within Spark SQL can be important for interoperating with other systems, I'm not sure if we have decided to always follow HQL, particularly as Spark SQL becomes less directly dependent on Hive in Spark 2.0. @marmbrus ?

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55805 has finished for PR 12376 at commit 2b004c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class RoundBase(child: Expression, scale: Expression,
    • case class Round(child: Expression, scale: Expression)
    • case class BRound(child: Expression, scale: Expression)

@dongjoon-hyun
Copy link
Member Author

Hi, @markhamstra ! Thank you for commenting.

I agree with your viewpoint. So, this PR has a meaning to add just a function, bround; not a HQL language level meaning.

In terms of semantics, this is the same implementation with Hive. The following is Hive code from the Hive master branch.

public static double bround(double input, int scale) {
    if (Double.isNaN(input) || Double.isInfinite(input)) {
      return input;
    }
    return BigDecimal.valueOf(input).setScale(scale, RoundingMode.HALF_EVEN).doubleValue();
}

By the way, for the last issue, I think in a different way.
In order to become less and less directly dependent on Hive, we need to provide this as a Spark SQL function.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-14614][SQL] Add bround function [SPARK-14614][SQL] Add bround function Apr 14, 2016
@dongjoon-hyun
Copy link
Member Author

Hi, @davies .
Could you review this PR, please?

*
* @group math_funcs
* @since 2.0.0
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this for Python and R? or create a JIRA for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @davies .
According to your comments, I created SPARK-14639 for it.

Copy link
Contributor

@markhamstra markhamstra Apr 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, please don't do JIRAs that way. A JIRA that just refers to a PR (or a PR description that just refers to a JIRA number) is nearly pointless and very annoying. Always include a description in the JIRA motivating why a change is needed or desired, and a description of what was changed should go in the PR itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I'll fill it soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markhamstra . I updated it.
Please let me know if you think I need to do more.
Thank you in many ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's enough. A small, simple issue like this doesn't require a lot of description, but there still should be some.

@marmbrus
Copy link
Contributor

+1 to native implementations of hive udfs so we can continue to minimize our dependence.

@davies
Copy link
Contributor

davies commented Apr 14, 2016

LGTM

@markhamstra
Copy link
Contributor

Yes, I am also +1 for a native implementation over using Hive, but my question is more whether we want bround in Spark's SQL dialect or whether there is another or standard syntax for accessing bankers' rounding that we should consider. I'm just not sure what our strategy is for adopting non-standardized SQL-ism, whether we are committed to doing whatever Hive does or whether we have some other considerations.

@dongjoon-hyun
Copy link
Member Author

Thank you, @marmbrus , @davies , @markhamstra .
@markhamstra . I really appreciate your attention and understand what your concern here correctly, but don't you think it's a little beyond the scope of this kind of tiny PR. I think the topic of your question is related too many things not only this. What about sending an email to dev email list?

@dongjoon-hyun
Copy link
Member Author

Hi, @markhamstra .
If we choose one of bround variances, I think we had better choose the one in Hive.
How do you think about that?

@dongjoon-hyun
Copy link
Member Author

The above is my opinion about your second question.
For the first question, we already got three +1 for adding bround.
(including yours, thank you.)

@dongjoon-hyun
Copy link
Member Author

By the way, is Spark heading to some SQL Standard?

@markhamstra
Copy link
Contributor

@dongjoon-hyun Following Hive's lead is definitely one option. I don't know whether it is the right option or whether any strategic decision has been made about how we will handle non-standard SQL functionality in Spark 2.0 -- particularly as we gain independence from Hive's implementation, we could also separate more easily from Hive's interface.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 15, 2016

Sure. In this year, Spark seems to able to remove Hive code. I agree that Spark is better than Hive and we can do more. But, in terms of Hive compatibility, Spark had better have same semantics with Hive for a few years. In this PR, I mean bround semantics, not other syntax.

I'm wondering whether we can be proud of that we have superior bround semantics.

@dongjoon-hyun
Copy link
Member Author

Hi, @marmbrus , @davies , @markhamstra .
I'm not sure what to do next for this PR. If I am supposed to do something, please let me know.

@dongjoon-hyun
Copy link
Member Author

Until now, I cannot find any reason to pursuit other bround implementation.
In addition, I think the current implementation of this PR is good for maintainability since it's consistent with existing Spark codebase.

So, could anyone merge this PR please?

@davies
Copy link
Contributor

davies commented Apr 18, 2016

Merging this into master, thanks!

@asfgit asfgit closed this in 432d139 Apr 18, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you so much, @davies !

lw-lin pushed a commit to lw-lin/spark that referenced this pull request Apr 20, 2016
## What changes were proposed in this pull request?

This PR aims to add `bound` function (aka Banker's round) by extending current `round` implementation. [Hive supports `bround` since 1.3.0.](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF)

**Hive (1.3 ~ 2.0)**
```
hive> select round(2.5), bround(2.5);
OK
3.0	2.0
```

**After this PR**
```scala
scala> sql("select round(2.5), bround(2.5)").head
res0: org.apache.spark.sql.Row = [3,2]
```

## How was this patch tested?

Pass the Jenkins tests (with extended tests).

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#12376 from dongjoon-hyun/SPARK-14614.
@dongjoon-hyun dongjoon-hyun deleted the SPARK-14614 branch May 12, 2016 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants