Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20211][SQL] Fix the Precision and Scale of Decimal Values when the Input is BigDecimal between -1.0 and 1.0 #18244

Closed
wants to merge 3 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

The precision and scale of decimal values are wrong when the input is BigDecimal between -1.0 and 1.0.

The BigDecimal's precision is the digit count starts from the leftmost nonzero digit based on the JAVA's BigDecimal definition. However, our Decimal decision follows the database decimal standard, which is the total number of digits, including both to the left and the right of the decimal point. Thus, this PR is to fix the issue by doing the conversion.

Before this PR, the following queries failed:

select 1 > 0.0001
select floor(0.0001)
select ceil(0.0001)

How was this patch tested?

Added test cases.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @davies

@cloud-fan
Copy link
Contributor

LGTM, pending test

@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77826 has finished for PR 18244 at commit a7396be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77828 has finished for PR 18244 at commit 8e654bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -126,7 +126,15 @@ final class Decimal extends Ordered[Decimal] with Serializable {
def set(decimal: BigDecimal): Decimal = {
this.decimalVal = decimal
this.longVal = 0L
this._precision = decimal.precision
if (decimal.compare(BigDecimal(1.0)) == -1 && decimal.compare(BigDecimal(-1.0)) == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (decimal.presision <= decimal.scale) {
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! @davies

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stanzhai That is not enough. 0.90's precision is 2 and scale is also 2. When the scale and precision are identical, we still need to update the precision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we allow the case that precision equals to scale? Seems it works for the example expressions.

scala> sql("select floor(0.9)").show()
+----------+
|FLOOR(0.9)|
+----------+
|         0|
+----------+


scala> sql("select ceil(0.9)").show()
+---------+
|CEIL(0.9)|
+---------+
|        1|
+---------+


scala> sql("select 1 > 0.9").show()
+-----------------------+
|(CAST(1 AS BIGINT) > 0)|
+-----------------------+
|                   true|
+-----------------------+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to correct the precision to make it consistent with what the other database system.

Copy link
Member

@viirya viirya Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean 0.9 has precision 2 and scale 1 in other database systems? I just tried MySQL, it seems to be precision=1 and scale=1. Not sure if other database systems have the behavior as you said.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, seems that the numbers like 0.9 are the only cases that the precision equals to the scale. If we force precision as scale + 1, looks like we don't support precision=scale anymore for DecimalType. Then we may need to update the related check and the document in DecimalType which explicitly says that scale can be equal to precision. Maybe I miss anything about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try DB2. 0.9's precision is 2, scale is 1. 0.90's precision is 3, scale is 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still allow the case in which the precision equals to the scale. In the traditional database, .99's precision is 2 and scale is 2; but 0.99's precision is 3 and scale is 2

@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77838 has started for PR 18244 at commit d36be12.

@davies
Copy link
Contributor

davies commented Jun 9, 2017

lgtm

@@ -126,7 +126,15 @@ final class Decimal extends Ordered[Decimal] with Serializable {
def set(decimal: BigDecimal): Decimal = {
this.decimalVal = decimal
this.longVal = 0L
this._precision = decimal.precision
if (decimal.precision <= decimal.scale) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the comment is // For Decimal, we expect the precision is equal to or large than the scale.

= has been processed within the function floor and ceil.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L387

This is reason that I think we should use if (decimal.precision < decimal.scale), and it works fine for 0.90.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is not for floor and ceil only. We need to correct the precision if the value is from BigDecimal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the fix!

@discipleforteen
Copy link

how about
select 1 > 0.0001D
select floor(0.0001D)
select ceil(0.0001D)

i think constant doube value should be parsed to double as old spark version, not to bigdecimal in antlr4

@gatorsmile
Copy link
Member Author

retest this please

@gatorsmile
Copy link
Member Author

@discipleforteen Your question is not related to this PR. Converting to the decimal is pretty common for avoiding loss of precision. For example, the link shows how DB2 choose the type when users specify the constant.

@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77848 has finished for PR 18244 at commit d36be12.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jun 10, 2017

LGTM

@gatorsmile
Copy link
Member Author

Thanks! Merging to master.

Will submit PRs for 2.2/2.1/2.0

@asfgit asfgit closed this in 8e96acf Jun 10, 2017
asfgit pushed a commit that referenced this pull request Jun 14, 2017
…al Values when the Input is BigDecimal between -1.0 and 1.0

### What changes were proposed in this pull request?

This PR is to backport #18244 to 2.2

---

The precision and scale of decimal values are wrong when the input is BigDecimal between -1.0 and 1.0.

The BigDecimal's precision is the digit count starts from the leftmost nonzero digit based on the [JAVA's BigDecimal definition](https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html). However, our Decimal decision follows the database decimal standard, which is the total number of digits, including both to the left and the right of the decimal point. Thus, this PR is to fix the issue by doing the conversion.

Before this PR, the following queries failed:
```SQL
select 1 > 0.0001
select floor(0.0001)
select ceil(0.0001)
```

### How was this patch tested?
Added test cases.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #18297 from gatorsmile/backport18244.
asfgit pushed a commit that referenced this pull request Jun 14, 2017
…al Values when the Input is BigDecimal between -1.0 and 1.0

### What changes were proposed in this pull request?

This PR is to backport #18244 to 2.2

---

The precision and scale of decimal values are wrong when the input is BigDecimal between -1.0 and 1.0.

The BigDecimal's precision is the digit count starts from the leftmost nonzero digit based on the [JAVA's BigDecimal definition](https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html). However, our Decimal decision follows the database decimal standard, which is the total number of digits, including both to the left and the right of the decimal point. Thus, this PR is to fix the issue by doing the conversion.

Before this PR, the following queries failed:
```SQL
select 1 > 0.0001
select floor(0.0001)
select ceil(0.0001)
```

### How was this patch tested?
Added test cases.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #18297 from gatorsmile/backport18244.

(cherry picked from commit 6265119)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
asfgit pushed a commit that referenced this pull request Jun 14, 2017
…al Values when the Input is BigDecimal between -1.0 and 1.0

### What changes were proposed in this pull request?

This PR is to backport #18244 to 2.2

---

The precision and scale of decimal values are wrong when the input is BigDecimal between -1.0 and 1.0.

The BigDecimal's precision is the digit count starts from the leftmost nonzero digit based on the [JAVA's BigDecimal definition](https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html). However, our Decimal decision follows the database decimal standard, which is the total number of digits, including both to the left and the right of the decimal point. Thus, this PR is to fix the issue by doing the conversion.

Before this PR, the following queries failed:
```SQL
select 1 > 0.0001
select floor(0.0001)
select ceil(0.0001)
```

### How was this patch tested?
Added test cases.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #18297 from gatorsmile/backport18244.

(cherry picked from commit 6265119)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
… the Input is BigDecimal between -1.0 and 1.0

### What changes were proposed in this pull request?
The precision and scale of decimal values are wrong when the input is BigDecimal between -1.0 and 1.0.

The BigDecimal's precision is the digit count starts from the leftmost nonzero digit based on the [JAVA's BigDecimal definition](https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html). However, our Decimal decision follows the database decimal standard, which is the total number of digits, including both to the left and the right of the decimal point. Thus, this PR is to fix the issue by doing the conversion.

Before this PR, the following queries failed:
```SQL
select 1 > 0.0001
select floor(0.0001)
select ceil(0.0001)
```

### How was this patch tested?
Added test cases.

Author: Xiao Li <gatorsmile@gmail.com>

Closes apache#18244 from gatorsmile/bigdecimal.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
…al Values when the Input is BigDecimal between -1.0 and 1.0

This PR is to backport apache#18244 to 2.2

---

The precision and scale of decimal values are wrong when the input is BigDecimal between -1.0 and 1.0.

The BigDecimal's precision is the digit count starts from the leftmost nonzero digit based on the [JAVA's BigDecimal definition](https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html). However, our Decimal decision follows the database decimal standard, which is the total number of digits, including both to the left and the right of the decimal point. Thus, this PR is to fix the issue by doing the conversion.

Before this PR, the following queries failed:
```SQL
select 1 > 0.0001
select floor(0.0001)
select ceil(0.0001)
```

Added test cases.

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#18297 from gatorsmile/backport18244.

(cherry picked from commit 6265119)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants