-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-30252][SQL] Disallow negative scale of Decimal #26881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #115295 has finished for PR 26881 at commit
|
|
Query 13 of |
|
Jenkins, retest this please. |
|
Test build #115327 has finished for PR 26881 at commit
|
|
I may need to update with the newest master branch and test again. Let me see... |
|
cc @cloud-fan |
| private[sql] def fromJavaBigDecimal(d: JavaBigDecimal): DecimalType = { | ||
| val (precision, scale) = if (d.scale < 0 && SQLConf.get.ansiEnabled) { | ||
| (d.precision - d.scale, 0) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also handle precision < scale like https://github.com/apache/spark/pull/26881/files#diff-92db6fd7dbad00cec24c5878a8c354d9R134 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added 9d41ea2
|
Test build #116356 has finished for PR 26881 at commit
|
|
Test build #116363 has finished for PR 26881 at commit
|
|
Test build #116447 has finished for PR 26881 at commit
|
|
Oh, I see. We need too keep this Spark-specific behaviour for future releases? If no DBMS-like system accepts negative scales, I think it is worth making positive scale by default and keeping the old behaviour (negative scale) with a legacy option. |
|
Test build #116481 has finished for PR 26881 at commit
|
|
@maropu Thanks for the suggestion! Addressed. |
| -- !query 21 | ||
| select 0.3, -0.8, .5, -.18, 0.1111, .1111 | ||
| -- !query 21 schema | ||
| struct<0.3:decimal(1,1),-0.8:decimal(1,1),0.5:decimal(1,1),-0.18:decimal(2,2),0.1111:decimal(4,4),0.1111:decimal(4,4)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think this is an existed bug in Spark: for a number less than 1, its precision&scale are different between Decimal and DecimalType if it's created from literal(because precision&scale are defined separately). This PR fixes it by:
private[sql] def fromDecimal(d: Decimal): DecimalType = DecimalType(d.precision, d.scale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bug? For example;
hive> create table testrel as select 0.3;
hive> describe testrel;
OK
_c0 decimal(1,1)
Is it difficult to keep the current behaviour?
cc: @gatorsmile @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true that there's other DBMS would ignore the leftmost zero, which could bring larger scale for values less than 1. I don't know that Spark is also intentionally to follow this. But AFAIK, for number like 0.3, in Spark, it will have (precision, scale) as (2, 1) in Decimal, but (1, 1) in DecimalType.
Maybe, we shall add this as a new feature in following PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The precision/scale in decimal type should be the one we expect. Shall we update the underlying Decimal and correct the precision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu @cloud-fan I've opened a separate PR #27217 to address this issue. PTAL.
|
Test build #116500 has finished for PR 26881 at commit
|
|
@Ngone51 Could you update the migration guide, too? |
|
Now it is not controlled by ansi mode? Then we should update the PR title and description. |
| finally: | ||
| self.spark.sql("set spark.sql.legacy.allowNegativeScaleOfDecimal.enabled=false") | ||
|
|
||
| def test_create_dataframe_from_objects(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation is wrong
| buildConf("spark.sql.legacy.allowNegativeScaleOfDecimal.enabled") | ||
| .internal() | ||
| .doc("When set to true, negative scale of Decimal type is allowed. For example, " + | ||
| "the type of number 1E10 under legacy mode is DecimalType(2, -9), but is " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1E10 BD
| } | ||
|
|
||
| test("SPARK-30252: Decimal should set zero scale rather than negative scale by default") { | ||
| withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the test name says "by default", we should not set config here.
| } | ||
|
|
||
| test("SPARK-30252: Negative scale is not allowed by default") { | ||
| withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
sql/core/src/test/resources/sql-tests/results/ansi/literals.sql.out
Outdated
Show resolved
Hide resolved
|
Test build #116850 has finished for PR 26881 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine to me if the tests passed.
|
I reverted the check for max precision added in For example, for the query below:
without max precision check, we'll get |
|
Test build #116897 has finished for PR 26881 at commit
|
|
Test build #116921 has finished for PR 26881 at commit
|
|
|
Test build #116948 has finished for PR 26881 at commit
|
|
@cloud-fan @maropu @viirya Any more comments? |
|
thanks, merging to master! |
| s"You can use spark.sql.legacy.allowNegativeScaleOfDecimal.enabled=true " + | ||
| s"to enable legacy mode to allow it.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need s"".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
late LGTM.
|
Thanks all!!! |
What changes were proposed in this pull request?
This PR propose to disallow negative
scaleofDecimalin Spark. And this PR brings two behavior changes:1.23E4BDor1.23E4(withspark.sql.legacy.exponentLiteralAsDecimal.enabled=true, see SPARK-29956), we set its(precision, scale)to (5, 0) rather than (3, -2);scalecheck inside the decimal method if it exposes to setscaleexplicitly. If check fails,AnalysisExceptionthrows.And user could still use
spark.sql.legacy.allowNegativeScaleOfDecimal.enabledto restore the previous behavior.Why are the changes needed?
According to SQL standard,
scale of Decimal should always be non-negative. And other mainstream databases, like Presto, PostgreSQL, also don't allow negative scale.
Presto:
PostgrelSQL:
And, actually, Spark itself already doesn't allow to create table with negative decimal types using SQL:
However, it is still possible to create such table or
DatFrameusing Spark SQL programming API:while, these two different behavior could make user confused.
On the other side, even if user creates such table or
DataFramewith negative scale decimal type, it can't write data out if using format, likeparquetororc. Because these formats have their own check for negative scale and fail on it.So, I think it would be better to disallow negative scale totally and make behaviors above be consistent.
Does this PR introduce any user-facing change?
Yes, if
spark.sql.legacy.allowNegativeScaleOfDecimal.enabled=false, user couldn't create Decimal value with negative scale anymore.How was this patch tested?
Added new tests in
ExpressionParserSuiteandDecimalSuite;Updated
SQLQueryTestSuite.