[SPARK-37451][SQL] Fix cast string type to decimal type if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled #34811

wangyum · 2021-12-05T14:17:51Z

What changes were proposed in this pull request?

Fix cast string type to decimal type only if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled. For example:

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show

The result is null since SPARK-32706.

Why are the changes needed?

Fix regression bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

SparkQA · 2021-12-05T15:27:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50405/

SparkQA · 2021-12-05T16:11:51Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50405/

SparkQA · 2021-12-05T19:36:59Z

Test build #145929 has finished for PR 34811 at commit a35fbbb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2021-12-06T05:00:30Z

cc @cloud-fan

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

dongjoon-hyun · 2021-12-08T16:11:11Z

sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala

@@ -299,4 +299,19 @@ class DecimalSuite extends SparkFunSuite with PrivateMethodTester with SQLHelper
      assert(Decimal.fromStringANSI(UTF8String.fromString(string)) === Decimal(string))
    }
  }
+
+  test("SPARK-37451: Performance improvement regressed String to Decimal cast") {


According to the above description, this causes a different result accidentally? And, now this recovers it to back?

The result is null since SPARK-32706.

dongjoon-hyun

+1, LGTM. (with minor style comments)

cc @huaxingao since she is the release manager of 3.2.1.

SparkQA · 2021-12-09T05:48:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50502/

SparkQA · 2021-12-09T06:47:38Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50502/

SparkQA · 2021-12-09T08:58:54Z

Test build #146026 has finished for PR 34811 at commit c5b2ac8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…legacy.allowNegativeScaleOfDecimal is enabled ### What changes were proposed in this pull request? Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example: ```scala import org.apache.spark.sql.types._ import org.apache.spark.sql.Row spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true) val data = Seq(Row("7.836725755512218E38")) val schema = StructType(Array(StructField("a", StringType, false))) val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.select(col("a").cast(DecimalType(37,-17))).show ``` The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706). ### Why are the changes needed? Fix regression bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #34811 from wangyum/SPARK-37451. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a1214a9) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2021-12-09T09:10:16Z

Thank you for update, @wangyum .

Since the last commit is only indentation change, the Jenkins failure is irrelevant.
Merged to master/3.2.

Could you make a backport to branch-3.1?

wangyum · 2021-12-09T13:06:27Z

Backport for branch-3.1: #34851

….sql.legacy.allowNegativeScaleOfDecimal is enabled Backport #34811 ### What changes were proposed in this pull request? Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example: ```scala import org.apache.spark.sql.types._ import org.apache.spark.sql.Row spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true) val data = Seq(Row("7.836725755512218E38")) val schema = StructType(Array(StructField("a", StringType, false))) val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.select(col("a").cast(DecimalType(37,-17))).show ``` The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706). ### Why are the changes needed? Fix regression bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #34851 from wangyum/SPARK-37451-branch-3.1. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

….sql.legacy.allowNegativeScaleOfDecimal is enabled Backport apache#34811 ### What changes were proposed in this pull request? Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example: ```scala import org.apache.spark.sql.types._ import org.apache.spark.sql.Row spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true) val data = Seq(Row("7.836725755512218E38")) val schema = StructType(Array(StructField("a", StringType, false))) val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.select(col("a").cast(DecimalType(37,-17))).show ``` The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706). ### Why are the changes needed? Fix regression bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes apache#34851 from wangyum/SPARK-37451-branch-3.1. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…legacy.allowNegativeScaleOfDecimal is enabled ### What changes were proposed in this pull request? Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example: ```scala import org.apache.spark.sql.types._ import org.apache.spark.sql.Row spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true) val data = Seq(Row("7.836725755512218E38")) val schema = StructType(Array(StructField("a", StringType, false))) val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.select(col("a").cast(DecimalType(37,-17))).show ``` The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706). ### Why are the changes needed? Fix regression bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes apache#34811 from wangyum/SPARK-37451. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a1214a9) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

Performance improvement regressed String to Decimal cast

a35fbbb

github-actions bot added the SQL label Dec 5, 2021

cloud-fan approved these changes Dec 6, 2021

View reviewed changes

razajafri approved these changes Dec 6, 2021

View reviewed changes

dongjoon-hyun reviewed Dec 8, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Dec 8, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Dec 8, 2021

View reviewed changes

dongjoon-hyun approved these changes Dec 8, 2021

View reviewed changes

Update Decimal.scala

c5b2ac8

dongjoon-hyun closed this in a1214a9 Dec 9, 2021

wangyum mentioned this pull request Dec 9, 2021

[SPARK-37451][3.1][SQL] Fix cast string type to decimal type if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled #34851

Closed

wangyum deleted the SPARK-37451 branch December 9, 2021 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37451][SQL] Fix cast string type to decimal type if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled #34811

[SPARK-37451][SQL] Fix cast string type to decimal type if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled #34811

wangyum commented Dec 5, 2021 •

edited by dongjoon-hyun

Loading

SparkQA commented Dec 5, 2021

SparkQA commented Dec 5, 2021

SparkQA commented Dec 5, 2021

wangyum commented Dec 6, 2021

dongjoon-hyun Dec 8, 2021 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

SparkQA commented Dec 9, 2021

SparkQA commented Dec 9, 2021

SparkQA commented Dec 9, 2021

dongjoon-hyun commented Dec 9, 2021

wangyum commented Dec 9, 2021

[SPARK-37451][SQL] Fix cast string type to decimal type if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled #34811

[SPARK-37451][SQL] Fix cast string type to decimal type if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled #34811

Conversation

wangyum commented Dec 5, 2021 • edited by dongjoon-hyun Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Dec 5, 2021

SparkQA commented Dec 5, 2021

SparkQA commented Dec 5, 2021

wangyum commented Dec 6, 2021

dongjoon-hyun Dec 8, 2021 • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

SparkQA commented Dec 9, 2021

SparkQA commented Dec 9, 2021

SparkQA commented Dec 9, 2021

dongjoon-hyun commented Dec 9, 2021

wangyum commented Dec 9, 2021

wangyum commented Dec 5, 2021 •

edited by dongjoon-hyun

Loading

dongjoon-hyun Dec 8, 2021 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading