Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37451][SQL] Fix cast string type to decimal type if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled #34811

Closed
wants to merge 2 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Dec 5, 2021

What changes were proposed in this pull request?

Fix cast string type to decimal type only if spark.sql.legacy.allowNegativeScaleOfDecimal is enabled. For example:

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show

The result is null since SPARK-32706.

Why are the changes needed?

Fix regression bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Dec 5, 2021
@SparkQA
Copy link

SparkQA commented Dec 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50405/

@SparkQA
Copy link

SparkQA commented Dec 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50405/

@SparkQA
Copy link

SparkQA commented Dec 5, 2021

Test build #145929 has finished for PR 34811 at commit a35fbbb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Dec 6, 2021

cc @cloud-fan

@@ -299,4 +299,19 @@ class DecimalSuite extends SparkFunSuite with PrivateMethodTester with SQLHelper
assert(Decimal.fromStringANSI(UTF8String.fromString(string)) === Decimal(string))
}
}

test("SPARK-37451: Performance improvement regressed String to Decimal cast") {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the above description, this causes a different result accidentally? And, now this recovers it to back?

The result is null since SPARK-32706.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. (with minor style comments)

cc @huaxingao since she is the release manager of 3.2.1.

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50502/

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50502/

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Test build #146026 has finished for PR 34811 at commit c5b2ac8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Dec 9, 2021
…legacy.allowNegativeScaleOfDecimal is enabled

### What changes were proposed in this pull request?

Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example:
```scala
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show
```

The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706).

### Why are the changes needed?

Fix regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #34811 from wangyum/SPARK-37451.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a1214a9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Thank you for update, @wangyum .

Since the last commit is only indentation change, the Jenkins failure is irrelevant.
Merged to master/3.2.

Could you make a backport to branch-3.1?

@wangyum
Copy link
Member Author

wangyum commented Dec 9, 2021

Backport for branch-3.1: #34851

@wangyum wangyum deleted the SPARK-37451 branch December 9, 2021 13:06
dongjoon-hyun pushed a commit that referenced this pull request Dec 9, 2021
….sql.legacy.allowNegativeScaleOfDecimal is enabled

Backport #34811

### What changes were proposed in this pull request?

Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example:
```scala
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show
```

The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706).

### Why are the changes needed?

Fix regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #34851 from wangyum/SPARK-37451-branch-3.1.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
….sql.legacy.allowNegativeScaleOfDecimal is enabled

Backport apache#34811

### What changes were proposed in this pull request?

Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example:
```scala
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show
```

The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706).

### Why are the changes needed?

Fix regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes apache#34851 from wangyum/SPARK-37451-branch-3.1.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…legacy.allowNegativeScaleOfDecimal is enabled

### What changes were proposed in this pull request?

Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example:
```scala
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show
```

The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706).

### Why are the changes needed?

Fix regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes apache#34811 from wangyum/SPARK-37451.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a1214a9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…legacy.allowNegativeScaleOfDecimal is enabled

### What changes were proposed in this pull request?

Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example:
```scala
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show
```

The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706).

### Why are the changes needed?

Fix regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes apache#34811 from wangyum/SPARK-37451.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a1214a9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…legacy.allowNegativeScaleOfDecimal is enabled

### What changes were proposed in this pull request?

Fix cast string type to decimal type only if `spark.sql.legacy.allowNegativeScaleOfDecimal` is enabled. For example:
```scala
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", true)
val data = Seq(Row("7.836725755512218E38"))
val schema = StructType(Array(StructField("a", StringType, false)))
val df =spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.select(col("a").cast(DecimalType(37,-17))).show
```

The result is null since [SPARK-32706](https://issues.apache.org/jira/browse/SPARK-32706).

### Why are the changes needed?

Fix regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes apache#34811 from wangyum/SPARK-37451.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a1214a9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants