[SPARK-47646][SQL] Make try_to_number return NULL for malformed input#45771
Closed
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
Closed
[SPARK-47646][SQL] Make try_to_number return NULL for malformed input#45771HyukjinKwon wants to merge 1 commit intoapache:masterfrom
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
Conversation
Member
Author
Member
Author
|
Merged to master and branch-3.5. |
HyukjinKwon
added a commit
that referenced
this pull request
Mar 29, 2024
### What changes were proposed in this pull request?
This PR proposes to add NULL check after parsing the number so the output can be safely null for `try_to_number` expression.
```scala
import org.apache.spark.sql.functions._
val df = spark.createDataset(spark.sparkContext.parallelize(Seq("11")))
df.select(try_to_number($"value", lit("$99.99"))).show()
```
```
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.types.Decimal.toPlainString()" because "<local7>" is null
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:894)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:894)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
```
### Why are the changes needed?
To fix the bug, and let `try_to_number` return `NULL` for malformed input as designed.
### Does this PR introduce _any_ user-facing change?
Yes, it fixes a bug. Previously, `try_to_number` failed with NPE.
### How was this patch tested?
Unittest was added.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #45771 from HyukjinKwon/SPARK-47646.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit d709e20)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Contributor
|
Yikes and thanks! Will there be a 3.4.3? This also happens in 3.4.2 as well, although it takes more work to make it happen: |
Member
Author
|
Yeah let me backport |
HyukjinKwon
added a commit
that referenced
this pull request
Mar 31, 2024
This PR proposes to add NULL check after parsing the number so the output can be safely null for `try_to_number` expression.
```scala
import org.apache.spark.sql.functions._
val df = spark.createDataset(spark.sparkContext.parallelize(Seq("11")))
df.select(try_to_number($"value", lit("$99.99"))).show()
```
```
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.types.Decimal.toPlainString()" because "<local7>" is null
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:894)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:894)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)
```
To fix the bug, and let `try_to_number` return `NULL` for malformed input as designed.
Yes, it fixes a bug. Previously, `try_to_number` failed with NPE.
Unittest was added.
No.
Closes #45771 from HyukjinKwon/SPARK-47646.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Member
Author
|
Merged to bracnh-3.4 too. |
dongjoon-hyun
pushed a commit
that referenced
this pull request
Mar 31, 2024
…function with TryToNumber ### What changes were proposed in this pull request? This patch fixes broken CI by replacing non-existing `try_to_number` function in branch-3.4. ### Why are the changes needed? #45771 backported a test to `StringFunctionsSuite` in branch-3.4 but it uses `try_to_number` which is added since Spark 3.5. So this patch fixes the broken CI: https://github.com/apache/spark/actions/runs/8494692184/job/23270175100 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #45785 from viirya/fix. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Member
|
For the record, this broke branch-3.4 and the following PR fixed it. |
Contributor
|
I should have mentioned that |
Member
Author
|
Thank you guys! |
Member
|
Late LGTM! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to add NULL check after parsing the number so the output can be safely null for
try_to_numberexpression.Why are the changes needed?
To fix the bug, and let
try_to_numberreturnNULLfor malformed input as designed.Does this PR introduce any user-facing change?
Yes, it fixes a bug. Previously,
try_to_numberfailed with NPE.How was this patch tested?
Unittest was added.
Was this patch authored or co-authored using generative AI tooling?
No.