New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27671][SQL] Fix error when casting from a nested null in a struct #24576
Conversation
Test build #105307 has finished for PR 24576 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
Show resolved
Hide resolved
@@ -2157,4 +2157,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { | |||
|*(1) Range (0, 10, step=1, splits=2)""".stripMargin)) | |||
} | |||
} | |||
|
|||
test("SPARK-27671: Fix analysis exception when casting null in nested field in struct") { | |||
val df = sql("select * FROM VALUES (('a', (10, null))), (('b', (10, 50))), " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. select
-> SELECT
.
Test build #105328 has finished for PR 24576 at commit
|
Test build #105332 has finished for PR 24576 at commit
|
cc @cloud-fan |
Retest this please. |
Test build #105360 has finished for PR 24576 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @viirya and @gatorsmile .
Merged to master/branch-2.4
.
When a null in a nested field in struct, casting from the struct throws error, currently. ```scala scala> sql("select cast(struct(1, null) as struct<a:int,b:int>)").show scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$) at org.apache.spark.sql.catalyst.expressions.Cast.castToInt(Cast.scala:447) at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:635) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castStruct$1(Cast.scala:603) ``` Similarly, inline table, which casts null in nested field under the hood, also throws an error. ```scala scala> sql("select * FROM VALUES (('a', (10, null))), (('b', (10, 50))), (('c', null)) AS tab(x, y)").show org.apache.spark.sql.AnalysisException: failed to evaluate expression named_struct('col1', 10, 'col2', NULL): NullType (of class org.apache.spark.sql.t ypes.NullType$); line 1 pos 14 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47) at org.apache.spark.sql.catalyst.analysis.ResolveInlineTables.$anonfun$convert$6(ResolveInlineTables.scala:106) ``` This fixes the issue. Added tests. Closes #24576 from viirya/cast-null. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 8b0bdaa) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
// For primitive types, we don't reach here because the guard of `nullSafeEval`. | ||
// But for nested types like struct, we might reach here for nested null type field. | ||
// We won't call the returned function actually, but returns a placeholder. | ||
_ => throw new SparkException(s"should not directly cast from NullType to $to.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will we hit this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of casting a nested field in a struct or array, if we find it is a null, we don't call the cast function but set the destination as null directly. If we call this cast function, we hit this exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do the thing as follow:
buildCast[InternalRow](_, row => {
val newRow = new GenericInternalRow(from.fields.length)
var i = 0
while (i < row.numFields) {
newRow.update(i,
// We don't call cast function but directly set a null, if finding a null.
if (row.isNullAt(i)) null else castFuncs(i)(row.get(i, from.apply(i).dataType)))
i += 1
}
newRow
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which means, it's a bug if the exception is thrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If reaching here, means we call the cast function for a null, unnecessarily.
When a null in a nested field in struct, casting from the struct throws error, currently. ```scala scala> sql("select cast(struct(1, null) as struct<a:int,b:int>)").show scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$) at org.apache.spark.sql.catalyst.expressions.Cast.castToInt(Cast.scala:447) at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:635) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castStruct$1(Cast.scala:603) ``` Similarly, inline table, which casts null in nested field under the hood, also throws an error. ```scala scala> sql("select * FROM VALUES (('a', (10, null))), (('b', (10, 50))), (('c', null)) AS tab(x, y)").show org.apache.spark.sql.AnalysisException: failed to evaluate expression named_struct('col1', 10, 'col2', NULL): NullType (of class org.apache.spark.sql.t ypes.NullType$); line 1 pos 14 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47) at org.apache.spark.sql.catalyst.analysis.ResolveInlineTables.$anonfun$convert$6(ResolveInlineTables.scala:106) ``` This fixes the issue. Added tests. Closes apache#24576 from viirya/cast-null. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 8b0bdaa) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
When a null in a nested field in struct, casting from the struct throws error, currently. ```scala scala> sql("select cast(struct(1, null) as struct<a:int,b:int>)").show scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$) at org.apache.spark.sql.catalyst.expressions.Cast.castToInt(Cast.scala:447) at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:635) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castStruct$1(Cast.scala:603) ``` Similarly, inline table, which casts null in nested field under the hood, also throws an error. ```scala scala> sql("select * FROM VALUES (('a', (10, null))), (('b', (10, 50))), (('c', null)) AS tab(x, y)").show org.apache.spark.sql.AnalysisException: failed to evaluate expression named_struct('col1', 10, 'col2', NULL): NullType (of class org.apache.spark.sql.t ypes.NullType$); line 1 pos 14 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47) at org.apache.spark.sql.catalyst.analysis.ResolveInlineTables.$anonfun$convert$6(ResolveInlineTables.scala:106) ``` This fixes the issue. Added tests. Closes apache#24576 from viirya/cast-null. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 8b0bdaa) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
When a null in a nested field in struct, casting from the struct throws error, currently.
Similarly, inline table, which casts null in nested field under the hood, also throws an error.
This fixes the issue.
How was this patch tested?
Added tests.