Skip to content
Permalink
Browse files

[SPARK-27671][SQL] Fix error when casting from a nested null in a struct

## What changes were proposed in this pull request?

When a null in a nested field in struct, casting from the struct throws error, currently.

```scala
scala> sql("select cast(struct(1, null) as struct<a:int,b:int>)").show
scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$)
  at org.apache.spark.sql.catalyst.expressions.Cast.castToInt(Cast.scala:447)
  at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:635)
  at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castStruct$1(Cast.scala:603)
```

Similarly, inline table, which casts null in nested field under the hood, also throws an error.

```scala
scala> sql("select * FROM VALUES (('a', (10, null))), (('b', (10, 50))), (('c', null)) AS tab(x, y)").show
org.apache.spark.sql.AnalysisException: failed to evaluate expression named_struct('col1', 10, 'col2', NULL): NullType (of class org.apache.spark.sql.t
ypes.NullType$); line 1 pos 14
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
  at org.apache.spark.sql.catalyst.analysis.ResolveInlineTables.$anonfun$convert$6(ResolveInlineTables.scala:106)
```

This fixes the issue.

## How was this patch tested?

Added tests.

Closes #24576 from viirya/cast-null.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information...
viirya authored and dongjoon-hyun committed May 13, 2019
1 parent f3ddd6f commit 8b0bdaa8e018607f1c4e790d1c0eb8cd480dee24
@@ -620,6 +620,12 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
// We can return what the children return. Same thing should happen in the codegen path.
if (DataType.equalsStructurally(from, to)) {
identity
} else if (from == NullType) {
// According to `canCast`, NullType can be casted to any type.
// For primitive types, we don't reach here because the guard of `nullSafeEval`.
// But for nested types like struct, we might reach here for nested null type field.
// We won't call the returned function actually, but returns a placeholder.
_ => throw new SparkException(s"should not directly cast from NullType to $to.")
} else {
to match {
case dt if dt == from => identity[Any]
@@ -990,4 +990,19 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
}
}
}

test("SPARK-27671: cast from nested null type in struct") {
import DataTypeTestUtils._

atomicTypes.foreach { atomicType =>
val struct = Literal.create(
InternalRow(null),
StructType(Seq(StructField("a", NullType, nullable = true))))

val ret = cast(struct, StructType(Seq(
StructField("a", atomicType, nullable = true))))
assert(ret.resolved)
checkEvaluation(ret, InternalRow(null))
}
}
}
@@ -2157,4 +2157,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
|*(1) Range (0, 10, step=1, splits=2)""".stripMargin))
}
}

test("SPARK-27671: Fix analysis exception when casting null in nested field in struct") {
val df = sql("SELECT * FROM VALUES (('a', (10, null))), (('b', (10, 50))), " +
"(('c', null)) AS tab(x, y)")
checkAnswer(df, Row("a", Row(10, null)) :: Row("b", Row(10, 50)) :: Row("c", null) :: Nil)

val cast = sql("SELECT cast(struct(1, null) AS struct<a:int,b:int>)")
checkAnswer(cast, Row(Row(1, null)) :: Nil)
}
}

0 comments on commit 8b0bdaa

Please sign in to comment.
You can’t perform that action at this time.