Skip to content

Commit

Permalink
[SPARK-32521][SQL] Bug-fix: WithFields Expression should not be foldable
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Make WithFields Expression not foldable.

### Why are the changes needed?

The following query currently fails on master brach:
```
sql("SELECT named_struct('a', 1, 'b', 2) a")
.select($"a".withField("c", lit(3)).as("a"))
.show(false)
// java.lang.UnsupportedOperationException: Cannot evaluate expression: with_fields(named_struct(a, 1, b, 2), c, 3)
```
This happens because the Catalyst optimizer sees that the WithFields Expression is foldable and tries to statically evaluate the WithFields Expression (via the ConstantFolding rule), however it cannot do so because WithFields Expression is Unevaluable.

### Does this PR introduce _any_ user-facing change?

Yes, queries like the one shared above will now succeed.
That said, this bug was introduced in Spark 3.1.0 which has yet to be released.

### How was this patch tested?

A new unit test was added.

Closes #29338 from fqaiser94/SPARK-32521.

Lead-authored-by: fqaiser94@gmail.com <fqaiser94@gmail.com>
Co-authored-by: fqaiser94 <fqaiser94@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
fqaiser94 authored and cloud-fan committed Aug 4, 2020
1 parent 7fec6e0 commit 6d69068
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 2 deletions.
Expand Up @@ -563,8 +563,6 @@ case class WithFields(

override def dataType: StructType = evalExpr.dataType.asInstanceOf[StructType]

override def foldable: Boolean = structExpr.foldable && valExprs.forall(_.foldable)

override def nullable: Boolean = structExpr.nullable

override def prettyName: String = "with_fields"
Expand Down
Expand Up @@ -1420,4 +1420,36 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession {
}.getMessage should include("No such struct field b in a, B")
}
}

test("withField user-facing examples") {
checkAnswer(
sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
.select($"struct_col".withField("c", lit(3))),
Row(Row(1, 2, 3)))

checkAnswer(
sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
.select($"struct_col".withField("b", lit(3))),
Row(Row(1, 3)))

checkAnswer(
sql("SELECT CAST(NULL AS struct<a:int,b:int>) struct_col")
.select($"struct_col".withField("c", lit(3))),
Row(null))

checkAnswer(
sql("SELECT named_struct('a', 1, 'b', 2, 'b', 3) struct_col")
.select($"struct_col".withField("b", lit(100))),
Row(Row(1, 100, 100)))

checkAnswer(
sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
.select($"struct_col".withField("a.c", lit(3))),
Row(Row(Row(1, 2, 3))))

intercept[AnalysisException] {
sql("SELECT named_struct('a', named_struct('b', 1), 'a', named_struct('c', 2)) struct_col")
.select($"struct_col".withField("a.c", lit(3)))
}.getMessage should include("Ambiguous reference to fields")
}
}

0 comments on commit 6d69068

Please sign in to comment.