Skip to content

Commit

Permalink
[SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
```SQL
CREATE TABLE `tab1`
(`custom_fields` ARRAY<STRUCT<`id`: BIGINT, `value`: STRING>>)
USING parquet

INSERT INTO `tab1`
SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b'))

SELECT custom_fields.id, custom_fields.value FROM tab1
```

The above query always return the last struct of the array, because the rule `SimplifyCasts` incorrectly rewrites the query. The underlying cause is we always use the same `GenericInternalRow` object when doing the cast.

### How was this patch tested?

Author: gatorsmile <gatorsmile@gmail.com>

Closes #18412 from gatorsmile/castStruct.
  • Loading branch information
gatorsmile authored and cloud-fan committed Jun 24, 2017
1 parent 7c7bc8f commit 2e1586f
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 2 deletions.
Expand Up @@ -482,15 +482,15 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
case (fromField, toField) => cast(fromField.dataType, toField.dataType)
}
// TODO: Could be faster?
val newRow = new GenericInternalRow(from.fields.length)
buildCast[InternalRow](_, row => {
val newRow = new GenericInternalRow(from.fields.length)
var i = 0
while (i < row.numFields) {
newRow.update(i,
if (row.isNullAt(i)) null else castFuncs(i)(row.get(i, from.apply(i).dataType)))
i += 1
}
newRow.copy()
newRow
})
}

Expand Down
Expand Up @@ -345,4 +345,25 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
)
}
}

test("SPARK-21203 wrong results of insertion of Array of Struct") {
val tabName = "tab1"
withTable(tabName) {
spark.sql(
"""
|CREATE TABLE `tab1`
|(`custom_fields` ARRAY<STRUCT<`id`: BIGINT, `value`: STRING>>)
|USING parquet
""".stripMargin)
spark.sql(
"""
|INSERT INTO `tab1`
|SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b'))
""".stripMargin)

checkAnswer(
spark.sql("SELECT custom_fields.id, custom_fields.value FROM tab1"),
Row(Array(1, 2), Array("a", "b")))
}
}
}

0 comments on commit 2e1586f

Please sign in to comment.