[SPARK-57736][SQL] Fix NPE in CreateNamedStruct.dataType when a struct field name is null by MaxGekk · Pull Request #56845 · apache/spark

MaxGekk · 2026-06-28T10:11:40Z

What changes were proposed in this pull request?

CreateNamedStruct.dataType builds each field with StructField(name.toString, ...):

override lazy val dataType: StructType = {
  val fields = names.zip(valExprs).map {
    case (name, expr) =>
      ...
      StructField(name.toString, expr.dataType, expr.nullable, metadata)   // NPE if name == null
  }
  StructType(fields)
}

When a field name is null, name.toString throws a NullPointerException. This is reached eagerly while building a RowEncoder serializer (SerializerBuildHelper.createSerializerForObject -> CreateNamedStruct(...).dataType), so it crashes before any analysis runs. This PR makes the field name null-safe and preserves the null name:

StructField(if (name == null) null else name.toString, expr.dataType, expr.nullable, metadata)

Why are the changes needed?

A null field name is invalid input -- CreateNamedStruct.checkInputDataTypes already rejects it (names.contains(null) -> UNEXPECTED_NULL) -- but dataType dereferences name.toString before type checking, and the encoder calls dataType directly. Keeping it null-safe converts the hard NullPointerException into correct behavior, consistent with SPARK-57725 which made AttributeSeq tolerate null-named attributes.

Minimal reproduction:

import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Literal}
import org.apache.spark.sql.types.{IntegerType, StringType}

CreateNamedStruct(Seq(Literal.create(null, StringType), Literal(1))).dataType  // NPE before this fix

Note: this fixes the specific CreateNamedStruct.dataType NPE. The full createDataFrame(schemaWithNullFieldName) scenario hits additional, independent null-name sites further along (e.g. a StructField.name.equalsIgnoreCase schema comparison during resolution), which are separate pre-existing issues and out of scope here.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a regression test in ComplexTypeSuite asserting dataType no longer throws and preserves the null field name.

build/sbt 'catalyst/testOnly *ComplexTypeSuite'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

…t field name is null ### What changes were proposed in this pull request? `CreateNamedStruct.dataType` builds each field with `StructField(name.toString, ...)`, which throws a `NullPointerException` when a field name is null. This is reached eagerly while building a `RowEncoder` serializer, so it crashes before any analysis runs. This PR makes the field name null-safe and preserves the null name: `StructField(if (name == null) null else name.toString, ...)`. ### Why are the changes needed? A null field name is invalid input (`checkInputDataTypes` already flags it as `UNEXPECTED_NULL`), but `dataType` dereferences `name.toString` before type checking, and the encoder calls `dataType` directly. Keeping it null-safe converts the hard NPE into correct behavior, consistent with SPARK-57725 which made `AttributeSeq` tolerate null-named attributes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a regression test in `ComplexTypeSuite` asserting `dataType` no longer throws and preserves the null field name. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor

MaxGekk · 2026-06-29T06:49:49Z

Could you review when you have a moment? cc @cloud-fan @dongjoon-hyun

uros-b · 2026-06-29T14:51:11Z

+    assert(dt.length == 1)
+    assert(dt.head.name == null)


Please update the test a bit - the current regression test asserts dt.length == 1 and dt.head.name == null (core behavior covered) but does not assert checkInputDataTypes() still returns UNEXPECTED_NULL, nor exercise a null name mixed with valid named fields.

uros-b

Thank you @MaxGekk, I left one note but otherwise looks good!

Co-authored-by: Isaac

MaxGekk · 2026-06-29T15:57:50Z

Thank you for the review, @uros-b! Addressed your comment: the regression test now also asserts checkInputDataTypes() returns UNEXPECTED_NULL and exercises a null field name mixed with valid named fields.

dongjoon-hyun

+1, LGTM.

MaxGekk · 2026-06-29T21:35:33Z

Merging to master/4.x/4.2/4.1/4.0. Thank you, @uros-b and @dongjoon-hyun for review.

…t field name is null ### What changes were proposed in this pull request? `CreateNamedStruct.dataType` builds each field with `StructField(name.toString, ...)`: ```scala override lazy val dataType: StructType = { val fields = names.zip(valExprs).map { case (name, expr) => ... StructField(name.toString, expr.dataType, expr.nullable, metadata) // NPE if name == null } StructType(fields) } ``` When a field name is `null`, `name.toString` throws a `NullPointerException`. This is reached eagerly while building a `RowEncoder` serializer (`SerializerBuildHelper.createSerializerForObject` -> `CreateNamedStruct(...).dataType`), so it crashes before any analysis runs. This PR makes the field name null-safe and preserves the null name: ```scala StructField(if (name == null) null else name.toString, expr.dataType, expr.nullable, metadata) ``` ### Why are the changes needed? A null field name is invalid input -- `CreateNamedStruct.checkInputDataTypes` already rejects it (`names.contains(null)` -> `UNEXPECTED_NULL`) -- but `dataType` dereferences `name.toString` before type checking, and the encoder calls `dataType` directly. Keeping it null-safe converts the hard `NullPointerException` into correct behavior, consistent with SPARK-57725 which made `AttributeSeq` tolerate null-named attributes. Minimal reproduction: ```scala import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Literal} import org.apache.spark.sql.types.{IntegerType, StringType} CreateNamedStruct(Seq(Literal.create(null, StringType), Literal(1))).dataType // NPE before this fix ``` Note: this fixes the specific `CreateNamedStruct.dataType` NPE. The full `createDataFrame(schemaWithNullFieldName)` scenario hits additional, independent null-name sites further along (e.g. a `StructField.name.equalsIgnoreCase` schema comparison during resolution), which are separate pre-existing issues and out of scope here. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a regression test in `ComplexTypeSuite` asserting `dataType` no longer throws and preserves the null field name. ``` build/sbt 'catalyst/testOnly *ComplexTypeSuite' ``` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56845 from MaxGekk/SPARK-57736-createnamedstruct-npe. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 0525313) Signed-off-by: Max Gekk <max.gekk@gmail.com>

uros-b reviewed Jun 29, 2026

View reviewed changes

uros-b approved these changes Jun 29, 2026

View reviewed changes

Address review: assert UNEXPECTED_NULL and mixed null/valid field names

6289e88

Co-authored-by: Isaac

dongjoon-hyun approved these changes Jun 29, 2026

View reviewed changes

MaxGekk closed this in 0525313 Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57736][SQL] Fix NPE in CreateNamedStruct.dataType when a struct field name is null#56845

[SPARK-57736][SQL] Fix NPE in CreateNamedStruct.dataType when a struct field name is null#56845
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:SPARK-57736-createnamedstruct-npe

MaxGekk commented Jun 28, 2026

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

uros-b Jun 29, 2026

Uh oh!

uros-b left a comment

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MaxGekk commented Jun 28, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

uros-b Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants