Skip to content

[SPARK-57184][SQL] Recurse into CalendarInterval children when appending a null struct#56235

Closed
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:fix-appendstruct-interval
Closed

[SPARK-57184][SQL] Recurse into CalendarInterval children when appending a null struct#56235
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:fix-appendstruct-interval

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented May 31, 2026

What changes were proposed in this pull request?

Include CalendarIntervalType in the recursion guard of WritableColumnVector.appendStruct(boolean isNull), so that appending a NULL parent struct cascades appendStruct(true) into a CalendarInterval child column and advances all three of its grandchild columns (months / days / microseconds).

for (WritableColumnVector c: childColumns) {
  if (c.type instanceof StructType || c.type instanceof VariantType
      || c.type instanceof CalendarIntervalType) {
    c.appendStruct(true);
  } else {
    c.appendNull();
  }
}

This was split out of the nanosecond-timestamp ColumnVector PR (SPARK-57100, #56198) per review, since it is an independent fix.

Why are the changes needed?

A CalendarInterval column is struct-shaped: it is backed by three grandchild primitive columns (months as int, days as int, microseconds as long). The recursion guard in appendStruct only handled StructType and VariantType, so an interval child column took the else branch (c.appendNull()), which advances only the interval column's own cursor and leaves its three grandchild cursors un-advanced.

As a result, for a struct column with a CalendarInterval field, appending a NULL parent row left the interval's grandchild cursors behind by one. A subsequent non-null row then wrote its months/days/microseconds into the wrong (earlier) grandchild slots, and reading that row back returned a skewed value - silent data corruption for the nested struct-of-interval case.

Does this PR introduce any user-facing change?

Yes. Reading back a struct-of-interval column that contains a NULL parent row followed by a non-null row now returns the correct interval value instead of a skewed one. Previously it returned corrupted data.

How was this patch tested?

Added a unit test to ColumnarBatchSuite that uses RowToColumnConverter to convert a null parent struct followed by non-null struct-of-interval rows, and verifies the interval values are read back correctly. The test fails without the fix and passes with it.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

…ing a null struct

### What changes were proposed in this pull request?

Include `CalendarIntervalType` in the recursion guard of
`WritableColumnVector.appendStruct(boolean isNull)`, so that appending a NULL parent
struct cascades `appendStruct(true)` into a `CalendarInterval` child column and advances
all three of its grandchild columns (months/days/microseconds).

### Why are the changes needed?

A `CalendarInterval` column is struct-shaped: it is backed by three grandchild primitive
columns. The recursion guard only handled `StructType` and `VariantType`, so an interval
child took the `appendNull()` branch, advancing only the interval's own cursor and leaving
its three grandchild cursors behind. For a struct column with a `CalendarInterval` field,
a NULL parent row then caused a subsequent non-null row to write its months/days/
microseconds into the wrong grandchild slots, returning a skewed value on read - silent
data corruption for the nested struct-of-interval case.

### Does this PR introduce _any_ user-facing change?

Yes. Reading back a struct-of-interval column that contains a NULL parent row followed by
a non-null row now returns the correct interval value instead of a skewed one.

### How was this patch tested?

Added a unit test to `ColumnarBatchSuite` that converts a null parent struct followed by
non-null struct-of-interval rows and verifies the interval values are read back correctly.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Co-authored-by: Max Gekk <max.gekk@gmail.com>
…terval

# Conflicts:
#	sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 1, 2026

Merging to master/4.x. Thank you @peter-toth for review.

@MaxGekk MaxGekk closed this in 18f2bac Jun 1, 2026
MaxGekk added a commit that referenced this pull request Jun 1, 2026
…ing a null struct

### What changes were proposed in this pull request?

Include `CalendarIntervalType` in the recursion guard of `WritableColumnVector.appendStruct(boolean isNull)`, so that appending a NULL parent struct cascades `appendStruct(true)` into a `CalendarInterval` child column and advances all three of its grandchild columns (months / days / microseconds).

```java
for (WritableColumnVector c: childColumns) {
  if (c.type instanceof StructType || c.type instanceof VariantType
      || c.type instanceof CalendarIntervalType) {
    c.appendStruct(true);
  } else {
    c.appendNull();
  }
}
```

This was split out of the nanosecond-timestamp `ColumnVector` PR (SPARK-57100, #56198) per review, since it is an independent fix.

### Why are the changes needed?

A `CalendarInterval` column is struct-shaped: it is backed by three grandchild primitive columns (`months` as int, `days` as int, `microseconds` as long). The recursion guard in `appendStruct` only handled `StructType` and `VariantType`, so an interval child column took the `else` branch (`c.appendNull()`), which advances only the interval column's own cursor and leaves its three grandchild cursors un-advanced.

As a result, for a struct column with a `CalendarInterval` field, appending a NULL parent row left the interval's grandchild cursors behind by one. A subsequent non-null row then wrote its `months`/`days`/`microseconds` into the wrong (earlier) grandchild slots, and reading that row back returned a skewed value - silent data corruption for the nested struct-of-interval case.

### Does this PR introduce _any_ user-facing change?

Yes. Reading back a struct-of-interval column that contains a NULL parent row followed by a non-null row now returns the correct interval value instead of a skewed one. Previously it returned corrupted data.

### How was this patch tested?

Added a unit test to `ColumnarBatchSuite` that uses `RowToColumnConverter` to convert a null parent struct followed by non-null struct-of-interval rows, and verifies the interval values are read back correctly. The test fails without the fix and passes with it.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Closes #56235 from MaxGekk/fix-appendstruct-interval.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 18f2bac)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants