[SPARK-57184][SQL] Recurse into CalendarInterval children when appending a null struct#56235
Closed
MaxGekk wants to merge 2 commits into
Closed
[SPARK-57184][SQL] Recurse into CalendarInterval children when appending a null struct#56235MaxGekk wants to merge 2 commits into
MaxGekk wants to merge 2 commits into
Conversation
…ing a null struct ### What changes were proposed in this pull request? Include `CalendarIntervalType` in the recursion guard of `WritableColumnVector.appendStruct(boolean isNull)`, so that appending a NULL parent struct cascades `appendStruct(true)` into a `CalendarInterval` child column and advances all three of its grandchild columns (months/days/microseconds). ### Why are the changes needed? A `CalendarInterval` column is struct-shaped: it is backed by three grandchild primitive columns. The recursion guard only handled `StructType` and `VariantType`, so an interval child took the `appendNull()` branch, advancing only the interval's own cursor and leaving its three grandchild cursors behind. For a struct column with a `CalendarInterval` field, a NULL parent row then caused a subsequent non-null row to write its months/days/ microseconds into the wrong grandchild slots, returning a skewed value on read - silent data corruption for the nested struct-of-interval case. ### Does this PR introduce _any_ user-facing change? Yes. Reading back a struct-of-interval column that contains a NULL parent row followed by a non-null row now returns the correct interval value instead of a skewed one. ### How was this patch tested? Added a unit test to `ColumnarBatchSuite` that converts a null parent struct followed by non-null struct-of-interval rows and verifies the interval values are read back correctly. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 Co-authored-by: Max Gekk <max.gekk@gmail.com>
peter-toth
approved these changes
May 31, 2026
…terval # Conflicts: # sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
Member
Author
|
Merging to master/4.x. Thank you @peter-toth for review. |
MaxGekk
added a commit
that referenced
this pull request
Jun 1, 2026
…ing a null struct
### What changes were proposed in this pull request?
Include `CalendarIntervalType` in the recursion guard of `WritableColumnVector.appendStruct(boolean isNull)`, so that appending a NULL parent struct cascades `appendStruct(true)` into a `CalendarInterval` child column and advances all three of its grandchild columns (months / days / microseconds).
```java
for (WritableColumnVector c: childColumns) {
if (c.type instanceof StructType || c.type instanceof VariantType
|| c.type instanceof CalendarIntervalType) {
c.appendStruct(true);
} else {
c.appendNull();
}
}
```
This was split out of the nanosecond-timestamp `ColumnVector` PR (SPARK-57100, #56198) per review, since it is an independent fix.
### Why are the changes needed?
A `CalendarInterval` column is struct-shaped: it is backed by three grandchild primitive columns (`months` as int, `days` as int, `microseconds` as long). The recursion guard in `appendStruct` only handled `StructType` and `VariantType`, so an interval child column took the `else` branch (`c.appendNull()`), which advances only the interval column's own cursor and leaves its three grandchild cursors un-advanced.
As a result, for a struct column with a `CalendarInterval` field, appending a NULL parent row left the interval's grandchild cursors behind by one. A subsequent non-null row then wrote its `months`/`days`/`microseconds` into the wrong (earlier) grandchild slots, and reading that row back returned a skewed value - silent data corruption for the nested struct-of-interval case.
### Does this PR introduce _any_ user-facing change?
Yes. Reading back a struct-of-interval column that contains a NULL parent row followed by a non-null row now returns the correct interval value instead of a skewed one. Previously it returned corrupted data.
### How was this patch tested?
Added a unit test to `ColumnarBatchSuite` that uses `RowToColumnConverter` to convert a null parent struct followed by non-null struct-of-interval rows, and verifies the interval values are read back correctly. The test fails without the fix and passes with it.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.8
Closes #56235 from MaxGekk/fix-appendstruct-interval.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 18f2bac)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Include
CalendarIntervalTypein the recursion guard ofWritableColumnVector.appendStruct(boolean isNull), so that appending a NULL parent struct cascadesappendStruct(true)into aCalendarIntervalchild column and advances all three of its grandchild columns (months / days / microseconds).This was split out of the nanosecond-timestamp
ColumnVectorPR (SPARK-57100, #56198) per review, since it is an independent fix.Why are the changes needed?
A
CalendarIntervalcolumn is struct-shaped: it is backed by three grandchild primitive columns (monthsas int,daysas int,microsecondsas long). The recursion guard inappendStructonly handledStructTypeandVariantType, so an interval child column took theelsebranch (c.appendNull()), which advances only the interval column's own cursor and leaves its three grandchild cursors un-advanced.As a result, for a struct column with a
CalendarIntervalfield, appending a NULL parent row left the interval's grandchild cursors behind by one. A subsequent non-null row then wrote itsmonths/days/microsecondsinto the wrong (earlier) grandchild slots, and reading that row back returned a skewed value - silent data corruption for the nested struct-of-interval case.Does this PR introduce any user-facing change?
Yes. Reading back a struct-of-interval column that contains a NULL parent row followed by a non-null row now returns the correct interval value instead of a skewed one. Previously it returned corrupted data.
How was this patch tested?
Added a unit test to
ColumnarBatchSuitethat usesRowToColumnConverterto convert a null parent struct followed by non-null struct-of-interval rows, and verifies the interval values are read back correctly. The test fails without the fix and passes with it.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.8