fix(transformer): preserve existing BYTES value when transform yields null#18503
Conversation
… null `ExpressionTransformer` was overwriting an existing non-null `byte[]` value with null when a transform expression evaluated to null. `byte[]` is logically a scalar but is caught by the `isArray()` branch intended for nested fields, so the null-result path applied `applyTransformedValue(record, column, null)` and cleared the BYTES column. Extract a `shouldPreserveExistingValue` helper that skips the apply step in two cases when the transform yields null and an existing non-null value is present: 1. Implicit map transforms (`mapField__KEYS`/`__VALUES`) — generalizes the prior guard so it also covers the `shouldApplyTransform` branch (e.g. columns flagged null via `putDefaultNullValue`). 2. BYTES single-value columns — treat `byte[]` as the scalar it represents. Guard applies in both the populate-missing/overwrite branch and the existing array/Collection/Map branch, including the post-upsert overwrite path (`_overwriteExistingValues=true`). Adds three regression tests covering implicit-map + BYTES, explicit null-returning transform on BYTES, and the post-upsert overwrite variant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a null-handling edge case in ExpressionTransformer where BYTES single-value columns (represented as byte[]) were incorrectly treated as “nested/array” fields and could be cleared when a transform expression evaluated to null. The change ensures existing non-null byte[] values are preserved when the transform yields null, including in the post-upsert overwrite mode.
Changes:
- Add a
shouldPreserveExistingValue(...)helper to skip applying anulltransform result when an existing non-null value must be retained (implicit MAP-derived columns and BYTESbyte[]). - Apply the preservation guard in both the “shouldApplyTransform” path and the “array/collection/map compatibility” path.
- Add targeted unit tests covering implicit MAP + BYTES, explicit null transforms on BYTES, and post-upsert overwrite mode.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformer.java | Prevents null transform results from clobbering existing non-null BYTES (byte[]) values; reuses the same guard for implicit MAP-derived columns. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/ExpressionTransformerTest.java | Adds regression tests for BYTES preservation behavior in normal ingestion and post-upsert overwrite paths. |
The `shouldApplyTransform` branch is reached when `existingValue == null`, the column is null-flagged, or `_overwriteExistingValues == true`. In all three cases the transform is meant to apply — for the post-upsert overwrite path in particular, a null-returning transform is the configured way to clear a derived column. Keeping the preservation guard there blocks that intent. Restrict the `shouldPreserveExistingValue` guard to the array/Collection/Map branch, which only fires when the existing value is genuinely non-null and not null-flagged. Drop the two tests that exercised the misplaced guard (implicit-map + putDefaultNullValue, and post-upsert overwrite). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18503 +/- ##
============================================
+ Coverage 63.65% 63.69% +0.04%
+ Complexity 1735 1684 -51
============================================
Files 3254 3266 +12
Lines 199446 199898 +452
Branches 30977 31048 +71
============================================
+ Hits 126953 127333 +380
- Misses 62356 62403 +47
- Partials 10137 10162 +25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Cleaner shape: filter `byte[]` out of the else-if predicate itself so BYTES single-value columns fall through as scalars (matching STRING/INT/etc.). The transform is never evaluated when the column already has a value, which makes the rule self-evident at the branch test. Drop the `shouldPreserveExistingValue` helper since it's no longer needed. Multi-value BYTES (`byte[][]`) is still caught by `isArray()` and continues to be treated as a nested array. The implicit-map null guard inside the branch is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
ExpressionTransformeroverriding an existing non-nullbyte[](BYTES column) with null when a transform expression evaluates to null.byte[]is logically a scalar in Pinot, but it accidentally falls into theisArray()branch intended for nested fields, so the null-result path calledapplyTransformedValue(record, column, null)and cleared the BYTES column.shouldPreserveExistingValuehelper that skips the apply step when the transform yields null AND an existing non-null value is present, for two cases:mapField__KEYS/__VALUES) — generalizes the prior guard introduced in Add post-partial-upsert transforms for derived columns #17308 so it also covers theshouldApplyTransformbranch (e.g. columns flagged null viaputDefaultNullValue).byte[]as the scalar it represents._overwriteExistingValues=true).Related: #17308.
Test plan
testImplicitMapTransformDoesNotOverrideExistingBytesValueWhenSourceMissing— implicit map + BYTES with default-null markertestTransformReturningNullDoesNotOverrideExistingBytesValue— explicit transform yielding null, regular ingestion pathtestPostUpsertTransformReturningNullDoesNotOverrideExistingBytesValue— explicit transform yielding null, post-upsert overwrite pathExpressionTransformerTestpass./mvnw spotless:apply license:format checkstyle:check license:check -pl pinot-segment-localclean🤖 Generated with Claude Code