Enhance JSON extraction functions to support null handling in queries#17867
Enhance JSON extraction functions to support null handling in queries#17867gortiz wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates Pinot’s jsonExtractScalar transform to better support null defaults under Pinot’s null-handling mode, and adds tests/data to validate the behavior in query execution.
Changes:
- Enhanced
JsonExtractScalarTransformFunctionto detect an explicitnulldefault and propagate nulls via a computed null bitmap when null handling is enabled. - Extended the JSON test dataset with a record containing an explicit JSON
nullvalue for the extracted field. - Added query tests covering
jsonExtractScalar(..., null)withenableNullHandlingtoggled on/off, and annotated test SQL strings with@Language("sql").
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java | Implements null-default detection and null-bitmap propagation logic for jsonExtractScalar. |
| pinot-core/src/test/java/org/apache/pinot/queries/BaseJsonQueryTest.java | Adds a JSON record with a null field and annotates SQL strings for improved static analysis. |
| pinot-core/src/test/java/org/apache/pinot/queries/JsonExtractScalarTest.java | Adds tests validating null default behavior with null handling enabled vs disabled. |
| case DOUBLE: | ||
| case TIMESTAMP: | ||
| _defaultValue = literalTransformFun.getDoubleLiteral(); | ||
| break; |
There was a problem hiding this comment.
Default value parsing for TIMESTAMP uses getDoubleLiteral(), which can lose precision for large epoch values and also regresses support for string timestamp literals that previously worked via DataType.TIMESTAMP.convert(...). Consider using the long/timestamp literal accessor (or the existing DataType.TIMESTAMP.convert on the string literal) so TIMESTAMP defaults are handled consistently with other Pinot timestamp parsing rules.
| } | ||
| if (!nullBitmap.isEmpty()) { | ||
| bitmap.or(nullBitmap); | ||
| } |
There was a problem hiding this comment.
getNullBitmap() always returns a RoaringBitmap instance even when it is empty. Per the TransformFunction contract (and BaseTransformFunction behavior), it should return null when there are no null rows to avoid extra allocations and to preserve the semantic meaning of a null return value.
| } | |
| } | |
| if (bitmap.isEmpty()) { | |
| return null; | |
| } |
| import org.apache.pinot.spi.utils.JsonUtils; | ||
| import org.jspecify.annotations.Nullable; | ||
| import org.roaringbitmap.RoaringBitmap; | ||
|
|
There was a problem hiding this comment.
This file introduces org.jspecify.annotations.Nullable, but TransformFunction/BaseTransformFunction (and most Pinot core code) use javax.annotation.Nullable. Using a different @Nullable annotation here makes nullness annotations inconsistent and can reduce tooling effectiveness; consider switching to javax.annotation.Nullable for the override and removing the jspecify import.
There was a problem hiding this comment.
+1, don't we use javax.annotation.Nullable everywhere else in the project?
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17867 +/- ##
=============================================
- Coverage 63.26% 34.30% -28.96%
+ Complexity 1466 745 -721
=============================================
Files 3190 3190
Lines 192039 192079 +40
Branches 29421 29428 +7
=============================================
- Hits 121484 65895 -55589
- Misses 61042 120655 +59613
+ Partials 9513 5529 -3984
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| import org.apache.pinot.spi.utils.JsonUtils; | ||
| import org.jspecify.annotations.Nullable; | ||
| import org.roaringbitmap.RoaringBitmap; | ||
|
|
There was a problem hiding this comment.
+1, don't we use javax.annotation.Nullable everywhere else in the project?
| RoaringBitmap bitmap = new RoaringBitmap(); | ||
| for (TransformFunction arg : _arguments.subList(1, _arguments.size() - 1)) { | ||
| RoaringBitmap argBitmap = arg.getNullBitmap(valueBlock); | ||
| if (argBitmap != null) { | ||
| bitmap.or(argBitmap); | ||
| } | ||
| } | ||
| int numDocs = valueBlock.getNumDocs(); | ||
| RoaringBitmap nullBitmap = new RoaringBitmap(); | ||
| IntFunction<Object> resultExtractor = getResultExtractor(valueBlock); | ||
| for (int i = 0; i < numDocs; i++) { | ||
| Object result = null; | ||
| try { | ||
| result = resultExtractor.apply(i); | ||
| } catch (Exception ignored) { | ||
| } | ||
| if (result == null) { | ||
| nullBitmap.add(i); | ||
| } | ||
| } | ||
| if (!nullBitmap.isEmpty()) { | ||
| bitmap.or(nullBitmap); | ||
| } |
There was a problem hiding this comment.
This seems pretty expensive, we're computing the whole result set again?
This pull request enhances the
JsonExtractScalarTransformFunctionto provide better support for handlingnulldefault values, especially when Pinot's null handling feature is enabled. It introduces logic to correctly propagate nulls when anulldefault is specified, and adds comprehensive tests to validate this behavior.Enhancements to null handling in JSON extraction:
_defaultIsNullflag and logic to detect when the default value forjsonExtractScalaris explicitly set tonull, and to handle this case differently depending on whether null handling is enabled (_nullHandlingEnabled). [1] [2]getNullBitmapmethod to ensure rows with anullresult are correctly marked as null when anulldefault is used and null handling is enabled.initmethod signature to accept thenullHandlingEnabledparameter, aligning with the new null handling logic.Testing improvements:
nullvalues in the JSON field to the test dataset.jsonExtractScalarwhen the default value isnull, both with null handling enabled and disabled, ensuring correct output in each scenario.Miscellaneous:
@Language("sql")annotation to improve static analysis of SQL queries in tests.@NullableandRoaringBitmapto support the new null handling logic.