Skip to content

Enhance JSON extraction functions to support null handling in queries#17867

Open
gortiz wants to merge 1 commit intoapache:masterfrom
gortiz:jsonextractscalar-null-default
Open

Enhance JSON extraction functions to support null handling in queries#17867
gortiz wants to merge 1 commit intoapache:masterfrom
gortiz:jsonextractscalar-null-default

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Mar 12, 2026

This pull request enhances the JsonExtractScalarTransformFunction to provide better support for handling null default values, especially when Pinot's null handling feature is enabled. It introduces logic to correctly propagate nulls when a null default is specified, and adds comprehensive tests to validate this behavior.

Enhancements to null handling in JSON extraction:

  • Added a _defaultIsNull flag and logic to detect when the default value for jsonExtractScalar is explicitly set to null, and to handle this case differently depending on whether null handling is enabled (_nullHandlingEnabled). [1] [2]
  • Overrode the getNullBitmap method to ensure rows with a null result are correctly marked as null when a null default is used and null handling is enabled.
  • Updated the init method signature to accept the nullHandlingEnabled parameter, aligning with the new null handling logic.

Testing improvements:

  • Added new test records with null values in the JSON field to the test dataset.
  • Introduced two new tests to verify the behavior of jsonExtractScalar when the default value is null, both with null handling enabled and disabled, ensuring correct output in each scenario.

Miscellaneous:

  • Added the @Language("sql") annotation to improve static analysis of SQL queries in tests.
  • Imported @Nullable and RoaringBitmap to support the new null handling logic.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Pinot’s jsonExtractScalar transform to better support null defaults under Pinot’s null-handling mode, and adds tests/data to validate the behavior in query execution.

Changes:

  • Enhanced JsonExtractScalarTransformFunction to detect an explicit null default and propagate nulls via a computed null bitmap when null handling is enabled.
  • Extended the JSON test dataset with a record containing an explicit JSON null value for the extracted field.
  • Added query tests covering jsonExtractScalar(..., null) with enableNullHandling toggled on/off, and annotated test SQL strings with @Language("sql").

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java Implements null-default detection and null-bitmap propagation logic for jsonExtractScalar.
pinot-core/src/test/java/org/apache/pinot/queries/BaseJsonQueryTest.java Adds a JSON record with a null field and annotates SQL strings for improved static analysis.
pinot-core/src/test/java/org/apache/pinot/queries/JsonExtractScalarTest.java Adds tests validating null default behavior with null handling enabled vs disabled.

Comment on lines +129 to +132
case DOUBLE:
case TIMESTAMP:
_defaultValue = literalTransformFun.getDoubleLiteral();
break;
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value parsing for TIMESTAMP uses getDoubleLiteral(), which can lose precision for large epoch values and also regresses support for string timestamp literals that previously worked via DataType.TIMESTAMP.convert(...). Consider using the long/timestamp literal accessor (or the existing DataType.TIMESTAMP.convert on the string literal) so TIMESTAMP defaults are handled consistently with other Pinot timestamp parsing rules.

Copilot uses AI. Check for mistakes.
}
if (!nullBitmap.isEmpty()) {
bitmap.or(nullBitmap);
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNullBitmap() always returns a RoaringBitmap instance even when it is empty. Per the TransformFunction contract (and BaseTransformFunction behavior), it should return null when there are no null rows to avoid extra allocations and to preserve the semantic meaning of a null return value.

Suggested change
}
}
if (bitmap.isEmpty()) {
return null;
}

Copilot uses AI. Check for mistakes.
Comment on lines 40 to 43
import org.apache.pinot.spi.utils.JsonUtils;
import org.jspecify.annotations.Nullable;
import org.roaringbitmap.RoaringBitmap;

Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file introduces org.jspecify.annotations.Nullable, but TransformFunction/BaseTransformFunction (and most Pinot core code) use javax.annotation.Nullable. Using a different @Nullable annotation here makes nullness annotations inconsistent and can reduce tooling effectiveness; consider switching to javax.annotation.Nullable for the override and removing the jspecify import.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, don't we use javax.annotation.Nullable everywhere else in the project?

@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 0% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.30%. Comparing base (54bea86) to head (3d890f2).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...m/function/JsonExtractScalarTransformFunction.java 0.00% 42 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (54bea86) and HEAD (3d890f2). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (54bea86) HEAD (3d890f2)
java-21 5 4
unittests1 2 0
unittests 4 2
temurin 10 8
java-11 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #17867       +/-   ##
=============================================
- Coverage     63.26%   34.30%   -28.96%     
+ Complexity     1466      745      -721     
=============================================
  Files          3190     3190               
  Lines        192039   192079       +40     
  Branches      29421    29428        +7     
=============================================
- Hits         121484    65895    -55589     
- Misses        61042   120655    +59613     
+ Partials       9513     5529     -3984     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 34.24% <0.00%> (-29.00%) ⬇️
java-21 34.29% <0.00%> (-28.94%) ⬇️
temurin 34.30% <0.00%> (-28.96%) ⬇️
unittests 34.30% <0.00%> (-28.96%) ⬇️
unittests1 ?
unittests2 34.30% <0.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the lint

Comment on lines 40 to 43
import org.apache.pinot.spi.utils.JsonUtils;
import org.jspecify.annotations.Nullable;
import org.roaringbitmap.RoaringBitmap;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, don't we use javax.annotation.Nullable everywhere else in the project?

Comment on lines +166 to +188
RoaringBitmap bitmap = new RoaringBitmap();
for (TransformFunction arg : _arguments.subList(1, _arguments.size() - 1)) {
RoaringBitmap argBitmap = arg.getNullBitmap(valueBlock);
if (argBitmap != null) {
bitmap.or(argBitmap);
}
}
int numDocs = valueBlock.getNumDocs();
RoaringBitmap nullBitmap = new RoaringBitmap();
IntFunction<Object> resultExtractor = getResultExtractor(valueBlock);
for (int i = 0; i < numDocs; i++) {
Object result = null;
try {
result = resultExtractor.apply(i);
} catch (Exception ignored) {
}
if (result == null) {
nullBitmap.add(i);
}
}
if (!nullBitmap.isEmpty()) {
bitmap.or(nullBitmap);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty expensive, we're computing the whole result set again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants