Summary
With SET enableNullHandling = true, queries that use jsonExtractScalar(...) (no default value argument) over JSON columns where some rows have no value at the queried path fail with IllegalArgumentException("Cannot resolve JSON path on some records. Consider setting a default value."). The same shape with jsonExtractIndex(...) likewise throws (RuntimeException("Cannot resolve JSON path...")).
Both JsonExtractScalarTransformFunction and JsonExtractIndexTransformFunction have _nullHandlingEnabled plumbed through init() but the six typed SV methods (transformTo{Int,Long,Float,Double,BigDecimal,String}ValuesSV) ignore it and throw unconditionally.
Meanwhile, JsonIndexDistinctOperator (which SELECT DISTINCT jsonExtractIndex(...) routes through when a JSON index is present) does honor null handling — it folds unresolved rows into a single SQL NULL group via handleMissingDocs. This creates a real divergence: same logical query, different outcomes based on whether the DISTINCT operator route is eligible.
Details
A clicks table with a single JSON column:
row 0: {"country": "US", "clicks": 5}
row 1: {"country": "CA", "clicks": 3}
row 2: {"country": null, "clicks": null}
row 3: {}
JSON index is present on json.
Today on master — enableNullHandling = true
| # |
Query shape |
Default value |
jsonExtractScalar(json, '\$.country', 'STRING'[, …]) |
jsonExtractIndex(json, '\$.country', 'STRING'[, …]) |
Consistent? |
| 1 |
Projection
SELECT <expr> FROM clicks |
none |
throws IllegalArgumentException("Cannot resolve JSON path on some records…") scalar transform fn |
throws RuntimeException("Cannot resolve JSON path…") index transform fn |
✓ both throw |
| 2 |
DISTINCT
SELECT DISTINCT <expr> FROM clicks |
none |
throws (same scalar transform path as row 1) |
returns "US", "CA", NULL via JsonIndexDistinctOperator.handleMissingDocs → addNull() |
✗ inconsistent |
| 3 |
GROUP BY + COUNT
SELECT <expr> AS c, COUNT(*) FROM clicks GROUP BY c |
none |
throws (same scalar transform path as row 1) |
throws (same index transform path as row 1) |
✓ both throw |
| 4 |
Projection |
'foobar' |
returns "US", "CA", "foobar", "foobar" |
returns "US", "CA", "foobar", "foobar" |
✓ |
| 5 |
DISTINCT |
'foobar' |
returns "US", "CA", "foobar" |
returns "US", "CA", "foobar" via JsonIndexDistinctOperator.handleMissingDocs → addValueToDistinctTable("foobar", …) |
✓ |
| 6 |
GROUP BY |
'foobar' |
returns ("CA",1), ("US",1), ("foobar",2) |
returns ("CA",1), ("US",1), ("foobar",2) |
✓ |
After fix — enableNullHandling = true
Both JsonExtractScalarTransformFunction and JsonExtractIndexTransformFunction patched to honor _nullHandlingEnabled when no default literal is supplied.
| # |
Query shape |
Default value |
jsonExtractScalar(json, '\$.country', 'STRING'[, …]) |
jsonExtractIndex(json, '\$.country', 'STRING'[, …]) |
Consistent? |
| 1 |
Projection |
none |
returns "US", "CA", NULL, NULL scalar transform fn — new null-emit branch |
returns "US", "CA", NULL, NULL index transform fn — new null-emit branch |
✓ |
| 2 |
DISTINCT |
none |
returns "US", "CA", NULL scalar SV result (with null bitmap) deduped by generic distinct operator |
returns "US", "CA", NULL via JsonIndexDistinctOperator.handleMissingDocs → addNull() (unchanged) |
✓ |
| 3 |
GROUP BY + COUNT |
none |
returns ("CA",1), ("US",1), (NULL,2) scalar SV result (with null bitmap), then standard GROUP BY |
returns ("CA",1), ("US",1), (NULL,2) index transform fn SV result (with null bitmap), then standard GROUP BY |
✓ |
| 4 |
Projection |
'foobar' |
returns "US", "CA", "foobar", "foobar" unchanged — default-value branch |
returns "US", "CA", "foobar", "foobar" unchanged — default-value branch |
✓ |
| 5 |
DISTINCT |
'foobar' |
returns "US", "CA", "foobar" unchanged |
returns "US", "CA", "foobar" unchanged |
✓ |
| 6 |
GROUP BY |
'foobar' |
returns ("CA",1), ("US",1), ("foobar",2) unchanged |
returns ("CA",1), ("US",1), ("foobar",2) unchanged |
✓ |
Change summary (NH = ON): rows 1–3 in both function-name columns flip from throws to a SQL NULL (or null group / null row). Rows 4–6 are unchanged. Every cell is consistent across both function names post-fix.
Today on master — enableNullHandling = false
| # |
Query shape |
Default value |
jsonExtractScalar(...) |
jsonExtractIndex(...) |
Consistent? |
| 1 |
Projection |
none |
throws IllegalArgumentException |
throws RuntimeException (index transform fn) |
✓ both throw |
| 2 |
DISTINCT |
none |
throws (scalar transform) |
throws RuntimeException("Illegal Json Path: [...], for some docIds in segment [...]") via JsonIndexDistinctOperator.handleMissingDocs — NH-off, no-default branch |
✓ both throw |
| 3 |
GROUP BY |
none |
throws (scalar transform) |
throws (index transform) |
✓ both throw |
| 4 |
Projection |
'foobar' |
returns "US", "CA", "foobar", "foobar" |
returns "US", "CA", "foobar", "foobar" |
✓ |
| 5 |
DISTINCT |
'foobar' |
returns "US", "CA", "foobar" |
returns "US", "CA", "foobar" |
✓ |
| 6 |
GROUP BY |
'foobar' |
returns ("CA",1), ("US",1), ("foobar",2) |
returns ("CA",1), ("US",1), ("foobar",2) |
✓ |
With NH off, all cells are already consistent on master — the fix preserves the legacy throw for the no-default case (gated on _nullHandlingEnabled).
Proposed fix
In both JsonExtractScalarTransformFunction and JsonExtractIndexTransformFunction:
- When
_nullHandlingEnabled = true and no default value was supplied: emit SQL NULL (write the type's null placeholder + mark the row in getNullBitmap()) for unresolved rows, instead of throwing.
- When
_nullHandlingEnabled = false: preserve the legacy throw.
- MV transforms unchanged.
JsonIndexDistinctOperator already handles this correctly; no change needed.
Out of scope (separate follow-up)
- 4-arg with SQL
NULL literal as default: LiteralContext.getStringValue() returns "" for SQL NULL, causing JsonIndexDistinctOperator to insert an empty string instead of a null group. Separate, pre-existing inconsistency.
Summary
With
SET enableNullHandling = true, queries that usejsonExtractScalar(...)(no default value argument) over JSON columns where some rows have no value at the queried path fail withIllegalArgumentException("Cannot resolve JSON path on some records. Consider setting a default value."). The same shape withjsonExtractIndex(...)likewise throws (RuntimeException("Cannot resolve JSON path...")).Both
JsonExtractScalarTransformFunctionandJsonExtractIndexTransformFunctionhave_nullHandlingEnabledplumbed throughinit()but the six typed SV methods (transformTo{Int,Long,Float,Double,BigDecimal,String}ValuesSV) ignore it and throw unconditionally.Meanwhile,
JsonIndexDistinctOperator(whichSELECT DISTINCT jsonExtractIndex(...)routes through when a JSON index is present) does honor null handling — it folds unresolved rows into a single SQLNULLgroup viahandleMissingDocs. This creates a real divergence: same logical query, different outcomes based on whether the DISTINCT operator route is eligible.Details
A
clickstable with a single JSON column:JSON index is present on
json.Today on master —
enableNullHandling = truejsonExtractScalar(json, '\$.country', 'STRING'[, …])jsonExtractIndex(json, '\$.country', 'STRING'[, …])SELECT <expr> FROM clicksIllegalArgumentException("Cannot resolve JSON path on some records…")scalar transform fn
RuntimeException("Cannot resolve JSON path…")index transform fn
SELECT DISTINCT <expr> FROM clicks"US","CA",NULLvia
JsonIndexDistinctOperator.handleMissingDocs→addNull()SELECT <expr> AS c, COUNT(*) FROM clicks GROUP BY c'foobar'"US","CA","foobar","foobar""US","CA","foobar","foobar"'foobar'"US","CA","foobar""US","CA","foobar"via
JsonIndexDistinctOperator.handleMissingDocs→addValueToDistinctTable("foobar", …)'foobar'("CA",1),("US",1),("foobar",2)("CA",1),("US",1),("foobar",2)After fix —
enableNullHandling = trueBoth
JsonExtractScalarTransformFunctionandJsonExtractIndexTransformFunctionpatched to honor_nullHandlingEnabledwhen no default literal is supplied.jsonExtractScalar(json, '\$.country', 'STRING'[, …])jsonExtractIndex(json, '\$.country', 'STRING'[, …])"US","CA",NULL,NULLscalar transform fn — new null-emit branch
"US","CA",NULL,NULLindex transform fn — new null-emit branch
"US","CA",NULLscalar SV result (with null bitmap) deduped by generic distinct operator
"US","CA",NULLvia
JsonIndexDistinctOperator.handleMissingDocs→addNull()(unchanged)("CA",1),("US",1),(NULL,2)scalar SV result (with null bitmap), then standard GROUP BY
("CA",1),("US",1),(NULL,2)index transform fn SV result (with null bitmap), then standard GROUP BY
'foobar'"US","CA","foobar","foobar"unchanged — default-value branch
"US","CA","foobar","foobar"unchanged — default-value branch
'foobar'"US","CA","foobar"unchanged
"US","CA","foobar"unchanged
'foobar'("CA",1),("US",1),("foobar",2)unchanged
("CA",1),("US",1),("foobar",2)unchanged
Change summary (NH = ON): rows 1–3 in both function-name columns flip from
throwsto a SQLNULL(or null group / null row). Rows 4–6 are unchanged. Every cell is consistent across both function names post-fix.Today on master —
enableNullHandling = falsejsonExtractScalar(...)jsonExtractIndex(...)IllegalArgumentExceptionRuntimeException(index transform fn)RuntimeException("Illegal Json Path: [...], for some docIds in segment [...]")via
JsonIndexDistinctOperator.handleMissingDocs— NH-off, no-default branch'foobar'"US","CA","foobar","foobar""US","CA","foobar","foobar"'foobar'"US","CA","foobar""US","CA","foobar"'foobar'("CA",1),("US",1),("foobar",2)("CA",1),("US",1),("foobar",2)With NH off, all cells are already consistent on master — the fix preserves the legacy throw for the no-default case (gated on
_nullHandlingEnabled).Proposed fix
In both
JsonExtractScalarTransformFunctionandJsonExtractIndexTransformFunction:_nullHandlingEnabled = trueand no default value was supplied: emit SQLNULL(write the type's null placeholder + mark the row ingetNullBitmap()) for unresolved rows, instead of throwing._nullHandlingEnabled = false: preserve the legacy throw.JsonIndexDistinctOperatoralready handles this correctly; no change needed.Out of scope (separate follow-up)
NULLliteral as default:LiteralContext.getStringValue()returns""for SQLNULL, causingJsonIndexDistinctOperatorto insert an empty string instead of a null group. Separate, pre-existing inconsistency.