Skip to content

[VL] Fix json_tuple rewrite producing incompatible JSON path in fallback scenarios#12038

Merged
philo-he merged 4 commits intoapache:mainfrom
Zouxxyy:dev/fix-json1
May 9, 2026
Merged

[VL] Fix json_tuple rewrite producing incompatible JSON path in fallback scenarios#12038
philo-he merged 4 commits intoapache:mainfrom
Zouxxyy:dev/fix-json1

Conversation

@Zouxxyy
Copy link
Copy Markdown
Contributor

@Zouxxyy Zouxxyy commented May 5, 2026

What changes are proposed in this pull request?

When json_tuple is rewritten to get_json_object calls by PullOutGenerateProjectHelper.pullOutPreProject, the generated JSON path used bare bracket notation $[key] (without quotes). This works fine in Velox/simdjson, but when the expression falls back to Spark JVM execution (e.g., get_json_object is blacklisted or validation fails), Spark's JsonPathParser rejects $[key] and returns NULL.

This PR changes the generated path format from $[key] to $['key'] (single-quoted bracket notation), which is accepted by both Velox/simdjson and Spark JVM's JsonPathParser.

Root cause:

  • Velox's JsonPathNormalizer normalizes $['key']$[key] internally for simdjson, so both forms work.
  • Spark JVM only accepts $.name or $['name'], and rejects $[name] (bare brackets without quotes), returning NULL directly.

How was this patch tested?

Added unit tests in MiscOperatorSuite covering:

  • Basic single key extraction with fallback
  • Dot-containing field names (e.g., a.b) — the core scenario for bracket notation
  • Multiple keys extraction
  • Non-existent keys returning null
  • Mix of existing and non-existing keys
  • NULL JSON input handling

Was this patch authored or co-authored using generative AI tooling?

Yes

Generated-by: Qoder

@github-actions github-actions Bot added the VELOX label May 5, 2026
@Zouxxyy
Copy link
Copy Markdown
Contributor Author

Zouxxyy commented May 6, 2026

The failure ci should be unrelated, CC @lyy-pineapple @philo-he for a look thanks

@philo-he
Copy link
Copy Markdown
Member

philo-he commented May 8, 2026

Thank you for the PR.
MiscOperatorSuite.scala generally holds some basic tests for expressions/operators. I would recommend to move the proposed test to a new test suite, named like JsonTuplePathRewriteSuite.scala. Thanks.

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@philo-he philo-he changed the title [VL] Fix json_tuple rewrite producing incompatible JSON path for Spark JVM fallback [VL] Fix json_tuple rewrite producing incompatible JSON path in fallback scenarios May 9, 2026
@philo-he philo-he merged commit c3165c1 into apache:main May 9, 2026
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants