Skip to content

fix: pass elem writer to JSON array membership dialect methods#22

Merged
richardwooding merged 1 commit into
mainfrom
fix/json-array-membership-elem-writer
Jun 9, 2026
Merged

fix: pass elem writer to JSON array membership dialect methods#22
richardwooding merged 1 commit into
mainfrom
fix/json-array-membership-elem-writer

Conversation

@richardwooding

Copy link
Copy Markdown
Contributor

Summary

Ports cel2sql4j PR #20 (commit 1835215) into pycel2sql. Resolves the single port candidate flagged by three weekly upstream-scan issues: #21, #20, #18.

The two JSON-array membership dialect methods (write_json_array_membership / write_nested_json_array_membership) previously received only the array writer, forcing the converter to emit elem = <dialect output> inline. Worse, _visit_in never routed to them at all — they were dead code, and x in <json array field> fell through to plain write_array_membership. The inline elem = form was broken for every dialect except PostgreSQL.

Changes

  • dialect/_base.py — widen both ABC methods to take write_elem alongside write_array; each dialect now owns the full boolean predicate.
  • 6 dialects — emit correct membership SQL (see table).
  • _converter.py_visit_in now detects a JSON-array RHS (schema-declared JSON field, nested JSON access, or flat json_variables path) and routes to the JSON-membership hooks. A direct table.field carries JSONB-ness via json_func; a deeper chain uses the nested form. Non-JSON RHS still falls back to write_array_membership.
  • Tests — replace the two Spark "raises" tests with positive assertions; add cross-dialect converter tests in test_dialect_parametrized.py (direct JSONB/JSON field, nested access, + non-JSON regression guard).
  • Docs — update CLAUDE.md conventions and the Spark dialect-differences line.

Per-dialect output ("x" in t.tags, t.tags a JSON array)

Dialect Before (broken) After
PostgreSQL ANY(ARRAY(SELECT jsonFunc(arr))) (accidental) 'x' = ANY(ARRAY(SELECT jsonb_array_elements_text(t.tags)))
BigQuery = UNNEST(...) ❌ invalid 'x' IN UNNEST(JSON_VALUE_ARRAY(t.tags))
DuckDB scalar subquery (last row) ❌ EXISTS (SELECT 1 FROM json_each(t.tags) WHERE value = 'x')
SQLite scalar subquery (last row) ❌ EXISTS (SELECT 1 FROM json_each(t.tags) WHERE value = 'x')
MySQL compares elem to 0/1 ❌ JSON_OVERLAPS(JSON_ARRAY('x'), t.tags)
Spark raised UnsupportedDialectFeatureError array_contains(from_json(t.tags, 'ARRAY<STRING>'), 'x')

Verification

  • uv run ruff check src/ tests/ — clean
  • uv run pytest tests/ --ignore=tests/integration728 passed (19 new)
  • mypy: only the pre-existing/expected bare-Tree [type-arg] errors (gated continue-on-error in CI)

Closes #21
Closes #20
Closes #18

🤖 Generated with Claude Code

)

Port of cel2sql4j PR #20 (commit 1835215). The two JSON-array membership
dialect methods previously received only the array writer, so the converter
had to emit `elem = <dialect output>` inline — and `_visit_in` never even
routed to them (dead code). The inline form was broken for every dialect
except PostgreSQL:

- BigQuery: `= UNNEST(...)` is invalid; needs `IN UNNEST(...)`.
- DuckDB/SQLite: scalar subquery returns the last row for multi-element arrays.
- MySQL: compared the element to JSON_CONTAINS's 0/1 result.
- Spark: raised at conversion time (couldn't build a predicate without elem).

Widen `write_json_array_membership` / `write_nested_json_array_membership` to
take a `write_elem` writer alongside `write_array`; each dialect now owns the
full boolean predicate. Wire `_visit_in` to detect a JSON-array RHS (schema
JSON field, nested JSON access, or flat json_variable path) and route to the
JSON-membership hooks; non-JSON RHS still falls back to write_array_membership.

Per-dialect output:
  PostgreSQL: elem = ANY(ARRAY(SELECT <json_func>(arr)))
  MySQL:      JSON_OVERLAPS(JSON_ARRAY(elem), arr)
  SQLite:     EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)
  DuckDB:     EXISTS (SELECT 1 FROM json_each(arr) WHERE value = elem)
  BigQuery:   elem IN UNNEST(JSON_VALUE_ARRAY(arr))
  Spark:      array_contains(from_json(arr, 'ARRAY<STRING>'), elem)

Replace the two Spark "raises" tests with positive assertions and add
cross-dialect converter tests (direct JSONB/JSON field, nested access, and a
non-JSON regression guard). Update CLAUDE.md conventions + Spark dialect notes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@richardwooding richardwooding merged commit 70ff849 into main Jun 9, 2026
7 checks passed
@richardwooding richardwooding deleted the fix/json-array-membership-elem-writer branch June 9, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upstream port candidates — week of 2026-06-08 Upstream port candidates — week of 2026-06-01 Upstream port candidates — week of 2026-05-25

1 participant