Fix OPTIONAL MATCH dropping null-preserving rows with subquery WHERE (#2378)#2380
Conversation
…pache#2378) Cypher OPTIONAL MATCH semantics require that when no right-hand row survives the WHERE predicate, the outer row is still emitted with NULLs in the optional columns. Before this fix, a WHERE containing a list comprehension or sub-pattern predicate (EXISTS { ... }, COUNT { ... }) would take the transform_cypher_clause_with_where rewrite path, which detaches the WHERE, transforms the match clause as a subquery, and then attaches the WHERE as an outer filter on that subquery. For OPTIONAL MATCH, the inner subquery already produced a LATERAL LEFT JOIN with null-preserving rows; the outer filter then ran against those nulled rows and dropped them when the predicate evaluated NULL or false on the nulled side, producing zero rows where Cypher semantics require one null-filled row per outer match. Fix: in transform_cypher_match, the has_list_comp_or_subquery rewrite now only applies to non-optional MATCH. In the OPTIONAL MATCH path, transform_cypher_optional_match_clause detaches the WHERE from the cypher_match node before recursively transforming the right-hand side (so the inner transform does not double-apply or misresolve the predicate in a fresh namespace), and re-attaches the transformed predicate as the LEFT JOIN's ON condition after both sides are in the namespace. A LEFT JOIN with a failing ON condition correctly preserves left rows with null right columns, which matches Cypher OPTIONAL MATCH ... WHERE semantics. Regression tests cover: - EXISTS { (friend)-[...]->(...) } referencing the optional variable - EXISTS { (p)-[...]->(...) } referencing the outer variable - non-correlated EXISTS (previously-working guard) - plain scalar predicate on the optional variable (guard) - constant-false WHERE (guard) Fixes issue apache#2378. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@gregfelice build is failing |
|
Ok on it |
…#2378) transform_cypher_optional_match_clause was calling transform_cypher_expr with EXPR_KIND_JOIN_ON when re-attaching the detached WHERE as the LEFT JOIN's ON condition. All other WHERE transforms in cypher_clause.c use EXPR_KIND_WHERE, and there are three explicit p_expr_kind == EXPR_KIND_WHERE guards (cypher_clause.c:5415, 5679, 6597) that do load-bearing variable resolution for sub-pattern predicates -- walking up parent parsestates to rebind variables like `friend` inside EXISTS { (friend)-[...]->(...) }. Using EXPR_KIND_JOIN_ON bypassed those guards, so the sub-pattern fell through to the "create new variable" path and produced a structurally invalid parse tree. Under a release PG build the query happened to produce correct-looking output, but under --enable-cassert the downstream invariant checks aborted, crashing the backend and taking down the regression run (reported by @MuhammadTahaNaveed). Fix: use EXPR_KIND_WHERE, matching the pattern already established in transform_cypher_clause_with_where at line 2619. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed a one-line fix (bc891bf). Root cause: In Why I didn't catch it: my local PG18 is the stock Ubuntu package (no |
The four ExecStoreVirtualTuple calls in exec_cypher_merge were triggering
an Assert failure under --enable-cassert:
TRAP: failed Assert("TTS_EMPTY(slot)"), File: execTuples.c, Line: 1748
ExecStoreVirtualTuple (execTuples.c:1748) asserts that its target slot
is in the TTS_EMPTY state. In our MERGE executor, process_path writes
directly into the subquery's scan tuple slot -- which already holds the
subquery's output tuple and therefore is NOT empty. On a release build
the assertion compiles out and ExecStoreVirtualTuple just clears the flag
and sets tts_nvalid; on an --enable-cassert build the backend aborts and
takes down the regression run.
We only need the bookkeeping half of ExecStoreVirtualTuple (clear
TTS_FLAG_EMPTY and set tts_nvalid = natts) -- not the "store semantics"
that motivate the assertion. Add a small static helper
mark_scan_slot_valid() that does exactly the bookkeeping, and replace
the four call sites. Release-build behavior is byte-identical since
Assert() compiles to nothing; cassert-build behavior now matches release.
Caught by the cassert-enabled regression suite we reinstated after apache#2380.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Fixes issue #2378 —
OPTIONAL MATCHincorrectly drops null-preserving outer rows when itsWHEREclause contains a correlated sub-pattern predicate (EXISTS { ... },COUNT { ... }).Reproducer
Before the fix:
(0 rows)After the fix:
Alice | NULL,Bob | NULL,Charlie | NULL— one row per person, null-preserved, matching Cypher OPTIONAL MATCH semantics.Root cause
In
transform_cypher_match, when theWHEREclause contains a list comprehension or a sub-pattern (has_list_comp_or_subqueryreturns true), the code takes a rewrite path that:WHEREfrom thecypher_matchnode (match_self->where = NULL).transform_cypher_clause_as_subquery.WHEREas an outer filter on that subquery:query->jointree = makeFromExpr(p_joinlist, where_qual).This rewrite is correct for non-optional
MATCH, because the subquery's output is the full set of matches and the outerWHEREfilters that set.But for
OPTIONAL MATCH, the inner subquery produces aLATERAL LEFT JOINthat already emits null-preserving rows for outer tuples with no right-hand match. The outerWHEREfilter then runs against those nulled rows and drops them when the predicate evaluates toNULLorfalseon the nulled side. The result is zero rows for every outer tuple whose optional matches all fail the predicate — a direct violation of CypherOPTIONAL MATCHsemantics, which require the outer row to survive withNULLin the optional columns.Fix
Two coordinated changes in
src/backend/parser/cypher_clause.c:transform_cypher_match— only take thehas_list_comp_or_subqueryrewrite path for non-optional MATCH. For OPTIONAL MATCH, fall through to the normaltransform_cypher_match_patternpath so the WHERE remains attached to thecypher_matchnode.transform_cypher_optional_match_clause— detachmatch_self->wherebefore recursively transforming the right-hand side of theLATERAL LEFT JOIN, so the inner transform does not double-apply or misresolve the predicate in a fresh namespace. After both sides are transformed and the right-side namespace item is inpstate->p_namespace, re-attach the transformed predicate as theLEFT JOIN'sONcondition (j->quals). PostgreSQL'sLEFT JOIN ... ON <pred>correctly preserves left rows with null right columns when theONfails — which is exactly the semantics CypherOPTIONAL MATCH ... WHERErequires.The fix restores the invariant that for every outer row,
OPTIONAL MATCH ... WHERE predeither emits the surviving right-hand matches (if any) or exactly one row with nulls in the optional columns.Regression tests
Added to
regress/sql/cypher_match.sqlunder a newissue_2378graph:EXISTSreferencing the optional variable (friend) — the primary reproducer.EXISTSreferencing the outer variable (p) — the second bug family.EXISTS— was already working; kept as a regression guard.friend.name = 'Bob') — was already working; guard.WHERE— was already working; guard.Test plan
make installcheck REGRESS=\"cypher_match cypher_merge cypher_with cypher_subquery cypher_vle cypher_union cypher_call cypher_create cypher_set cypher_delete cypher_remove cypher_unwind list_comprehension expr\"— all 14 suites pass.p) now preserves outer rows correctly.WHERE falsecontinue to work as before.