Skip to content

Fix OPTIONAL MATCH dropping null-preserving rows with subquery WHERE (#2378)#2380

Merged
MuhammadTahaNaveed merged 2 commits intoapache:masterfrom
gregfelice:fix_2378_optional_match_correlated_subquery
Apr 19, 2026
Merged

Fix OPTIONAL MATCH dropping null-preserving rows with subquery WHERE (#2378)#2380
MuhammadTahaNaveed merged 2 commits intoapache:masterfrom
gregfelice:fix_2378_optional_match_correlated_subquery

Conversation

@gregfelice
Copy link
Copy Markdown
Contributor

Summary

Fixes issue #2378OPTIONAL MATCH incorrectly drops null-preserving outer rows when its WHERE clause contains a correlated sub-pattern predicate (EXISTS { ... }, COUNT { ... }).

Reproducer

CREATE (a:Person {name:'Alice'}), (b:Person {name:'Bob'}), (c:Person {name:'Charlie'}),
       (a)-[:KNOWS]->(b), (a)-[:KNOWS]->(c);

MATCH (p:Person)
OPTIONAL MATCH (p)-[:KNOWS]->(friend:Person)
WHERE EXISTS { (friend)-[:KNOWS]->(:Person) }
RETURN p.name, friend.name;

Before the fix: (0 rows)
After the fix: Alice | NULL, Bob | NULL, Charlie | NULL — one row per person, null-preserved, matching Cypher OPTIONAL MATCH semantics.

Root cause

In transform_cypher_match, when the WHERE clause contains a list comprehension or a sub-pattern (has_list_comp_or_subquery returns true), the code takes a rewrite path that:

  1. Detaches the WHERE from the cypher_match node (match_self->where = NULL).
  2. Transforms the match clause as a subquery via transform_cypher_clause_as_subquery.
  3. Attaches the detached WHERE as an outer filter on that subquery: query->jointree = makeFromExpr(p_joinlist, where_qual).

This rewrite is correct for non-optional MATCH, because the subquery's output is the full set of matches and the outer WHERE filters that set.

But for OPTIONAL MATCH, the inner subquery produces a LATERAL LEFT JOIN that already emits null-preserving rows for outer tuples with no right-hand match. The outer WHERE filter then runs against those nulled rows and drops them when the predicate evaluates to NULL or false on the nulled side. The result is zero rows for every outer tuple whose optional matches all fail the predicate — a direct violation of Cypher OPTIONAL MATCH semantics, which require the outer row to survive with NULL in the optional columns.

Fix

Two coordinated changes in src/backend/parser/cypher_clause.c:

  1. transform_cypher_match — only take the has_list_comp_or_subquery rewrite path for non-optional MATCH. For OPTIONAL MATCH, fall through to the normal transform_cypher_match_pattern path so the WHERE remains attached to the cypher_match node.

  2. transform_cypher_optional_match_clause — detach match_self->where before recursively transforming the right-hand side of the LATERAL LEFT JOIN, so the inner transform does not double-apply or misresolve the predicate in a fresh namespace. After both sides are transformed and the right-side namespace item is in pstate->p_namespace, re-attach the transformed predicate as the LEFT JOIN's ON condition (j->quals). PostgreSQL's LEFT JOIN ... ON <pred> correctly preserves left rows with null right columns when the ON fails — which is exactly the semantics Cypher OPTIONAL MATCH ... WHERE requires.

The fix restores the invariant that for every outer row, OPTIONAL MATCH ... WHERE pred either emits the surviving right-hand matches (if any) or exactly one row with nulls in the optional columns.

Regression tests

Added to regress/sql/cypher_match.sql under a new issue_2378 graph:

  • Correlated EXISTS referencing the optional variable (friend) — the primary reproducer.
  • Correlated EXISTS referencing the outer variable (p) — the second bug family.
  • Non-correlated EXISTS — was already working; kept as a regression guard.
  • Plain scalar predicate on the optional variable (friend.name = 'Bob') — was already working; guard.
  • Constant-false WHERE — was already working; guard.

Test plan

  • make installcheck REGRESS=\"cypher_match cypher_merge cypher_with cypher_subquery cypher_vle cypher_union cypher_call cypher_create cypher_set cypher_delete cypher_remove cypher_unwind list_comprehension expr\" — all 14 suites pass.
  • Issue reporter's primary reproducer now returns the expected 3 rows with null-preserved friend.
  • Second variant from the issue body (EXISTS referencing outer p) now preserves outer rows correctly.
  • Non-correlated EXISTS, plain predicates, and WHERE false continue to work as before.

…pache#2378)

Cypher OPTIONAL MATCH semantics require that when no right-hand row
survives the WHERE predicate, the outer row is still emitted with
NULLs in the optional columns.  Before this fix, a WHERE containing
a list comprehension or sub-pattern predicate (EXISTS { ... },
COUNT { ... }) would take the transform_cypher_clause_with_where
rewrite path, which detaches the WHERE, transforms the match clause
as a subquery, and then attaches the WHERE as an outer filter on that
subquery.  For OPTIONAL MATCH, the inner subquery already produced a
LATERAL LEFT JOIN with null-preserving rows; the outer filter then
ran against those nulled rows and dropped them when the predicate
evaluated NULL or false on the nulled side, producing zero rows where
Cypher semantics require one null-filled row per outer match.

Fix: in transform_cypher_match, the has_list_comp_or_subquery rewrite
now only applies to non-optional MATCH.  In the OPTIONAL MATCH path,
transform_cypher_optional_match_clause detaches the WHERE from the
cypher_match node before recursively transforming the right-hand side
(so the inner transform does not double-apply or misresolve the
predicate in a fresh namespace), and re-attaches the transformed
predicate as the LEFT JOIN's ON condition after both sides are in the
namespace.  A LEFT JOIN with a failing ON condition correctly
preserves left rows with null right columns, which matches Cypher
OPTIONAL MATCH ... WHERE semantics.

Regression tests cover:
  - EXISTS { (friend)-[...]->(...) } referencing the optional variable
  - EXISTS { (p)-[...]->(...) } referencing the outer variable
  - non-correlated EXISTS (previously-working guard)
  - plain scalar predicate on the optional variable (guard)
  - constant-false WHERE (guard)

Fixes issue apache#2378.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MuhammadTahaNaveed
Copy link
Copy Markdown
Member

@gregfelice build is failing

@gregfelice
Copy link
Copy Markdown
Contributor Author

Ok on it

…#2378)

transform_cypher_optional_match_clause was calling transform_cypher_expr
with EXPR_KIND_JOIN_ON when re-attaching the detached WHERE as the
LEFT JOIN's ON condition.  All other WHERE transforms in cypher_clause.c
use EXPR_KIND_WHERE, and there are three explicit
p_expr_kind == EXPR_KIND_WHERE guards (cypher_clause.c:5415, 5679, 6597)
that do load-bearing variable resolution for sub-pattern predicates --
walking up parent parsestates to rebind variables like `friend` inside
EXISTS { (friend)-[...]->(...) }.

Using EXPR_KIND_JOIN_ON bypassed those guards, so the sub-pattern fell
through to the "create new variable" path and produced a structurally
invalid parse tree.  Under a release PG build the query happened to
produce correct-looking output, but under --enable-cassert the
downstream invariant checks aborted, crashing the backend and taking
down the regression run (reported by @MuhammadTahaNaveed).

Fix: use EXPR_KIND_WHERE, matching the pattern already established in
transform_cypher_clause_with_where at line 2619.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gregfelice
Copy link
Copy Markdown
Contributor Author

Pushed a one-line fix (bc891bf).

Root cause: In transform_cypher_optional_match_clause I called transform_cypher_expr with EXPR_KIND_JOIN_ON when re-attaching the detached WHERE as the LEFT JOIN's ON condition. All other WHERE transforms in cypher_clause.c use EXPR_KIND_WHERE, and there are three explicit p_expr_kind == EXPR_KIND_WHERE guards (cypher_clause.c:5415, 5679, 6597) that do load-bearing variable resolution for sub-pattern predicates — walking up parent parsestates to rebind variables like friend inside EXISTS { (friend)-[...]->(...) }. EXPR_KIND_JOIN_ON bypassed those guards, so the sub-pattern fell through to the "create new variable" path and produced a structurally invalid parse tree — silently tolerated on a release PG build but caught by --enable-cassert.

Why I didn't catch it: my local PG18 is the stock Ubuntu package (no --enable-cassert), so all 32 regression tests passed locally before I opened the PR. Setting up a dedicated --enable-cassert PG build locally now so this class of failure can't slip past me again. Thanks for flagging @MuhammadTahaNaveed.

@MuhammadTahaNaveed MuhammadTahaNaveed merged commit 15030a0 into apache:master Apr 19, 2026
6 checks passed
gregfelice added a commit to gregfelice/age that referenced this pull request Apr 19, 2026
The four ExecStoreVirtualTuple calls in exec_cypher_merge were triggering
an Assert failure under --enable-cassert:

    TRAP: failed Assert("TTS_EMPTY(slot)"), File: execTuples.c, Line: 1748

ExecStoreVirtualTuple (execTuples.c:1748) asserts that its target slot
is in the TTS_EMPTY state.  In our MERGE executor, process_path writes
directly into the subquery's scan tuple slot -- which already holds the
subquery's output tuple and therefore is NOT empty.  On a release build
the assertion compiles out and ExecStoreVirtualTuple just clears the flag
and sets tts_nvalid; on an --enable-cassert build the backend aborts and
takes down the regression run.

We only need the bookkeeping half of ExecStoreVirtualTuple (clear
TTS_FLAG_EMPTY and set tts_nvalid = natts) -- not the "store semantics"
that motivate the assertion.  Add a small static helper
mark_scan_slot_valid() that does exactly the bookkeeping, and replace
the four call sites.  Release-build behavior is byte-identical since
Assert() compiles to nothing; cassert-build behavior now matches release.

Caught by the cassert-enabled regression suite we reinstated after apache#2380.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OPTIONAL MATCH may incorrectly drop null-preserving outer rows when its WHERE clause contains a correlated subquery predicate.

2 participants