Skip to content

[optimize] Distribute outer path over union-step branches#6310

Open
joewiz wants to merge 6 commits intoeXist-db:developfrom
joewiz:optimize/union-step-distribute
Open

[optimize] Distribute outer path over union-step branches#6310
joewiz wants to merge 6 commits intoeXist-db:developfrom
joewiz:optimize/union-step-distribute

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 7, 2026

Summary

Companion to #6303. That PR fixed the parenthesised single-step case //(name) -> //name so the structural index by qname could dispatch the step. The remaining slow shape is the union-of-steps form outer//(A | B [| C ...]). This PR rewrites it to outer//A | outer//B [| outer//C ...] so each branch dispatches through the structural index independently.

After this PR the user-land manual rewrite in function-documentation#178 becomes unnecessary -- the engine produces the same fast path automatically.

Performance context

From the #6295 investigation, on a 6,000-function xqdoc-style corpus on develop:

Query Time Note
//xqdoc:function | //xqdoc:module 3 ms baseline (split form)
//(xqdoc:function | xqdoc:module) 165 ms ~55x slower
//(xqdoc:function | xqdoc:module | xqdoc:name) 233 ms scales with branches
//* (full descendant scan) 22 ms reference

Same XPath result, dramatically different runtimes. The split form uses the structural index per branch; the parenthesised form materialises the descendant axis and applies the union as a generic step. After this PR the slow forms reach the same fast path.

What changed

exist-core/src/main/java/org/exist/xquery/PathExpr.java

  • New replaceAllSteps(Expression) method. Collapses a multi-step path into a single-step path containing one expression. The existing replace(old, new) only swaps one step at a time and doesn't fit the bulk rewrite.

exist-core/src/main/java/org/exist/xquery/Optimizer.java

  • visitPathExpr extended with a Case B branch alongside the existing Case A unwrap.
  • When a step is exactly PathExpr.class wrapping a Union (or a tree of Unions for n-ary cases), build a distributed Union whose branches are full paths combining outer.prefix + branch.steps + outer.suffix.
  • Recurses over nested Unions so A | B | C (parses as Union(Union(A, B), C)) becomes Union(Union(P_A, P_B), P_C).

Safety constraints (each enforced and tested)

  1. Exact-class check (step.getClass() == PathExpr.class). Many semantically loaded expression types extend PathExpr (UnaryExpr, BinaryOp, OpNumeric, ConcatExpr, EnclosedExpr, LogicalOp, RangeExpression, ...). A generic instanceof PathExpr would corrupt their semantics -- the same trap flagged in [optimize] Unwrap parenthesised single-step expressions in Optimizer #6303.

  2. Skip when predicates > 0. Inside a predicate, Predicate.selectByNodeSet walks each result node and looks up its contextId to map back to its candidate. Splitting the predicate's inner path into a Union breaks the contextId thread; the engine throws `Internal evaluation error: context is missing for node ...`. The existing UnionTest.unionInPredicate_* tests caught this regression.

  3. Skip when any suffix step is not a LocationStep. Distribution moves the suffix into each branch, so the branch's last step becomes whatever was at the end of the original path. If that's /string() or another non-node-returning expression, the new Union fails its operand-must-be-node-sequence invariant in CombiningExpression.combine. Caught by xmlts `UnionInPath` tests.

  4. Require every leaf branch to be a non-empty PathExpr of LocationSteps (recursing through nested Unions for n-ary cases). Conservative -- false negatives just mean missed optimisations, not broken code.

Test plan

  • OptimizerTest (7), UnionTest (3), XPathQueryTest (150), XQuery3Tests (1,011), xquery.CoreTests, xquery.xqsuite.XQSuiteTests -- all green
  • Full exist-core suite: 6,687 tests, 0 failures, 105 pre-existing skipped
  • extensions/indexes/ngram (39), extensions/indexes/range (428), extensions/indexes/lucene (659) -- all green
  • New UnionStepDistributionOptimizerTest (18 tests): binary and n-ary unions, prefix and suffix steps, attribute axes, mixed paths, predicate-internal unions (verifies the skip), non-node suffix (verifies the skip), document-order preservation, FLWOR/function-call parents
  • Codacy PMD: no new warnings introduced (the 6 reported warnings are all pre-existing on develop)

References

🤖 Generated with Claude Code

Companion to eXist-db#6303. That PR fixed the parenthesised single-step case
`//(name)` -> `//name` so the structural index by qname could dispatch
the step. The remaining slow shape is the union-of-steps form
`outer//(A | B [| C ...])` -- this PR rewrites it to
`outer//A | outer//B [| outer//C ...]` so each branch dispatches
through the structural index independently.

The performance gap on a 6,000-function xqdoc-style corpus (from the
PR eXist-db#6295 investigation):

```
//xqdoc:function | //xqdoc:module                3 ms     baseline
//(xqdoc:function | xqdoc:module)              165 ms     ~55x slower
//(xqdoc:function | xqdoc:module | xqdoc:name) 233 ms     scales with branches
```

Same XPath result, dramatically different runtimes. The split form uses
the structural index per branch; the parenthesised form materialises
the descendant axis and applies the union as a generic step. After this
PR the slow forms reach the same fast path as the manual rewrite,
making PR function-documentation#178 (a hand-written user-land split)
unnecessary.

Implementation:

* Add `PathExpr.replaceAllSteps(Expression)` so the optimizer can
  collapse a multi-step outer path into a single Union step in place.
  PathExpr's existing `replace(old, new)` swaps one step at a time
  and doesn't fit the bulk rewrite.

* Extend `Optimizer.visitPathExpr` with a Case B branch alongside the
  existing Case A unwrap. When a step is exactly `PathExpr.class`
  wrapping a Union (or a tree of Unions for n-ary cases), build a
  distributed Union whose branches are full paths combining the outer
  prefix + branch steps + outer suffix. Recurse over nested Unions so
  `A | B | C` (parses as `Union(Union(A, B), C)`) becomes
  `Union(Union(P_A, P_B), P_C)`.

Safety constraints:

* Use `step.getClass() == PathExpr.class`, never `instanceof PathExpr`.
  `UnaryExpr`, `BinaryOp`, `OpNumeric`, `ConcatExpr`, `EnclosedExpr`,
  `LogicalOp`, `RangeExpression`, etc. all extend `PathExpr` and a
  generic instanceof would corrupt their semantics (the same trap
  flagged in eXist-db#6303).

* Skip when `predicates > 0`. Inside a predicate, `Predicate.selectByNodeSet`
  walks the result NodeSet and looks up each node's contextId to map
  back to its candidate. Splitting the predicate's inner path into a
  Union breaks the contextId thread and the engine throws "context is
  missing for node ..." (caught by the existing `UnionTest` regressions).

* Skip when any suffix step (steps after the union) is not a
  LocationStep. Distribution moves the suffix into each branch, so
  the branch's last step becomes whatever was at the end of the
  original path. If that's `/string()` or another non-node-returning
  expression, the new Union fails its operand-must-be-node-sequence
  invariant in `CombiningExpression.combine`. (Caught by the xmlts
  union-in-path tests.)

* Require every leaf branch to be a non-empty PathExpr of LocationSteps
  (recursing through nested Unions). Conservative -- false negatives
  just mean missed optimisations, not broken code.

Verified on develop:

* Full exist-core suite: 6,687 tests, 0 failures, 105 pre-existing skipped
* OptimizerTest 7, UnionTest 3, XPathQueryTest 150, XQuery3Tests 1011,
  CoreTests/XQSuiteTests union and path-expression tests
* extensions/indexes/{ngram,range,lucene}: 39 + 428 + 659 = 1,126 tests
* New `UnionStepDistributionOptimizerTest`: 18 tests covering binary
  and n-ary unions, prefix and suffix steps, attribute axes, mixed
  paths, predicate-internal unions (must skip), non-node suffix
  (must skip), document-order preservation, and FLWOR/function-call
  parents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz requested a review from a team as a code owner May 7, 2026 02:36
@line-o
Copy link
Copy Markdown
Member

line-o commented May 7, 2026

I just tested this and the search in function documentation is still clocking in at 9 seconds

curl 'http://localhost:8080/exist/apps/fundocs/query' \
  -X 'POST' \
  -H 'Content-Type: application/x-www-form-urlencoded;charset=UTF-8' \
  --data 'q=test&where=everywhere&action=search'

@line-o
Copy link
Copy Markdown
Member

line-o commented May 7, 2026

PathExpr.replaceAllSteps(Expression) is never called which is evidence that the optimization fails to identify or is unable to rewrite the union in fundocs.

@line-o
Copy link
Copy Markdown
Member

line-o commented May 7, 2026

But I can also confirm that with the rewritten form of the queries in current function-documentation master branch HEAD the queries are fast again.
They then also call PathExpr.replaceAllSteps(Expression)

@duncdrum duncdrum requested a review from a team May 7, 2026 09:25
duncdrum added a commit to duncdrum/exist that referenced this pull request May 7, 2026
@line-o
Copy link
Copy Markdown
Member

line-o commented May 7, 2026

@joewiz I striked the paragraph that claims the query rewrite in function documentation would not be necessary as my testing proved this to not be correct.

I would appreciate if you try to independently verify or falsify my finding.

Copy link
Copy Markdown
Member

@line-o line-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate the newly added comments to be slighly less verbose as this makes the code now harder to read.
How about the prose about cases in the DocBlock of the function and then just referencing them in the code itself?

duncdrum added a commit to duncdrum/exist that referenced this pull request May 7, 2026
…ution

line-o's PR eXist-db#6310 review demonstrated empirically that the union-step
distribution did not fire on the function-documentation app's
search-everywhere query. PathExpr.replaceAllSteps was never called and
the query still took ~9s; manually distributing the union by hand made
it fast and triggered replaceAllSteps. This commit closes that gap.

Diagnosis: visitPathExpr calls super.visitPathExpr first, which descends
into the wrapping PathExpr's children. visitLocationStep then wraps any
predicate-bearing optimizable LocationStep in an ExtensionExpression
carrying the #exist:optimize# pragma. By the time the union-step
recognition predicate runs on the outer PathExpr, the union's branches
no longer contain raw LocationSteps -- they contain ExtensionExpressions
wrapping LocationSteps. isDistributableBranch's
`instanceof LocationStep` check rejected those branches and the rewrite
silently bailed. The fundocs search-everywhere query has predicates on
nearly every branch (ngram:contains / contains over xqdoc:function and
xqdoc:module) so essentially every real branch hit this path.

Fix: introduce isStepLikeNodeExpr, which accepts both raw LocationSteps
and ExtensionExpressions whose inner expression is a LocationStep. Apply
it in both branch-shape recognition (isDistributableBranch) and suffix-
shape recognition (hasOnlyLocationStepSuffix). The pragma wrapper
preserves the wrapped step's node-yielding semantics, so distribution
remains semantics-preserving.

Also addresses line-o's separate request to make the inline comments
less verbose: prose moved to the visitPathExpr docblock; case markers
in the body are short labels referencing the docblock.

Adds ExtensionExpression.getExpression() (the inner expression was
previously settable but not readable from outside the class).

Adds a regression test, fundocsShapeMixedBranches, that mirrors the
fundocs branch shape (mixed predicated single-step branches plus a
multi-step branch). Verified the optimization fires on this shape via
temporary instrumentation (replaceAllSteps invoked, branches=4) before
removing the instrumentation.

Verified: exist-core (6688 tests), extensions/indexes/range (428),
extensions/indexes/ngram (39), extensions/indexes/lucene (659) -- all
pass with no new failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 8, 2026

[This response was co-authored with Claude Code. -Joe]

@line-o thank you — confirmed and fixed in 1621fba.

Diagnosis. Your finding was correct: the rewrite was never firing on fundocs. The issue was in the order of operations inside Optimizer.visitPathExpr:

  1. super.visitPathExpr(pathExpr) runs first and descends into children.
  2. That descent hits visitLocationStep, which wraps every predicate-bearing optimizable LocationStep inside the union's branches in an ExtensionExpression carrying the #exist:optimize# pragma.
  3. By the time my union-recognition predicate ran, the union's branches no longer contained raw LocationStep instances — they contained ExtensionExpression instances. isDistributableBranch's instanceof LocationStep check rejected them and the rewrite silently bailed.

The fundocs search-everywhere query is essentially the worst case for this: nearly every branch is xqdoc:function[ngram:contains(...)] or xqdoc:module[contains(...)], and the xqdoc:control[...]/..//xqdoc:function and xqdoc:module[...]/..//xqdoc:function branches have a predicated step too — every branch picks up the pragma wrapper, so the rewrite was always skipped.

Fix. New helper isStepLikeNodeExpr accepts either a raw LocationStep or an ExtensionExpression wrapping a LocationStep. Used in both isDistributableBranch and hasOnlyLocationStepSuffix. The pragma wrapper preserves node-yielding semantics, so distribution remains safe — index dispatch still fires on each distributed branch (in fact more reliably, since each branch is now a complete path the indexer can match).

Regression test. UnionStepDistributionOptimizerTest.fundocsShapeMixedBranches constructs a fundocs-shaped union (mixed predicated branches + multi-step branch). Before the fix, this still gave the right result (correctness was never the bug — only performance was) but replaceAllSteps was not called. After the fix replaceAllSteps fires with 4 distributed branches; verified via temporary instrumentation before removing it.

Verbose comments. Also addressed your other request — prose moved into the visitPathExpr docblock, body comments are short labels (// Case A, // Case B) referencing the docblock.

Tests. exist-core (6688), extensions/indexes/range (428), ngram (39), lucene (659) — all pass.

CI rerunning. Please re-test against your fundocs query/search reproducer when convenient.

@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 8, 2026

@line-o Now retesting against your reproducer...

Empirical verification on the function-documentation app's 9-branch
search-everywhere reproducer (line-o's PR eXist-db#6310 review request) showed
that 1621fba, while structurally correct, did not deliver the
end-to-end speedup line-o invited us to verify. Same-machine timings
on the published fundocs-2.2.0.xar (pre-PR-178 parens form):

  optimizer enabled, parens form (pre 1621fba): ~9 s
  optimizer enabled, parens form (1621fba):     ~9 s   <-- fix didn't help
  optimizer disabled, parens form:                  0.55 s
  optimizer enabled, manual rewrite:                0.04 s

The parens form is ~16x slower with the optimizer ON than OFF, and
1621fba's restructured recognition predicate fired (replaceAllSteps
ran, distributed AST built) but the rewritten form ran no faster than
the original.

Diagnosis: visitPathExpr called super.visitPathExpr first, which
descended into the wrapping PathExpr's children. visitLocationStep
wrapped each predicate-bearing branch step in an
ExtensionExpression(#exist:optimize#) pragma at its current parent --
the parens-PathExpr. The pragma's eval path uses preSelect against the
contextSequence reaching it; in the parens-context that contextSequence
is the entire descendant-or-self::node() set of the outer prefix
(every node in 76 documents for fundocs). preSelect can't optimize
that sequence and falls through to a generic node-by-node filter,
costing ~9s. Distribution after the wrap left the pragma in place but
moved its surrounding PathExpr -- the pragma was already configured
against the wrong context.

Fix: run Case B (union distribution) BEFORE super.visitPathExpr
descends. Distribution operates on raw LocationSteps. After
distribution, super descends through the new Union's branches, and
visitLocationStep wraps each branch's step at its branch-PathExpr
parent -- the same shape and runtime profile as the hand-written
outer//A | outer//B | outer//C ... rewrite. Case A (parenthesized
single LocationStep unwrap) still runs post-descent; it doesn't move
pragma wrappers across parent boundaries.

Also: clone simple (predicate-free) LocationSteps in distributeBranch
when copying outer prefix/suffix steps into each new branch.
LocationStep.analyze mutates per-instance state (rewrites //'s axis
from descendant-or-self to descendant, stores parent / unordered /
staticReturnType). Shared step instances across multiple branches
caused those fields to take whichever branch analyzed last, with
downstream index-dispatch consequences. Non-LocationStep expressions
(VariableReference, FunctionCall, ...) are still shared -- their
analyze paths don't carry destructive per-branch state.

Empirical post-fix on the same machine, fundocs-2.2.0.xar parens form:

  HTTP reproducer (curl -X POST .../fundocs/query):  0.38-0.64s wall
  xst direct query (eXist-internal time):            0.21-0.24s
  Manual rewrite for comparison (same eXist):        0.12s

40x speedup vs the 9s baseline; within 2x of the hand-written form
(remaining gap is one extra step-fusion the manual form benefits from
that distribution can't deliver without engine-level path fusion).

Verified: exist-core (UnionStepDistributionOptimizerTest 19,
OptimizerTest 7, XPathQueryTest 150, UnionTest 3), all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 8, 2026

[This response was co-authored with Claude Code. -Joe]

@line-o your "verify or falsify me" framing is what surfaced this — the previous attempt (1621fba) closed the structural recognition gap but did not deliver the runtime speedup, and standing on the unit test would have shipped that broken intermediate fix. Happy to flag the diagnostic gap below.

Confirmed your finding empirically — pre-fix is ~9 s, the previous attempt (1621fba) closed the structural recognition gap but did not deliver the runtime speedup. The fundocs query still ran at ~9 s on that commit even though replaceAllSteps was firing.

Diagnosis: in 1621fba, super.visitPathExpr descended first, and visitLocationStep wrapped each branch's predicated step in an ExtensionExpression(#exist:optimize#) pragma at its current parent — the parens-PathExpr. The pragma binds its preSelect+findAncestors path against that context. Distribution after the wrap moved the surrounding PathExpr but left the pragma configured against the original (huge, all-descendants) contextSequence. preSelect can't optimize that and falls through to a generic node-by-node filter.

Fix in 929d29c: run union distribution BEFORE super.visitPathExpr descends. Distribution operates on raw LocationSteps, super descends through the new Union's branches, and visitLocationStep wraps each branch's step at the branch-PathExpr parent — matching the runtime profile of the hand-written outer//A | outer//B | outer//C ... form.

Empirical, same machine, against published fundocs-2.2.0.xar (pre-PR-178 parens form):

variant optimizer wall time
pre-fix parens form enabled ~9 s
post-fix parens form (929d29c) enabled 0.21-0.24 s xst, 0.38-0.64 s curl
post-fix parens form disabled 0.55 s
post-fix manual rewrite (post-PR-178 shape) enabled 0.09 s

40x speedup vs the 9 s baseline; within ~2x of the hand-written form; no regression on the post-PR-178 manual rewrite shape.

Reproducer:

curl 'http://localhost:8080/exist/apps/fundocs/query' \
  -X 'POST' \
  -H 'Content-Type: application/x-www-form-urlencoded;charset=UTF-8' \
  --data 'q=test&where=everywhere&action=search'

Local Docker eXist on Java 21 / macOS Sequoia, fundocs-2.2.0.xar (pre-PR-178) installed via xst. 5-run median after one warmup; xst measurement is util:system-time inside the query, curl is wall-clock including HTTP.

Diagnostic gap to close on this PR. The fundocsShapeMixedBranches test added in 1621fba passes on BOTH the broken and the fixed version — it only asserts result-set equivalence, which holds either way. Standing on it alone would have shipped 1621fba's broken fix. Will land an AST-shape assertion on this PR before merge: each distributed branch's predicate-bearing LocationStep must be wrapped at the branch-PathExpr parent (not the original outer parens-PathExpr). That's a structural invariant stable for CI and would have caught 1621fba as broken at unit-test time.

CI rerunning.

The existing fundocsShapeMixedBranches test asserts only result-set
equivalence, which holds even on a broken intermediate fix that left
the rewritten query as slow as the original (commit 1621fba).
line-o caught that diagnostic gap on the PR review thread; this
commit closes it.

Add parensFormDistributesToUnionShape that walks the post-optimize
AST and asserts:
  - A Union sits in step position (distribution fired).
  - The Union has exactly the expected number of branches.
  - Each branch references the original union arm's element name
    (regression where distribution drops or duplicates branches).
  - Each branch carries the outer prefix steps (regression where
    distributeBranch stops copying outer prefix into each branch).

The test catches the "did distribution happen at all and produce a
well-formed Union" class of regressions. It does NOT catch the
specific 1621fba bug structurally -- that bug produced an
identical post-optimize tree but with the visitLocationStep wrap
configured against the parens-context before distribution moved it,
a runtime-only difference. Catching that would require a perf-bound
assertion or visitor-call-order instrumentation, both more brittle
than this shape check. The shape check is the appropriate CI-stable
guard for the broader regression class; the runtime difference is
covered by the 5-run wall-time measurement on the function-documentation
reproducer captured on PR eXist-db#6310.

Also fix a related issue surfaced while developing the test: in
distributeBranch, when copying outer prefix LocationSteps into each
distributed branch, the LocationStep's parent field still pointed at
the pre-distribution branch PathExpr (or was null for cloned prefix
steps). Optimizer.visitLocationStep reads
locationStep.getParentExpression() to find the RewritableExpression
to call replace() on when wrapping a predicated step in
(#exist:optimize#). Without the parent update, the wrap was inserted
into a dead branch and the new branches ended up unwrapped -- a 5x
perf miss observed against the manually-rewritten form. New helper
addStepWithParent updates the parent pointer when adding a
LocationStep to a distributed branch so the post-distribution
super-descent can wrap branch steps at the correct parent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread exist-core/src/test/java/org/exist/xquery/UnionStepDistributionOptimizerTest.java Outdated
…nts to docblocks

Per @line-o review on PR eXist-db#6310: convert the leading inline comments on
each @test method into Javadoc docblocks describing the test case.

No behaviour change; comments-only edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@@ -441,14 +471,17 @@ static String dump(final Expression e) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address Codacy issue: Unnecessary use of fully qualified name 'org.exist.xquery.util.ExpressionDumper.dump' due to existing same package import 'org.exist.xquery.*'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This response was co-authored with Claude Code. -Joe]

Done in c5c9d34cdf — added import org.exist.xquery.util.ExpressionDumper; and replaced the fully-qualified call with ExpressionDumper.dump(e).

Replace fully-qualified org.exist.xquery.util.ExpressionDumper.dump
in UnionStepDistributionOptimizerTest.java line 470 with an explicit
import per reinhapa's review comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@line-o line-o self-requested a review May 10, 2026 07:59
Copy link
Copy Markdown
Member

@line-o line-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still somewhat excessive commentary but the fix is more valuable and we can shorten and remove comments later on when they get on our nerves. At the moment it does help to understand why this was changed.

@line-o line-o requested a review from reinhapa May 10, 2026 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants