[fix] Rewrite search queries to avoid parenthesised-step slow path by joewiz · Pull Request #178 · eXist-db/function-documentation

joewiz · 2026-05-06T13:40:48Z

Summary

Each search function in modules/app.xqm was using the shape $app:data//(branch1 | branch2 | ...) -- a parenthesised union step expression. At runtime this defeats the structural-index fast path: the engine materialises the full descendant axis under each $app:data root and applies the union as a generic step, instead of dispatching each branch by qname through the structural index.

This PR rewrites each function to the equivalent split form $app:data//branch1 | $app:data//branch2 | ..., where each branch is an independent path with its own structural-index lookup.

What changed

5 functions in src/main/xar-resources/modules/app.xqm:

search-in-module-location -- single-branch parenthesised step, parens dropped
search-in-module-name -- single-branch parenthesised step, parens dropped
search-in-description -- 2-branch union, distributed over $app:data//
search-in-signature -- 2-branch union, distributed
search-everywhere -- 9-branch union, distributed

Plus a comment block explaining why the rewrite matters and pointing to the upstream optimiser PR.

Why both forms are equivalent

XPath's | is a set union with document-order sort and duplicate elimination. For paths P and predicate-paths A, B:

$P//(A | B)   ≡   $P//A | $P//B

The right-hand form evaluates each path independently and unions the results; the left-hand form materialises the descendant axis once and applies the union per-node. Both produce the same node-set in document order.

Numbers

Synthetic xqdoc-shaped corpus (200 modules, 30 functions each = 6,000 functions, ngram-indexed on description/name/signature/param/return), measured against an embedded eXist running develop:

function shape	before (parens)	after (split)
`search-in-description` (2-branch)	~38 ms	~3 ms
`search-in-signature` (2-branch)	~34 ms	~3 ms
`search-everywhere` (9-branch)	~35 ms	~5 ms
`search-in-module-location` (1-branch parens)	~30 ms	~3 ms
`search-in-module-name` (1-branch parens)	~30 ms	~3 ms

The function-reference UI's keystroke-latency on large corpora drops correspondingly.

Related work

Companion PR upstream: eXist-db/exist#6303 -- adds an Optimizer pass that automatically unwraps the single-step parens shape //(name), so future code that accidentally uses parens around a single step gets the win for free. The union-of-steps distribution that this PR does by hand is left as an upstream follow-up because it requires more invasive AST rewriting (distributing the parent path over union branches needs either a PathExpr.replaceAllSteps-style API or rewriting the outer PathExpr at its parent).

Investigation thread: eXist-db/exist#6295 -- @line-o reported residual ngram performance issues in this app after #6300 merged. Diagnosis pinned the slow path to the parenthesised-step shape in this app's queries, not to ngram or the optimizer's predicate-rewriting. This PR fixes the app side; #6303 fixes the engine side as far as it can.

Test plan

Visual diff review of app.xqm: 5 functions rewritten, semantics-preserving
Cypress E2E (fundoc_spec.cy.js includes a search-everywhere case for "exist_home") -- run by maintainer / CI

[This PR was prepared with Claude Code. -Joe]

🤖 Generated with Claude Code

Each search function in app.xqm was using the shape `$app:data//(branch1 | branch2 | ...)` -- a parenthesised union step expression. At runtime this defeats the structural index fast path: the engine materialises the full descendant axis under each `$app:data` root, then applies the parenthesised expression as a generic step, instead of dispatching each branch by qname through the structural index. The split form `$app:data//branch1 | $app:data//branch2 | ...` is semantically identical (XPath's `|` is set union with document-order sort and dedup) but evaluates each branch as an independent path with its own structural-index lookup. On a synthetic xqdoc corpus (~6,000 functions) the full `search-everywhere` query goes from ~35ms (parenthesised form) down to ~5ms (split form). The function reference UI's keystroke latency on large corpora drops correspondingly. Two of the functions (`search-in-module-location`, `search-in-module-name`) had a single-branch parenthesised step that also hit the same slow path; they're rewritten by simply dropping the unnecessary parens. For the upstream optimiser-side companion fix (which addresses the single-step `//(name)` shape automatically), see eXist-db/exist#6303. The union-of-steps distribution that this commit performs by hand is left as an upstream follow-up because it requires more invasive AST rewriting than the parser/optimiser currently support. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

line-o

sigh yes, let's do this

duncdrum approved these changes May 6, 2026

View reviewed changes

duncdrum added this to v7.0.0 May 6, 2026

duncdrum added the enhancement label May 6, 2026

line-o requested review from a team May 6, 2026 23:01

line-o approved these changes May 6, 2026

View reviewed changes

line-o merged commit 2ba3170 into eXist-db:master May 6, 2026
2 checks passed

github-project-automation Bot moved this to Done in v7.0.0 May 6, 2026

joewiz deleted the fix/search-query-perf branch May 7, 2026 01:45

joewiz mentioned this pull request May 7, 2026

[optimize] Distribute outer path over union-step branches eXist-db/exist#6310

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] Rewrite search queries to avoid parenthesised-step slow path#178

[fix] Rewrite search queries to avoid parenthesised-step slow path#178
line-o merged 1 commit intoeXist-db:masterfrom
joewiz:fix/search-query-perf

joewiz commented May 6, 2026

Uh oh!

line-o left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

joewiz commented May 6, 2026

Summary

What changed

Why both forms are equivalent

Numbers

Related work

Test plan

Uh oh!

line-o left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants