Skip to content

a[b] fails to match when <a> has multiple <b> children (XPTY0004) #147

@boukeversteegh

Description

@boukeversteegh

XPath predicate with multi-item subexpression raises XPTY0004 instead of using effective boolean value

Summary

An XPath predicate that selects a sequence of nodes (e.g. //b[c] where the
context node has multiple c children) raises XPTY0004 instead of treating
the predicate as a boolean test via effective boolean value (EBV).

This breaks common path expressions such as //lambda[body/block/expression_statement]
whenever the inner path can match more than one node under any context.

Reproduction

<root>
  <a><b><c>1</c></b></a>
  <a><b><c>2</c><c>3</c></b></a>
</root>

XPath expression:

//b[c]
  • Expected: both <b> elements are returned (the predicate is a node
    sequence, so EBV applies — non-empty is true).
  • Actual: evaluation fails with XPTY0004 as soon as any matched <b>
    has more than one <c> child.

For contrast, //b[exists(c)] works correctly on the same input.

Root cause

The failure originates in pop_is_numeric in
xee-interpreter/src/interpreter/interpret.rs (around line 1046):

fn pop_is_numeric(&mut self) -> error::Result<bool> {
    let value = self.state.pop()?;
    let a = value.atomized_option(self.state.xot())?;
    if let Some(a) = a {
        Ok(a.is_numeric())
    } else {
        Ok(false)
    }
}

This helper is used by the IsNumeric instruction to decide whether a
predicate value should be interpreted as a positional predicate (e.g.
[1]) or a boolean predicate (via EBV).

Value::atomized_option returns an error (XPTY0004) whenever the sequence
contains more than one item. For a numeric-predicate probe this is the wrong
shape: a multi-item sequence is simply not a numeric predicate value, and
the interpreter should fall through to the EBV path rather than propagate an
error.

The net effect is that any predicate whose subexpression can yield ≥ 2 items
fails, even though the spec says such a predicate should be evaluated by its
effective boolean value.

Suggested fix

Replace the single-item atomization with a bounded iteration: if there is
exactly one atomized item, test whether it is numeric; if there are zero or
more than one, return false so the caller uses EBV.

fn pop_is_numeric(&mut self) -> error::Result<bool> {
    let value = self.state.pop()?;
    let mut atomized = value.atomized(self.state.xot());
    let Some(first) = atomized.next() else {
        return Ok(false); // empty sequence is not numeric
    };
    if atomized.next().is_some() {
        return Ok(false); // multi-item sequence is not a numeric predicate
    }
    Ok(first?.is_numeric())
}

One existing snapshot changes accordingly — (1, 2, 3)[(2, 3)] now fails
with FORG0006 (from EBV on a non-node multi-item sequence) instead of
XPTY0004:

xee-xpath/tests/snapshots/xpath__sequence_predicate_sequence_too_long.snap
-    error: XPTY0004,
+    error: FORG0006,

which matches the XPath error expected when EBV is applied to a heterogeneous
non-node sequence.

Next steps

I'd like to open a PR with the fix above plus a regression test (//b[c] on
a document where some <b> has multiple <c> children).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions