XPath predicate with multi-item subexpression raises XPTY0004 instead of using effective boolean value
Summary
An XPath predicate that selects a sequence of nodes (e.g. //b[c] where the
context node has multiple c children) raises XPTY0004 instead of treating
the predicate as a boolean test via effective boolean value (EBV).
This breaks common path expressions such as //lambda[body/block/expression_statement]
whenever the inner path can match more than one node under any context.
Reproduction
<root>
<a><b><c>1</c></b></a>
<a><b><c>2</c><c>3</c></b></a>
</root>
XPath expression:
- Expected: both
<b> elements are returned (the predicate is a node
sequence, so EBV applies — non-empty is true).
- Actual: evaluation fails with
XPTY0004 as soon as any matched <b>
has more than one <c> child.
For contrast, //b[exists(c)] works correctly on the same input.
Root cause
The failure originates in pop_is_numeric in
xee-interpreter/src/interpreter/interpret.rs (around line 1046):
fn pop_is_numeric(&mut self) -> error::Result<bool> {
let value = self.state.pop()?;
let a = value.atomized_option(self.state.xot())?;
if let Some(a) = a {
Ok(a.is_numeric())
} else {
Ok(false)
}
}
This helper is used by the IsNumeric instruction to decide whether a
predicate value should be interpreted as a positional predicate (e.g.
[1]) or a boolean predicate (via EBV).
Value::atomized_option returns an error (XPTY0004) whenever the sequence
contains more than one item. For a numeric-predicate probe this is the wrong
shape: a multi-item sequence is simply not a numeric predicate value, and
the interpreter should fall through to the EBV path rather than propagate an
error.
The net effect is that any predicate whose subexpression can yield ≥ 2 items
fails, even though the spec says such a predicate should be evaluated by its
effective boolean value.
Suggested fix
Replace the single-item atomization with a bounded iteration: if there is
exactly one atomized item, test whether it is numeric; if there are zero or
more than one, return false so the caller uses EBV.
fn pop_is_numeric(&mut self) -> error::Result<bool> {
let value = self.state.pop()?;
let mut atomized = value.atomized(self.state.xot());
let Some(first) = atomized.next() else {
return Ok(false); // empty sequence is not numeric
};
if atomized.next().is_some() {
return Ok(false); // multi-item sequence is not a numeric predicate
}
Ok(first?.is_numeric())
}
One existing snapshot changes accordingly — (1, 2, 3)[(2, 3)] now fails
with FORG0006 (from EBV on a non-node multi-item sequence) instead of
XPTY0004:
xee-xpath/tests/snapshots/xpath__sequence_predicate_sequence_too_long.snap
- error: XPTY0004,
+ error: FORG0006,
which matches the XPath error expected when EBV is applied to a heterogeneous
non-node sequence.
Next steps
I'd like to open a PR with the fix above plus a regression test (//b[c] on
a document where some <b> has multiple <c> children).
XPath predicate with multi-item subexpression raises XPTY0004 instead of using effective boolean value
Summary
An XPath predicate that selects a sequence of nodes (e.g.
//b[c]where thecontext node has multiple
cchildren) raisesXPTY0004instead of treatingthe predicate as a boolean test via effective boolean value (EBV).
This breaks common path expressions such as
//lambda[body/block/expression_statement]whenever the inner path can match more than one node under any context.
Reproduction
XPath expression:
<b>elements are returned (the predicate is a nodesequence, so EBV applies — non-empty is
true).XPTY0004as soon as any matched<b>has more than one
<c>child.For contrast,
//b[exists(c)]works correctly on the same input.Root cause
The failure originates in
pop_is_numericinxee-interpreter/src/interpreter/interpret.rs(around line 1046):This helper is used by the
IsNumericinstruction to decide whether apredicate value should be interpreted as a positional predicate (e.g.
[1]) or a boolean predicate (via EBV).Value::atomized_optionreturns an error (XPTY0004) whenever the sequencecontains more than one item. For a numeric-predicate probe this is the wrong
shape: a multi-item sequence is simply not a numeric predicate value, and
the interpreter should fall through to the EBV path rather than propagate an
error.
The net effect is that any predicate whose subexpression can yield ≥ 2 items
fails, even though the spec says such a predicate should be evaluated by its
effective boolean value.
Suggested fix
Replace the single-item atomization with a bounded iteration: if there is
exactly one atomized item, test whether it is numeric; if there are zero or
more than one, return
falseso the caller uses EBV.One existing snapshot changes accordingly —
(1, 2, 3)[(2, 3)]now failswith
FORG0006(from EBV on a non-node multi-item sequence) instead ofXPTY0004:which matches the XPath error expected when EBV is applied to a heterogeneous
non-node sequence.
Next steps
I'd like to open a PR with the fix above plus a regression test (
//b[c]ona document where some
<b>has multiple<c>children).