XPath – text() should use normalization #21

FilipJirsak · 2017-01-09T14:07:56Z

Input document can have one element divided into multiple text nodes (text() method than returns multiple nodes). XPath matching than doesn't work - it matches text only with first text node. Text nodes should be normalized before matching XPath.
Input text is divided into multiple text nodes for example when it contains entities - for example < or >.

The text was updated successfully, but these errors were encountered:

reluxa · 2018-10-30T09:20:08Z

I guess it's belong to here: I was also trying to extract text() element form an xml document. the element looked liked the following

<element>TOOXYZ:sometext /TOOXYZ:otherText</element>

The above XML fragment is present many times in the xml document which I was trying to process. Interestingly the last occurrence of the XML could not get parsed correctly the text() has returned to nodes: "TOOXYZ:sometext /TOO" and XYZ:otherText

FilipJirsak · 2018-11-06T10:55:07Z

@reluxa This is correct, there can be multiple adjacent text nodes. You can call document.normalize() to settle text nodes.
This issue is about XPath matching - normalization should probably be done automatically before XPath matching because XPath expects normalized documents.

FilipJirsak added the bug label Jan 9, 2017

FilipJirsak self-assigned this Jan 9, 2017

FilipJirsak added this to the 2.1.0 milestone Jun 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XPath – text() should use normalization #21

XPath – text() should use normalization #21

FilipJirsak commented Jan 9, 2017

reluxa commented Oct 30, 2018

FilipJirsak commented Nov 6, 2018

XPath – text() should use normalization #21

XPath – text() should use normalization #21

Comments

FilipJirsak commented Jan 9, 2017

reluxa commented Oct 30, 2018

FilipJirsak commented Nov 6, 2018