Frequently asked questions

Benito van der Zander edited this page Nov 25, 2017 · 5 revisions

Getting the last node

Q: How do I get the last node? //foo//bar returns all bars, but I only want the last one, and //foo//bar[last()] did not work.

<div>
  <foo>
    <bar>First </bar>
    <bar>Second </bar>
  </foo>
  <foo>
    <bar>Third </bar>
    <bar>Fourth </bar>
  </foo>
</div>

A: //foo//bar[last()] would return the last bar of its parent, in the example Second and Fourth

You need (//foo//bar)[last()] to get the last of those.

Getting nodes with an attribute

Q: I want to extract the title attribute from links whose href contains the string "contentFile.aspx".

This command returns the href, but I do not know how to get the Title contents instead.

xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a/@href[contains(., "contentFile.aspx")]'

A: You can go back from the @href to the corresponding a:

xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a/@href[contains(., "contentFile.aspx")]/../@title'

Or you can put the condition on the a:

xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a[@href[contains(., "contentFile.aspx")]]/@title'

or

xidel http://www.coorong.sa.gov.au/page.aspx?u=1813 --xquery '//a[contains(@href, "contentFile.aspx")]/@title'

Getting nodes containing text

Q: How do you find tags which include a certain text?

A: You can use contains or matches on these nodes. E.g.

xidel input.html -e '//*[contains(., "searched text")]'

finds all nodes containing text as well as their ancestors, because a node containing a node containing text contains the text, too.

To find the nodes without ancestors, you can check only the direct text of the nodes:

xidel input.html -e '//*[text()[contains(., "searched text")]]'

This is also much faster, however texts that span multiple nodes are not found, e.g. in <span>foo<b>bar</b></span> either foo or bar can be found with text(), but not foobar.

When "searched text" is a regular expression, you can use matches in place of contains.

Replacing empty/null nodes

Q: How to return a default value, if the input is empty?

A: For inputs that have at most one value use:

(input, "default value")[1]

[1] returns the first value of a sequence, so it will return input if input exists. If input is empty, the sequence becomes ("default value")[1], so it will return "default value".

Deletion of nodes

Q: How do I delete the div from

<div>
    <span>I want to keep this</span>
    <div class="I_want_to_delete_this">
        <span>blah< blah/span>
    </div>
    <span>I want to keep this too</span>
</div>

to get something like

<div>
    <span>I want to keep this</span>
    <span>I want to keep this too</span>
</div>

?

A: All data is immutable, so you cannot delete something from a document, but you can create a new document without these nodes.

For example using the transform function

 xidel xx.xml --xml -e 'let $delete := //div[@class="I_want_to_delete_this"] return transform(/, function($e){ if ($delete[$e is .]) then () else $e})' 

or

xidel xx.xml --xml -e 'let $delete := //div[@class="I_want_to_delete_this"] return transform(/, function($e){ $e[not($delete[$e is .])]})' 

Using Xidel in a shell pipeline | xidel

Q: Is there any way of processing output from another script in xidel, i.e. is there any option to tell xidel to grab the content like this: grep foobar test.html | xidel ...

A: If you give it a dash - as file name it reads the pipe input.

 grep foobar test.html | xidel - ...

Caveats

Also look here for things to avoid: https://github.com/benibela/xidel/wiki/Caveats

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.