Skip to content

Latest commit

 

History

History
57 lines (39 loc) · 1.45 KB

XPath.md

File metadata and controls

57 lines (39 loc) · 1.45 KB

XPath

XPath is designed to make working with HTML/XML trees convenient. It is terse, powerful language for working with these trees. It allows us to specify patterns for identifying nodes in trees. It is analogous to regular expressions, but for trees. (Some people use regular expressions for working with HTML/XML trees. Don't!)

XPath is a domain specific language (DSL) or sub-language that is available in R, Python, etc. We specify our pattern/query within a string and R never looks at it. Another engine applies it to the particular tree.

As the name suggests, an XPath pattern/query specifies a path.

  • The path starts from a node (by default the topmost or root of the tree).
  • A path is a sequence of steps, separated by a /.

XPath queries are like navigating a file system, but much more expressive and succinct.

Each step in an XPath query has 3 elements, with one being optional

  • axis/direction
  • node test (name or type)
  • condition/predicate

An XPath

Path - sequence of steps Each step: direction/axis, node test, optional condition

Long-hand axis::test[condition] e.g. descendant-or-self::table

Short hand //table // means descendant-or-self

@attr - attribute::attr

. - current node
.. - parent node

All nodes whose contents contain the character $ //td[ contains(., '$') ]

Useful References