-
Notifications
You must be signed in to change notification settings - Fork 0
/
XPath.html
39 lines (38 loc) · 2.19 KB
/
XPath.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<!DOCTYPE html>
<html>
<head>
<title>XPath.md</title>
<link rel="stylesheet" href="OmegaTech.css">
</head>
<body>
<h1 id="xpath">XPath</h1>
<p>XPath is designed to make working with HTML/XML trees<br>convenient.<br>It is terse, powerful language for working with these trees.<br>It allows us to specify patterns for identifying nodes in trees.<br>It is analogous to regular expressions, but for trees.<br>(Some people use regular expressions for working with HTML/XML trees. Don't!)</p>
<p>XPath is a domain specific language (DSL) or sub-language that<br>is available in R, Python, etc.<br>We specify our pattern/query within a string and R never looks at it.<br>Another engine applies it to the particular tree.</p>
<p>As the name suggests, an XPath pattern/query<br>specifies a path. </p>
<ul>
<li>The path starts from a node (by default the topmost or root of the tree).</li>
<li>A path is a sequence of steps, separated by a /.</li>
</ul>
<p>XPath queries are like navigating a file system, but much more expressive<br>and succinct.</p>
<p>Each step in an XPath query has 3 elements, with one being optional</p>
<ul>
<li>axis/direction</li>
<li>node test (name or type)</li>
<li>condition/predicate</li>
</ul>
<p>An XPath </p>
<p> Path - sequence of steps<br> Each step: direction/axis, node test, optional condition</p>
<p> Long-hand<br> axis::test[condition]<br> e.g.<br> descendant-or-self::table</p>
<p> Short hand<br> //table<br> // means descendant-or-self</p>
<p> @attr - attribute::attr</p>
<pre><code><div class="highlight"><pre><span class="p">.</span> <span class="o">-</span> <span class="nx">current</span> <span class="nx">node</span>
<span class="p">..</span> <span class="o">-</span> <span class="nx">parent</span> <span class="nx">node</span>
</pre></div>
</code></pre><p> All <td> nodes whose contents contain the character $<br> //td[ contains(., '$') ]</p>
<h1 id="useful-references">Useful References</h1>
<ul>
<li><a href="https://www.w3schools.com/xml/xpath_intro.asp">https://www.w3schools.com/xml/xpath_intro.asp</a></li>
<li><a href="https://www.w3schools.com/xml/xpath_axes.asp">https://www.w3schools.com/xml/xpath_axes.asp</a></li>
</ul>
</body>
</html>