# Path Languages

> [Main Table of Contents](../README.md)

## In This Notebook
> Generic mini languages to navigate structured text e.g xml, DOM, html

- xpath vs css locators
- xpath: xml path language
- CSS locators


```html
# Example Structure
<html>
<head>
	<base href='http://example.com/' />
	<title>Example website</title>
</head>
<body>
	<div id='images'>
		<a href='image1.html'>Name: My image 1 <br /><img src='image1_thumb.jpg' alt='image1'/></a>
		<a href='image2.html'>Name: My image 2 <br /><img src='image2_thumb.jpg' alt='image2'/></a>
		<a href='image3.html'>Name: My image 3 <br /><img src='image3_thumb.jpg' alt='image3'/></a>
		<a href='image4.html'>Name: My image 4 <br /><img src='image4_thumb.jpg' alt='image4'/></a>

	</div>
	<div>
		<p class='p_styles class-1'>I am p1</p>
		<p id='p2'>
			I am p2
			<a href='image5.html'>Name: My image 5<br /><img src='image5_thumb.jpg' alt='image5'/></a>
		</p>
	</div>
</body>
</html> 
```

## xpath vs css locators

- xpath easier to locate specific child
- xpath easier to locate all elements with attribute (not attribute value)
- css easier to locate attribute with value

	Desc | xpath | css | example
	--- | --- | --- |  ---
	All children of type elem | /elem | ><br>(except first char) | /html/body/div<br><br>html > body > div
	All descendents of type elem | //elem | space<br>(except first char) | //div/span/p<br><br>div > span p
	Specific child element | elem[#] |  elem:nth-of-type(#) | /html/body//div/p[2]<br><br>html > body div > p:nth-of-type(2)
	All elements with attribute name | elem/@attr_name | elem::attr(attr_name) | a/@href<br><br>a::attr(href)
	All element with attribute name/value | elem[@attr_name="attr_val"] | elem#id_name<br>elem.class_name | //p[@id="p2"]<br><br>p#p2
	All text between tags of chilren |text() | ::text | //p[@id="p2"]/text()<br><br>p#p2::text()  
	All text between tags of all descendents |text() | ::text | //p[@id="p2"]//text()<br><br>p#p2 ::text()  


- Three different ways of selecting attribute values

	```python
	# Available Matches:
	# 1. <p class="class-1">
	# 2. <p class="class-1 class-2">
	# 3. <p class="class-12"> 
	```

	Language | Example | Matches
	--- | --- | ---
	xpath | p[@class="class-1"] | only 1
	xpath | p[contains(@class, "class-1")] | all 3
	css | p.class-1 | 1 and 2

## Xpath
	
- Notations

	Notation | Description
	--- | ---
	*| wild card to select _all_
	/elem | All children of type elem
	//elem | All descendants of type elem
	@attr_name | Select all Attributes
	elem[#]] | Select specific child (not 0-indexed)
	elem[fn_name()] | Use functions in selection
	elem[@attr_name="attr_val"] | Select by attribute/value


- Functions

	Function | Description
	--- | ---
	contains(@attr_name, 'str') | Match attributes that contain string expression in any way
	elem/text() | Select text in between tags

In [1]:
# Select @href attribute of all anchor tags that are children of all parent paragraphs with id='p2'
xpath = '//p[@id="p2"]/a/@href'

# Select second paragraph child
xpath = '/html/body/div[2]/p[2]'

## CSS Locators

- Notations

	Notation | Description
	--- | ---
	*| wild card to select _all_
	\> elem | All children of type elem
	space elem | All descendants of type elem
	.class_name | Select all elems with class_name
	\#id_name | Select all elems with id_name
	::text | Select text between tags

- Functions

	Function | Description
	--- | ---
	elem:nth-of-type(#) | Select specific child
	::attr(attr_name) | Select all elements with attr_name