## Section 8: Selectolax And Advanced CSS Selectors

In [61]:
# !pip install selectolax

In [62]:
import requests
from selectolax.lexbor import LexborHTMLParser
from pprint import pprint

In [63]:
url = "https://books.toscrape.com/"
resp = requests.get(url)
tree = LexborHTMLParser(resp.text)

In [64]:
node = tree.css("div")[0]

In [65]:
type(node)

selectolax.lexbor.LexborNode

In [66]:
node.attributes

{'class': 'page_inner'}

In [67]:
node.text()

'\n            \n                Books to Scrape We love being scraped!\n\n\n                \n            \n        '

In [68]:
node.text(deep=False)

'\n            \n        '

In [69]:
pprint(node.html)

('<div class="page_inner">\n'
 '            <div class="row">\n'
 '                <div class="col-sm-8 h1"><a href="index.html">Books to '
 'Scrape</a><small> We love being scraped!</small>\n'
 '</div>\n'
 '\n'
 '                \n'
 '            </div>\n'
 '        </div>')


### CSS Combinators

- `space` (space combinator): Selects all `<div>` elements that are descendants of an element with the class `container`.  

- `>` (child combinator): Selects all `<li>` elements that are direct children of the `<ul>` element.  

- `+` (adjacent sibling combinator): Selects the `<div>` element with the class `sibling` that is immediately preceded by a `<div>` element with the class `parent`.  

- ` ` (descendant combinator): Selects the `<div>` element with the class `descendant` that is a descendant of an element with the class `ancestor`.

In [70]:
with open("test.html") as f:
    html = f.read()
tree = LexborHTMLParser(html)

In [72]:
divs_in_container = tree.css(".container div")
lis_in_ul = tree.css("ul > li")
sibling_of_parent = tree.css(".parent + .sibling")
descendant_of_ancestor = tree.css(".ancestor .descendant")

print([node.text() for node in divs_in_container])
print([node.text() for node in lis_in_ul])
print([node.text() for node in sibling_of_parent])
print([node.text() for node in descendant_of_ancestor])

['Element 1', 'Element 2']
['Item 1', 'Item 2']
['Sibling Element']
['Descendant Element']


### Types of selectors

1. **Simple Selectors**: These selectors target elements based on a single condition or criterion, such as tag name, class, ID, or attribute.

   ```python
   # Simple selector: Tag name
   doc.css('div')

   # Simple selector: Class
   doc.css('.container')

   # Simple selector: ID
   doc.css('#my-element')

   # Simple selector: Attribute
   doc.css('a[href]')
   ```

2. **Compound Selectors**: These selectors combine multiple simple selectors together. They target elements that satisfy all the specified conditions.

   ```python
   # Compound selector: Tag name and class
   doc.css('div.container')

   # Compound selector: Tag name and attribute
   doc.css('a[href].external')

   # Compound selector: Class and attribute
   doc.css('.container.highlight')
   ```

3. **Complex Selectors**: These selectors target elements based on their relationships with other elements, such as parent-child, sibling, or ancestor-descendant relationships.

   ```python
   # Complex selector: Descendant
   doc.css('div span')

   # Complex selector: Child
   doc.css('ul > li')

   # Complex selector: Adjacent sibling
   doc.css('.parent + .sibling')

   # Complex selector: Ancestor-descendant
   doc.css('.ancestor .descendant')
   ```

4. **Selector Lists**: Selector lists are collections of multiple selectors separated by commas. They target elements that match any of the individual selectors within the list.

   ```python
   # Selector list: Tag names
   doc.css('h1, h2, h3')

   # Selector list: Classes
   doc.css('.primary, .secondary')

   # Selector list: Multiple conditions
   doc.css('input[type="text"], input[type="email"]')
   ```