# Beautiful Soup

Beautiful Soup is a Python library used for web scraping purposes. It provides a convenient way to extract data from HTML and XML documents. Beautiful Soup helps parse and navigate the document structure, making it easy to extract the desired information. Here's an explanation of Beautiful Soup along with some examples:

``` pip install beautifulsoup4 ```

Parsing HTML:
Beautiful Soup can parse HTML documents and create a parse tree, which allows you to navigate and search for specific elements. Here's an example:

In this example, we import the BeautifulSoup class from the bs4 module. We create a Beautiful Soup object by passing the HTML document and the desired parser (html.parser in this case).

We can then access specific elements using dot notation (soup.tag_name) or by navigating the document structure. We extract the title, heading, paragraph, and link from the HTML document and print their text content.


In [1]:

from bs4 import BeautifulSoup

# HTML document to be parsed
html_doc = '''
<html>
<head>
    <title>Sample HTML Document</title>
</head>
<body>
    <h1>Welcome to Beautiful Soup</h1>
    <div class="content">
        <p>This is a sample paragraph.</p>
        <a href="https://www.example.com">Visit Example</a>
    </div>
</body>
</html>
'''

# Create a Beautiful Soup object
soup = BeautifulSoup(html_doc, 'html.parser')

# Accessing elements
title = soup.title
h1 = soup.h1
paragraph = soup.div.p
link = soup.div.a

# Print the element text
print('Title:', title.text)
print('Heading:', h1.text)
print('Paragraph:', paragraph.text)
print('Link:', link['href'])


Title: Sample HTML Document
Heading: Welcome to Beautiful Soup
Paragraph: This is a sample paragraph.
Link: https://www.example.com


Searching for Elements:
Beautiful Soup provides various methods for searching and filtering elements based on different criteria. Here's an example:

In this example, we use the select_one() method to find the first occurrence of an element based on a CSS selector (h1 in this case). We extract the text content of the title.

We also use the select() method to find all occurrences of an element based on a CSS selector (.book in this case). We iterate over the results and print the text content of each book.

In [2]:
from bs4 import BeautifulSoup

# HTML document to be parsed
html_doc = '''
<html>
<body>
    <div id="content">
        <h1>Books</h1>
        <ul>
            <li class="book">Book 1</li>
            <li class="book">Book 2</li>
            <li class="book">Book 3</li>
        </ul>
    </div>
</body>
</html>
'''

# Create a Beautiful Soup object
soup = BeautifulSoup(html_doc, 'html.parser')

# Find elements using CSS selectors
title = soup.select_one('h1').text
books = soup.select('.book')

# Print the results
print('Title:', title)
print('Books:')
for book in books:
    print('-', book.text)


Title: Books
Books:
- Book 1
- Book 2
- Book 3


These are just a few examples of what you can do with Beautiful Soup. It offers many more features and methods for navigating and manipulating HTML and XML documents. You can refer to the Beautiful Soup documentation for more details and explore its capabilities further.