# Web Scraping with Python and BeautifulSoup

In this notebook, we will explore web scraping using Python and the BeautifulSoup library.
BeautifulSoup allows you to parse HTML and XML documents, making it easy to extract useful data from web pages.

## Part 1 - Checking Python Version
Before we start, let's check that our Python version is compatible with BeautifulSoup (Python 3.10 recommended).

In [None]:
import sys
print(sys.version)

## Part 2 - Importing BeautifulSoup
BeautifulSoup is part of the `bs4` package. Let's import it.

In [None]:
from bs4 import BeautifulSoup

## Part 3 - Creating a BeautifulSoup Object
A BeautifulSoup object represents the entire HTML document. It allows us to navigate and search the document easily.

Let's create a sample HTML document and parse it using BeautifulSoup.

In [None]:
our_html_document = '''
<html><head><title>IoT Articles</title></head>
<body>
<p class='title'><b>2018 Trends: Best New IoT Device Ideas for Data Scientists and Engineers</b></p>
<p class='description'>Itâ€™s almost 2018 and IoT is on the cusp of an explosive expansion...</p>
</body></html>
'''

In [None]:
# Parse the HTML document
our_soup_object = BeautifulSoup(our_html_document, 'html.parser')
print(our_soup_object)

In [None]:
# Prettify the soup object for better readability
print(our_soup_object.prettify()[0:300])

## Part 4 - Tag Objects
Tag objects correspond to HTML/XML elements. You can access their name, attributes, and contents.

In [None]:
soup_object = BeautifulSoup('<h1 attribute_1="Heading Level 1">Future Trends for IoT in 2018</h1>', 'html.parser')
tag = soup_object.h1
type(tag)

In [None]:
print(tag)

In [None]:
tag.name

In [None]:
# Rename the tag
tag.name = 'heading1'
tag

In [None]:
tag.name

### Tag Attributes
You can access and modify tag attributes using dictionary-like syntax.

In [None]:
soup_object = BeautifulSoup('<h1 attribute_1="Heading Level 1">Future Trends for IoT in 2018</h1>', 'html.parser')
tag = soup_object.h1
tag

In [None]:
tag['attribute_1']

In [None]:
tag.attrs

In [None]:
# Add a new attribute
tag['attribute_2'] = 'Heading Level 1*'
tag.attrs

In [None]:
# Delete attributes
del tag['attribute_2']
del tag['attribute_1']
tag.attrs

## Part 5 - Navigating a Parse Tree
BeautifulSoup allows you to navigate through the HTML tree using tags. You can directly access elements like `head`, `title`, `body`, `p`, `a`, `li`, etc.

In [None]:
# Access common tags
our_soup_object.head
our_soup_object.title
our_soup_object.body.b
our_soup_object.body
our_soup_object.li
our_soup_object.a