# BeatifulSoup 

### BeautifulSoup is a Python library purposes to pull the data out of HTML and XML files.


#### BeautifulSoup and Web Fetching:

- BeautifulSoup is a library designed for parsing HTML and XML documents, not for fetching them from the web.
- It doesn't include the functionality to download web pages.
- That's why we need to use an additional library like requests to handle the HTTP requests to fetch the web page.

#### Parsing HTML and XML:

- Parsing HTML or XML content means analyzing a string of HTML or XML code to understand its structure and extract specific elements or data from it.
- This process involves breaking down the document into a tree of elements and attributes.
- The resulting tree can be easily navigated and manipulated to extract the desired information.


<center>
   <div align="center">
    <img src="https://th.bing.com/th/id/OIP.LA6AXbzvC0IQ_d2H8v3NCwAAAA?rs=1&pid=ImgDetMain" alt="Description" />
    </div>
</center>




In [None]:
#If you already have the HTML content (e.g., saved in a file or a string), you can use BeautifulSoup directly without requests
#![image.png](attachment:d85bf1de-1ed7-4793-a3f2-c0d27bd87061.png)
from bs4 import BeautifulSoup

# 1- HTML content as a string
html_content = '<html><head><title>Example</title></head><body></body></html>'

## 2- Or open a local file 
## Open and read the HTML file
# with open('example.html', 'r', encoding='utf-8') as file:
#     html_content = file.read()

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Extract data
title = soup.title.string
print(title)


# Requests 

### Requests is a Python library downloads content from the web.

- Is used to send HTTP requests to a specified URL and retrieve the content of the web page.
- It handles the process of connecting to the server, sending the request, and receiving the response.
- This includes retrieving the HTML content of the page, which can then be processed further.

<center>
   <div align="center">
    <img src="https://automation-help.com/wp-content/uploads/2021/01/get-requests.png.webp" alt="Description" />
    </div>
</center>



In [None]:
import requests

# Make your URL here
url = 'http://example.com'
response = requests.get(url)
html_content = response.content

print(html_content)

# BeautifulSoup & requests

### We combine bouth of them to scrap static Pages

To download a web page and then parse its content.

<center>
   <div align="center">
    <img src="https://stackabuse.s3.amazonaws.com/media/parsing-html-with-beautifulsoup-in-python-1.jpg" alt="Description" />
    </div>
</center>


In [None]:
import requests
from bs4 import BeautifulSoup

def scrap_using_beauty():
    # URL of the webpage you want to scrape
    url = "https://www.fedex.com/fedextrack/?trknbr=276581471468&trkqual=2460494000~276581471468~FX"
    
    
    # Send a GET request to the URL
    response = requests.get(url)
    
    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Parse the HTML content of the page
        soup = BeautifulSoup(response.content, 'html.parser')
    
        # Print the parsed HTML (or save it, analyze it, etc.)
        print(soup.prettify())
    else:
        print(f'Failed to retrieve the webpage. Status code: {response.status_code}')

scrap_using_beauty()

# EXERCISE

### for the hard workers  🥴

- firstfull be sure that the page allow the scraping, by : https//site.com/*robots.txt*
- robots.txt : 