# Module 10: Introduction to Web Scraping with Python

## Using requests and BeautifulSoup

Requests is a Python module that you can use to send all kinds of HTTP requests. BeautifulSoup is a Python library for pulling data out of HTML and XML files.

In [1]:
import requests
from bs4 import BeautifulSoup

# Use requests to get the content of the webpage
# Note: In this example, I'm using a freely available website for practicing web scraping.
# Make sure to respect the robots.txt file and terms of service of the website you're scraping.

url = "http://books.toscrape.com/"
response = requests.get(url)

# Use BeautifulSoup to parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')

## Scraping and parsing simple websites

Once you have the HTML content of the page, you can use BeautifulSoup to find specific elements.


In [2]:
# Get the title of the webpage
title = soup.title.string
print("Title of the webpage:", title)

# Get the first h3 tag on the page
first_h3 = soup.find('h3')
print("First h3 tag on the page:", first_h3)

# Get the text of the first h3 tag on the page
first_h3_text = first_h3.text
print("Text of the first h3 tag on the page:", first_h3_text)

Title of the webpage: 
    All products | Books to Scrape - Sandbox

First h3 tag on the page: <h3><a href="catalogue/a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
Text of the first h3 tag on the page: A Light in the ...
