## Inspect and Understand the Web Page Structure

To extract data from the website, I first inspected the structure of the web page.

- I opened [https://books.toscrape.com/](https://books.toscrape.com/) in a browser.
- Then I right-clicked on a book and selected **"Inspect"** to view the HTML.
- I found that each book is inside an HTML element: `<article class="product_pod">`
- The **book title** is stored in: `<h3><a title="Book Title">...</a></h3>`
- The **price** is inside a `<p>` tag with class `price_color`.

This information helps me know exactly what tags and classes to target when extracting data using Python.


In [1]:
##Fetch and Parse the Web Page
import requests
from bs4 import BeautifulSoup

# Fetch the first page of the site
url = 'https://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

In [2]:
## Extract Data from the Web Page
# Find all book containers on the page
books = soup.find_all('article', class_='product_pod')

# Loop through each book and extract the title and price
for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text
    print(f'{title} — {price}')

A Light in the Attic — Â£51.77
Tipping the Velvet — Â£53.74
Soumission — Â£50.10
Sharp Objects — Â£47.82
Sapiens: A Brief History of Humankind — Â£54.23
The Requiem Red — Â£22.65
The Dirty Little Secrets of Getting Your Dream Job — Â£33.34
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull — Â£17.93
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics — Â£22.60
The Black Maria — Â£52.15
Starving Hearts (Triangular Trade Trilogy, #1) — Â£13.99
Shakespeare's Sonnets — Â£20.66
Set Me Free — Â£17.46
Scott Pilgrim's Precious Little Life (Scott Pilgrim #1) — Â£52.29
Rip it Up and Start Again — Â£35.02
Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991 — Â£57.25
Olio — Â£23.88
Mesaerion: The Best Science Fiction Stories 1800-1849 — Â£37.59
Libertarianism for Beginners — Â£51.33
It's Only the Himalayas — Â£45.17


In [3]:
## Handle Pagination
all_books = []
page = 1

while True:
    url = f'https://books.toscrape.com/catalogue/page-{page}.html'
    response = requests.get(url)

    # Stop the loop if page does not exist
    if response.status_code != 200:
        break

    soup = BeautifulSoup(response.text, 'html.parser')
    books = soup.find_all('article', class_='product_pod')

    if not books:
        break  # Stop if no books found

    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text
        all_books.append([title, price])

    page += 1

In [4]:
## Save the Extracted Data
import csv

# Save the list of books into a CSV file
with open('books.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Title', 'Price'])  # Write header
    writer.writerows(all_books)          # Write data

print("Data saved to books.csv")

Data saved to books.csv
