# 🎓 Lesson 10: Pagination and Multi-Page Scraping

🎯 Goal

In this lesson, you'll learn how to:

1. Identify pagination links in HTML
2. Scrape data from multiple pages
3. Automate following “Next” buttons
4. Combine results across pages

💻 Practice Site

📍 https://quotes.toscrape.com/page/1/

This site has classic pagination and is perfect for BeautifulSoup!

## ✅ Step-by-Step Pagination Scraper

In [None]:
import requests
from bs4 import BeautifulSoup

# 🗂️ Base URL with a placeholder for page numbers
base_url = "https://quotes.toscrape.com/page/{}/"

# 🔁 Loop through page numbers 1 to 5
for page_num in range(1, 6):  # You can adjust the range as needed
    print(f"📄 Scraping page {page_num}...")

    # 🔗 Format the URL with the current page number
    url = base_url.format(page_num)

    # 🌐 Send a GET request to the page
    response = requests.get(url)

    # 🍜 Parse the HTML content using BeautifulSoup and lxml parser
    soup = BeautifulSoup(response.text, "lxml")

    # 🔍 Select all quote blocks on the page
    quotes = soup.select("div.quote")

    # 🔁 Loop through each quote block and extract the quote and author
    for quote in quotes:
        text = quote.select_one("span.text").text.strip()       # 📜 Extract the quote text
        author = quote.select_one("small.author").text.strip()  # ✍ Extract the author's name
        print(f"📝 {text} — {author}")  # 📤 Print the formatted quote and author

    print("-" * 50)  # 📏 Separator line between pages

### 🔍 Explanation
| Concept                | Description                              |
| ---------------------- | ---------------------------------------- |
| `{}` in URL            | Page number placeholder                  |
| `range(1, 6)`          | Scrapes pages 1 to 5                     |
| `.select("div.quote")` | Selects each quote block                 |
| `.select_one(...)`     | Extracts text and author from each block |


## ✅ Detecting the “Next” Button (Optional)

To automate unknown number of pages, detect if a Next button exists:

In [None]:
page = 1  # Start from the first page

while True:
    # 🔗 Construct the URL for the current page
    url = f"https://quotes.toscrape.com/page/{page}/"

    # 🌐 Fetch the page
    response = requests.get(url)

    # 🍜 Parse the HTML content
    soup = BeautifulSoup(response.text, "lxml")

    # 🔍 Find all quote blocks on the current page
    quotes = soup.select("div.quote")

    # ⛔ If no quotes are found, stop the loop (end of pagination)
    if not quotes:
        break  # 🚫 No more content — exit the loop

    # ✅ Print the number of quotes found on this page
    print(f"📄 Page {page} - Quotes Found: {len(quotes)}")

    # ➕ Move to the next page
    page += 1


## Practice Tasks

1. Scrape all quotes across all available pages using a while loop.

2. Create a list of all unique authors.

3. Save all quotes to a `.csv` file (we'll formally cover this in Lesson 13!)

## 🔜 Next up: Lesson 11 – Forms and Query URLs

Learn how to send parameters via the URL (e.g., search filters), and scrape based on user input.