# 🎓 Lesson 18: Modular & Reusable Scraping Functions

🎯 Goal

In this lesson, you'll learn how to:

- Turn scraping code into **reusable Python functions**

- Separate logic into clean modules (fetch, parse, save)

- Make your scrapers more readable, maintainable, and testable

## ✅ Why Modular Code Matters

| Benefit            | Example                                            |
| ------------------ | -------------------------------------------------- |
| 🔁 Reusability     | Scrape any tag, URL, or category with one function |
| 🔧 Maintainability | Change one part without breaking the rest          |
| 🧪 Testability     | Easily test parts of the scraper individually      |
| 🔍 Readability     | Cleaner and easier to debug                        |


## ✅ Example: Scraping Quotes by Tag (Modularized)

In [None]:
import requests
from bs4 import BeautifulSoup
import time
import random


def get_soup(url, headers=None, delay=True):
    """Send a GET request and return parsed soup."""
    if delay:
        sleep_time = round(random.uniform(1.5, 3.0), 2)
        time.sleep(sleep_time)

    response = requests.get(url, headers=headers or get_default_headers())
    return BeautifulSoup(response.text, "lxml")


def get_default_headers():
    """Return a browser-like headers dictionary."""
    return {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      "AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/113.0.0.0 Safari/537.36"
    }


def extract_quotes(soup):
    """Extract all quotes from a page soup."""
    quotes_data = []
    quote_blocks = soup.select("div.quote")

    for quote in quote_blocks:
        text = quote.select_one("span.text").text.strip()
        author = quote.select_one("small.author").text.strip()
        quotes_data.append({"text": text, "author": author})
    
    return quotes_data


def scrape_quotes_by_tag(tag, pages=1):
    """Scrape quotes for a given tag and number of pages."""
    all_quotes = []
    base_url = f"https://quotes.toscrape.com/tag/{tag}/page/{{}}/"

    for page_num in range(1, pages + 1):
        url = base_url.format(page_num)
        print(f"🔎 Scraping {url} ...")
        soup = get_soup(url)
        quotes = extract_quotes(soup)

        if not quotes:
            print("⚠️ No more quotes found.")
            break
        
        all_quotes.extend(quotes)

    return all_quotes

### ✅ Example Usage

In [None]:
if __name__ == "__main__":
    tag = input("Enter a quote tag to scrape: ").strip()
    quotes = scrape_quotes_by_tag(tag, pages=3)

    for q in quotes:
        print(f"📝 {q['text']} — {q['author']}")

## 💡 Bonus Tip: Use .env Files or Config Modules

Keep base URLs, delays, and headers in a `config.py` or `.env` for easier project management.

## Practice Tasks

1. Move each function to a separate file/module (e.g., `utils.py`, `scraper.py`)

2. Add a `save_to_csv(data, filename)` function

3. Make the tag scraper work with Selenium instead of requests

4. Add error handling to `get_soup()` if the request fails

## 🔜 Next up: Lesson  19 – Logging, Debugging, and Testing Your Scraper

Learn how to add logs, handle exceptions smartly, and structure your code for testing.