# 🎓 Lesson 16: Scraping JavaScript-rendered Sites (Intro to Selenium)

🎯 Goal

In this lesson, you’ll learn how to:

- Understand why some content doesn’t appear with `requests + BeautifulSoup`

- Use Selenium to load and scrape JavaScript-rendered pages

- Simulate browser actions like click, wait, and scroll


## Problem Recap

Some websites (like many modern e-commerce or SPA apps) load content via JavaScript.
This means:

- `requests.get()` → ❌ Can’t see the data

- `Selenium` → ✅ Runs a full browser that executes JavaScript

## 💡 What Is Selenium WebDriver?

Selenium WebDriver is a tool that automates web browsers.
It lets your Python code control a browser like opening pages, clicking buttons, filling forms, and reading the page content.

Think of it as:

Your Python code + A real browser = Smart automation

## 💡 What Is ChromeDriver?

ChromeDriver is a bridge between:

- Your Python code (using Selenium)

- The real Chrome browser

It knows how to talk to Chrome in its own “language” (called the WebDriver protocol).
Without `ChromeDriver`, Selenium **can’t control Chrome**.

## So What’s the Relationship?
| Term             | Meaning                                                |
| ---------------- | ------------------------------------------------------ |
| **Selenium**     | The library you use in Python (`pip install selenium`) |
| **WebDriver**    | A generic API, Selenium uses it to talk to browsers   |
| **ChromeDriver** | A specific driver for **Google Chrome**                |
| **GeckoDriver**  | For **Firefox**                                        |
| **EdgeDriver**   | For **Microsoft Edge**                                 |



## ✅ Setup: Install Selenium + ChromeDriver

In [None]:
pip install selenium

Download ChromeDriver that matches your Chrome version

(Place it in your project folder or add to PATH)

### 1. Check your Chrome version

Open Chrome and go to:
```bash
chrome://settings/help
```
You’ll see something like:

```bash
Version 113.0.5672.126
```

### 2. Download ChromeDriver

Visit the official site:

👉 https://sites.google.com/chromium.org/driver/


- Choose the version **matching your Chrome**

- Download the `.zip` for your platform (Windows/Linux/macOS)

- Extract the file `chromedriver.exe`

### 3. Put `chromedriver` in a known location

You have 2 options:

| Option                | How                                                              |
| --------------------- | ---------------------------------------------------------------- |
| Add to system PATH | Place it in `C:\Windows\System32` or add its folder to your PATH |
| Local path usage   | Place `chromedriver.exe` in your project folder and use:         |

```python
webdriver.Chrome(executable_path="./chromedriver")
```

⚠️ Newer versions (Selenium 4.x) don’t require executable_path if it's in PATH.

### Alternative (Easier): Use webdriver-manager (Optional)

In [None]:
pip install webdriver-manager

And then:

In [None]:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

⚠️ Slower startup but no manual downloading ever needed.

## 💻 Practice Site:

📍 https://quotes.toscrape.com/js/

This page won’t show quotes using requests but Selenium can see them!

## ✅ Step-by-Step Example

In [None]:
from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Set up Selenium with Chrome
driver = webdriver.Chrome()  # or use ChromeDriverManager if installed

# Open the JavaScript-rendered version of the site
driver.get("https://quotes.toscrape.com/js/")

# Wait for the page to fully load
time.sleep(3)  # Give JavaScript time to render content

# Get the page source and parse it with BeautifulSoup
soup = BeautifulSoup(driver.page_source, "lxml")

# Find and print all quotes
quotes = soup.select("div.quote")
for quote in quotes:
    text = quote.select_one("span.text").text.strip()
    author = quote.select_one("small.author").text.strip()
    print(f"📝 {text} — {author}")

# Don't forget to close the browser
driver.quit()

## Explanation

| Step                 | What It Does                         |
| -------------------- | ------------------------------------ |
| `webdriver.Chrome()` | Launches a real Chrome browser       |
| `driver.get()`       | Opens the page like a human would    |
| `time.sleep(3)`      | Waits for JavaScript to load content |
| `driver.page_source` | Gets the fully rendered HTML         |
| `BeautifulSoup(...)` | Parses the HTML just like before     |



## ✅ Bonus: Headless Mode (No Browser Window)

In [None]:
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True  # Don’t show browser window
driver = webdriver.Chrome(options=options)

## Practice Tasks

1. Try scraping from `https://quotes.toscrape.com/js/` without `sleep()` see what happens.

2. Add scrolling or clicking in future lessons.

3. Use `headless=True` for silent scraping.

## 💡 Tip: Only use Selenium when necessary. It’s slower and heavier than requests best used for:

- JavaScript-only sites

- Simulated logins, scrolling, button clicking

- Pages that need waiting or user-like interaction

## 🔜 Next up: Lesson  17 – Bypassing Anti-Bot Mechanisms

Learn how sites detect scrapers, and how to avoid traps, blocks, and honeypots (ethically).