# 🎓 Lesson 17: Bypassing Anti-Bot Mechanisms (Ethically)

🎯 Goal

In this lesson, you’ll learn how to:

- Understand how websites detect bots

- Avoid common scraping traps and blocks

- Bypass basic anti-bot techniques ethically

- Respect sites while staying under the radar

## How Do Websites Detect Bots?

Websites often use the following techniques:

| Technique               | What It Does                                         |
| ----------------------- | ---------------------------------------------------- |
| **Rate-limiting**       | Blocks users making too many requests quickly        |
| **Missing headers**     | No `User-Agent`, `Referer`, or browser fingerprint   |
| **Repetitive behavior** | Requests every second like a robot                   |
| **CAPTCHAs**            | Requires human interaction                           |
| **JavaScript checks**   | Detects if JS is disabled (bots usually skip JS)     |
| **Honeypots**           | Hidden fields or links — real users never touch them |


## ✅ Realistic & Polite Scraping Strategy

Here are techniques to avoid detection without violating site rules:

### 1. Use Realistic Headers

In [None]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/113.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://google.com"
}

### 2. Add Random Delays Between Requests

In [None]:
import time, random
delay = round(random.uniform(1.5, 4.0), 2)
time.sleep(delay)

### 3. Rotate User-Agents (Optional)

Use a list of real User-Agent strings and randomly select one for each session/request.

### 4. Use requests.Session() to Reuse Cookies

This mimics real browsing behavior and avoids creating new sessions on every request.

In [None]:
import requests
session = requests.Session()
session.get("https://example.com", headers=headers)

### 5. Respect `robots.txt`

Before scraping, check:

```bash
https://example.com/robots.txt
```

Don't scrape pages listed under `Disallow`: especially login, admin, or private areas.

### 6. Avoid Honeypots

If you see hidden input fields in forms (e.g., `style="display:none"`), don’t touch them.
Bots that fill in hidden fields often get auto-banned.

### 7. Simulate Human Interaction (with Selenium)

If you're using Selenium:

- Add delays between actions

- Scroll the page

- Click instead of directly navigating with .get()

Click a Button

Let's say you're on a page that loads quotes only when you click "Load more".

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

driver = webdriver.Chrome()
driver.get("https://quotes.toscrape.com/js/")

# Find and click a button (e.g., pagination, load more)
button = driver.find_element(By.CSS_SELECTOR, "a[href='/page/2/']")
button.click()

time.sleep(2)  # Wait for the next page to load

Scroll the Page Like a Human

In [None]:
# Scroll down by a fixed amount (e.g., 500 pixels)
driver.execute_script("window.scrollBy(0, 500);")
time.sleep(1.5)

# Scroll to bottom of the page
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)

You can combine scrolling with waiting to trigger lazy-loaded content (like infinite scroll).

Type in an Input Field (Slowly Like a Human)

In [None]:
from selenium.webdriver.common.keys import Keys
import random

input_box = driver.find_element(By.NAME, "username")

# Simulate typing each letter with a random delay
for char in "sasan":
    input_box.send_keys(char)
    time.sleep(random.uniform(0.1, 0.3))  # Random typing speed

# Hit ENTER
input_box.send_keys(Keys.RETURN)

Hover Over an Element (Optional)

In [None]:
from selenium.webdriver.common.action_chains import ActionChains

element = driver.find_element(By.CLASS_NAME, "quote")
ActionChains(driver).move_to_element(element).perform()

time.sleep(1)

Summary of Techniques

| Action     | Code Example                                    |
| ---------- | ----------------------------------------------- |
| **Click**  | `element.click()`                               |
| **Scroll** | `driver.execute_script("window.scrollTo(...)")` |
| **Type**   | `send_keys()` with delay                        |
| **Hover**  | `ActionChains(...).move_to_element()`           |


## ❌ What NOT to Do

- ❌ Do NOT scrape private content behind logins without permission

- ❌ Do NOT flood servers with dozens of requests per second

- ❌ Do NOT use scraping for malicious, illegal, or unauthorized commercial use

- ❌ Do NOT try to bypass advanced protections like Cloudflare without permission

## ✅ Bonus Tip: Use Your Name in Headers

For research or personal scraping projects, you can add a custom header:

In [None]:
headers["From"] = "sasan@example.com"

It shows transparency and good intent

## Practice Tasks

1. Update your scraper to include headers and random delays

2. Use requests.Session() to reuse cookies and headers

3. Use time.sleep(random.uniform(...)) and add a delay between each request

4. Always test your scraper slowly and watch network traffic

## 🔜 Next up: Lesson  18 – Modular & Reusable Scraping Functions

You’ll learn how to turn your scraping code into clean, testable, and reusable Python functions.