# Selenium

## Why Selenium exists

#### What is the problem with requests ? 
* requests only downloads raw HTML sent by the server.
Yet, many modern websites:
* load data with JavaScript
* update content after page load
* require clicks, scrolls, forms

#### What Selenium does 

* opens a real browser (Chrome, Firefox…)
* executes JavaScript
* behaves like a human user

--> Selenium scrapes what you see and not just the html

### When to use which ? 

#### Use requests + BeautifulSoup when:
* data is in page source
* site is static
* no interaction needed

#### Use Selenium when:
* data appears after page load
* you must click / scroll / type
* site requires JS rendering
* pop-ups or cookies block content

### How to use it - Selenium's most important features

#### Browser automation
* Open URLs
* Navigate pages
* Execute JS

#### Element selection
```python
find_element(By.ID)
find_element(By.CLASS_NAME)
find_element(By.CSS_SELECTOR)
```

### User actions
* click
* send_keys (typing)
* submit forms

### Waiting mechanisms (CRUCIAL)
* wait for elements to exist
* wait for page updates
Without waits → flaky scripts.

instead of ```time.sleep()```, with Selenium, you can use : 
* ```implicitly_wait()``` for a simple global wait
* ```WebDriverWait``` to wait for a condition

### Handling obstacles
* alerts
* cookie banners
* pop-ups

### HTML vs Javascript

In [14]:
html = '''<h1>Hello</h1>
<p>This is a paragraph</p>
'''

JavaScript:
* is a programming language
* runs in the browser
* can modify HTML after page load

JS can:
* add elements
* remove elements
* change text
* fetch data from APIs

### Let's see together how it changes

Go to this [source](https://duckduckgo.com)

Inspect the website and go to the console. 

Type : 
```document.body.innerHTML.slice(0, 200)```

Now, let's use JavaScript to modify the html.

Type in the console : 
```document.body.innerHTML += "<h1 style='color:red'>Added by JavaScript</h1>"```

**This ```h1```didn't exist originally, and is not caught by requests**

If you type this in the console : 

```document.querySelector("h1").textContent```

#### To sum up : 

Requests works with html but cannot interact with it:

```requests.get(url).text``` gives the initial html

Selenium works with Javascript and can thus interact with the html:

```driver.page_source``` gives the modified html


| Concept          | HTML   | JavaScript |
| ---------------- | ------ | ---------- |
| Runs where       | Server | Browser    |
| Seen by requests | ✅      | ❌          |
| Seen by Selenium | ✅      | ✅          |
| Can change page  | ❌      | ✅          |


### Demo - How does it work concretely

In [15]:
# pip install selenium

In [17]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Open browser
driver = webdriver.Chrome()
driver.get("https://duckduckgo.com")

# Find search box and type query
search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("web scraping")
search_box.send_keys(Keys.RETURN)

# Wait for results to appear
wait = WebDriverWait(driver, 10)
results = wait.until(
    EC.presence_of_all_elements_located((By.CSS_SELECTOR, "h2 a"))
)

# Extract first 5 result titles
for r in results[:5]:
    print(r.text)

driver.quit()


Web scraping — Wikipédia
Comment faire du web scraping : Le guide complet pour débutants
Qu'est-ce que le web scraping ? Comment extraire légalement ... - Kinsta
Web scraping : définition, techniques et légalité en 2026
Comment fonctionne le web scraping ? Et pourquoi l'IA ... - ZDNet


* This works reliably
* No login
* No CAPTCHA
* Clearly demonstrates why Selenium is needed

In [18]:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()
driver.get("https://duckduckgo.com")
wait = WebDriverWait(driver, 5)

try:
    wait.until(EC.alert_is_present())
    driver.switch_to.alert.accept()
    print("Alert accepted.")
except TimeoutException:
    print("No alert appeared.")


No alert appeared.


## Finding the buttons you can interact with using Selenium

In [19]:
from selenium.webdriver.common.by import By

buttons = driver.find_elements(By.TAG_NAME, "button")
print("Buttons found:", len(buttons))
for b in buttons[:15]:
    txt = b.text.strip()
    if txt:
        print("-", txt)


Buttons found: 34
- Menu
- Personnaliser
- Protection
Échappez aux escroqueries et aux entreprises avides de données
- Confidentialité
Bloque la plupart des publicités et des fenêtres contextuelles de cookies
- Tranquillité d'esprit
Effectuez des recherches et chattez sans vous faire suivre
- Chrome
- Edge
- Safari
- Firefox


# Let's interact with the button Personnaliser

In [20]:
buttons = driver.find_elements(By.TAG_NAME, "button")

for b in buttons:
    if b.text.strip() == "Personnaliser":
        b.click()
        break

In [21]:
driver.quit()