# Selenium Basics in Docker

This notebook walks you through:
- Setting up Selenium inside Docker
- Mounting a volume to run local scripts
- Collecting information from web pages
- Scraping data from real websites
- Saving scraped data to your host machine

In [None]:
# Dockerfile for Selenium + Python + Chrome
FROM selenium/standalone-chrome-debug

USER root
RUN apt-get update && apt-get install -y python3-pip
RUN pip3 install selenium pandas beautifulsoup4

WORKDIR /home/seluser/scripts
VOLUME /home/seluser/scripts
CMD ["tail", "-f", "/dev/null"]

### Build and run the container with volume mounted:
Assumes you're in the folder containing your scraping scripts.

In [None]:
docker build -t selenium-lab .
docker run -it --rm \
  -v $(pwd)/scripts:/home/seluser/scripts \
  -p 4444:4444 -p 5900:5900 selenium-lab

### Python Script: Launch Chrome and navigate to a page

In [None]:
# file: open_google.py
from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=options)

driver.get('https://www.google.com')
print("Title:", driver.title)
driver.quit()

### Example: Scrape Python job listings from RemoteOK

In [None]:
# file: scrape_jobs.py
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd

driver = webdriver.Chrome()
driver.get('https://remoteok.com/remote-dev+python-jobs')

jobs = driver.find_elements(By.CSS_SELECTOR, 'tr.job')
data = []
for job in jobs:
    title = job.find_element(By.CSS_SELECTOR, 'td.position h2').text
    company = job.find_element(By.CSS_SELECTOR, 'td.company h3').text
    data.append({"title": title, "company": company})

driver.quit()
df = pd.DataFrame(data)
df.to_csv('python_jobs.csv', index=False)
print(df.head())

### More ideas to try:
- Log in to a test website and collect dashboard data
- Monitor prices on a product page
- Extract all links from a webpage
- Capture screenshots during browsing sessions

In [None]:
# Screenshot example
driver = webdriver.Chrome()
driver.get('https://example.com')
driver.save_screenshot('example.png')
driver.quit()