### **SELENIUM LIBRARY**
#### **What is Selenium ?**
- **Selenium** is an open-source tool for automating web browsers.
- Originally developed for **testing web applications.**
- Commonly used in **web scraping**, especially for **dynamic websites** with JavaScript content.
- Can perform **user-like actions:** clicking, scrolling, filling forms, etc.
- Useful when **static tools** like BeautifulSoup or Requests can't access rendered content.
- Works by using a **web driver** (e.g., Chrome, Edge) to control the browser.


#### **Key Features of Selenium**
- **Dynamic Content Handling**
    - Can scrape data from JavaScript-heavy websites.
    - Waits for elements to load before interaction or extraction.
- **Interaction Simulation**
    - Handles clicks, form submissions, dropdown selections, scrolling, etc.
    - Useful for extracting data hidden behind user actions.
- **Cross-Browser Support**
    - Works with Chrome, Firefox, Edge, Safari, and more.
- **Customizable Waits**
    - Supports **explicit** and **implicit waits** to ensure reliable interaction with web elements.


#### **Comparative Analysis**
| **Feature**                   | **Selenium**              | **BeautifulSoup** | **Requests** |
| ----------------------------- | ------------------------- | ----------------- | ------------ |
| **Handles JavaScript**        | ✅ Yes                     | ❌ No              | ❌ No         |
| **Ease of Use**               | Moderate                  | Easy              | Easier       |
| **Speed**                     | Slower (browser overhead) | Faster            | Fastest      |
| **Interaction with Elements** | ✅ Yes                     | ❌ No              | ❌ No         |


#### Advantages of Selenium 
- Works with **dynamic JavaScript websites**.
- Simulates real user actions (click, scroll, etc.).
- Supports **multiple browsers** (Chrome, Firefox, etc.).
- Works with various **programming languages**.
- Allows **waits** for slow-loading elements.
- Can **take screenshots** and debug.
- Open-source and well-supported.
- Integrates with testing and CI tools.
- Offers flexible element locators (XPath, CSS).
- Supports **headless mode** for faster execution.


#### Disadvantages of Selenium
- **Slower** than static scraping tools.
- **High resource usage** (CPU & memory).
- Requires **setup of browser drivers**.
- Not ideal for **simple/static pages**.
- Scripts can break on **site layout changes**.
- **Detectable** by some websites.
- Needs third-party tools for **captcha solving**.
- Harder to **scale** for large scraping tasks.
- Limited **mobile emulation** support.


### 

### **7.1 : Getting started with Selenium**

#### **Importing the selenium:**
- **Syntax:**
 ```python
  from selenium import webdriver
  ```

#### **`driver`** object
- The **`driver object`** acts like a remote control for your browser. It allows you to automate actions like:
    - Opening URLs
    - Clicking buttons
    - Filling forms
    - Extracting content (scraping)
    - Taking screenshots
    - Navigating pages

- **Sample code:**
```python
from selenium import webdriver
driver = webdriver.Edge()
```

- Behind the scenes:
    - **`webdriver.Chrome()`** is a **constructor** — it initializes the browser.
    - **`driver`** is the instance of **`Chrome`**  WebDriver class (a subclass of **`WebDriver`** ).
    - You use it to **call methods** (like `.get()`, `.click()`, etc.).

| Code                   | Meaning                                                  |
| --------------------     | ------------------------------------------------------ |
| **`webdriver.Chrome()`** | Creates an object of the **`Chrome`** WebDriver class.       |
| **`driver`**             | Is the **object reference** used to control the browser. |


- **Common Functions You Can Use on driver**

| Method                           | Description                         |
| -------------------------------- | ----------------------------------- |
| **driver.get(url)**              | Opens a webpage                     |
| **driver.quit()**                | Closes the entire browser           |
| **driver.close()**               | Closes the current tab only         |
| **driver.refresh()**             | Refreshes the page                  |
| **driver.back()**                | Goes back in browser history        |
| **driver.forward()**             | Goes forward in history             |
| **driver.maximize_window()**     | Maximizes the browser window        |
| **driver.minimize_window()**     | Minimizes the browser window        |
| **driver.set_window_size(w, h)** | Sets browser size                   |
| **driver.page_source**           | Gets entire HTML source of the page |
| **driver.title**                 | Gets the page title                 |
| **driver.current_url**           | Gets the current URL                |
| **driver.find_element(...)**     | Finds one element on the page       |
| **driver.find_elements(...)**    | Finds all matching elements         |


#### **Step-by-Step: Install WebDriver for Selenium**

##### **1. Choose the Browser**
Selenium supports the following major browsers:
- **Chrome** → needs `chromedriver`  
- **Edge** → needs `msedgedriver`  
- **Firefox** → needs `geckodriver`  

##### **2. Install WebDriver (Manual Method)**

##### **ChromeDriver (for Google Chrome)**
- Visit: [https://sites.google.com/chromium.org/driver/](https://sites.google.com/chromium.org/driver/)
- Match the driver version with your **Chrome browser version**  
  - Find version: `chrome://settings/help`
- Download the ZIP → Extract it  
- Place `chromedriver.exe` in a known folder (e.g., `D:\drivers\chromedriver.exe`)

##### **EdgeDriver (for Microsoft Edge)**

- Visit: [https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/](https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/)
- Download the version that matches your **Edge version**  
  - Find version: `edge://settings/help`
- Extract the ZIP  
- Place `msedgedriver.exe` in a known folder

##### **GeckoDriver (for Mozilla Firefox)**
- Visit: [https://github.com/mozilla/geckodriver/releases](https://github.com/mozilla/geckodriver/releases)
- Download the version for your OS (Windows, Linux, macOS)
- Extract and place `geckodriver.exe` in a known path


##### **3. Use in Selenium Python Code**
##### Example: Using Chrome WebDriver
```python
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# Path to your local ChromeDriver
service = Service(r"D:\drivers\chromedriver.exe")
driver = webdriver.Chrome(service=service)

driver.get("https://example.com")
```


#### **To Open Edge Browser**

In [3]:
from selenium import webdriver
from selenium.webdriver.edge.service import Service
import time

# Path to msedgedriver.exe
edge_driver_path = r"D:\WEB SCRAPING\S7.Selenium\edgedriver_win64\msedgedriver.exe"
service = Service(edge_driver_path)

# Launch Edge browser
driver = webdriver.Edge(service=service)

# Open a page
driver.get("https://www.google.com")
time.sleep(5)

# Close the browser
driver.quit()


#### **To Open Chrome Browser**

In [10]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://www.google.com")


In [1]:
# Try to execute some attributes and function

from selenium import webdriver
from selenium.webdriver.edge.service import Service
import time

# Path to msedgedriver.exe
edge_driver_path = r"D:\WEB SCRAPING\S7.Selenium\edgedriver_win64\msedgedriver.exe"
service = Service(edge_driver_path)

# Launch Edge browser
driver = webdriver.Edge(service=service)
# 
# Maximize the 
driver.maximize_window()
# Open a page
driver.get("https://www.google.com")

print(f"Title: { driver.title }")
print(f"Current Url: { driver.current_url }")
print(f"\nScreenshot Taken ! :{ driver.save_screenshot('db/1.google-ss.png') }")

# Gives you time to view the page (delay to see the browser open)
time.sleep(3)

# Close the browser
driver.quit()


Title: Google
Current Url: https://www.google.com/

Screenshot Taken ! :True


####  **7.2 : Locators (Locating Elements)**
- Selenium provides multiple locators are used with functions (methods) to find and interact with elements on a webpage. 
- Most commonly used functions for locating elements are:

##### **Functions Used by Locators**
| Function              | Description                                |
| --------------------- | ------------------------------------------ |
| **`find_element()`**  | Finds the **first matching** element.      |
| **`find_elements()`** | Finds **all matching** elements as a list. |

##### **Modern Syntax (Selenium 4+)**
- **`Syntax:`**
   ```python
   from selenium.webdriver.common.by import By
    ```

##### **Syntax for **`find_element()`** `&` **`find_elements()`****
| Purpose                  | Function                                          |
| ------------------------ | ------------------------------------------------- |
| Locate one element       | **`driver.find_element(By.<LOCATOR>, "value")`**  |
| Locate multiple elements | **`driver.find_elements(By.<LOCATOR>, "value")`** |


##### **`By` in Selenium: Acts Like a Locator Strategy Filter**
- `By` is **not a function** but a **class**in Selenium that acts as a **filter/strategy** to tell Selenium how to find elements on a webpage.
- Think of `By` as a way to `specify the type of selector` you’re using: T`By.ID`, `By.NAME`, `By.XPATH`, etc.


##### **Selenium Locator**
- Here is a complete list of Selenium Locators using the By class in Python (Selenium 4+), along with their purpose and syntax:
| Locator Method        | `By` Syntax            | Description                                       |
| --------------------- | ---------------------- | ------------------------------------------------- |
| **ID**                | `By.ID`                | Locates elements by the **ID attribute**          |
| **Name**              | `By.NAME`              | Locates elements by the **name attribute**        |
| **Class Name**        | `By.CLASS_NAME`        | Locates elements by a **single class name**       |
| **Tag Name**          | `By.TAG_NAME`          | Locates elements by their **HTML tag name**       |
| **Link Text**         | `By.LINK_TEXT`         | Locates `<a>` links with **exact visible text**   |
| **Partial Link Text** | `By.PARTIAL_LINK_TEXT` | Locates `<a>` links with **partial visible text** |
| **CSS Selector**      | `By.CSS_SELECTOR`      | Locates using **CSS rules** (like in stylesheets) |
| **XPath**             | `By.XPATH`             | Locates using **XPath expressions**               |




#### **SelectorsHub – XPath & CSS Selector Plugin (Free)**
- **SelectorsHub** is a free browser extension that helps you **auto-generate, write, and verify:**
    - XPath (axes-based, relative, absolute, index-based)
    - CSS Selectors
    - jQuery, JS Path, and Playwright selectors

- **Key Features:**
    - Auto-generates all possible selectors for inspected elements
    - Supports **Shadow DOM, iframes, SVG,** and **dynamic elements**
    - Suggests automation exceptions and validates selectors
    - Verifies multiple selectors at once
    - Built-in error handling for XPath/CSS issues

![image.png](attachment:606da9e4-5a29-48b7-b736-c00f2aea9cce.png)

### **7.3 :Understanding XPath:**
### XPath:
- **XPath** (short for XML Path Language) is a **query language** used to **navigate through elements and attributes in XML or HTML documents.**

- In Selenium, **XPath is used to locate elements** on a web page — especially when elements **don't have unique IDs or class names.**

- **In simple words:**  
    > HTML is like the full map of a webpage.

    > XPath is like giving directions to find a specific place (element) on that map.

---

##### **Why XPath is Useful in Selenium?**

| ✅ Feature               | 🔎 What It Means                                                               |
| ----------------------- | ------------------------------------------------------------------------------ |
| **Powerful Navigation** | Can find elements anywhere in the DOM, even deeply nested or with no ID/class. |
| **Flexible Conditions** | Use conditions like `@attribute='value'`, `contains()`, `starts-with()`, etc.  |
| **Supports Axes**       | Navigate using parent, child, sibling, ancestor, etc.                          |
| **Precise Targeting**   | Locate exact elements when CSS/ID-based locators fail or are not stable.       |
| **Indexing Support**    | Allows selecting elements by position: `(//div[@class='row'])[2]`              |



---




- **XPath rules and best practices every Selenium user should know:** 

| Rule                                            | Description                                   | Example                                    |
| ----------------------------------------------- | --------------------------------------------- | ------------------------------------------ |
| **1. Use `//` to search anywhere**              | Selects nodes **anywhere** in the document    | `//input` — selects all `<input>` elements |
| **2. Use `/` for direct child**                 | Selects the **immediate child**               | `/html/body/div` — direct structure        |
| **3. Use `@` for attributes**                   | Targets an attribute of an element            | `//input[@id='username']`                  |
| **4. Use `[]` for conditions**                  | Filters elements based on attributes or index | `//input[@type='text']` or `(//div)[2]`    |
| **5. Use `text()` to match visible text**       | Matches elements by the **inner text**        | `//button[text()='Login']`                 |
| **6. Use `contains()` for partial matches**     | Useful when values change dynamically         | `//input[contains(@class, 'form')]`        |
| **7. Use `and` / `or` for multiple conditions** | Combine multiple attribute checks             | `//input[@type='text' and @name='email']`  |
| **8. Use `*` to match any tag**                 | Wildcard to select any element                | `//*[@id='main']` — any tag with id=main   |

---

- **Advanced XPath Rules**
| Rule                                                          | Description                             | Example                                            |
| ------------------------------------------------------------- | --------------------------------------- | -------------------------------------------------- |
| **9. Use `position()` for indexing**                          | Gets element by its position            | `(//input)[2]` — 2nd input tag                     |
| **10. Use axes like `parent::`, `following-sibling::`, etc.** | Navigate complex DOMs                   | `//label[text()='Email']/following-sibling::input` |
| **11. Avoid absolute XPath**                                  | Full path is brittle and breaks easily  | ❌ `/html/body/div[2]/form/input[1]`                |
| **12. Prefer relative XPath**                                 | Starts from relevant nodes, more stable | ✅ `//form[@id='loginForm']/input[1]`               |

---

- **Common XPath Functions**
| Function            | Use                                 |
| ------------------- | ----------------------------------- |
| `text()`            | Selects the element's text          |
| `contains()`        | Partial match for text or attribute |
| `starts-with()`     | Matches the beginning of a value    |
| `last()`            | Selects the last node in a set      |
| `normalize-space()` | Trims extra white spaces            |

---

- **Best Practices**
    - Prefer **unique attributes** (`id`, `name`) when possible.
    - Use `contains()` for **dynamic classes or IDs**.
    - Avoid relying on **index-based XPath** unless structure is stable.
    - Always **verify XPath manually** using browser DevTools or tools like **SelectorsHub**.


#### **Structural presentation**
- Difference between an HTML Document and XPath, using a tree-like format for clarity:
- **HTML Document Structure (Example)**
```html
    <html>
      <body>
        <div class="container">
          <form id="loginForm">
            <input type="text" id="username" />
            <input type="password" id="password" />
            <button type="submit">Login</button>
          </form>
        </div>
      </body>
    </html>
```

- **HTML as a Tree Structure**
  ```html
  html
└── body
    └── div (class="container")
        └── form (id="loginForm")
            ├── input (type="text", id="username")
            ├── input (type="password", id="password")
            └── button (type="submit") → "Login"
  ```



- **XPath Navigating This Tree**
| XPath Expression               | What It Selects                               |
| ------------------------------ | --------------------------------------------- |
| **`/html/body/div`**           | The `<div>` element                           |
| **`//form[@id='loginForm']`**  | The `<form>` with id "loginForm"              |
| **`//input[@id='username']`**  | The username input field                      |
| **`//button[text()='Login']`** | The login button by its visible text          |
| **`//form/input[2]`**          | The second input inside the form (`password`) |


### **7.4 : Intreacting With Elements**

#### **Selenium `Keys`**

##### **What is `Keys`?**
- In Selenium, `Keys` is a class used to simulate **keyboard key presses** like:
    - `Keys.ENTER`
    - `Keys.TAB`
    - `Keys.BACKSPACE`
    - `Keys.CONTROL`
    - `Keys.SHIFT`
    - `Keys.ARROW_DOWN`, etc.

> **In simple words:**  
> `Keys` lets your script press keys on the keyboard like a real user.

---

##### **Importing `Keys`**
```python
from selenium.webdriver.common.keys import Keys
```
##### **What is `send_keys()` in Selenium?**
- This method **simulate typing into input fields or elements**, like a real keyboard would.

- **Purpose:**
    - Type text into text boxes, textareas, or inputs.
    - Simulate pressing special keys like `ENTER`, `TAB`, `ESC`, etc.

- **Syntax:** **`element.send_keys(*value)`**
    - **parameters:**
        - **`value`**: The text or key(s) to send to the element

  
----

#####  **When to Use `send_keys()` with `Keys`**
Use `Keys` when:
- You want to **submit a form** `(Keys.ENTER)`
- Navigate between fields `(Keys.TAB)`
- Clear input or correct mistakes `(Keys.BACKSPACE)`
- Perform keyboard shortcuts `(Keys.CONTROL + a)`


##### **`clear() function`**
- This method **clears the text content** of input fields or textareas.
- **Purpose:**
  - Remove existing text from input fields.
  - Useful before using `send_keys()` to avoid appending text.
- **Syntax:** **`element.clear()`**

##### **`click() function`**
- This method **simulates a mouse click** on any clickable web element.
- **Purpose:**
  - Click buttons, links, checkboxes, radio buttons, etc.
  - Trigger actions tied to click events.
- **Syntax:** **`element.click()`**

##### **`submit() function`**
- This method **submits the form** that the element belongs to.
- **Purpose:**
  - Used on forms or child elements of a form to submit it.
  - Often used on input fields instead of clicking a submit button.
- **Syntax:** **`element.submit()`**
- **NOTE:**
    - `submit()` works only if the element is inside a real HTML <form>.
    - In modern web apps, forms are often built **without `<form>` tags**, using JavaScript.
    - In such cases, `.submit()` **won’t work**.
    - Use `.click()` on the button or `send_keys(Keys.ENTER)` instead to trigger submission.

In [4]:
from selenium import webdriver
from selenium.webdriver.edge.service import Service
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

# Path to msedgedriver.exe
edge_driver_path = r"D:\WEB SCRAPING\S7.Selenium\edgedriver_win64\msedgedriver.exe"
service = Service(edge_driver_path)

# Launch Edge browser
driver = webdriver.Edge(service=service)

driver.maximize_window()

driver.get('https://github.com/login')
time.sleep(2)

# username field
username_field = driver.find_element(By.ID, 'login_field')
username_field.send_keys('Akashpagi')
time.sleep(1)

# password field
password_field = driver.find_element(By.ID, 'password')
password_field.send_keys('9211')
time.sleep(1)

# submit button
submit_button = driver.find_element(By.XPATH, '/html/body/div[1]/div[3]/main/div/div[2]/form/div[3]/input')
submit_button.click()
time.sleep(2)

driver.quit()




In [1]:
from selenium import webdriver
from selenium.webdriver.edge.service import Service
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

# Path to msedgedriver.exe
edge_driver_path = r"D:\WEB SCRAPING\S7.Selenium\edgedriver_win64\msedgedriver.exe"
service = Service(edge_driver_path)

# Launch Edge browser
driver = webdriver.Edge(service=service)

# driver.maximize_window()

# Open a page
driver.get("https://www.bing.com")

# delay to see the browser open
time.sleep(5)

# Locating the elements
search_bar_xpath = '//*[@id="sb_form_q"]'
search_bar = driver.find_element(by=By.XPATH, value=search_bar_xpath)

# Enter input
search_bar.send_keys('Machine Learning Wikipedia')

time.sleep(5)

# # Clear input fields
# search_bar.clear()
# time.sleep(1)

search_bar.send_keys(Keys.ENTER)
time.sleep(3)

# Click on link
link_xpath = '//*[@id="gs_main"]/div[2]/div[3]/div/div[1]/a/div/div[2]/div'
link = driver.find_element(by=By.XPATH, value=link_xpath)
link.click()

time.sleep(5)

driver.quit()