# Beautiful Soup Scrapper 

In [None]:
# TO WORK WITH BEAUTIFUL SOUP : 
    # pip install requests
    # pip install html5lib
    # pip install bs4

In [None]:
# import requests
# from bs4 import BeautifulSoup 

In [None]:
# URL = "https://en.wikipedia.org/wiki/Grand_Theft_Auto_VI"
# r = requests.get(URL)
# r.content

# Example Code ( Selenium Scrapper on NSE data )

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import numpy as np
import pandas as pd 

In [None]:
# This one if we don't want a popup browser 
# options = webdriver.ChromeOptions()
# options.add_argument("--headless=new")

driver = webdriver.Chrome()

driver.get("https://www.nseindia.com/market-data/live-equity-market")

In [None]:
print(driver.page_source)

In [None]:
stocks = driver.find_elements(By.CLASS_NAME, 'symbol-word-break')
values = driver.find_elements(By.CLASS_NAME, 'text-right')


# Extracting the text from each WebElement
stock_values = [stock.text for stock in stocks]
values_values = [values.text for values in values]

# Printing the values
print(stock_values)


In [None]:
values_values# Assuming `values_values` contains your data

columns = 13  # Number of columns in each row

# Reshape the list into sublists of 13 elements each
table = [values_values[i:i + columns] for i in range(0, len(values_values), columns)]

In [None]:
pd.concat([pd.DataFrame(stock_values),pd.DataFrame(table[2:])],axis=1)

In [None]:
driver.quit()

# **Web Scraping Using Selenium**



Do you want to master **web scraping with Selenium** in bash, from the basics to advanced techniques, including strategies to avoid being blocked? Look no further! 

Selenium is a widely-used **open-source library** designed for **browser automation** and **scraping dynamic content**. It leverages the **WebDriver protocol** to control popular browsers like **Chrome**, **Firefox**, and **Safari**. 

Unlike traditional scraping tools, Selenium excels at interacting with **JavaScript-heavy websites**, giving it a significant edge. Its human-like interaction capabilities also make it effective at mimicking real users and bypassing anti-bot systems.

---

## <span style="color:#2E86C1"><b>Why Selenium?</b></span>

- **Handles Dynamic Content**: Collects data from sites that render content using JavaScript.
- **Human-Like Interaction**: Simulates real user actions like typing, scrolling, and clicking.
- **Browser Compatibility**: Works with major browsers like Chrome, Firefox, Safari, and Edge.
- **Powerful Automation**: Ideal for both web scraping and automated testing.

---

## <span style="color:#D35400"><b>Getting Started with Selenium</b></span>

To scrape data using Selenium, we’ll use **Google Chrome**, one of the most popular browsers for automation. You’ll need to install the following tools:

### <span style="color:#28B463"><b>Step 1: Install Google Chrome</b></span>

Download the latest version of **Google Chrome** from the official website:
[Download Google Chrome](https://www.google.it/intl/en/chrome/)

### <span style="color:#E74C3C"><b>Step 2: Install ChromeDriver</b></span>

- Visit the official [ChromeDriver page](https://googlechromelabs.github.io/chrome-for-testing/).
- Download the **ChromeDriver** version that matches your Chrome browser version.
- Extract the zipped folder and locate the `chromedriver` executable.
- Move the `chromedriver` file to your project root folder for easy access.

---

## <span style="color:#F39C12"><b>Step#1: Setting Up Selenium in bash</b></span>

Use the following code snippet to set up Selenium for **Google Chrome**:

In [None]:
# ! pip install selenium  # install it in your current env 

In [None]:
# import the required library
from selenium import webdriver
 
# initialize an instance of the chrome driver (browser)
driver = webdriver.Chrome()

# visit your target site
driver.get("https://www.scrapingcourse.com/ecommerce/")

# output the full-page HTML
# print(driver.page_source)

# release the resources allocated by Selenium and shut down the browser
driver.quit()

The code spins up a browser interface with a "Chrome is being controlled by automated test software" message, an extra alert section to inform you that Selenium is controlling the Chrome instance:

<center><img src="./images/SeleniumControlChrome.png" alt="error" width="600"/></center>

# **Headless Browsers in bash**


A *headless browser* is a browser that operates without a **Graphical User Interface (GUI)** but retains all the functionalities of a regular browser. These browsers are controlled using automation scripts and are widely used for tasks such as **test automation** and **web scraping**.

Headless browsers are faster than GUI-based browsers because they avoid rendering resource-intensive graphics. They allow you to execute JavaScript, automate interactions like **clicking**, **scrolling**, and **typing**, and handle dynamic websites efficiently.

---

## <span style="color:#2E86C1"><b>What Is a Headless Browser in bash?</b></span>

A **bash headless browser** is an automation tool designed to perform browser operations invisibly, controlled through scripts. Popular headless browser tools in bash include **Selenium** and **Playwright**.

### **Key Features:**
- **No GUI**: Operates without graphical rendering, improving performance.
- **Automation Capabilities**: Automate interactions like typing, clicking, scrolling, and more.
- **Dynamic Content Handling**: Extract data from JavaScript-rendered web pages.
- **Waiting Mechanism**: Adds delay for web elements to load before taking further actions.

---

## <span style="color:#D35400"><b>Step #2: Setting Up Headless Mode in Selenium</b></span>

To enable *headless mode* in Selenium for Chrome, you use the `ChromeOptions` object. Here’s how to set it up:

```bash
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# Configure Chrome for headless mode
chrome_options = Options()
chrome_options.add_argument("--headless=new")  # Enable headless mode for Chrome 109+
chrome_options.add_argument("--disable-gpu")   # Disable GPU for stability
chrome_options.add_argument("--window-size=1920,1080")  # Set browser window size

# Start WebDriver with options
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://example.com")
print(driver.title)
driver.quit()
```

---

## <span style="color:#28B463"><b>Comparison of Popular bash Headless Browsers</b></span>

| **Name**           | **Pros**                                                                                                                                                     | **Cons**                                                                                                                                                     |
|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Selenium**        | - **Wide Browser Support**: Works with major browsers like Chrome, Firefox, Edge, Safari, and Internet Explorer.<br>- **Extensive Plugins**: Supports plugins like the Undetected ChromeDriver for bypassing bot detection.<br>- **Rich Ecosystem**: Well-documented and widely used in testing and scraping communities. | - **Complex for Advanced Tasks**: Requires a steep learning curve, especially for intricate automation or scraping tasks.<br>- **Limited Browser Control**: Provides limited direct manipulation of browser properties like the `navigator` field.<br>- **Messy JavaScript Execution**: Requires wrapping JavaScript code in strings, making it less intuitive compared to alternatives. |
| **Playwright**      | - **Simple API Methods**: Offers straightforward and developer-friendly APIs for automation tasks.<br>- **Broad Browser Support**: Works with major browsers, including Chrome, Firefox, Safari, and Edge.<br>- **Fine-Grained Browser Control**: Allows direct manipulation of browser properties via the Chrome DevTools Protocol.<br>- **Anti-Bot Features**: Supports evasion plugins like Playwright Stealth for bypassing detection mechanisms. | - **Large Installation Size**: Downloading browser binaries increases disk space usage.<br>- **Limited Legacy Support**: Does not support older browsers like Internet Explorer, limiting its use for certain projects. |

## <span style="color:#F39C12"><b>Other Simple Tools</b></span>

While Selenium and Playwright are the most feature-rich, other tools like **Pyppeteer**, **Splash**, and **MechanicalSoup** can also be used for simpler or specific tasks.

- **Pyppeteer**: Good for Chromium-specific tasks but lacks updates.
- **Splash**: Lightweight and integrates with Scrapy but requires Lua scripting.
- **MechanicalSoup**: Ideal for static websites but limited for JavaScript-heavy pages.


Choose a tool based on your project requirements, balancing simplicity, performance, and compatibility.


In [None]:
# import the required library
from selenium import webdriver

options = webdriver.ChromeOptions()

options.add_argument("--headless=new")
 
# initialize an instance of the chrome driver (browser)
driver = webdriver.Chrome(options=options)

# visit your target site
driver.get("https://www.scrapingcourse.com/ecommerce/")

# output the full-page HTML
# print(driver.page_source)

# release the resources allocated by Selenium and shut down the browser
driver.quit()

# **Step #3: Extract Specific Data From the Page**

To extract specific data from a website, Selenium allows you to scrape information such as product names, prices, image sources, and URLs from the target page. 

Selenium provides two primary methods to locate elements on a web page:

1. **`find_element`**: Retrieves a single element. If multiple elements match the selector, it returns the **first matching element**.
2. **`find_elements`**: Retrieves **all elements** matching the selector as an array.

---

## <span style="color:#28B463"><b>Methods for Locating Elements</b></span>

### **Categories of Locators**
Selenium supports eight locator strategies divided into three main categories:

- **CSS Selectors**: `By.ID`, `By.CLASS_NAME`, `By.CSS_SELECTOR`
- **XPath**: `By.XPATH`
- **Direct Selectors**: `By.NAME`, `By.LINK_TEXT`, `By.PARTIAL_LINK_TEXT`, `By.TAG_NAME`

---

## <span style="color:#2E86C1"><b>Locator Strategies with Examples</b></span>

The table below describes the strategies, their usage, and corresponding Selenium examples:

| **Strategy**          | **Description**                                                | **HTML Sample Code**                      | **Selenium Example**                                                                 |
|------------------------|---------------------------------------------------------------|-------------------------------------------|-------------------------------------------------------------------------------------|
| **By.ID**             | Selects elements based on their `id` attribute                | `<div id="s-437">...</div>`               | `find_element(By.ID, "s-437")`                                                     |
| **By.CLASS_NAME**     | Selects elements based on their `class` attribute             | `<div class="welcome-text">Welcome!</div>`| `find_element(By.CLASS_NAME, "welcome-text")` <br>`find_elements(By.CLASS_NAME, "text-center")` |
| **By.CSS_SELECTOR**   | Selects elements matching a CSS selector                      | `<div class="product-card"><span class="price">$140</span></div>` | `find_element(By.CSS_SELECTOR, ".product-card .price")` <br>`find_elements(By.CSS_SELECTOR, ".product-card .price")` |
| **By.XPATH**          | Selects elements using an XPath expression                   | `<h1>My <strong>Fantastic</strong> Blog</h1>` | `find_element(By.XPATH, "//h1/strong")` <br>`find_elements(By.XPATH, "//h1/strong")` |
| **By.NAME**           | Selects elements based on their `name` attribute             | `<input name="email" />`                  | `find_element(By.NAME, "email")` <br>`find_elements(By.NAME, "email")`             |
| **By.LINK_TEXT**      | Selects anchor (`<a>`) elements matching a specific link text| `<a href="/">Home</a>`                    | `find_element(By.LINK_TEXT, "Home")` <br>`find_elements(By.LINK_TEXT, "Home")`     |
| **By.PARTIAL_LINK_TEXT** | Selects anchor (`<a>`) elements matching a substring of the link text | `<a href="/">Click here now</a>` | `find_element(By.PARTIAL_LINK_TEXT, "now")` <br>`find_elements(By.PARTIAL_LINK_TEXT, "now")` |
| **By.TAG_NAME**       | Selects elements based on their tag name                     | `<span>...</span>`                        | `find_element(By.TAG_NAME, "span")` <br>`find_elements(By.TAG_NAME, "span")`       |

---

## <span style="color:#F39C12"><b>CSS Selectors vs XPath</b></span>

- **CSS Selectors**: Recommended for beginners due to simplicity and maintainability. They are ideal for selecting elements with classes and IDs.
- **XPath**: Useful for navigating complex HTML structures. It provides more specificity when selecting nodes.

---

## <span style="color:#9B59B6"><b>Quick Tips</b></span>

1. **Get CSS Selectors and XPath Automatically**:  
   Right-click on an element in the browser, open the **"Copy" menu**, and select either:
   - **Copy selector** (for CSS Selector)
   - **Copy XPath** (for XPath)

2. **Choosing the Right Method**:  
   Use **CSS Selectors** for simplicity. Use **XPath** for complex cases where CSS Selectors might not suffice.

---

Now that you understand the element locators and their applications, try them out to scrape data effectively!


In [None]:
# First, import By, the Selenium method containing all the built-in locator strategies

from selenium.webdriver.common.by import By

In [None]:
# import the required library
from selenium import webdriver

# options = webdriver.ChromeOptions()

# options.add_argument("--headless=new")
 
driver = webdriver.Chrome()

driver.get("https://www.scrapingcourse.com/ecommerce/")

In [None]:
# extract all the product containers
products = driver.find_elements(By.CSS_SELECTOR, ".product")
products

In [None]:

# extract the elements into a dictionary using the CSS selector
product_data = {
    "Url": driver.find_element(
        By.CSS_SELECTOR, ".woocommerce-LoopProduct-link"
    ).get_attribute("href"),
    "Image": driver.find_element(By.CSS_SELECTOR, ".product-image").get_attribute(
        "src"
    ),
    "Name": driver.find_element(By.CSS_SELECTOR, ".product-name").text,
    "Price": driver.find_element(By.CSS_SELECTOR, ".price").text,
}

# print the extracted data
print(product_data)

In [None]:
# declare an empty list to collect the extracted data

extracted_products = []

# loop through the product containers

for product in products:

    # extract the elements into a dictionary using the CSS selector
    product_data = {
        "Url": product.find_element(
            By.CSS_SELECTOR, ".woocommerce-LoopProduct-link"
        ).get_attribute("href"),
        "Image": product.find_element(By.CSS_SELECTOR, ".product-image").get_attribute(
            "src"
        ),
        "Name": product.find_element(By.CSS_SELECTOR, ".product-name").text,
        "Price": product.find_element(By.CSS_SELECTOR, ".price").text,
    }

    # append the extracted data to the extracted_product list

    extracted_products.append(product_data)


In [None]:
driver.quit()

In [None]:
import pandas as pd 

data = pd.DataFrame(extracted_products)
data.head()

In [None]:
data.to_csv('data.csv')


# **How to Interact With a Web Page as in a Browser**

Selenium allows you to mimic human interactions with web pages, enabling actions such as **scrolling**, **clicking**, **hovering**, **filling out forms**, and even **dragging and dropping**. This capability is especially useful when working with dynamic pages or avoiding anti-bot measures.

In this section, we'll explore **browser interactions** you might frequently use while scraping with Selenium, focusing on **scrolling** as a vital technique.

---

## <span style="color:#2E86C1"><b>Scrolling</b></span>

### <span style="color:#28B463"><b>Why Scrolling Matters</b></span>
Scrolling is essential when scraping websites that load content dynamically, such as those implementing **infinite scrolling**. These pages use AJAX to fetch more data as you scroll, which requires additional actions to access all the content.

---

### <span style="color:#28B463"><b>Infinite Scrolling Logic</b></span>

To scrape such pages, you can simulate continuous scrolling using **JavaScript** within Selenium's `execute_script` method. The process involves:

1. **Getting the Initial Page Height**: This serves as a baseline for detecting changes in content.
2. **Initiating a Scrolling Action in a Loop**: Use JavaScript to scroll to the bottom of the page.
3. **Pausing for Content to Load**: Add a delay using `time.sleep` to allow AJAX calls to complete.
4. **Checking for Additional Content**: Compare the updated page height with the initial height to determine if new content has loaded.
5. **Breaking the Loop**: Stop when no new content is detected.

---

### <span style="color:#28B463"><b>Key Points to Note</b></span>

- **Time Delays**: Adjust the `time.sleep` duration based on the loading speed of the target website. Some sites might require longer pauses for content to appear.
- **JavaScript Execution**: Selenium's `execute_script` method is pivotal for simulating browser actions that aren't directly available via the WebDriver API.
- **Dynamic Content Handling**: Scrolling is just one step. Ensure your scraper is equipped to handle elements like pop-ups, lazy-loaded images, or CAPTCHA challenges.

---

### <span style="color:#28B463"><b>Benefits of Simulating Scrolling</b></span>

- **Access Dynamic Content**: Extract hidden data that only becomes visible after scrolling.
- **Mimic User Behavior**: Reduce the likelihood of being flagged as a bot by emulating human-like interactions.
- **Handle Large Datasets**: Scrape all content from long pages without manually clicking "Load More."

---

By mastering scrolling techniques, you'll enhance your ability to scrape dynamic and content-rich websites effectively with Selenium!
```

In [None]:
import time 
from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()

# options.add_argument("--headless=new")

driver = webdriver.Chrome(
    options=options
)

driver.get("https://www.scrapingcourse.com/infinite-scrolling")

# get the initial scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # scroll to the bottom of the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # wait for more elements to load after scrolling
    time.sleep(5)

    # get the new scroll height after scrolling
    new_height = driver.execute_script("return document.body.scrollHeight")

    # check if new content has loaded
    if new_height == last_height:
        # if no new content is loaded, break the loop
        break

    # update the last height
    last_height = new_height

# extract all product containers
products = driver.find_elements(By.CSS_SELECTOR, ".product-item")

# declare an empty list to collect the extracted data
extracted_products = []

# loop through each product container to extract details
for product in products:
    product_data = {
        "Name": product.find_element(By.CSS_SELECTOR, ".product-name").text,
        "Price": product.find_element(By.CSS_SELECTOR, ".product-price").text,
    }
    extracted_products.append(product_data)

# output the data
print(extracted_products)

# release the resources allocated by Selenium and shut down the browser
driver.quit()



# **Handling JavaScript-Rendered Pages in Selenium**

Scraping JavaScript-rendered pages requires additional steps since the DOM takes time to fully load. The element you want to scrape may not be immediately available, so you'll need to implement waiting mechanisms.

### <span style="color:#9B59B6"><b>Selenium provides three primary ways to handle this</b></span>

| **Method**            | **Description**                                                                                      | **Best Use Cases**                                      | **Limitations**                                                                 |
|------------------------|------------------------------------------------------------------------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------------------|
| **`time.sleep()`**     | Pauses execution for a fixed time interval.                                                         | Handling uncertain wait times or infinite scrolling.   | Inefficient; adds unnecessary delays.                                          |
| **`implicitly_wait`**  | Globally waits for elements to be present in the DOM.                                               | Applying a uniform wait for all elements.              | Limited control; doesn’t check visibility or interactability.                  |
| **`WebDriverWait`**    | Pauses execution until a specific condition is met or the timeout is reached.                       | Waiting for dynamic elements or specific conditions.    | Requires additional setup and knowledge of Selenium APIs.                      |


### <span style="color:#F39C12"><b>Recommendations</b></span>

- **Use `WebDriverWait`** for flexibility and efficiency when dealing with dynamic JavaScript-rendered pages.
- **Avoid overusing `time.sleep()`**, as it introduces inefficiencies.
- **Set a reasonable `implicitly_wait`** time as a fallback for general page loading delays.


In [None]:
# import the required libraries

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# instantiate a Chrome options object
options = webdriver.ChromeOptions()

# set the options to use Chrome in headless mode
options.add_argument("--headless=new")

# initialize an instance of the Chrome driver (browser) in headless mode
driver = webdriver.Chrome(options=options)

# visit your target site
driver.get("https://www.scrapingcourse.com/javascript-rendering")

# wait up to 5 seconds until the image card appears
element = WebDriverWait(driver, 5).until(
    EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".product-item"))
)


# you are now sure that the product grid has loaded
# and can scrape it
products = driver.find_elements(By.CSS_SELECTOR, ".product-item")

extracted_products = []

for product in products:
    product_data = {
        "name": product.find_element(By.CSS_SELECTOR, ".product-name").text,
        "price": product.find_element(By.CSS_SELECTOR, ".product-price").text,
    }

    extracted_products.append(product_data)

print(extracted_products)

### <span style="color:#D35400"><b>Popular expected_conditions in Selenium</b></span>

Selenium's `expected_conditions` provide flexibility when waiting for specific states or elements during automation. Below is a table of commonly used conditions:

| **Condition**                        | **Description**                                                                                   |
|--------------------------------------|---------------------------------------------------------------------------------------------------|
| **title_contains**                 | Waits until the page title contains a specific string.                                            |
| **presence_of_element_located**    | Waits until an HTML element is present in the DOM.                                               |
| **visibility_of_element_located**  | Waits until an element already in the DOM becomes visible.                                       |
| **text_to_be_present_in_element**  | Waits until a specific text is present in an element.                                            |
| **element_to_be_clickable**        | Waits until an HTML element becomes clickable.                                                   |
| **alert_is_present**               | Waits until a JavaScript native alert appears.                                                   |
| **visibility_of_all_elements_located** | Waits until multiple elements (matching the same selector) become visible.                     |


## **Wait for a Page to Load**

**document.readyState:** Execute a script that waits for the document.readyState to complete before interacting further with the DOM. This method involves using the explicit wait method (WebDriverWait) with the expected_conditions to check if the page document and all its resources have finished loading. Again, this method is better since it offers more flexibility.

In [None]:
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

driver.get('https://www.scrapingcourse.com/javascript-rendering')

WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME,'product-image')))

## **We can save screenshot of website**

this can be useful for capcha handling etc..

In [None]:
from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.scrapingcourse.com/javascript-rendering')

driver.save_screenshot('photo.png') 

You can also screenshot a specific element. The code below grabs the description section of this demo product page. Note that we've used the ID selector this time

In [None]:
value = driver.find_element(By.CSS_SELECTOR,'#tab-description')
value.screenshot('specific_img.png')

In [None]:
value = driver.find_element(By.CSS_SELECTOR,'.flex-viewport img')
value.screenshot('specific_img.png')

## **Clicking Specific Button (On a Link)**

In [None]:
value = driver.find_element(By.CSS_SELECTOR,'.brand-name')
value.click()

value = driver.find_element(By.CSS_SELECTOR,'.card-page-link')
value.click()

## **Fill Out a Form**

Selenium's form-filling feature helps automate actions, such as signing up, logging in, filling out a contact form, or launching a search

In [None]:
from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.scrapingcourse.com/login')

In [None]:
email = driver.find_element(By.CSS_SELECTOR,'#email')
password = driver.find_element(By.CSS_SELECTOR,'#password')
login = driver.find_element(By.CSS_SELECTOR,'#submit-button')

email.send_keys('admin@example.com')
password.send_keys('password')
login.click()   

## **Execute JavaScript Directly Within the Browser** 

Selenium provides access to all browser functionalities, including launching JavaScript instructions.

The execute_script() method enables you to execute JavaScript instructions synchronously. **That's particularly helpful when the features provided by Selenium aren't enough to achieve your goal.**

Let's use **Javascript** to take screenshot of **Description** with better **view.**

In [None]:
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://www.scrapingcourse.com/ecommerce/product/chaz-kangeroo-hoodie/")

card = driver.find_element(By.CSS_SELECTOR,'#tab-description')

card_y_location = card.location['y']

# "-100" to give some extra space and make
# ensure the screenshot is taken correctly
javaScript = f"window.scrollBy(0, {card_y_location}-100);"

# execute JavaScript
driver.execute_script(javaScript)

driver.save_screenshot("scrolled-element-screenshot.png")


## **Customize Windows Size**

Modern sites are responsive and **adapt their layout to the user's screen or browser window size**. Depending on the available space, they may show or hide elements using JavaScript on smaller screens. Selenium allows you to change the browser window's initial size, enabling you to reveal content that might be hidden in the initial viewport. 

You can achieve this in two ways:
-   `options.add_argument("--window-size=<width>,<height>")`.
-   `set_window_size(<width>, <height>)`.

In [None]:
# method 1: using options.add_argument("--window-size=<width>,<height>")
options = webdriver.ChromeOptions()

# set the initial window size
options.add_argument("--window-size=800,600")

driver = webdriver.Chrome(options=options)

# print the window size
print(driver.get_window_size())  # {"width": 800, "height": 600}

In [None]:
# method 2: using set_window_size(<width>, <height>)
driver = webdriver.Chrome(options=options)

# set the window size
driver.set_window_size(1920, 1200)

# print the window size
print(driver.get_window_size())  # {'width': 1920, 'height': 1200}

## **Get Around Anti-Scraping Protections With Selenium in Python**

You now know how to do web scraping using Selenium in Python. Yet, retrieving data from the web is a challenge, as some sites adopt anti-bot technologies that might detect your scraper as a bot and block it.

In [None]:
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# run Chrome in headless mode
# options = Options()
# options.add_argument("--headless=new")

# start a driver instance
# driver = webdriver.Chrome(options=options)
driver = webdriver.Chrome()

# open the target website
driver.get("https://www.g2.com/products/asana/reviews")

# save a screenshot to see what happens
# driver.save_screenshot("g2-reviews-screenshot.png")

# release the resources allocated by Selenium and shut down the browser
# driver.quit()

### **Change IP Using a Proxy**

A proxy service sends requests on your behalf and increases your chances of bypassing IP bans due to rate limiting and geo-restrictions. 

To see how proxy implementation works in Selenium, grab a [free proxy](https://free-proxy-list.net/) from the Free Proxy List and add it to your scraper, as shown in the code below. 

In [None]:
# import the required libraries
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# run Chrome in headless mode
options = Options()

# set the proxy address
proxy_server_ip = "http://67.43.228.251:3343"

# add the address to Chrome options
options.add_argument(f"--proxy-server={proxy_server_ip}")

# set the options to use Chrome in headless mode
# options.add_argument("--headless")

# start a driver instance
driver = webdriver.Chrome(options=options)

# open the target website
driver.get("https://httpbin.io/ip")

# print your current IP address
print(driver.find_element(By.TAG_NAME, "body").text)

# release the resources allocated by Selenium and shut down the browser
# driver.quit()
