# WebScraping with Selenium

> Selenium is a web automation tool that allows you to automate web browser interaction from Python. It is used to automate web browser interaction from Python. 
> This notebook is a tutorial on how to scrape images from google images to create your own dataset for machine learning. This is a very useful technique to create your own dataset for machine learning.

### PART 1 - Installation

> Download and install the latest version of **Anaconda** from [here](https://www.anaconda.com/products/individual).

> Install **selenium** using **pip**. If you are using **Google Colab**, you can use the `!` operator to run shell commands. For example, to install selenium, you can use the following command:
```python
# install selenium
!pip install selenium
```
Download the **ChromeDriver** from [here](https://chromedriver.chromium.org/downloads) at an appropriate location on your system. Make sure to download the version that matches your Chrome browser version.



### PART 2 - Importing the libraries

In [2]:
import os
import time
import shutil

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import Select

Mention the path to the ChromeDriver in the `PATH_TO_CHROMEDRIVER` argument. You can find the path to the ChromeDriver by right-clicking on the ChromeDriver file and selecting `Copy path`. You can also use the `os` module to get the path to the ChromeDriver.


In [5]:
PATH_TO_CHROMEDRIVER = "D:/webdrivers/chromedriver.exe"

### PART 3 - Write your first Selenium Script

In [3]:
options=Options()
options.page_load_strategy = 'normal'

# Start the session
driver=webdriver.Chrome(options=options,service=Service(PATH_TO_CHROMEDRIVER))

# Navigate to the URL
driver.get("https://www.selenium.dev/selenium/web/web-form.html")

# Request browser Information
title=driver.title

# Establising Waiting Strategy
driver.implicitly_wait(0.5)

# Find the element
text_box = driver.find_element(by=By.NAME, value="my-text")
submit_button = driver.find_element(by=By.CSS_SELECTOR, value="button")

# Take action on the element
text_box.send_keys("Selenium")
submit_button.click()

# Request element information
message=driver.find_element(by=By.ID, value="message")
value=message.text
print(value)

# End the session
driver.quit()

NameError: name 'PATH_TO_CHROMEDRIVER' is not defined

### PART 4 - Understanding Locator Strategies

#### Strategy 1 - By Class Name 
Use this when you want to locate an element by class name. With this strategy, the first element with the matching class name attribute will be returned. If no element has a matching class name attribute, a NoSuchElementException will be raised.

In [6]:
# Link Text Locator
options=Options()
options.page_load_strategy = 'normal'

# Start the session
driver=webdriver.Chrome(options=options,service=Service(PATH_TO_CHROMEDRIVER))
query="solar panels"
driver.get("https://images.google.com/")

search_box=driver.find_element(by=By.CLASS_NAME,value="gLFyf")
search_box.send_keys(query)
search_box.send_keys(Keys.ENTER)

#### Strategy 2 - By Link Text
Use this when you know the link text used within an anchor tag. With this strategy, the first element with the link text matching the provided value will be returned. If no element has a matching link text attribute, a NoSuchElementException will be raised.

In [7]:
# Link Text Locator
options=Options()
options.page_load_strategy = 'normal'

# Start the session
driver=webdriver.Chrome(options=options,service=Service(PATH_TO_CHROMEDRIVER))
driver.get("https://www.selenium.dev/selenium/web/web-form.html")

link=driver.find_element(by=By.LINK_TEXT,value="Return to index")
print(link.text)

Return to index


#### Strategy 3 - By Partial Link Text

In [8]:
# Partial Link Locator
options=Options()
options.page_load_strategy = 'normal'

# Start the session
driver=webdriver.Chrome(options=options,service=Service(PATH_TO_CHROMEDRIVER))

driver.get("https://www.selenium.dev/selenium/web/web-form.html")

link=driver.find_element(by=By.PARTIAL_LINK_TEXT,value="Return")
print(link.text)

Return to index


#### Strategy 4 - By Tag Name
Use this when you want to locate an element by tag name. With this strategy, the first element with the given tag name will be returned. If no element has a matching tag name, a NoSuchElementException will be raised.

In [9]:
#Tag Name Locator
options=Options()
options.page_load_strategy = 'normal'

# Start the session
driver=webdriver.Chrome(options=options,service=Service(PATH_TO_CHROMEDRIVER))

driver.get("https://www.selenium.dev/selenium/web/web-form.html")

link=driver.find_element(by=By.TAG_NAME,value="h1")
print(link.text)

Web form


#### Strategy 5 - By ID
Use this when you know the id attribute of an element. With this strategy, the first element with a matching id attribute will be returned. If no element has a matching id attribute, a NoSuchElementException will be raised.

In [9]:
# ID Locator
options=Options()
options.page_load_strategy = 'normal'

# Start the session
driver=webdriver.Chrome(options=options,service=Service(PATH_TO_CHROMEDRIVER))

driver.get("https://www.selenium.dev/selenium/web/web-form.html")

link=driver.find_element(by=By.ID,value="my-check-1").click()
time.sleep(2)
link=driver.find_element(by=By.ID,value="my-check-2").click()
time.sleep(2)
link=driver.find_element(by=By.ID,value="my-radio-1").click()
time.sleep(2)
link=driver.find_element(by=By.ID,value="my-radio-2").click()
time.sleep(2)

#### Strategy 6 - By Name
Use this when you know the name attribute of an element. With this strategy, the first element with a matching name attribute will be returned. If no element has a matching name attribute, a NoSuchElementException will be raised.

In [10]:
# Name Locator
options=Options()
options.page_load_strategy = 'normal'

# Start the session
driver=webdriver.Chrome(options=options,service=Service(PATH_TO_CHROMEDRIVER))

driver.get("https://www.selenium.dev/selenium/web/web-form.html")

link=driver.find_element(by=By.NAME,value="my-text")
time.sleep(2)
link.send_keys("Selenium")