# Scraping with Selenium

We need to inform Selenium about the Webdriver for the browser we wish to use. For the purpose of the exercise, we continue to use Chrome.

There are two ways to do it.
1. Manually install driver: Install a stable version of the chromedriver directly from [here](https://googlechromelabs.github.io/chrome-for-testing/). Pay attention to the correct version. Download the driver and place it inside the project folder locally. 

2. Package: You could also install a library such as (webdriver_manager)[https://github.com/SergeyPirogov/webdriver_manager]
```pip install webdriver_manager```



In [2]:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time

In [4]:
# If you have chosen to install chromedriver manually
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get("https://www.google.com/")
driver.close()

In [None]:
#If you have chosen to install webdriver_manager, then execute the following lines of code
'''
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.maximize_window()
driver.get("https://www.google.com/")

'''

# Some Important Concepts
1. **Waiting**: Sometimes the web application needs time for processing. If we try to access the page before the application is in a valid state, we might get an error. (Usually TimeoutError). Selenium provides two waiting strategies
    1. **Implicit Waits**: This is a global setting that applies to every element location call for the entire session. If an implicit wait is set, the driver will wait for the duration of the provided value before returning an error, if the element is not fopund. However, if the element is located, the driver will return the element reference immediate and further code will continue executing, so a larger implicit wait value won’t necessarily increase the duration of the session.
    ```driver.implicitly_wait(10)``` implies that the driver should wait for 10 seconds before returning an error.
    2. **Explicit Waits**: These are loops that check that a condition criteria is true before proceeding to futher code execution. If the condition is not met, a timeout error is returned.
        1. ```wait = WebDriverWait(driver, 10)``` implies a timeout should be performed after 10 seconds if the condition is not fulfilled.
        2. ```element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_login)))``` implies waiting until the [expected conditions](https://www.selenium.dev/documentation/webdriver/support_features/expected_conditions/) for checking an element is visible and enabled such that you can click it is met. An element is retuned if the the conditions are met.
   More details can be found [here](https://www.selenium.dev/documentation/webdriver/waits/).
2. ** Finding Elements**: Selenim offers [several attributes](https://selenium-python.readthedocs.io/locating-elements.html) in the BY class which can be used for finding elements. 
    1. ID = "id" &emsp; ```find_element(By.ID, "id")```
    2. NAME = "name" &emsp; ```find_element(By.NAME, "name")```
    3. XPATH = "xpath" &emsp; ```find_element(By.XPATH, "xpath")```
    4. LINK_TEXT = "link text" &emsp; ```find_element(By.LINK_TEXT, "link text")```
    5. PARTIAL_LINK_TEXT = "partial link text" &emsp; ```find_element(By.PARTIAL_LINK_TEXT, "partial link text")```
    6. TAG_NAME = "tag name" &emsp; ```find_element(By.TAG_NAME, "tag name")```
    7. CLASS_NAME = "class name" &emsp; ```find_element(By.CLASS_NAME, "class name")```
    8. CSS_SELECTOR = "css selector" &emsp; ```find_element(By.CSS_SELECTOR, "css selector")```
    9. Multiple elements can be found by ```element.find_elements(By.NAME, "name")```
3. **Giving Inputs**: Form elements can be provided with text using ```element.send_keys("Some Text")```  

In [35]:
url = "https://www.selenium.dev/selenium/docs/api/py/index.html#"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get(url=url)
wait = WebDriverWait(driver, 10)
time.sleep(10)
driver.close()

In [39]:
url = "https://www.selenium.dev/selenium/docs/api/py/index.html#"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get(url=url)
#wait = WebDriverWait(driver, 10)
element = driver.find_element(By.ID,"introduction")
print(element.text)
time.sleep(2)
driver.close()

Introduction
Python language bindings for Selenium WebDriver.
The selenium package is used to automate web browser interaction from Python.
Home: https://selenium.dev
GitHub: https://github.com/SeleniumHQ/Selenium
PyPI: https://pypi.org/project/selenium/
IRC/Slack: Selenium chat room
Several browsers/drivers are supported (Firefox, Chrome, Internet Explorer), as well as the Remote protocol.


## Task 1
Print the text for all the div tags with class names as sections for the url https://www.selenium.dev/selenium/docs/api/py/index.html#

In [41]:
url = "https://www.selenium.dev/selenium/docs/api/py/index.html#"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get(url=url)
elements = driver.find_elements(By.CLASS_NAME,"section")
for element in elements:
    print(element.text)
time.sleep(2)
driver.close()

In [47]:
url = "https://www.selenium.dev/selenium/docs/api/py/index.html#"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get(url=url)
elements = driver.find_elements(By.PARTIAL_LINK_TEXT,"github")
for element in elements:
    print(element.text)
time.sleep(2)
driver.close()

https://github.com/SeleniumHQ/Selenium
https://github.com/mozilla/geckodriver/releases
https://github.com/SeleniumHQ/selenium/tree/trunk/py


## Task 2
Go to the website (https://fill.dev/)[https://fill.dev/] and submit forms such as simple login and credit card information with **DUMMY** information.

In [20]:
# Simple Login
url = "https://fill.dev/"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get(url=url)
wait = WebDriverWait(driver, 10)
xpath_login = "//*[@id=\"navbarSupportedContent\"]/ul/li[1]"

element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_login)))
element.click()
time.sleep(5)
xpath_simple_login = "//*[@id=\"navbarSupportedContent\"]/ul/li[1]/ul/li[1]/a"
element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_simple_login)))
element.click()
xpath_username = "//*[@id=\"username\"]"
element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_username)))
element.send_keys("random_username")
xpath_password = "//*[@id=\"password\"]"
element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_password)))
element.send_keys("random_password")
xpath_submit_button = "//*[@id=\"app\"]/main/div/div/div/div/div[2]/form/div[3]/div/button"
element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_submit_button)))
element.click()
time.sleep(10)

driver.close()

In [32]:
# Credit Card
url = "https://fill.dev/form/credit-card-simple"
service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.maximize_window()
driver.get(url=url)
wait = WebDriverWait(driver, 10)
xpath_name = "//*[@id=\"cc-name\"]"
element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_name)))
element.send_keys("John Doe")

xpath_type = "//*[@id=\"cc-type\"]"
element = wait.until(EC.element_to_be_clickable((By.XPATH, xpath_type)))
all_options  = element.find_elements(By.TAG_NAME, "option")
for option in all_options:
    print(option.get_attribute("value"))
    if option.get_attribute("value")=="amex":
        option.click()
time.sleep(10)

driver.close()

The chromedriver version (124.0.6367.201) detected in PATH at c:\Users\sukan\PycharmProjects\Data Scraping Workshop 2024\chromedriver.exe might not be compatible with the detected chrome version (125.0.6422.142); currently, chromedriver 125.0.6422.141 is recommended for chrome 125.*, so it is advised to delete the driver in PATH and retry



visa
mc
amex
discover
