# Webscraping with Selenium

Selenium automates browsers. That's it!
What you do with that power is entirely up to you.

Primarily Selenium is for automating web applications for testing purposes, but is certainly not limited to just that. You can also use Selenium to automatically retrieve data from webpages and this is exactly what this notebook is about. 

# 1. Initiating the Webdriver

In [3]:
# Init webdriver - you should initiate the webdriver in a seperate function. Otherwise a new webdriver instance will be initiated whenever you call the function!
from selenium import webdriver

# Init webdriver normally to see mistakes
driver = webdriver.Chrome("chromedriver.exe") #make sure that the chromdriver.exe file is in the working directory or that you use the absolut path instead. 

# maimizes the browser window -> recommended to consistently find all web objects at the same place. 
driver.maximize_window()


# Init webdriver with options 
"""chrome_options = Options()
chrome_options.add_argument("--headless")
    # set screensize to 1920x1080
chrome_options.add_argument("--window-size=1920x1080")
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver)
"""

# Init webdriver headless
"""
# Init webdriver headless to increase performance -> only use this method if you know what you are doing ;)
# get path of webdriver

chrome_driver = webdriver.Chrome("D:/Drive/01_Promotion/31_Code/01_Python/GitHub Readme/chromedriver.exe")
    # set options of webdriver to headless
chrome_options = Options()
chrome_options.add_argument("--headless")
    # set screensize to 1920x1080
chrome_options.add_argument("--window-size=1920x1080")

"""

# 2. Navigate the Webdriver

In [6]:
# navigate webdriver
    # the following call navigate your browser window to the quotes to scrape website 
try:
    driver.get("https://quotes.toscrape.com/")
except:
    print("webdriver failure")

In [9]:
author =  driver.find_element_by_xpath("/html/body/div/div[2]/div[1]/div[1]/span[2]/small").text
author

'Albert Einstein'

## 2.1 Finding elements with Selenium

Selenium offers a variety of options to find and interact with element on a website. Below we only showcase few examples but you can check out the Selenium documentation (https://selenium-python.readthedocs.io/locating-elements.html) to discover different ways to locate elements. 

In [None]:
# finding an author name by XPATH
    # hint: by right-clicking on the item on the website you can open Google's "inspect" mode. If you right-click on the element in the HTML code of the inspect window, you can copy the XPATH.
XPATH = "/html/body/div/div[2]/div[1]/div[1]/span[2]/small"
author = driver.find_element_by_xpath(XPATH).text

print(author)


# finding an author name by ID
ID = "insert ID"
author = driver.find_element_by_id(ID).text
print(author)

# finding an author name by tag name
tag_name = "insert tag name"
author = driver.find_element_by_tag_name(tag_name).text
print(author)

# finding an author name by class name
class_name = "insert class name"
author = driver.find_element_by_class_name(class_name).text
print(author)

# finding an author name by css_selector
css_selector = "insert css selector"
author = driver.find_element_by_css_selector(css_selector).text
print(author)



In [None]:
# find multiple elements 
    # you can also find a list of multiple element 
    # once you got the list you need to iterate through the list if you want to access the individual items
    # you can use the following methods to retrieve the list
author_list = driver.find_elements_by_name()
author_list = driver.find_elements_by_xpath()
author_list = driver.find_elements_by_link_text()
author_list = driver.find_elements_by_partial_link_text()
author_list = driver.find_elements_by_tag_name()
author_list = driver.find_elements_by_class_name()
author_list = driver.find_elements_by_css_selector()

# iterating through the lists
for author in author_list:
    print(author.text)


## 2.2 Interacting with elements 

Selenium also allows you to interact with all elements on a webpage that a normal user can usually interact with. E.g., it can click button to go through different subpages. 

In [18]:
# interacting with a buttion
    # frist you need to import the ActionChains module of Selenium
from selenium.webdriver import ActionChains

    # then you need to find the button element. You can chose every "find" method you want
button = driver.find_element_by_xpath("/html/body/div/div[2]/div[1]/nav/ul/li[2]/a")

    # finally you need to pass the button into the click method of the action chain object that takes you webdriver as an argument and call the perform method
ActionChains(driver).click(button).perform()

## Scraping Example 2 - Scraping a real live page 