# Selenium 

Selenium is used to scrape pages that have lazy loaders and heavy javascript.  (Things like endless scrolling)

Different from Beautiful Soup which just requests and parses the raw HTML, Selenium you simulate using a explorer by selecting objects to interact with (clicking, typing, scrolling).  This is done through a web driver (I will use Firefox).  Code is used to 'interact' with the driver, the scraper is debugged by watching the actions of the browser but can be run in headless mode (There are some issues with endless scrolling headless so look into that first [here](https://stackoverflow.com/questions/48257870/headless-chrome-with-selenium-can-only-find-ways-to-scroll-non-headless), also looking for newer info as updates may have fixed issue)

In [26]:
from selenium import webdriver
import time

In [4]:
path = '/Users/mekdiyilma/code/chromedriver'

In [58]:
driver = webdriver.Chrome(path)

In [59]:
driver.get("https://www.techwithtim.net/")

In [18]:
driver.title

'Tech With Tim - Python & Java Programming Tutorials - techwithtim.net'

In [34]:
from selenium.webdriver.common.keys import Keys

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

In [65]:
path = '/Users/mekdiyilma/code/chromedriver'
driver = webdriver.Chrome(path)
driver.get("https://www.techwithtim.net/")

search = driver.find_element_by_name("s")
search.send_keys("test")
search.send_keys(Keys.RETURN)


# print(driver.page_source) #prints blob of text of the website html


# source: https://selenium-python.readthedocs.io/waits.html

try:
    main = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, 'main'))
    )
    
    articles = main.find_elements_by_tag_name('article')
    for article in articles:
        header = article.find_element_by_class_name('entry-summary')
        print(header.text)
finally:
    driver.quit()
    
    

# main = driver.find_element_by_id('main') # could use this right after search.send_keys(Keys.RETURN), however
## if the website takes a while to load then this might error out. Hence, we add the explicit wait time 
## then goes ahead and run main By.ID. FYI we can also use By.name or By.class

# main.text
## in order to pring the text within main, we can use the method .text on main



HTTP MethodsIn this tutorial we will talk about HTTP methods. HTTP methods are the standard way of sending information to and from a web server. To break it down, a website runs on a server or multiple servers and simple returns information to a client (web-browser). Information is exchanged between the client and the server […]
Creating a Base Template So you may have realized that creating new web pages for every single page on our website is extremely inefficient. Especially when our website follows a theme and has similar elements (like a sidebar) on every page. This is where template inheritance comes in. We will talk about how to inherit […]
Redirecting ContinuedStarting from where we left off in the last tutorial. I wanted to show how to redirect to a function that takes an argument (like our user function). To do this we simply need to define the parameter name and a value in the url_for function, like below.from flask import Flask, redirect, url_for app […]
What is Flask?Flask

In [64]:
path = '/Users/mekdiyilma/code/chromedriver'
driver = webdriver.Chrome(path)
driver.get("https://www.techwithtim.net/")

# https://selenium-python.readthedocs.io/locating-elements.html 

link = driver.find_element_by_link_text('Python Programming')
link.click()

body = driver.find_element_by_name('body')
body.click()

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.LINK_TEXT, "Intermediate Python Tutorials"))
    )
    element.click()
finally:
    driver.quit()


NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[name="body"]"}
  (Session info: chrome=89.0.4389.114)


In [1]:
from selenium import webdriver
import selenium
import time

Create instance of the browser object. This will be used to interact with firefox

In [None]:
browser = webdriver.Firefox()

Navigate to amazon

In [None]:
browser.get("http://www.amazon.com/")

With Selenium the buttons and boxes on the webpage are treated as objects so we can search for the element and interact with it.  In this case the search box is found and selected (by clicking) then keys are sent to it (simulating typing on the keyboard).  The search button is found and clicked.

In [None]:
search_box = browser.find_element_by_css_selector("input#twotabsearchtextbox")

In [None]:
search_box.click()

In [None]:
search_box.send_keys("alexa")

In [None]:
search_button = browser.find_element_by_css_selector("div.nav-search-submit input")

In [None]:
search_button.click()
time.sleep(3)

The `browser` now contains the new webpage so the objects we want are found using uniqe class name.  This now is a list of web element objects that can be looped through.

In [None]:
products = browser.find_elements_by_css_selector("div.a-section.a-spacing-medium")

In [None]:
products

Simple version of getting data out of the objects.  This code can be modified to at the end scroll down to get the next returned items if this was a endless scroll page

In [None]:
titles_prices = []
for p in products:
    try:
        title_element = p.find_element_by_css_selector("span.a-size-medium.a-color-base.a-text-normal")
        title = title_element.text

        price_whole_element = p.find_element_by_css_selector("span.a-price-whole")
        price = price_whole_element.text

        price_fractional_element = p.find_element_by_css_selector(
            "span.a-price-fraction")
        price += ("." + price_fractional_element.text)
    except selenium.common.exceptions.NoSuchElementException:
        continue
    titles_prices.append((title, price))

In [None]:
titles_prices

Here is a example of opening tabs to look at sub pages.  In this method a new tab is opened, the focus of the driver is switched to that tab (we select the new tab) and the data is fetched.  Then any scraping of data can be done, the new tab is closed and focus is switched back to the first tab.

In [None]:
titles_prices = []
for p in products:
    try:
        title_element = p.find_element_by_css_selector("span.a-size-medium.a-color-base.a-text-normal")
        title = title_element.text
        
        link = p.find_element_by_css_selector('a.a-link-normal').get_attribute('href')
        
        # Open new tab
        browser.execute_script("window.open('');")
        time.sleep(3)

        # Switch to the new window
        browser.switch_to.window(browser.window_handles[1])
        browser.get(link)
        time.sleep(3)
        
        #Here is where you would get info off this page and store it
        
        
        # close the active tab
        browser.close()

        # Switch focus back to main tab
        browser.switch_to.window(browser.window_handles[0])
        
        
    except selenium.common.exceptions.NoSuchElementException:
        continue
    titles_prices.append((title, price))