# Selenium 

Selenium is used to scrape pages that have lazy loaders and heavy javascript.  (Things like endless scrolling)

Different from Beautiful Soup which just requests and parses the raw HTML, Selenium you simulate using a explorer by selecting objects to interact with (clicking, typing, scrolling).  This is done through a web driver (I will use Firefox).  Code is used to 'interact' with the driver, the scraper is debugged by watching the actions of the browser but can be run in headless mode (There are some issues with endless scrolling headless so look into that first [here](https://stackoverflow.com/questions/48257870/headless-chrome-with-selenium-can-only-find-ways-to-scroll-non-headless), also looking for newer info as updates may have fixed issue)

In [None]:
from selenium import webdriver
import selenium
import time

Create instance of the browser object. This will be used to interact with firefox

In [None]:
browser = webdriver.Firefox()

Navigate to amazon

In [None]:
browser.get("http://www.amazon.com/")

With Selenium the buttons and boxes on the webpage are treated as objects so we can search for the element and interact with it.  In this case the search box is found and selected (by clicking) then keys are sent to it (simulating typing on the keyboard).  The search button is found and clicked.

In [None]:
search_box = browser.find_element_by_css_selector("input#twotabsearchtextbox")

In [None]:
search_box.click()

In [None]:
search_box.send_keys("alexa")

In [None]:
search_button = browser.find_element_by_css_selector("div.nav-search-submit input")

In [None]:
search_button.click()
time.sleep(3)

The `browser` now contains the new webpage so the objects we want are found using uniqe class name.  This now is a list of web element objects that can be looped through.

In [None]:
products = browser.find_elements_by_css_selector("div.a-section.a-spacing-medium")

In [None]:
products

Simple version of getting data out of the objects.  This code can be modified to at the end scroll down to get the next returned items if this was a endless scroll page

In [None]:
titles_prices = []
for p in products:
    try:
        title_element = p.find_element_by_css_selector("span.a-size-medium.a-color-base.a-text-normal")
        title = title_element.text

        price_whole_element = p.find_element_by_css_selector("span.a-price-whole")
        price = price_whole_element.text

        price_fractional_element = p.find_element_by_css_selector(
            "span.a-price-fraction")
        price += ("." + price_fractional_element.text)
    except selenium.common.exceptions.NoSuchElementException:
        continue
    titles_prices.append((title, price))

In [None]:
titles_prices

Here is a example of opening tabs to look at sub pages.  In this method a new tab is opened, the focus of the driver is switched to that tab (we select the new tab) and the data is fetched.  Then any scraping of data can be done, the new tab is closed and focus is switched back to the first tab.

In [None]:
titles_prices = []
for p in products:
    try:
        title_element = p.find_element_by_css_selector("span.a-size-medium.a-color-base.a-text-normal")
        title = title_element.text
        
        link = p.find_element_by_css_selector('a.a-link-normal').get_attribute('href')
        
        # Open new tab
        browser.execute_script("window.open('');")
        time.sleep(3)

        # Switch to the new window
        browser.switch_to.window(browser.window_handles[1])
        browser.get(link)
        time.sleep(3)
        
        #Here is where you would get info off this page and store it
        
        
        # close the active tab
        browser.close()

        # Switch focus back to main tab
        browser.switch_to.window(browser.window_handles[0])
        
        
    except selenium.common.exceptions.NoSuchElementException:
        continue
    titles_prices.append((title, price))