# Navigating the Web with Selenium
*Curtis Miller*

Selenium is effectively a script-controlled web browser. Most of the thing you can do on a webpage as a human, you can do with Selenium.

This notebook demonstrates how we can browse the web like a human using Selenium.

**Note: The websites visited here could change, which might break this code.**

In [None]:
from selenium import webdriver
from time import sleep
from selenium.webdriver.common.keys import Keys    # Useful to send more exotic key press

In [None]:
path = "chromedriver"
driver = webdriver.Chrome(executable_path=path)

In [None]:
# Visit Google
driver.get("https://www.google.com")
sleep(5)
# Get the textbox for queries
txtSearch = driver.find_element_by_id("lst-ib")    # Locate a page element by its id value
# Get the button to click to start a search
btnSearch = driver.find_element_by_name("btnK")    # Locate an element by its name attribute value

# Send a search to Google
txtSearch.send_keys("selenium")
sleep(2)
txtSearch.send_keys(Keys.ESCAPE)
sleep(3)
# Click to search
btnSearch.click()
sleep(15)

Now let's start building a set of related search terms to the one that we tried. We do the following:

1. Grab the `div` containing the search terms
2. Add those terms to the `set` (chosen to prevent duplication)
3. Click the link for one of those search terms
4. Grab a new list of search terms (but don't click them) and add them to the set
5. Go back to the previous page
6. Click the next link

We use [XPath syntax](https://en.wikipedia.org/wiki/XPath) to find some elements.

(This code may not be the best approach, but it is illustrative.)

In [None]:
# Find a div using XPath syntax
# We should be on a new webpage
# This div contains related search terms to the one we passed

lst_aRes = driver.find_elements_by_xpath("//div[@id='extrares']//a")    # Find the div with id "extrares", then all links
                                                                        # contained in the div
num_common = len(lst_aRes)

terms = set()
for i in range(num_common):    # Go through this list (not wise to use the list itself here)
    terms.add(lst_aRes[i].text)    # Add the term to the list
    lst_aRes[i].click()    # Go to that page
    sleep(5)
    lst_aChildRes = driver.find_elements_by_xpath("//div[@id='extrares']//a")
    # Add all child terms to the set
    for a in lst_aChildRes:
        terms.add(a.text)
    driver.back()    # One step back in the browser history
    lst_aRes = driver.find_elements_by_xpath("//div[@id='extrares']//a")
    sleep(2)

A better approach may have grabbed all the links we wanted to visit then visited them in the browser. We could have assembled such a list like so.

In [None]:
links = list()
for a in lst_aRes:
    links.append(a.get_attribute("href"))    # Get the destination of the link

links

Nevertheless, we have a collection of related search terms.

In [None]:
terms

A cool feature of Selenium is the ability to take screenshots. Here's a screenshot of the driver as it currently is.

In [None]:
from IPython.display import Image    # Just to view images

In [None]:
driver.get_screenshot_as_file("screenshot.png")    # Save a screenshot of the current view
Image("screenshot.png")

In [None]:
driver.close()    # We're done