# Selenium Tutorial
## By Travis Cossarini

### Table of Contents

Selenium is a very powerful tool for the automation of web based testing. It is also extremely useful in the creation of web crawlers and automation of web based tasks.

In this notebook we will learn how to use Selenium for:
<ol>
    <li>Setup</li>
    <li>Basics</li>
    <li>Website Interaction</li>
    <li>Scraping</li>
    <li>Automated Testing</li>
        
</ol>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Selenium-Tutorial" data-toc-modified-id="Selenium-Tutorial-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Selenium Tutorial</a></span><ul class="toc-item"><li><span><a href="#By-Travis-Cossarini" data-toc-modified-id="By-Travis-Cossarini-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>By Travis Cossarini</a></span><ul class="toc-item"><li><span><a href="#Table-of-Contents" data-toc-modified-id="Table-of-Contents-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Table of Contents</a></span></li></ul></li></ul></li><li><span><a href="#Setup" data-toc-modified-id="Setup-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Setup</a></span></li><li><span><a href="#Basics" data-toc-modified-id="Basics-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Basics</a></span></li><li><span><a href="#Website-Interaction" data-toc-modified-id="Website-Interaction-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Website Interaction</a></span></li><li><span><a href="#Scraping" data-toc-modified-id="Scraping-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Scraping</a></span></li><li><span><a href="#Automated-Testing" data-toc-modified-id="Automated-Testing-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Automated Testing</a></span></li></ul></div>

# Setup

To start you will need to install both Selenium (Pip or Conda) and a webdriver (we'll be using Chrome). See below for a tutorial on setting up the webdriver:
https://www.youtube.com/watch?v=dz59GsdvUF8

You can use any webdriver/broswer combo you would like, the syantax is largely the same.

After adding the folder to PATH, you will need to restart your computer for the change to take effect.

An alternative to adding the webdriver to your path is the specify the location of the webdriver file in the instantiation of each instance. However, this is not what I have opted for in this tutorial.

# Basics

In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

Lets do a google search for a selenium tutorial.

In [3]:
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://google.com") # opens google

search_bar = driver.find_element_by_name("q")
search_bar.clear()
search_bar.send_keys("Python Selenium youtube tutorial") # enters the text
search_bar.submit() #submits request
time.sleep(2)
driver.find_element_by_xpath('//div[@class="r"]/a/h3').click()

time.sleep(10)
driver.close()

There we go, lesson over lol.

For our next magic trick, lets navigate to the lovely QMIND website. 

This cell will open Chrome, go to the QMIND address, print the title of the site and then close after 5 seconds.

Selenium also provides functions for implicit and explicit waiting, as an alterantive to the time library. These are extremely useful when dealing with stale requests. For this tutorial I am using the time library because it is simpler and there are no actions that suffer from common stale requests.

It is best practice to close the driver instance after every use.

In [4]:
driver = webdriver.Chrome() # initialize the driver
driver.maximize_window() # Makes the window go full screen
driver.get("https://qmind.ca/") # Opens a new chrome window
print(driver.title)
time.sleep(5)
driver.close()

QMIND – Queens AI Hub


This can also be done using a "headless" browser, which just means that the browser window won't actually open. Obviously useful for limiting clutter on your computer.

In [5]:
opts = Options()  
opts.add_argument("--headless") 
driver = webdriver.Chrome(options=opts) # initialize the driver
driver.maximize_window() # Makes the window go full screen
driver.get("https://qmind.ca/") # Opens a new chrome window
print(driver.title) # Grabs the title of the website
driver.close()

QMIND – Queens AI Hub


Since the QMIND title printed, it is clear that the webdriver worked, it just didn't create a visible window. This is nice so your computer screen does not become cluttered with Selenium windows.

By reading the HTML of any website, you can use Selenium to navigate and extract/enter information. Xpaths are the most versatile way to do this, but there are less complicated ways as well.
https://www.guru99.com/xpath-selenium.html

In most cases, you will want to use Selenium to input information, becuase other packages often do a better job of scraping, such as BS4.

To find xpaths, simply "inspect" and element and then right click the respective HTML. Select "copy" and then "xpath". 

If that doesn't make sense to you just google "How to find the xpath of website elements"

# Website Interaction

If this code throws an error, it is most likely because the Xpaths have changed on the website. In order to fix this, simply switch the Xpath for username, password and login below.

In [6]:
driver = webdriver.Chrome() # initialize the driver
driver.maximize_window() # Makes the window go full screen
driver.get("https://twitter.com/login") 

username_xpath = '//*[@id="react-root"]/div/div/div[2]/main/div/div/div[1]/form/div/div[1]/label/div/div[2]/div/input'
password_xpath = '//*[@id="react-root"]/div/div/div[2]/main/div/div/div[1]/form/div/div[2]/label/div/div[2]/div/input'
login_button_xpath = '//*[@id="react-root"]/div/div/div[2]/main/div/div/div[1]/form/div/div[3]/div/div'

username_element = driver.find_element_by_xpath(username_xpath)
password_element = driver.find_element_by_xpath(password_xpath)
login_element = driver.find_element_by_xpath(login_button_xpath)

username_element.send_keys('Placeholder username')
password_element.send_keys('Placeholder password')
time.sleep(2)
login_element.click()

#Can also use Selenium to take screenshots
driver.save_screenshot('screenshot.png')

time.sleep(3)
driver.close()

WebDriverException: Message: chrome not reachable
  (Session info: chrome=84.0.4147.105)


Pretty cool. Be careful not to hit websites too frequently as you can get IP banned.

You can also do some "mouse" manipulation using Selenium:
https://www.pluralsight.com/guides/web-scraping-with-selenium

I have yet to find an instance where this is necesary, but it is really cool.

# Scraping

Clearly, Selenium is useful in scraping applications where some level of interaction with the website is required. In most cases it is simply better to use BS4 for any scraping needs, while using Selenium to access any parts of the website that require browser interaction.

You can find all elements by a tag such as 'p'.

driver.find_all_elements_by_tag_name('p')

But it is just simpler to use the BS4 parser in my opinion. Below, I scrpae the entire page source from QMIND, which can then be fed into a parser.

In [7]:
opts = Options()
opts.add_argument('--headless')
opts.add_argument('--incognito')
driver = webdriver.Chrome(options = opts)
driver.maximize_window()

# Grabs the entire page source
driver.get("https://QMIND.ca")
source = driver.page_source
print(source)

<html lang="en-US" data-semplice="5.1.1" class=" "><head>
		<meta charset="UTF-8">
		<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0">
		<title>QMIND – Queens AI Hub</title>
<link rel="dns-prefetch" href="//s.w.org">
		<script type="text/javascript" async="" src="https://www.google-analytics.com/analytics.js"></script><script type="text/javascript">
			window._wpemojiSettings = {"baseUrl":"https:\/\/s.w.org\/images\/core\/emoji\/13.0.0\/72x72\/","ext":".png","svgUrl":"https:\/\/s.w.org\/images\/core\/emoji\/13.0.0\/svg\/","svgExt":".svg","source":{"concatemoji":"https:\/\/qmind.ca\/wp-includes\/js\/wp-emoji-release.min.js?ver=5.5"}};
			!function(e,a,t){var r,n,o,i,p=a.createElement("canvas"),s=p.getContext&&p.getContext("2d");function c(e,t){var a=String.fromCharCode;s.clearRect(0,0,p.width,p.height),s.fillText(a.apply(this,e),0,0);var r=p.toDataURL();return s.clearRect(0,0,p.width,p.height),s.fillText(a.apply(this,t),0,0),r===p.toDataURL()}func

# Automated Testing

Seleniums main industrial use is in automated testing of website functionality. This is not directly related to data science, but here is a tutorial if you are interested:

https://www.youtube.com/watch?v=_JNeiGbAgL4