# Selenium

Selenium (https://www.seleniumhq.org/) automates browsers. Primarily, it is for automating web applications for testing purposes, but it not limited to just that. Boring web-based tasks can (and should!) be automated.

In the code cell below, the statement
```python
from selenium import webdriver
```
is used to import the webdriver, which is always necessary for automating browsing.

In [3]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

### Using selenium to open a website

We first need to create an object for the web driver, which we use to open the page. In our class we use the Firefox webdriver, but others are available (see section 1.5 here: https://selenium-python.readthedocs.io/installation.html)

Note that on a school computer, you will need to specify the executable path to the webdriver, but this (likely) will not be the case on your personal computer.

```python
driver = webdriver.Firefox(executable_path='C:\geckodriver\geckodriver.exe')
```

Create the web driver object that controls the browser; this will open a Firefox brower with an empty url.

In [5]:
driver = webdriver.Safari()

To browse to a page, simply use the *driver.get* method and specify the URL.

In [6]:
driver.get('http://www.easternct.edu')

### Locating elements

In order to find the *first* element matching a particular *id*, *tag name*, etc, use the *find_element* method:

```python
find_element(By.TAG_NAME, value)
```

In order to find multiple elements that match, use the *find_elements* method, which returns a list:

```python
find_elements(By.TAG_NAME, value)
```

These methods can be called using the *driver* or any selenium web element.

The first argument can be any one of the following, and the second argument is the corresponding *value* to search for:

- By.ID 
- By.XPATH
- By.LINK_TEXT
- By.PARTIAL_LINK_TEXT
- By.NAME
- By.TAG_NAME
- By.CLASS_NAME
- By.CSS_SELECTOR

Note: For CLASS_NAME, any elements with that class will be returned (even if the element contains multiple classes). 

Note: if no elements exist, a *NoSuchElementException* will be raised.

The code below finds the first *ul* element on the page, which holds the list of menu items in the header of the page. The element is stored in a *webelement* object.

In [7]:
ul = driver.find_element(By.TAG_NAME, 'ul')
ul

<selenium.webdriver.remote.webelement.WebElement (session="E59A7E23-1F14-43BE-824C-9920E2B4E742", element="node-3B869258-B10A-4897-ACD3-93F797455C51")>

### Extracting text from elements
To extract text from an element, simply access its *text* field.

In [8]:
list_items = ul.find_elements(By.TAG_NAME, 'li')
for li in list_items :
    print(li.text)

Apply
Visit
Request Info
Give


### Clicking on an element

You can click on an element using the *click* method. Note that you will get an error if the element cannot be clicked. For example, this happens if you run the cell below twice.

In [9]:
searchButton = driver.find_element(By.ID, 'search-button')
searchButton.click()

### Adding text to an input

The *send_keys* method can be used to add text to an input. Here we add "How are you" to the search input that is now visible because we clicked on the search icon.

In [10]:
elem = driver.find_element(By.ID, 'q')
elem.send_keys("How are you?")

We can clear input using the *clear* method.

In [11]:
elem.clear()

Let's search for "Computer Science", by entering the text and then pressing the *Enter* key.

The statement
```python
from selenium.webdriver.common.keys import Keys
```
is needed so that we can simulate a user hitting the ENTER (RETURN) button.

In [12]:
from selenium.webdriver.common.keys import Keys
elem.send_keys('Computer Science')
elem.send_keys(Keys.RETURN)

### Getting the value of an attribute

The method *get_attribute* can be used to get the value of an attribute of an element. Here we get all links on the page, and display the text of the link as well as the URL (the *href* attribute).

In [13]:
links = driver.find_elements(By.TAG_NAME, 'a')
for l in links :
        text = l.text
        if l.text != '':
            print(text, l.get_attribute('href'), sep = ': ')


    		Skip to Main Site Navigation
    	: https://www.easternct.edu/search/index.html?cx=015963256972996848925%3A1fqh-ftnmxc&q=Computer+Science&ie=UTF-8&sa=#skipToTopNav

    	   Skip to Content
    	: https://www.easternct.edu/search/index.html?cx=015963256972996848925%3A1fqh-ftnmxc&q=Computer+Science&ie=UTF-8&sa=#skipToContent

    	    Skip to Footer
    	: https://www.easternct.edu/search/index.html?cx=015963256972996848925%3A1fqh-ftnmxc&q=Computer+Science&ie=UTF-8&sa=#skipToFooter
Read More: https://www.easternct.edu/emergency-alerts/index.html
Apply: https://www.easternct.edu/admissions/apply/apply-first-year.html
Visit: https://www.easternct.edu/admissions/visit.html
Request Info: https://www44.student-1.com/OnlineFormsECSU/form-direct.aspx?Form=INQUIRY
Give: https://www.easternct.edu/give/give-to-eastern.html

					
					
				: https://www.easternct.edu/index.html
Apply: https://www.easternct.edu/admissions/apply/apply-first-year.html
Visit: https://www.easternct.edu/admission

searchSearch for Computer Science on Google: https://www.google.com/search?client=ms-google-coop&q=Computer+Science&cx=015963256972996848925:1fqh-ftnmxc

						Facebook
						
					: https://www.facebook.com/EasternCTStateUniversity

						Twitter
						
					: https://twitter.com/EasternCTStateU

						Instagram
						
					: https://www.instagram.com/easternctstateuniv/

							LinkedIn
						
					: https://www.linkedin.com/school/eastern-connecticut-state-university/

						YouTube
						
					: https://www.youtube.com/EasternConnecticutStateUniversityVideo
Contact Us: https://www.easternct.edu/connect-with-us/contact-us.html
Maps: https://www.easternct.edu/maps/index.html
ADA Issues: https://www.easternct.edu/ada-issues.html
Emergency: https://www.easternct.edu/emergency-alerts/index.html
Jobs at Eastern: https://www.easternct.edu/human-resources/job-opportunities.html
Disclaimer: https://www.easternct.edu/university-disclaimer.html
Cookie Policy: https://www.easternct.edu/cookie-

### Searching by link text

- use the *By.LINK_TEXT* option to search for elements whose link text is an exact match
- use the *By.PARTIAL_TEXT* to search for elements whose link text *contains* the text

Note: *text* here refers to the text value of the element, which can contain the *text* from more than one tag, as is the case for the last link in the second example.

In [14]:
driver.find_element(By.LINK_TEXT, 'CONTACT US')

<selenium.webdriver.remote.webelement.WebElement (session="E59A7E23-1F14-43BE-824C-9920E2B4E742", element="node-FF37FE87-F618-4BC8-8ED8-223310AF7E75")>

In [15]:
cs_links = driver.find_elements(By.PARTIAL_LINK_TEXT, 'Computer')
for cs in cs_links :
    print(cs.text)

The Computer Science Program
Computer Science - Eastern
Computer Science Major - Eastern
Computer Science Minor - Eastern
Computer Science Major: BS Degree Requirements - Eastern
Computer Science Major: Course Sequence - Eastern
Computer Science Major Learning Outcomes - Eastern
Computer Science Major: (before Fall 2017) BS Degree Requirements
Computer Engineering Science Minor - Eastern


### Close the driver

Close the driver when you are done.

In [16]:
driver.close()

### Headless browsers and screenshots

It is possible to make a browser *headless* (meaning the browser no longer has a GUI and you therefore will not see it), by setting *options* as in the code below. You can also save a screenshot of, which is commonly done with testing. 

In [20]:
# configure headless browser
from selenium.webdriver.safari.options import Options
options = Options()
options.headless = True
print('configuring headless browser ...')
driver = webdriver.safari(options=options)

# go to Google News and take a screenshot
print('opening http://news.google.com ...')
driver.get('http://news.google.com')

print('take a screenshot ...')
driver.save_screenshot('google_news.png')

# close the browser
print('close the browser...')
driver.close()

print('done!')

ModuleNotFoundError: No module named 'selenium.webdriver.safari.options'

In [22]:
from selenium.webdriver.safari.options import Options

ModuleNotFoundError: No module named 'selenium.webdriver.safari.options'

### Searching by xpath

Xpath uses path expressions to select nodes in an XML (or HTML) document. For more information, see: https://www.w3schools.com/xml/xpath_syntax.asp. In some cases, specifying the *xpath* may be more intuitive and/or more powerful.

In [23]:
driver = webdriver.Safari()
driver.get('http://www.easternct.edu')

Here we use a CSS selector to get the 3rd list item inside of the *div* with class 'main-menu-bg'.

In [24]:
info_link = driver.find_element(By.CSS_SELECTOR, 'div.main-menu-bg li:nth-child(3)')
info_link.text

'Request Info'

We can do the same thing using xpath. Note that we use the following:

- two slashes (//) says to search starting from the current node (if you use a single slash, then the path must match exactly)
- you can look for an attribute using [@attribute = value] (an exact match is required)
- element[n] will match the *nth* element

Note: @class= identifies classes that match exactly; if the element could contain multiple classes, you shouls use the *contains* xpath function (see link below).

In [25]:
info_link = driver.find_element(By.XPATH, '//div[@class="main-menu-bg"]//li[3]')
info_link.text

'Request Info'

In general, anything you can match using a CSS SELECTOR can also be matched by specifying an XPATH. But XPATH also allows for "other things", such as text matches, that could not be specified otherwise. See https://devhints.io/xpath

In [26]:
ugr = driver.find_element(By.XPATH, '//h3[(text() = "Undergraduate Research")]/..')
ugr.text

'\n        \t\t\t\t\t\t\t        \t\t\t\t\t\t\t    Bahamas\n        \t\t\t\t\t\t\t        \t\t\t\t\t\t\t                            \t\t    Undergraduate Research\n                            \t\t        \t\t\t\t\t\t\t\n        \t\t\t\t\t\t\t            \t\t\t\t\t\t\t\t    Eastern undergraduates conduct faculty-mentored research in every major—everything from genetics\nto theater set design.\n        \t\t\t\t\t\t\t\t        \t\t\t\t\t\t\t\t        \t\t\t\t\t\t\t\t    Learn More\n        \t\t\t\t\t\t\t\t        \t\t\t\t\t\t\t\n        \t\t\t\t\t\t'

But in order to get the summary, we need to click on it, since the summary is currently not displayed.

In [27]:
ugr.click()

ElementNotInteractableException: Message: 


In [28]:
print(ugr.text)


        							        							    Bahamas
        							        							                            		    Undergraduate Research
                            		        							
        							            								    Eastern undergraduates conduct faculty-mentored research in every major—everything from genetics
to theater set design.
        								        								        								    Learn More
        								        							
        						


### Exercise: 
Search for a movie on IMDB and go to the page for the first result by *clicking* on the link. 
Can you extract the title and rating?

Note: It is important to sleep for a second or two between carrying out the search and going to the first result. 


In [None]:
import time