# Selenium


Where did the names and addresses in the previous exercise come from?

**They are scraped from the web via a small Selenium script**

A similar program, that would allow us to receive all URLs from www.kulturnaut.dk by clicking on the *"Next page..."* links, is here: [selenium_clicker.py](modules/selenium_clicker.py). 


## What is Selenium?

> Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.
http://docs.seleniumhq.org



## Automatically Finding Names, addresses and numbers

If you run the [selenium_krak.py](modules/selenium_krak.py) script from the command line, you will observe, that it opens a Firefox window, enters a search string (*"Møller"*), clicks the links *"Personer"* to search for persons only, and finally it prints the HTML sources of the page.


In [5]:
from modules import selenium_krak
res = selenium_krak.get_info('Møller')

Cookie Button <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="fd33a2b1-2d69-46f3-852a-818573811cd0", element="682b0e53-c2aa-4938-97f4-0302abc7033a")>
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="fd33a2b1-2d69-46f3-852a-818573811cd0", element="ff8c72c8-144f-4d85-b836-b6a7302f707a")>


In [6]:
for person in res:
    print(person)

A Henning Gamborg Møller
 Klostergade 28 
 6760 Ribe 
61 69 03...

A K Møller
 Bregnerødvej 75, st. 0002 
 3460 Birkerød 
75 50 75...

A Møller
 Dalstræde 11 
 Heltborg 
97 95 20...

A Møller
 Vestergade 62, 3. 0410 
 8900 Randers C 
86 45 44...

A Møller
 Jørgensgaardvej 13 
 6240 Løgumkloster 
74 74 36...

A Møller Andersen
 Gammel Holtevej 60 
 Gl Holte 
45 80 47...

A Møller Jensen
 Viborgvej 115, 1. tv 
 Hasle 
60 94 39...2

A Møller Sørensen
 Korsørgade 4, 6. tv 
 2100 København Ø 
35 38 97...

A Porse Møller
 Røddikvej 60 
 8464 Galten 
86 94 66...

Aage Bojsen-Møller
 Noret 3 
 4780 Stege 
55 81 46...

Aage Christian Møller Andersen
 Jordemodervej 7 
 Bislev 
64 66 01...3

Aage Hansen Møller
 Filippavej 38 
 Hundstrup 
62 24 10...

Aage Majbom Møller
 Pilelunden 23 
 5550 Langeskov 
28 59 06...

Aage Martin Møller
 Almind Østergade 15A, 1. tv 
 6051 Almind 
21 48 73...

Aage Møller
 Nordvestvej 5A, st. 
 4300 Holbæk 
20 94 43...2

Aage Møller
 Knudsvej 1 
 8586 Ørum Djurs 
40 1

In [3]:
import selenium
print(selenium.__version__)

3.141.0


In [4]:
'There were {} Møller on the first page'.format(len(res)) 

'There were 25 Møller on the first page'

## Controlling the Browser with the `selenium` Module

The `selenium` module lets Python directly control the browser by programatically clicking links and filling in login information, almost as though there is a human user interacting with the page. Selenium allows you to interact with web pages in a much more advanced way than Requests and Beautiful Soup; but because it launches a web browser, it is a bit slower and hard to run in the background if, say, you just need to download some files from the Web.


### Starting a Selenium-Controlled Browser

```python
from selenium import webdriver


browser = webdriver.Firefox()
browser.get('http://www.krak.dk')
```

### Finding Elements on the Page

WebDriver objects have quite a few methods for finding elements on a page. They are divided into the `find_element_*` and `find_elements_*` methods. The `find_element_*` methods return a single `WebElement` object, representing the first element on the page that matches your query. The `find_elements_*` methods return a list of `WebElement_*` objects for every matching element on the page. For example, in the following are some common methods given, which find multiple elements on the page:


  * `browser.find_elements_by_class_name(name)` ... finds elements that use the CSS class
name
  * `browser.find_elements_by_css_selector(selector)` ... finds elements that match the CSS
selector
  * `browser.find_elements_by_id(id)` ... finds elements with a matching id attribute value
  * `browser.find_elements_by_link_text(text)` ... finds `<a>` elements that completely
match the text provided
  * `browser.find_elements_by_partial_link_text(text)` ... finds `<a>` elements that contain the text
provided
  * `browser.find_elements_by_name(name)` ... finds elements with a matching name attribute value
  * `browser.find_elements_by_tag_name(tagname)` ... finds elements with a matching tag name (case insensitive; an `<a>` element is matched by 'a' and 'A')
  
For more information on finding elements on a page, see http://selenium-python.readthedocs.io/locating-elements.html#
  


### Clicking the Page

`WebElement` objects returned from the `find_element_*` and `find_elements_*` methods have a `click()` method that simulates a mouse click on that element. This method can be used to follow a link, make a selection on a radio button, click a Submit button, or trigger whatever else might happen when the element is clicked by the mouse.

```python
    base_url = 'http://www.krak.dk'
    browser = webdriver.Firefox() 
    browser.get(base_url)
    browser.implicitly_wait(3)

    link_to_persons = browser.find_elements_by_link_text('Personer')
    link_to_persons[0].click()
```


### Filling Out and Submitting Forms
Sending keystrokes to text fields on a web page is a matter of finding the `<input>` or `<textarea>` element for that text field and then calling the `send_keys()` method. 


```python
    base_url = 'http://www.krak.dk'
    browser = webdriver.Firefox() # or use driver = webdriver.PhantomJS() which will do the same without the overhead of a GUI. http://phantomjs.org/download.html
    browser.implicitly_wait(3)

    search_field = browser.find_element_by_name('searchQuery')
    search_field.send_keys('Møller')
    search_field.submit()
```

## Class exercise
Find a web site to interact with and fill out a form to get some information back.  
Examples could be https://www.jobindex.dk/,    
https://google.com or   
https://www.ikea.com/dk/da/