# Selenium


Where did the names and addresses in the previous exercise come from?

**They are scraped from the web via a small Selenium script**

A similar program, that would allow us to receive all URLs from www.kulturnaut.dk by clicking on the *"Next page..."* links, is here: [selenium_clicker.py](selenium_clicker.py). 


## What is Selenium?

> Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.
http://docs.seleniumhq.org



## Install Selenium

```bash
conda install selenium
```
or
```bash
pip install selenium
```
#### Download and register the geckodriver
[see short guide here](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path)

## Automatically Finding Names, addresses and numbers

If you run the [selenium_krak.py](selenium_krak.py) script from the command line, you will observe, that it opens a Firefox window, enters a search string (*"Møller"*), clicks the links *"Personer"* to search for persons only, and finally it prints the HTML sources of the page.


In [5]:
import selenium_krak
res = selenium_krak.get_info('Hartmann')
res

[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="81f87395-e0a1-4d03-9894-eff02170c92d", element="770a3acb-6beb-4a83-9d24-63654033c03c")>]


['A Hartmann\n Præstehusene 124 \n 2620 Albertslund \n43 45 25...\n',
 'Aage Frederik Hartmann Ditlevsen\n Højmose Vænge 25, 2. tv \n 2970 Hørsholm \n29 82 73...\n',
 'Aase Hartmann Kromann-Olsen\n Elmholt 11 \n 9560 Hadsund \n61 81 03...\n',
 'Adam Hartmann-Petersen\n Enghavevej 28 \n 5290 Marslev \n29 89 81...\n',
 'Agnethe Hartmann Hansen\n Skovsyrevej 66 \n 4700 Næstved \n40 63 19...\n',
 'Aiya Reginalda Hartmann Jensen\n Mosestræde 26 \n Thurø \n21 33 29...\n',
 'Alberte Hartmann Bertelsen\n Gugvej 123 \n 9210 Aalborg SØ \n25 80 90...\n',
 'Alex Hartmann\n Fuglebakken 9 \n Brylle \n22 94 00...\n',
 'Alex Hartmann\n Skæring Havvej 129 \n 8250 Egå \n40 32 02...\n',
 'Alexander Hartmann\n Strandgaardsvej 38 \n 7120 Vejle Øst \n26 17 61...3\n',
 'Alexander Sune Hartmann\n Inger Christensens Gade 24, 3. tv \n 8220 Brabrand \n42 63 99...\n',
 'Alexandra Tatiana-Louise Hartmann\n Enighedsvej 6E, 4. tv \n 2920 Charlottenlund \n42 26 06...\n',
 'Alf Hartmann\n Ølandsvej 35 \n 4681 Herfølge

In [4]:
'There were {} Hartmanns on the first page'.format(len(res)) 

'There were 25 Hartmanns on the first page'

## Controlling the Browser with the `selenium` Module

The `selenium` module lets Python directly control the browser by programatically clicking links and filling in login information, almost as though there is a human user interacting with the page. Selenium allows you to interact with web pages in a much more advanced way than Requests and Beautiful Soup; but because it launches a web browser, it is a bit slower and hard to run in the background if, say, you just need to download some files from the Web.


### Starting a Selenium-Controlled Browser

```python
from selenium import webdriver


browser = webdriver.Firefox()
browser.get('http://www.krak.dk')
```

### Finding Elements on the Page

WebDriver objects have quite a few methods for finding elements on a page. They are divided into the `find_element_*` and `find_elements_*` methods. The `find_element_*` methods return a single `WebElement` object, representing the first element on the page that matches your query. The `find_elements_*` methods return a list of `WebElement_*` objects for every matching element on the page. For example, in the following are some common methods given, which find multiple elements on the page:


  * `browser.find_elements_by_class_name(name)` ... finds elements that use the CSS class
name
  * `browser.find_elements_by_css_selector(selector)` ... finds elements that match the CSS
selector
  * `browser.find_elements_by_id(id)` ... finds elements with a matching id attribute value
  * `browser.find_elements_by_link_text(text)` ... finds `<a>` elements that completely
match the text provided
  * `browser.find_elements_by_partial_link_text(text)` ... finds `<a>` elements that contain the text
provided
  * `browser.find_elements_by_name(name)` ... finds elements with a matching name attribute value
  * `browser.find_elements_by_tag_name(tagname)` ... finds elements with a matching tag name (case insensitive; an `<a>` element is matched by 'a' and 'A')
  
For more information on finding elements on a page, see http://selenium-python.readthedocs.io/locating-elements.html#
  


### Clicking the Page

`WebElement` objects returned from the `find_element_*` and `find_elements_*` methods have a `click()` method that simulates a mouse click on that element. This method can be used to follow a link, make a selection on a radio button, click a Submit button, or trigger whatever else might happen when the element is clicked by the mouse.

```python
    base_url = 'http://www.krak.dk'
    browser = webdriver.Firefox() 
    browser.get(base_url)
    browser.implicitly_wait(3)

    link_to_persons = browser.find_elements_by_link_text('Personer')
    link_to_persons[0].click()
```


### Filling Out and Submitting Forms
Sending keystrokes to text fields on a web page is a matter of finding the `<input>` or `<textarea>` element for that text field and then calling the `send_keys()` method. 


```python
    base_url = 'http://www.krak.dk'
    browser = webdriver.Firefox() # or use driver = webdriver.PhantomJS() which will do the same without the overhead of a GUI. http://phantomjs.org/download.html
    browser.implicitly_wait(3)

    search_field = browser.find_element_by_name('searchQuery')
    search_field.send_keys('Møller')
    search_field.submit()
```

## Class exercise
Find a web site to interact with and fill out a form to get some information back.  
Examples could be https://www.jobindex.dk/,    
https://google.com or   
https://www.ikea.com/dk/da/