In [None]:
import selenium
print(selenium.__version__)

# Selenium
When a web application is using javascript to load content, we need more than just the page source to get the data. Then Beautifulsoup is not enough to get us what we need. We need a framework that can interact with the application by:
1. finding buttons
2. clicking buttons
3. fill out and submit forms
4. extract lists of images/links/divs etc.

### What is Selenium?

> Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.
http://docs.seleniumhq.org



## Controlling the Browser with the `selenium` Module

The `selenium` module lets Python directly control the browser by programatically clicking links and filling in login information, almost as though there is a human user interacting with the page. Selenium allows you to interact with web pages in a much more advanced way than Requests and Beautiful Soup; but because it launches a web browser, it is a bit slower and hard to run in the background if, say, you just need to download some files from the Web.


### Starting a Selenium-Controlled Browser

```python
from selenium import webdriver


browser = webdriver.Firefox()
browser.get('http://www.krak.dk')
```

In [None]:
# Example: goto www.cphbusiness.dk and find all the "Erhvervsakademiuddannelser" that are available.
# Selenium
from selenium import webdriver
import bs4
import json

url = 'https://www.cphbusiness.dk'
def cphbusiness_interaction():
    browser = webdriver.Firefox()
    browser.get(url)
    browser.implicitly_wait(2)
    button = browser.find_element_by_id('declineButton')
    button.click()
    button = browser.find_element_by_xpath('/html/body/header/div[2]/div[4]/div/nav/ul/li[1]/a')
    #print(button)
    button.click()
    button = browser.find_element_by_xpath('/html/body/main/div[1]/div/div[1]/div/a')
    button.click()
    edu_buttons = browser.find_elements_by_css_selector('div.tile.tile--small.small-12.medium-12.large-6.columns')
    edu_buttons = browser.find_elements_by_css_selector('p.tile__link.tile__link--small a.icon-arrow-after')
    educations = [b.text for b in edu_buttons]
    return educations, browser.page_source
    
def find_elements(page, selector):
    soup = bs4.BeautifulSoup(page, 'html.parser')
    event_cells = soup.select(selector)
    return event_cells

def print_page(page,file):
    with open(file,'w') as file:
        file.write(json.dumps(page))

        

In [None]:
educations,source = cphbusiness_interaction()
print(educations)
print()
print()
print(source)

### Finding Elements on the Page

WebDriver objects have quite a few methods for finding elements on a page. They are divided into the `find_element_*` and `find_elements_*` methods. The `find_element_*` methods return a single `WebElement` object, representing the first element on the page that matches your query. The `find_elements_*` methods return a list of `WebElement_*` objects for every matching element on the page. For example, in the following are some common methods given, which find multiple elements on the page:


  * `browser.find_elements_by_class_name(name)` ... finds elements that use the CSS class
name
  * `browser.find_elements_by_css_selector(selector)` ... finds elements that match the CSS
selector
  * `browser.find_elements_by_id(id)` ... finds elements with a matching id attribute value
  * `browser.find_elements_by_link_text(text)` ... finds `<a>` elements that completely
match the text provided
  * `browser.find_elements_by_partial_link_text(text)` ... finds `<a>` elements that contain the text
provided
  * `browser.find_elements_by_name(name)` ... finds elements with a matching name attribute value
  * `browser.find_elements_by_tag_name(tagname)` ... finds elements with a matching tag name (case insensitive; an `<a>` element is matched by 'a' and 'A')
  
For more information on finding elements on a page, see http://selenium-python.readthedocs.io/locating-elements.html#
  


### Clicking the Page

`WebElement` objects returned from the `find_element_*` and `find_elements_*` methods have a `click()` method that simulates a mouse click on that element. This method can be used to follow a link, make a selection on a radio button, click a Submit button, or trigger whatever else might happen when the element is clicked by the mouse.

```python
    base_url = 'http://www.krak.dk'
    browser = webdriver.Firefox() 
    browser.get(base_url)
    browser.implicitly_wait(3)

    link_to_persons = browser.find_elements_by_link_text('Personer')
    link_to_persons[0].click()
```


### Filling Out and Submitting Forms
Sending keystrokes to text fields on a web page is a matter of finding the `<input>` or `<textarea>` element for that text field and then calling the `send_keys()` method. 


```python
    base_url = 'http://www.krak.dk'
    browser = webdriver.Firefox() # or use driver = webdriver.PhantomJS() which will do the same without the overhead of a GUI. http://phantomjs.org/download.html
    browser.implicitly_wait(3)

    search_field = browser.find_element_by_name('searchQuery')
    search_field.send_keys('Møller')
    search_field.submit()
```

## Automatically Finding Names, addresses and numbers

In the below script cell, you will observe, that it opens a Firefox window clicks the cookies aproval box, enters a search string (*"Møller"*), clicks the links *"Personer"* to search for persons only, and finally it prints the HTML sources of the page.

## Headless mode in modules
When running selenium in .py module files in a docker container we do not have a GUI. Therefor we use the browser in headless mode to run selenium without the graphical output. See the example [here](http://127.0.0.1:8888/edit/modules/selenium_krak.py)


In [1]:
# brush up form modules/selenium_krak.py
import bs4
from time import sleep
from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
#from selenium.webdriver.firefox.options import Options
#from selenium import webdriver

#from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def get_info(name):
    base_url = 'https://www.krak.dk/'
    
    browser = webdriver.Firefox()
    browser.get(base_url)
    browser.implicitly_wait(3)

    #Cookies approval popup. This will wait for elememt to be visible for 20 seconds or until ready
    WebDriverWait(browser,20).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[1]/div/div/div/div[2]/div/button[1]'))).click()
    # search_field = browser.find_element_by_tag_name('input')
    search_field = browser.find_element_by_name('searchQuery')
    if(search_field):
        print(search_field)
    search_field.send_keys(name)
    search_field.submit()

    sleep(3)

    link_to_persons = browser.find_elements_by_partial_link_text('Personer')[0]

    # an overlay element was obscuring the Persons button
    try:
        link_to_persons.click()
    except:
        try:
            element = browser.find_element_by_xpath("//div[@class='qc-cmp-ui-container qc-cmp-showing']")
            browser.execute_script("arguments[0].style.visibility='hidden'", element)
            # wait for the javascript above to run
            sleep(3)
            link_to_persons.click()
        except:
            print('no such element')
    # wait for the persons to load the AJAX call
    sleep(3)

    page_source = browser.page_source
    # print(page_source)
    soup = bs4.BeautifulSoup(page_source, 'html.parser')
    event_cells = soup.find_all('div', {'class': 'topBlock'})
    # print('event_cells: \n\n',event_cells)
    entries_str = []
    for e in event_cells:
        # print('cell: ',e)
        name = e.select('div h3 a')[0].text
        street = e.select('div>span:nth-child(1)')[0].text
        city = e.select('div>span:nth-child(2)')[0].text
        phone = e.select('span>span>div')[0].text
        samlet = '{}\n{}\n{}\n{}\n'.format(name,street,city,phone)
        #print(samlet)
        # print(element.text)
        entries_str.append(samlet)
    return entries_str

def save_to_file(content, out_path='../data/selenium_krak_output.txt'):
    with open(out_path, 'w') as f:
        f.write(content)
        
entries = get_info('Møller')
# save_to_file('\n\n'.join(entries))


NameError: name 'By' is not defined

In [None]:
entries

## Class exercise
Find a web site to interact with and fill out a form to get some information back.  
Examples could be https://www.jobindex.dk/,    
https://google.com or   
https://www.ikea.com/dk/da/