# Selenium

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Webdriver" data-toc-modified-id="Webdriver-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Webdriver</a></span></li><li><span><a href="#The-use-case" data-toc-modified-id="The-use-case-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>The use case</a></span><ul class="toc-item"><li><span><a href="#Access-application-form" data-toc-modified-id="Access-application-form-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Access application form</a></span></li><li><span><a href="#Fill-in-the-fields" data-toc-modified-id="Fill-in-the-fields-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Fill in the fields</a></span></li><li><span><a href="#Solve-captcha" data-toc-modified-id="Solve-captcha-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Solve captcha</a></span></li><li><span><a href="#Choose-option" data-toc-modified-id="Choose-option-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Choose option</a></span></li></ul></li><li><span><a href="#Modularization-is-key!" data-toc-modified-id="Modularization-is-key!-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Modularization is key!</a></span></li><li><span><a href="#HAPPY-CRAWLING!!" data-toc-modified-id="HAPPY-CRAWLING!!-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>HAPPY CRAWLING!!</a></span></li></ul></div>

`selenium` is a Python library that lets me surf the Internet automatically

In [None]:
!pip install selenium

In [1]:
from selenium import webdriver

A robot is veeeery fast. We need to calm him down, because the Internet has not got infinite velocity

In [2]:
from time import sleep

In [3]:
print("hola")
sleep(3)
print("adios")

hola
adios


## Webdriver

I need a Chrome **webdriver** to let Python use Chrome browser: [Link](https://chromedriver.chromium.org/downloads)

Lets initialize the robot and play with it

In [148]:
driver = webdriver.Chrome("./chromedriver")

Find something on 20minutos

In [149]:
driver.get("https://www.20minutos.es")
sleep(1)

To find by tag and class, we use syntax "tag.class". If class name has spaces, change them by "."

[CSS selectors documentation](https://www.w3schools.com/cssref/css_selectors.asp)

In [48]:
# click on Search button
boton = driver.find_element_by_css_selector("i.fal.fa-search")

In [49]:
boton

<selenium.webdriver.remote.webelement.WebElement (session="db6021275489269892eefe204ec71e0a", element="be9600b6-37fa-418e-baa1-16021dcbb8d4")>

In [50]:
boton.click()

In [51]:
# write my search text
driver.find_element_by_css_selector("input[name='q']").send_keys("Pedro")
sleep(1)

In [46]:
# click again on search button
driver.find_elements_by_css_selector("i.fal.fa-search")[1].click()

Find something on google

In [53]:
driver.get("https://www.google.com")

In [54]:
buscador = driver.find_element_by_class_name("gLFyf.gsfi")

In [56]:
buscador.send_keys("ironhack")

In [57]:
from selenium.webdriver.common.keys import Keys

In [58]:
buscador.send_keys(Keys.ENTER)

## The use case

I need an appointment at Seguridad Social

In [150]:
url = "https://w6.seg-social.es/ProsaInternetAnonimo/OnlineAccess?ARQ.SPM.ACTION=LOGIN&ARQ.SPM.APPTYPE=SERVICE&ARQ.IDAPP=XV106001"

### Access application form

In [151]:
driver.get(url)

 * Web scraping lets us access the static HTML of the webpage
 * Web **crawling** lets us interact dinamycally with the browser!

In [152]:
driver.find_element_by_css_selector("input[value='INSS']").click()

### Fill in the fields

In [153]:
user_data = {
    "name": "Juanito López Ramírez",
    "dni": "12121211G",
    "telf": "666577794",
    "mail": "juanito99@gmail.com",
    "ciudad": "malaga"
}

Fill it with your information using `.send_keys()`

Lets create a function with all the previous functionality

In [95]:
def fill_fields(user_data):
    driver.find_element_by_name("nombreApellidos").send_keys(user_data.get("name"))
    driver.find_element_by_id("tipo").send_keys("NIF")
    driver.find_element_by_name("numeroDocumento").send_keys(user_data.get("dni"))
    driver.find_element_by_name("telefono").send_keys(user_data.get("telf"))
    driver.find_element_by_name("eMail").send_keys(user_data.get("mail"))
    driver.find_element_by_id("radioProvincia").click()
    driver.find_element_by_id("provincia").send_keys(user_data.get("ciudad"))

In [154]:
fill_fields(user_data)

In [155]:
# TODO manage citas por código postal

### Solve captcha

`.text` is used to access a tag's textual content

In [156]:
import random

In [157]:
words = driver.find_element_by_css_selector("p.p0").text.split(": ")[:-1]

In [130]:
words

['Persianas', 'Basurero', 'Veinte', 'Martillo', 'Barcelona']

In [132]:
word = random.choice(words)

In [133]:
driver.find_element_by_id("ARQ.CAPTCHA").send_keys(word)

In [134]:
driver.find_element_by_id("SPM.ACC.SIGUIENTE").click()

Build while loop until passed

In [158]:
def we_passed():
    try:
        driver.find_element_by_css_selector("li.mensajeError")
        return False
    except:
        return True

Build a while loop to make sure we passed

In [159]:
def solve_captcha():

    while True:
        print("\n INTENTO")
        words = driver.find_element_by_css_selector("p.p0").text.split(": ")[:-1]
        print(words)
        word = random.choice(words)
        print(word)
        driver.find_element_by_id("ARQ.CAPTCHA").send_keys(word)
        driver.find_element_by_id("SPM.ACC.SIGUIENTE").click()

        sleep(2)

        if we_passed():
            break

In [160]:
solve_captcha()


 INTENTO
['Diecisiete', 'Asistenta', 'Martillo', 'Corona', 'Gris']
Diecisiete


### Choose option

Build while loop until passed

In [146]:
def we_passed_second_step():
    try:
        driver.find_element_by_css_selector("li.mensajeCpmsTam3")
        return False
    except:
        return True

In [162]:
def choose_option():
    while True:
        sleep(2)
        driver.find_element_by_id("335").click()
        driver.find_element_by_id("SPM.ACC.CONTINUAR_TRAS_SELECCIONAR_SERVICIO").click()
        print("otra")
        if we_passed_second_step():
            print("passed")
            break

In [None]:
choose_option()

## Modularization is key!

Since we have several functions for several actions, we just concatenate them in a new function!

In [164]:
driver = webdriver.Chrome("./chromedriver")

In [176]:
def run_process():
    url = "https://w6.seg-social.es/ProsaInternetAnonimo/OnlineAccess?ARQ.SPM.ACTION=LOGIN&ARQ.SPM.APPTYPE=SERVICE&ARQ.IDAPP=XV106001"
    # access webpage
    driver.get(url)
    driver.find_element_by_css_selector("input[value='INSS']").click()
    sleep(2)

    # fill fields
    print("Filling fields")
    fill_fields(user_data)
    
    # solve captcha
    print("Solving captcha")
    solve_captcha()
    sleep(2)
    
    # choose option
    print("Choosing option")
    choose_option()
    
    play_beep(5)

In [177]:
run_process()

Filling fields
Solving captcha

 INTENTO
['Dos', 'Pórtico', 'Melón', 'Grifo', 'Veinticuatro']
Grifo

 INTENTO
['Pomelo', 'Plutón', 'Verde', 'Nueve', 'Helicóptero']
Verde
Choosing option
otra
passed


You can setup an alarm when process is finished, using library pygame

In [170]:
import pygame

pygame 2.0.0 (SDL 2.0.12, python 3.8.5)
Hello from the pygame community. https://www.pygame.org/contribute.html


In [175]:
def play_beep(n=10):
    pygame.mixer.init()
    pygame.mixer.music.load("./beep.wav")
    
    for _ in range(n):
        pygame.mixer.music.play()
        sleep(1)

The program would work like...

In [None]:
if run_process():
    music.play()
    break

## HAPPY CRAWLING!!

Aware of `robots.txt` anticrawling documentation from webpages:  
https://amazon.com/robots.txt  
https://google.com/robots.txt