# Web scraping

# First example
## Importation of module parts 

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys #used to simulate some keyboard keys (Alt, Tab, etc.)
from selenium.webdriver.common.by import By #used to locate elements on website

## Accessing a URL
We access a URL with the `get()` method. 

In [2]:
driver = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver") #creates a chrome web driver instance 
driver.get("http://www.python.org") #navigate to this URL, wait until it is loaded
assert "Python" in driver.title #check if python is in the title

  driver = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver") #creates a chrome web driver instance


## Getting things done

In [None]:
elem = driver.find_element(By.NAME, "q") #finds element from its name attribute
elem.clear() #we clear all potential text in the input element
elem.send_keys("pycon") #then we type our things
elem.send_keys(Keys.RETURN) #and we execute
assert "No results found." not in driver.page_source #to be sure that something is found
driver.close() #the close method closes the tab. Here there is only one so it's equivalent to the quit() method

In [3]:
driver.close() #When our operations are over, we need to close the driver. 

## Finding elements

-  In order to locate an element like the following : `<input type="text" name="passwd" id="passwd-id" />`,  we can use all these different commands : 

``` 
element = driver.find_element(By.ID, "passwd-id")
element = driver.find_element(By.NAME, "passwd")
element = driver.find_element(By.XPATH, "//input[@id='passwd-id']")
element = driver.find_element(By.CSS_SELECTOR, "input#passwd-id")
```

## Interacting with elements
-  One can input some text into an element : 
`element.send_keys("some text")`

-  Text parts append themselves and do not automatically clear themselves. Then, we need to use the `clear()` method. 

- We can navigate through the driver using special keys : 
```
element.send_keys(" and some", Keys.ARROW_DOWN)
```

## About forms
-  Further than with just text inputs, one can interact with options for example : 
```
element = driver.find_element(By.XPATH, "//select[@name='name']")
all_options = element.find_elements(By.TAG_NAME, "option")
for option in all_options:
    print("Value is: %s" % option.get_attribute("value"))
    option.click()
```
-  One can unselect elements : 
```
select = Select(driver.find_element(By.ID, 'id'))
select.deselect_all()
```

-  The `selenium.webdriver.support.ui` package includes specific useful methods for Select objects. 
```
#A few examples
select = Select(driver.find_element(By.NAME, 'name'))
select.select_by_index(index)
select.select_by_visible_text("text")
select.select_by_value(value)
```
-  We can access all selected options for specific drivers as follow : 
```
select = Select(driver.find_element(By.XPATH, "//select[@name='name']"))
all_selected_options = select.all_selected_options
```

We can, in the contrary, unselect all elements from a `SELECT` object: 
```
select = Select(driver.find_element(By.ID, 'selector'))
select.deselect_all()
```
One can access all selected options via the followiing command : `options = select.options`

Eventually, there are different methods to submit a form : 

-  One can do it manually assuming the submit button is identified : `#driver.find_element_by_id("submit").click()` 

-  Or let python try to find the enclosing form and submit it, with : `element.submit()`

# Form example : 

Let's try to automate the answer to Lydia collect links. We got two examples : 

-  The form for a paintball competition : https://collecte.io/paintball-inter-assos-2091521/fr

-  The form for the famous escape week : https://collecte.io/shotgun-escape-week-1987144/fr

# Paintball

## Accessing URL

In [9]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys #used to simulate some keyboard keys (Alt, Tab, etc.)
from selenium.webdriver.common.by import By #used to locate elements on website

In [14]:
driver = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver") #creates a chrome web driver instance 
driver.get("https://collecte.io/paintball-inter-assos-2091521/fr") #navigate to this URL, wait until it is loaded

  driver = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver") #creates a chrome web driver instance


The general expression to locate elements by their Xpath is as follows : `XPath = //tagname[@Attribute=’Value’]`

In [15]:
#Example with non generic method
paint_name = "//*[@id='val1']" #not general...
element = driver.find_element(By.XPATH, paint_name)


## We'll try to get all inputs completed step by step: 

-  We'll first create the driver

In [92]:
driver = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver") #creates a chrome web driver instance 
driver.get("https://collecte.io/paintball-inter-assos-2091521/fr") #navigate to this URL, wait until it is loaded

  driver = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver") #creates a chrome web driver instance


-  Then, we'll get all labels elements in a list

In [93]:
path = '//div/label'
labels = driver.find_elements(By.XPATH, path)

-  Then, we need a dictionary with all elements and their attribute

In [86]:
dic = {
    'Nom' : 'Caetano',
    'Prénom' : 'Hugo',
    'Numéro de téléphone' : '0619372524',
    'Adresse email' : 'hugocaetano78800@gmail.com',
    'Asso' : 'BDX'
}

In [87]:
def completer(dic, labels):
    """The completer function completes all fields from the dictionary in the elements wich are linked to the 
    labels list's elements. If there is no field that corresponds to an element in dic, it won't return an error 
    message. Then, 'mieux vaut trop que pas assez'"""
    for i in list(dic.keys()) :
        for j in labels:
            path_j = "//*[@id='" + j.get_attribute('for') + "']"
            element_j = driver.find_element(By.XPATH, path_j) #getting the element associated with the j-th label
            if i in j.text:
             element_j.clear() #we clear all potential text in the input element
             attrib_ij = dic.get(i)
             element_j.send_keys(attrib_ij)    

In [94]:
completer(dic, labels) 

We now have all our wanted fields completed. Let's **click on the submit button.**

In [95]:
submit = driver.find_element(By.ID, 'submit-state-lydia') #finding the submit button
submit.click()

In [96]:
driver.close()

# Escape Week

## Accessing URL

In [20]:
driver2 = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver")
driver2.get("https://collecte.io/shotgun-escape-week-1987144/fr")

  driver2 = webdriver.Chrome(executable_path="~/Desktop/proj/chromedriver")


## Getting things done

In [12]:
ew_nom = driver2.find_element(By.XPATH, "//label[text()='Nom']//parent::div//input") #getting element
ew_nom.send_keys("CAETANO")

NameError: name 'driver2' is not defined

## The end

In [24]:
driver2.close()

These little experiments were pretty useful. I feel ready to write the script. I just need to make a last test. 