# Web Scraping - Advanced Selenium

## Navigating using Selenium

Selenium allows us to do many other things, such as scroll, click, and send keystrokes. For example, you can run the following cells one by one and observe the results.

Let's see how to scroll down a page using Selenium:

In [7]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()

driver.get("http://www.python.org")

You are in the official Python Webpage, let's scroll down to the bottom of the page:

In [None]:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

The next cell will look for the search bar and click it.

In [None]:
search_bar = driver.find_element(by=By.XPATH, value='//*[@id="id-search-field"]')
search_bar.click()

Now that you clicked it, you can send a keystroke to the search bar:

In [None]:
search_bar.send_keys("method")

And once you enter the text, you can 'Press Enter':

In [None]:
search_bar.send_keys(Keys.RETURN)

Whenever you need to perform an action in Selenium, just think, what steps are you doing as a human being? If you can explain it with words, Selenium probably can do it, just look at the documentation, or Google it.

## Selenium Wait for an Element Feature

On many ocassions, you will need for an element to appear to scrape it. As mentioned above, many websites are dynamic, meaning that its whole content is not available right after connecting to it. If that is the case, Selenium will try to find elements before the whole page is loaded, and therefore the scraper might fail if the element is not ready.

To solve this problem, you can tell selenium to _wait_ until the element you want to scrape appears. For example, in the Zoopla challenge, the frame containing the "Accept Cookies" button will not appear immediately, that is why we added a `time.sleep(3)` after telling the driver to get to that website:
```
    driver = webdriver.Chrome() 
    URL = "https://www.zoopla.co.uk/new-homes/property/london/?q=London&results_sort=newest_listings&search_source=new-homes&page_size=25&pn=1&view_type=list"
    driver.get(URL)
    time.sleep(3) 
    try:
        driver.switch_to_frame('gdpr-consent-notice') # This is the id of the frame
        accept_cookies_button = driver.find_element_by_xpath('//*[@id="save"]')
        accept_cookies_button.click()
    ...
```

However, depending on the server and the user connection that number of seconds might vary. So, instead of using an arbitrary number like `3`, we might want to tell Selenium: "Wait until this frame shows up"

Selenium has many capabilities, and luckily, one of them allows us to implement this functionality. Let's take a look at how to do so:

First, let's import the libraries we need:

In [10]:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import time

Now, let's implement it on the code we had:

In [13]:
def load_and_accept_cookies() -> webdriver.Chrome:
    '''
    Open Zoopla and accept the cookies
    
    Returns
    -------
    driver: webdriver.Chrome
        This driver is already in the Zoopla webpage
    '''
    driver = webdriver.Chrome() 
    URL = "https://www.zoopla.co.uk/new-homes/property/london/?q=London&results_sort=newest_listings&search_source=new-homes&page_size=25&pn=1&view_type=list"
    driver.get(URL)
    delay = 10
    try:
        WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, '//*[@id="gdpr-consent-notice"]')))
        print("Frame Ready!")
        driver.switch_to.frame('gdpr-consent-notice')
        accept_cookies_button = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, '//*[@id="save"]')))
        print("Accept Cookies Button Ready!")
        accept_cookies_button.click()
        time.sleep(1)
    except TimeoutException:
        print("Loading took too much time!")

    return driver 

So, what's happening here?

1. As always, we define the driver and tell it to visit the URL
```
driver = webdriver.Chrome() 
URL = "https://www.zoopla.co.uk/new-homes/property/london/?q=London&results_sort=newest_listings&search_source=new-homes&page_size=25&pn=1&view_type=list"
driver.get(URL)
```
2. We set a variable named delay, which is the maximum time we allow Selenium to wait.
```
delay = 10
```
3. Then, we use the WebDriverWait class to tell the driver to way a maximum of 10 seconds. Within those 10 seconds, if the element corresponding to the XPath whose value is `'//*[@id="gdpr-consent-notice"]'` (which corresponds to the frame) appears, then, stop waiting, and keep running the code.
```
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, '//*[@id="gdpr-consent-notice"]')))
```
4. If the element appears before 10 seconds, we just go with the regular code to click the button. This code in turn has another WebDriverWait just in case the button appears after switching to frame
```
print("Frame Ready!")
driver.switch_to.frame('gdpr-consent-notice')
accept_cookies_button = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, '//*[@id="save"]')))
print("Accept Cookies Button Ready!")
accept_cookies_button.click()
time.sleep(1)
```
5. However, if the element doesn't show up in less than 10 seconds, Selenium throws a `TimeoutException` error, and the `except` clause is triggered
```
except TimeoutException:
    print("Loading took too much time!")
```

Let's see how it works:

In [12]:
driver = load_and_accept_cookies()

Frame Ready!


Thanks to this, we don't have to worry about setting an arbitrary number of seconds to wait.

## Key Takeaways

- Selenium also has more advanced features such as scrolling through a webpage, waiting for certain conditions to occur, and sending keystrokes to the website
- To scroll through a website, we can use the `window.scrollTo()` command 
- The `.send_keys()` command is used to send specific keystrokes (such as the Enter key) to the website 
- Selenium can also be instructed to wait until certain elements appear on a website. This can be achieved using the `WebDriverWait` command.
