# Scraping reviews using Selenium

Here is another example of how Selenium can be used to interact with websites making use of Ajax (Asynchronous JavaScript):

### Selenium is a chrome automation framework. 

It will enable us to tell chrome:

- go to page bbc.co.uk/weather
- "click the work 'next'"
- scroll down

Selenium will basically open a simplified version of Chrome, for a few seconds, use it and close it afterwards. You might even see it flash on your screen quickly.

then we will use beautiful soup to understand the code

### BeautifulSoup is an HTML parsing framework.

It will enable us to:

- copy the html of the tags eg. div, table
- extract text from these tags



# Getting selenium (don't skip this!)

1. find out which version of chrome you have, in chrome open page: chrome://settings/help
2. Go to the list of selenium versions and find folder with yoru version (eg. 	87.0.4280.88) https://chromedriver.storage.googleapis.com/index.html
3. Go into the folder for your version and download the zip file with the version for your operating system (most likely chromedriver_mac64.zip	or chromedriver_win32.zip ). 
4. unzip that file on yoru machine and put it in the folder where this notebook is. unzipped file will be called chromedriver or chromedriver.exe

In [None]:
!pip install selenium

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException


In [None]:
# define method that will create a browser, suitable to your operating system
import sys
def get_a_browser():
    if sys.platform.startswith('win32') or sys.platform.startswith('cygwin'):
        return webdriver.Chrome() # windows
    else:
        return webdriver.Chrome('./chromedriver') # mac

### important note: allowing your system to run chromedriver. This needs to be done just once.

If you are on a mac, you will need to allow your system to use chromium. Run below cell, and you will likely see a warning the first time, click 'cancel' (don't click 'Delete').

After you see the warning, go into Settings > Security&Privacy > General and "Allow Anyway"

On a pc the process will be simpler. When asked you'll need to allow computer to use the chromedriver.exe file.


### Let's try to scrape an interactive website:

What will be the weather in Edinburgh in 2 days?

# Task (you'll need: a web browser, pen and paper)

in this task I will ask you to do something by yourself (using your web browser, mouse and keyboard), and then you will see how you cen program Selenium to do it for you.

### Use www.bbc.co.uk/weather to find out what time will be the sunrise in EDINBURGH next Sunday.

D o it at least 3 times and observe all the steps you are taking. Make a very detailed list of all the steps, as if you had to describe them to someone over the phone without seeing their screen. See example below.

it will look a bit like this:

- ok, go to www.bbc.co.uk/weather and wait for it to load
- scroll down, do you see a link with words 'Edinburgh' on it? Click it. 
- Wait a minute for it to load.
- ok, now scroll down and ...

When you are done with this exercise, we will try to instruct Selenium (Chrome automation tool) to do it for us. Do you think you can try to use Chrome Dev tools to make yoru steps more specific? eg. Instead of saying "copy text in that bold link next to the word Sunrise" try to say "copy text from the html span item with a class 'wr-c-astro-data__time'" 

**SERIOUSLY: Take a few minutes to do this. It will make you learn more from the below code!**



Ok. And now let's get the python to do it for us.

In [None]:
browser = get_a_browser()

# the url we want to open
url = u'https://www.bbc.co.uk/weather'

# the browser will start and load the webpage
browser.get(url)

# we wait 1 second to let the page load everything
time.sleep(1)

# we search for an element that is called 'customer reviews', which is a button
# the button can be clicked with the .click() function
browser.find_element_by_link_text("Edinburgh").click();

# sleep again, let everything load
time.sleep(1)

# we load the HTML body (the main page content without headers, footers, etc.)
body = browser.find_element_by_tag_name('body')

# we use seleniums' send_keys() function to physically scroll down where we want to click
body.send_keys(Keys.PAGE_DOWN)

# search for the next button to access the next reviews
try:
    # link will look like "Sun 12Dec" so we use find_element_by_partial_link_text()
    next_button = browser.find_element_by_partial_link_text('Sun ') 
    next_button.click()
except NoSuchElementException:  #if such element does not exist, just stop looping
    print("something went wrong. There was no Sunday link.")
    
# load current view of the page into a soup
soup = BeautifulSoup(browser.page_source, 'html.parser')

# find all the elements of class pros and print them
sunrise_tag = soup.find("span", {"class" : 'wr-c-astro-data__time'})
print("Sunrise next Sunday: ", sunrise_tag.text)



Above you should see something like "Sunrise next Sunday:  08:40"