###  Introduction to Splinter (0:15)


Chrome Webdriver: [https://splinter.readthedocs.io/en/latest/drivers/chrome.html](https://splinter.readthedocs.io/en/latest/drivers/chrome.html)


* Up to now, we have used Beautiful Soup to scrape a single, static page at a time.

* Often, you can only access interesting parts of a website after logging in, filling out and submitting forms, etc.

* When the data is "buried" behind such dynamic interactions, a web driver can be used to write scripts for the browser!

* This allows developers to simulate user interactions programmatically and scrape multiple pages along the way.

* The Chrome webdriver can also be used to simulate user interaction with a webpage for testing purposes.

### Installing Chomedriver

### Mac:
* Mac users, if they have `brew` installed, can simply run `brew install chromedriver` from their terminal.
* To install brew, go to https://brew.sh and follow the instructions.

### Windows:
* Download the Chromedriver
* https://sites.google.com/a/chromium.org/chromedriver/downloads

* Extract the executable program file, and then place it in the same folder as their Python script.


* An instance of a Splinter browser on the Browser line below.
  
* The driver being used for the browser interaction is 'chrome'

* `False` is passed for the `headless` option. This means that the browser's actions will be displayed in a Chrome window so that the process can be seen. You can set this to true if you don't want to see it.



In [1]:
from splinter import Browser
from bs4 import BeautifulSoup
from sys import platform
if platform == "darwin":
    executable_path = {"executable_path": "/usr/local/bin/chromedriver"}
else:
    executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser("chrome", **executable_path, headless=False)



ModuleNotFoundError: No module named 'splinter'

### Take a look at  http://quotes.toscrape.com/

* For each page, we need to parse the html. 
* The browser then needs to click the `Next` button to proceed onto the next page 
  
![Next](images/next.png)
    

* Open the Chrome inspector to identify the element that the application will need to click.


![Next](images/next_html.png)


* Navigate to [Splinter's documentation](https://splinter.readthedocs.io/en/latest/elements-in-the-page.html)
  
* Splinter offers various ways of interacting with the page, including clicking an element by its text.

* The next part is a for-loop with five iterations that uses Beautiful Soup to parse the page by collecting all of the quotes in that location.

* After printing all the quotes on a page, the application clicks on the `Next` button with Splinter's `click_link_by_partial_text()` method.

* There are ten total pages on this practice website, but we chose five to cycle through.




In [71]:
url = 'http://quotes.toscrape.com/'
browser.visit(url)

for x in range(1, 6):

    html = browser.html
    soup = BeautifulSoup(html, 'html.parser')

    quotes = soup.find_all('span', class_='text')

    for quote in quotes:
        print('page:', x, '-------------')
        print(quote.text)

    browser.click_link_by_partial_text('Next')


page: 1 -------------
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
page: 1 -------------
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
page: 1 -------------
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
page: 1 -------------
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
page: 1 -------------
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
page: 1 -------------
“Try not to become a man of success. Rather become a man of value.”
page: 1 -------------
“It is better to be hated for what you are than to be loved for what you are not.”
page: 1 -------------
“I have not failed. I've just found 10,000 ways that won't work.”
page: 1 -------------
“A woman is like a tea bag; you ne

This one is harder. Since they use react, the HTML Elements are very cluttered. I used the FireFox Developer Edition https://www.mozilla.org/en-US/firefox/developer/.
Their dev tools allow you to right click on an element and choose -> `Copy XPath` or `Copy CSS` which gives you a path to that specific element.

In [90]:
url = 'https://unsplash.com/'
browser.visit(url)

html = browser.html
soup = BeautifulSoup(html, 'html.parser')
input_field = browser.find_by_css('html body.js-focus-ring div#app div div div.F5xv_ div._11ii- div.RuSEo div._258j4._2sCnE.PrOBO._1CR66 div._2FVCK._1pgnK._3vQsl div div div._2PV-w div._2-RsN div._3PSbf form._3_k_e div._3SDxY input#FEED_HEADER_SEARCH_INPUT.qWUF0')
input_field[0].fill('paris')
search_button = browser.find_by_xpath('/html/body/div/div/div[4]/div[2]/div/div[2]/div[1]/div/div/div[2]/div/div[1]/div[2]/form/div[2]/button')
search_button.click()



In [93]:
weather = "http://www.surfline.com/surf-forecasts/southern-california/santa-barbara_2141"
browser.visit(weather)

html = browser.html
forecast_soup = BeautifulSoup(html, "html.parser")
report = forecast_soup.find(class_="forecast-outlook")
surf_report = report.find_all("p")

In [96]:
surf_report = report.find_all("ul")

In [97]:
surf_report

[<ul>
 <li>Small WNW swell-mix for the end of the work week</li>
 <li>Light  early AM winds for the next few days</li>
 <li>Better WNW and S swell possible long range</li>
 </ul>]

In [100]:
for bullet in surf_report:
    print(bullet.text)


Small WNW swell-mix for the end of the work week
Light  early AM winds for the next few days
Better WNW and S swell possible long range

