<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Web Scraping OpenTable With Selenium: Guided Lab

_Authors: Joseph Nelson (DC)_

---

> *Note: This is intended to be an instructor-guided lab.*


In today's code-along lab, we'll build a scraper using urllib and Beautiful Soup. We'll also remedy some of the pitfalls of automated scraping by using a "headless" browser called Selenium.

You'll be scraping OpenTable's Washington, D.C. listings. We're interested in knowing the restaurants' **name, location, price, and how many people booked it that day.**

OpenTable provides all of this information on this page: http://www.opentable.com/washington-dc-restaurant-listings.

### 1) Inspect the elements of this page to confirm we can find all of the information we're interested in.

### 2) Use `urllib` and `BeautifulSoup` to read the contents of the HTML.

In [1]:
from bs4 import BeautifulSoup
import urllib

In [2]:
# Set the URL we want to visit.
url = "http://www.opentable.com/washington-dc-restaurant-listings"

# A:

### 3) Print out a fraction of the HTML. What's in it?

In [1]:
# A:

### 4) Use Beautiful Soup to convert the raw HTML into a soup object.

In [None]:
# A:

### 5) Extract the name of each restaurant.

First, let's find each restaurant name listed on the page we've loaded. How do we find each restaurant's location on the page? 

> *Hint: We need to know where the restaurant element is housed in the **HTML**.*

**5.A) See if you can find the restaurant name. Keep in mind that there are many restaurants loaded on the page.**

In [None]:
# A:

**5.B) Create a list of _only_ the restaurant names (no tags).**


In [None]:
# A:

### 6) Repeat this process for location.

For example, barmini by Jose Andres is located in "Penn Quarter," as listed in our search results.

In [None]:
# A:

### 7) Get the price for each restaurant.

The price is the number of dollar signs on a scale of one to four for each restaurant. We'll follow the same process we used for restaurant name and location.

In [None]:
# A:

**7.B) Convert the dollar sign strings to a count of the number of dollar signs.**

Can you figure out a way to print out the number of dollar signs per restaurant listed?

In [None]:
# A:

### 8) Can you find the number of times a restaurant was booked?

In the next cell, print out a sample of objects that contain the number of times a restaurant was booked.

> *Note: If you can't, why do you think this happens?*

In [None]:
# A:

## Enter: Selenium

---

Selenium is a headless browser. That means it enables us to mock human-browsing behavior — it even waits for JavaScript elements to load.

If you don't already have Selenium installed, you can do so via pip. Simply run `pip install selenium`.

In [19]:
# Import:
from selenium import webdriver

To run, Selenium requires us to determine a default browser. We're going to opt for Firefox, but Chromium is also a very common choice.

http://selenium-python.readthedocs.io/faq.html

### 9) What's going to happen when we run the next cell?

The ChromeDriver has been provided in the "chromedriver" folder, so don't worry about downloading another one.

In [None]:
# NOTE: YOU MAY NEED TO  RUN THIS

import os
from selenium import webdriver


#CHANGE THIS WITH THE RIGHT PATH!
chromedriver = "/Users/edoardo/classes/week-06/labs/python-webscraping_opentable-lab-master/chromedriver/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver

In [23]:
# Create a driver called "driver."
driver = webdriver.Chrome(executable_path="../chromedriver/chromedriver")

Pretty crazy, right? Now let's close that driver. 

This should have opened up a new browser window. If you didn't see it pop up automatically, check all of your desktop displays. 

In [26]:
# Close it.
driver.close()

### 10) Use the driver to visit `www.python.org`.

In [None]:
# A:

### 11) Visit the OpenTable page using the driver.

Let's return to the problem at hand. We need to visit the OpenTable listings for DC. Once there, we need the HTML to load. 

In the next cell, prove you can programmatically visit the page.

In [None]:
# A:

### 12) Resolve the JavaScript issue using the driver and find the bookings.

What we can do in this case is:

1) Request that the page load.
2) Wait one second.
3) Grab the source HTML from the page.

The page should believe we're visiting from a live connection on a browser client, so the JavaScript should render to be part of the page source. We can then grab the page source.

**Once you have the HTML with the JavaScript rendered, repeat the processes above to find the bookings.**

In [28]:
# Import sleep:
from time import sleep

In [None]:
# A:

### 13) Can we get all of the items we want from the page in a single `find_all`?

To be as efficient as possible, we only want to do a single loop for each entry on the page. That means we want to find the element all of our other elements (name, location, price, and bookings) are housed within. Where is each entry located on the page?

In [None]:
# A:

### 14) Does every entry have all of the elements we want?

In [None]:
# A:

### 15) Use Python exceptions to handle cases when bookings aren't found.

When a booking isn't found, store `'ZERO'`.

In [None]:
# A:

### 16) Putting it all together in a DataFrame.

**Loop through the entries. For each:**

1) Grab the relevant information we want (name, location, price, and bookings). 
2) Produce a DataFrame with the columns "name," "location," "price," and "bookings" that contains the 100 entries we'd like.

In [None]:
# A:

### 17) [Bonus] Sending keys over the driver.

We can send keys to the page using the driver. Below is a demonstration of how to search the page using the Selenium driver.

In [52]:
# We can send keys as well. Import:
from selenium.webdriver.common.keys import Keys

In [53]:
# Open the driver.
driver = webdriver.Chrome(executable_path="../chromedriver/chromedriver")
# Visit Python.
driver.get("http://www.python.org")
# Verify we're in the right place.
assert "Python" in driver.title

In [54]:
# Find the search position.
elem = driver.find_element_by_name("q")
# Clear it.
elem.clear()
# Type in "pycon."
elem.send_keys("pycon")

In [55]:
# Send the keys.
elem.send_keys(Keys.RETURN)
# This yields no results.
assert "No results found." not in driver.page_source

In [56]:
# Close the driver.
driver.close()

In [57]:
# All at once:
driver = webdriver.Chrome(executable_path="../chromedriver/chromedriver")
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

## Additional Resources

---

The example above (and many others) are available in the [Selenium docs](http://selenium-python.readthedocs.io/getting-started.html).

It's especially important to explore functionality, such as [locating elements](http://selenium-python.readthedocs.io/locating-elements.html#locating-elements).

Review Selenium's [FAQs](http://selenium-python.readthedocs.io/faq.html).