<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Webscraping OpenTable with Selenium: Guided Lab


---

> *Note: this lab is intended to be instructor guided.*


In today's codealong lab, we will build a scraper using urllib and BeautifulSoup. We will remedy some of the pitfalls of automated scraping by using a a "headless" browser called Selenium.

You will be scraping OpenTable's Austin listings. We're interested in knowing the restaurant's **name, location, price, and how many people booked it today.**

OpenTable provides all of this information on this given page: [Open table listings](https://www.opentable.com/austin-restaurant-listings)

### 1. Inspect the elements of this page to assure we can find each of the bits of information in which we're interested.

### 2. Use `requests` and `BeautifulSoup` to read the contents of the HTML.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from time import sleep

url = 'https://www.opentable.com/austin-restaurant-listings'

In [2]:
# set the url we want to visit
res = requests.get(url)

### 3. Use Beautiful Soup to convert the raw HTML into a soup object.

In [3]:
soup = BeautifulSoup(res.content)

### 4. Extract the name of each restaurant.

Let's first find each restaurant name listed on the page we've loaded. How do we find the page location of the restaurant? 

> *Hint: we need to know where in the **html** the restaurant element is housed.*

**4.A See if you can find the restaurant name on the page. Keep in mind there are many restaurants loaded on the page.**

In [4]:
soup

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8"/><meta content="IE=9; IE=8; IE=7; IE=EDGE" http-equiv="X-UA-Compatible"/> <title>Restaurant Reservation Availability</title> <meta content="noindex,nofollow" name="robots"/> <link href="//components.otstatic.com/components/favicon/1.0.6/favicon/favicon.ico" rel="shortcut icon" type="image/x-icon"/><link href="//components.otstatic.com/components/favicon/1.0.6/favicon/favicon-16.png" rel="icon" sizes="16x16"/><link href="//components.otstatic.com/components/favicon/1.0.6/favicon/favicon-32.png" rel="icon" sizes="32x32"/><link href="//components.otstatic.com/components/favicon/1.0.6/favicon/favicon-48.png" rel="icon" sizes="48x48"/><link href="//components.otstatic.com/components/favicon/1.0.6/favicon/favicon-64.png" rel="icon" sizes="64x64"/><link href="//components.otstatic.com/components/favicon/1.0.6/favicon/favicon-128.png" rel="icon" sizes="128x128"/><link href="//components.otstatic.com/components/favicon/1.0.6/favicon/ap

In [5]:
for i in soup.find_all('span', {'class':'rest-row-name-text'})[:5]:
    print(i.text)

Patricks
Kattie Parker
Kayden Overpass
Officia
Purdy


**4.B See any issues here?.**


In [6]:
# A:


### 5. Enter Selenium - resolve the javascript issue using the driver and find the bookings. 

Because the page should believe I'm visiting from a live connection on a browser client, the JavaScript should render to be a part of the page source. I can then grab the page source.

install selenium with `pip install selenium`


**Once you have the HTML with the javascript rendered, repeat the processes above.**


In [7]:
from selenium import webdriver
# uncomment below for macos
driver = webdriver.Chrome(executable_path="./chromedriver/macos/chromedriver") 

#uncomment below for windows 
#  driver = webdriver.Chrome(executable_path="./chromedriver/windows/chromedriver.exe")

In [19]:
#driver.get('http://wework.com')

driver.get('https://www.opentable.com/austin-restaurant-listings')

In [20]:
soup = BeautifulSoup(driver.page_source)

In [22]:
for i in soup.find_all('span', {'class':'rest-row-name-text'})[:5]:
    print(i.text)

The Grove Wine Bar and Kitchen Downtown
Rosedale Kitchen and Bar
Dai Due
Uchiko
Olive and June


### 6. Repeat the process above, and let's grab location as well 

In [23]:
for i in soup.find_all('span', {'class':'rest-row-meta--location'})[:5]:
    print(i.text)

Downtown
North Central
East Austin
Central Austin
Downtown


### 7. Get the price for each restaurant.

The price is number of dollar signs on a scale of one to four for each restaurant. We'll follow the same process.

In [24]:
for i in soup.find_all('i', {'class': 'pricing--the-price'})[:5]:
    #print(len((i.text).replace(' ', '')))
    print(i.text.count('$'))

2
2
3
3
3


**7.B Convert the dollar sign strings to a count of the number of dollar signs.**

Can you figure out a way to simply print out the number of dollar signs per restaurant listed?

In [25]:
# A:


### 8. Can you find the number of times a restaurant was booked.

In the next cell, print out a sample of objects that contain the number of times the restaurant was booked.



In [26]:
for i in soup.find_all('div', {'class' : 'booking'})[:5]:
    print(i.text.split()[1])

14
38
34
74
22


### 9. Can we get all of the items we want from the page in a single `find_all`?

To be most efficient, we want to only do a single loop for each entry on the page. That means we want to find what element all of other other elements (name, location, price, bookings) is housed within. Where on the page is each entry located?

In [27]:
driver.page_source



In [28]:



df = pd.DataFrame(columns=['name', 'location', 'price', 'bookings'])

for row in soup.find_all('div', {'class':'rest-row-info'}):
    name = row.find('span', {'class':'rest-row-name-text'}).text
    location = row.find('span', {'class':'rest-row-meta--location'}).text
    p = row.find('i', {'class': 'pricing--the-price'})
    price = p.text.count('$')
    try:
        bookings = row.find('div', {'class': 'booking'}).text.split()[1]
    except:
        bookings = '0'
        
    df.loc[len(df)] = [name, location, price, bookings]
    
df

Unnamed: 0,name,location,price,bookings
0,The Grove Wine Bar and Kitchen Downtown,Downtown,2,14
1,Rosedale Kitchen and Bar,North Central,2,38
2,Dai Due,East Austin,3,34
3,Uchiko,Central Austin,3,74
4,Olive and June,Downtown,3,22
...,...,...,...,...
95,Perla's Seafood and Oyster Bar,Downtown,3,55
96,P6,Downtown,2,15
97,Z'Tejas Austin 6th St,Downtown,2,0
98,Garrison,Downtown,3,2


### 10. Does every single entry have each element we want?

In [None]:
# A:

### 11. Use python exceptions to handle cases when bookings aren't found.

When a booking is not found, store 0.

In [1]:
# A:





### 12. Putting it all together in a dataframe.

**Loop through each entry. For each entry:**
1. Grab the relevant information we want (name, location, price, bookings). 
2. Produce a dataframe with the columns "name","location","price","bookings" that contains the 100 entries we would like.

In [30]:
# A:






In [11]:
## add together all the pieces here 


### Bonus: Use selenium to loop through at least 5 pages and grab that information as well. Chicago is a good example of a city with enough restaurant listings. 

In [32]:
driver = webdriver.Chrome(executable_path="./chromedriver/macos/chromedriver") 

driver.get('https://www.opentable.com/chicago-restaurant-listings')

In [36]:
next_button = driver.find_element_by_link_text('Next')
next_button.click()

In [37]:
driver.close()

In [40]:
driver = webdriver.Chrome(executable_path="./chromedriver/macos/chromedriver") 
driver.implicitly_wait(5)

## go to open table 
driver.get('https://www.opentable.com/chicago-restaurant-listings')

## instantiate empty df
df = pd.DataFrame(columns=['name', 'location', 'price', 'bookings'])

### loop through pages

for _ in range(5):

    soup = BeautifulSoup(driver.page_source)

    ### make dataframe for a single page 
    for row in soup.find_all('div', {'class':'rest-row-info'}):
        name = row.find('span', {'class':'rest-row-name-text'}).text
        location = row.find('span', {'class':'rest-row-meta--location'}).text
        p = row.find('i', {'class': 'pricing--the-price'})
        price = p.text.count('$')
        try:
            bookings = row.find('div', {'class': 'booking'}).text.split()[1]
        except:
            bookings = '0'

        df.loc[len(df)] = [name, location, price, bookings]
    
    next_button = driver.find_element_by_link_text('Next')
    next_button.click()
    sleep(2)
    
    print(f'grabbed from {_ +1} pages')

driver.close()

df.shape

grabbed from 1 pages
grabbed from 2 pages
grabbed from 3 pages
grabbed from 4 pages
grabbed from 5 pages


In [47]:
df.head(20)

Unnamed: 0,name,location,price,bookings
0,Somerset,Gold Coast / Streeterville,3,45
1,Jeong,West Town,4,21
2,Boka,Lincoln Park,3,48
3,Aba,West Loop,3,198
4,Sunda,River North (Chicago),3,119
5,Café Ba-Ba-Reeba,Lincoln Park,2,136
6,Summer House Santa Monica,Lincoln Park,2,186
7,Uncle Julio's - Chicago,Lincoln Park,2,16
8,Piccolo Sogno,River West,2,44
9,RL Restaurant,Gold Coast / Streeterville,4,92


## Additional resources

---

The above example (and many others) are available in the Selenium docs: http://selenium-python.readthedocs.io/getting-started.html

What is especially important is exploring functionality like locating elements: http://selenium-python.readthedocs.io/locating-elements.html#locating-elements

FAQ:
http://selenium-python.readthedocs.io/faq.html

In [50]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome(executable_path="./chromedriver/macos/chromedriver")
driver.get("http://www.python.org")
sleep(2)
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
sleep(2)
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
sleep(3)
driver.close()