<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Webscraping OpenTable with Selenium: Guided Lab


---

> *Note: this lab is intended to be instructor guided.*


In today's codealong lab, we will build a scraper using urllib and BeautifulSoup. We will remedy some of the pitfalls of automated scraping by using a a "headless" browser called Selenium.

You will be scraping OpenTable's Austin listings. We're interested in knowing the restaurant's **name, location, price, and how many people booked it today.**

OpenTable provides all of this information on this given page: [Open table listings](https://www.opentable.com/austin-restaurant-listings)

### 1. Inspect the elements of this page to assure we can find each of the bits of information in which we're interested.

### 2. Use `requests` and `BeautifulSoup` to read the contents of the HTML.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from time import sleep

url = 'https://www.opentable.com/austin-restaurant-listings'

In [3]:
# set the url we want to visit
res = requests.get(url)
res.status_code

200

### 3. Use Beautiful Soup to convert the raw HTML into a soup object.

In [4]:
soup = BeautifulSoup(res.content)



 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml")

  markup_type=markup_type))


### 4. Extract the name of each restaurant.

Let's first find each restaurant name listed on the page we've loaded. How do we find the page location of the restaurant? 

> *Hint: we need to know where in the **html** the restaurant element is housed.*

**4.A See if you can find the restaurant name on the page. Keep in mind there are many restaurants loaded on the page.**

In [6]:
for i in soup.find_all('span', {'class' : 'rest-row-name-text'}):
    print(i.text)

Katelynns
Trail
Yessenia Spring
Gleichner Trail
Kuphal
Monahan
987 Kozey
Granville Wiegand
Enola Weber
Jonass
Fannie Howell
Fuga Pine
Terrys
Consequatur Walter
Autem Mills
Cristians
Placeat Kuphal
Qui Village
Nikolaus
Stravenue
644 Kunze
Crescent
Jeanettes
Andres
Bartholomes
Circle
Russells
Sequi
Est Mount
Sophies
Valentinas
Kerluke
Damaris Crooks
Stewarts
Codys
Natus Bergnaum
Kiarra Tromp
Alis
Sint Botsford
728 Harris
Erna Hettinger
Justens
Sit
Dolor
Eveniet
574 Okuneva
Stoltenberg Key
Est
379 Gottlieb
Odio
Consequatur Circles
Considine Hollow
Dashawn Reynolds
Sit
Dach
894 Okuneva
Boris Expressway
Consequatur Bartoletti
Sammie Botsford
Maxime
Dollys
Dedric Watsica
Neque
Repellendus Haag
853 Mosciski
View
Libbies
Agloe Bar & Grill
Dolores Jerde
Anastasia Gerhold
Justens
Autem Squares
Trail
Roads
Raleighs
Joel Cove
Okuneva
Quidem Coves
843 Bode
A Circle
Springs
Qui
656 Wuckert
Taryns
River
April Turner
Wilfredo Brekke
Lakes
Heidenreich
Autem Lake
Et
1192 Sanford
Maude Ridge
Mante
Jasmin

**4.B See any issues here?.**


In [4]:
# A:


### 5. Enter Selenium - resolve the javascript issue using the driver and find the bookings. 

Because the page should believe I'm visiting from a live connection on a browser client, the JavaScript should render to be a part of the page source. I can then grab the page source.

install selenium with `pip install selenium`


**Once you have the HTML with the javascript rendered, repeat the processes above.**


In [7]:
from selenium import webdriver
# uncomment below for macos
driver = webdriver.Chrome(executable_path="./chromedriver/macos/chromedriver") 

#uncomment below for windows 
#  driver = webdriver.Chrome(executable_path="./chromedriver/windows/chromedriver.exe")

In [9]:
driver.get('http://wework.com')

driver.get('https://www.opentable.com/austin-restaurant-listings')

In [11]:
soup = BeautifulSoup(driver.page_source)



 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml")

  markup_type=markup_type))


In [12]:
for i in soup.find_all('span', {'class' : 'rest-row-name-text'}):
    print(i.text)

acre 41
Rosedale Kitchen and Bar
The Grove Wine Bar and Kitchen Downtown
Uchiko
Olive and June
The Capital Grille - Austin
34th Street Cafe
ASTI Trattoria
Olamaie
The Salty Sow
Goodall's Kitchen
Perry's Steakhouse & Grille - Downtown Austin
Jeffrey's Restaurant
Josephine House
Bar Peached
Eddie V's - 5th Street
Cipollina
ALC Steaks (Austin Land & Cattle)
Truluck's - Ocean's Finest Seafood & Crab - Austin Downtown
ATX Cocina
True Food Kitchen - Austin
Roaring Fork - Downtown, Congress
wink
Clark's Oyster Bar
Fixe
Foreign & Domestic - Austin
Dai Due
Ranch 616
Cafe Josie
North Italia - Austin 2nd Street
Carillon Restaurant
Vince Young Steakhouse
La Condesa
L'Oca d'Oro
Arlo Grey
Eberly
Lonesome Dove Western Bistro Austin
Rosewood
Bob's Steak & Chop House - Austin
Quattro Gatti Ristorante e Pizzeria
Caroline Restaurant
III Forks - Austin
Stella San Jac
The Peacock Mediterranean Grill
Peche Austin
Emmer & Rye
Il Brutto
Le Politique
Trace - W - Austin
The Peached Tortilla
Lambert's Downtown B

### 6. Repeat the process above, and let's grab location as well 

In [18]:
for i in soup.find_all('span', {'class' : 'rest-row-meta--location rest-row-meta-text sfx1388addContent'}):
    print(i.text)

Downtown
North Central
Downtown
Central Austin
Downtown
Downtown
Central Austin
Midtown
Central Austin
East Austin
Downtown
Downtown
Central Austin
Central Austin
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Central Austin
East Austin
Downtown
Downtown
Downtown
Downtown
Downtown
Central Austin
North Central
Downtown
Downtown
Downtown
East Austin
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
East Austin
Downtown
Downtown
Central Austin
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Central Austin
Central Austin
East Austin
Northwest
Downtown
East Austin
Downtown
East Austin
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
Downtown
East Austin
South Austin
Downtown
Downtown
Downtown
Central Austin
Central Austin
Central Austin
Midtown
Central Austin
Central Austin
Central Austin
Central Austin
Downtown
Downtown
Downtown
Downto

### 7. Get the price for each restaurant.

The price is number of dollar signs on a scale of one to four for each restaurant. We'll follow the same process.

In [21]:
for i in soup.find_all('i', {'class' : 'pricing--the-price'}):
    print(i.text)

  $    $      
  $    $      
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    $  
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    $  
  $    $      
  $    $      
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    
  $    $      
  $    $      
  $    $      
  $    $    $    $  
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    $  
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $    $    $  
  $    $    $    
  $    $    $    $  
  $    $      
  $    $      
  $    $    $    
  $    $      
  $    $    $    $  
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    $  
  $    $      
  $    $      
 

**7.B Convert the dollar sign strings to a count of the number of dollar signs.**

Can you figure out a way to simply print out the number of dollar signs per restaurant listed?

In [36]:
# A:
for i in soup.find_all('i', {'class' : 'pricing--the-price'}):
    print(i.text)

  $    $      
  $    $      
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    $  
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    $  
  $    $      
  $    $      
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    
  $    $      
  $    $      
  $    $      
  $    $    $    $  
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    $  
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $    $    $  
  $    $    $    
  $    $    $    $  
  $    $      
  $    $      
  $    $    $    
  $    $      
  $    $    $    $  
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    
  $    $    $    
  $    $      
  $    $    $    
  $    $    $    $  
  $    $      
  $    $      
 

### 8. Can you find the number of times a restaurant was booked.

In the next cell, print out a sample of objects that contain the number of times the restaurant was booked.



In [37]:
# A:
for i in soup.find_all('div', {'class' : 'booking'}):
    print(i.text)

 Booked 19 times today
 Booked 28 times today
 Booked 12 times today
 Booked 77 times today
 Booked 17 times today
 Booked 37 times today
 Booked 5 times today
 Booked 9 times today
 Booked 37 times today
 Booked 52 times today
 Booked 2 times today
 Booked 65 times today
 Booked 31 times today
 Booked 41 times today
 Booked 16 times today
 Booked 70 times today
 Booked 19 times today
 Booked 14 times today
 Booked 50 times today
 Booked 81 times today
 Booked 55 times today
 Booked 48 times today
 Booked 8 times today
 Booked 36 times today
 Booked 35 times today
 Booked 19 times today
 Booked 25 times today
 Booked 29 times today
 Booked 12 times today
 Booked 43 times today
 Booked 7 times today
 Booked 15 times today
 Booked 30 times today
 Booked 19 times today
 Booked 22 times today
 Booked 47 times today
 Booked 13 times today
 Booked 11 times today
 Booked 14 times today
 Booked 3 times today
 Booked 27 times today
 Booked 28 times today
 Booked 31 times today
 Booked 61 times 

### 9. Can we get all of the items we want from the page in a single `find_all`?

To be most efficient, we want to only do a single loop for each entry on the page. That means we want to find what element all of other other elements (name, location, price, bookings) is housed within. Where on the page is each entry located?

In [46]:
df = pd.DataFrame(columns=['name', 'location', 'price', 'bookings'])
for row in soup.find_all('div', {'class':'rest-row-info'}):
    name = row.find('span', {'class':'rest-row-name-text'}).text
    location = row.find('span', {'class':'rest-row-meta--location'}).text
    p = row.find('i', {'class': 'pricing--the-price'})
    price = p.text.count('$')
    try:
        bookings = row.find('div', {'class': 'booking'}).text.split()[1]
    except:
        bookings = '0'
    df.loc[len(df)] = [name, location, price, bookings]
df

Unnamed: 0,name,location,price,bookings
0,acre 41,Downtown,2,19
1,Rosedale Kitchen and Bar,North Central,2,28
2,The Grove Wine Bar and Kitchen Downtown,Downtown,2,12
3,Uchiko,Central Austin,3,77
4,Olive and June,Downtown,3,17
...,...,...,...,...
95,Geraldine’s,Downtown,3,32
96,La Volpe,Downtown,3,9
97,P6,Downtown,2,16
98,Z'Tejas Austin 6th St,Downtown,2,0


### 10. Does every single entry have each element we want?

In [None]:
# A:

### 11. Use python exceptions to handle cases when bookings aren't found.

When a booking is not found, store 0.

In [1]:
# A:


### 12. Putting it all together in a dataframe.

**Loop through each entry. For each entry:**
1. Grab the relevant information we want (name, location, price, bookings). 
2. Produce a dataframe with the columns "name","location","price","bookings" that contains the 100 entries we would like.

In [3]:
# A:


In [11]:
## add together all the pieces here 


### Bonus: Use selenium to loop through at least 5 pages and grab that information as well. Chicago is a good example of a city with enough restaurant listings. 

## Additional resources

---

The above example (and many others) are available in the Selenium docs: http://selenium-python.readthedocs.io/getting-started.html

What is especially important is exploring functionality like locating elements: http://selenium-python.readthedocs.io/locating-elements.html#locating-elements

FAQ:
http://selenium-python.readthedocs.io/faq.html