# Scraping the Unscrapable

## I can scrape Google with `requests` and `BeautifulSoup`...

In [None]:
import requests
from bs4 import BeautifulSoup

google_url="https://www.google.com"
soup=BeautifulSoup(requests.get(google_url).text, "html5lib")
print(soup.prettify())

There is a lot going on on this page - specifically, JavaScript is being used to render a lot of stuff here. 

Nevertheless, we can use Chrome inspector to find the name of a particular element. On Mac, the shortcut to bring up the Chrome inspector and enable its "highlight" functionality is "Command + Shift + C". 

And indeed, with these two libraries I can find elements on the page and select them...

In [None]:
print(soup.find(title="Google Search"))

That's great, but how can I actually type something in that box using Python? I _could_ do it using HTML query parameters, but I don't want to do that. For that, we need Selenium.

# Selenium

Selenium lets you interact with the objects on websites - clicking, entering text, and so on.

* Go to [this website](https://sites.google.com/a/chromium.org/chromedriver/home) to download Selenium Mac driver.
* Use the command line or click on the file name in your Downloads folder to unzip the file.
* Use the `mv` command to move the new `chromedriver` executable file into your home directory.
* Update the line below to make Selenium point to your Chromedriver file.

In [1]:
### CHANGE THE NAME "seth" HERE:
chromedriver_path = "/Users/seth/chromedriver"

Now, let's start using Selenium!

In [2]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Chrome(chromedriver_path)
driver.get('http://www.google.com/')
time.sleep(1); 
search_box = driver.find_element_by_name("q")

Note that running this code will actually open a new Chrome window!

The standard function that we use to "send data" to `WebElements` in Selenium is called `send_keys`.

In [3]:
search_box.send_keys("What to wear today")
search_box.send_keys(Keys.RETURN)

# OpenTable

Now let's go through an example of interacting with a dynamic website using Selenium.

In [11]:
import time
driver = webdriver.Chrome(chromedriver_path)
driver.get('http://www.opentable.com/')
time.sleep(1); 

WebDriverException: Message: disconnected: received Inspector.detached event
  (Session info: chrome=60.0.3112.90)
  (Driver info: chromedriver=2.30.477690 (c53f4ad87510ee97b5c3425a14c0e79780cdf262),platform=Mac OS X 10.12.6 x86_64)


How can we select elements on the page? We can, again, use the Chrome inspector to hover over certain elements on the page and click on these elements. See the screenshot below:

<img src="select_on_page.png">

It looks like the id for this element is "Select_1". Thus, the code below selects "3 people" from the first form:

In [None]:
people_dropdown = driver.find_element_by_name('Select_1')
time.sleep(1); 
people_dropdown.send_keys("3 person")
people_dropdown.send_keys(Keys.RETURN)
time.sleep(1); 

This selects the date, first:

* Converting the date into the number of milliseconds since the "epoch".
* Using the info (along with an adjustment) to "click" on the correct date field on OpenTable's website.

In [6]:
import time
from datetime import datetime
date_string = '08-12-2017'
def date_string_to_milliseconds(date_string, form='%m-%d-%Y', add=0):
    dt = datetime.fromtimestamp(time.mktime(time.strptime(date_string, '%m-%d-%Y')))
    epoch = datetime.utcfromtimestamp(0)
    return int((dt - epoch).total_seconds() * 1000) + add
    
milliseconds = date_string_to_milliseconds(date_string, add=60*30*10*1000)

temp = driver.find_element_by_name('datepicker')
time.sleep(1); 
temp.click()
time.sleep(1); 
date_element = driver.find_element_by_xpath('//div[@data-pick="{0}"]'.format(milliseconds))
date_element.click()

This selects 8:00 PM as the time:

In [7]:
time_dropdown = driver.find_element_by_name('Select_0')
time_dropdown.send_keys("8:00 PM")
time_dropdown.send_keys(Keys.RETURN)
time.sleep(1); 

This types in the "West Loop" as the neighborhood:

In [8]:
location = driver.find_element_by_name('searchText')
location.send_keys("West Loop, Chicago IL")
time.sleep(2); 
location.send_keys(Keys.RETURN)
time.sleep(1); 

Now, let's search!

In [9]:
search = driver.find_element_by_xpath('//input[@value="Find a Table"]')
search.click()
time.sleep(1); 

Now, let's loop through all of the restaurants that appear here. Note: at this point, you could switch back and use the slightly simpler and easier-to-use Beautiful Soup.

In [10]:
spans = driver.find_elements_by_xpath('//span[@class="rest-row-name-text"]')
for span in spans[:10]:
    print(span.text)

Sepia
Sepia
Bar Siena
Leña Brava
Maude's Liquor Bar
Nosh & Booze
Umami Burger - West Loop
Bar Takito - West Loop
Formento's
Jaipur


You would use only slightly more complicated code to, for example:

* Get the number of "dollar signs" for each restaurant and print that out.
* Get a list of restaurants that had reservations available at exactly 8:00 PM.
* Print out the rating, out of five stars, of each restaurant.

Each of these would take a bit of time and experimentation to figure out, but hopefully you see that it is possible.

Let's click on "Bar Siena":

In [None]:
for span in spans:
    if span.text == "Bar Siena":
        span.click()

Uh oh. We opened a new tab, but our driver is still on the old tab, so 
```
driver.find_element_by_xpath('//p[@class="readmore"]')
```
for example, won't work!

We can use the following to switch the driver to the correct window:

In [None]:
driver.switch_to_window(driver.window_handles[1])

In [None]:
description_element = driver.find_element_by_xpath('//p[@class="readmore"]')
description_text = description_element.text.replace('\n', ' ')
description_text

Finally, let's make a reservation!

In [None]:
reservation_element = driver.find_element_by_xpath('//a[@data-datetime="2017-07-14 22:00"]')
reservation_element.click()

In [None]:
el = driver.find_element_by_xpath('//a[@id="ReservationChangeUserLink"]')
el.click()

You would then enter your email and password and log in. Websites make this hard to do programmatically, even with tools like Selenium, on purpose.

Finally, you can click the button below to confirm the reservation!

In [None]:
final_button = driver.find_element_by_xpath('//button[@id="btn-complete"]')
final_button.click()

# Selenium References

References: 
- Documentation on finding elements:
- http://selenium-python.readthedocs.org/en/latest/locating-elements.html
- Xpath tutorial:
-  http://www.w3schools.com/xpath/xpath_syntax.asp