# Selenium Bootcamp

## Why Selenium?

Most of the time, scraping methods like RegEx or BeautifulSoup will be fine for dealing with websites. However, some websites handle things a little bit differently. Let's take a look. Run the following two cells:

In [None]:
import os

In [None]:
os.system("open my_website.html")

### Let's try downloading the site and scraping the data. 

In [None]:
from urllib.request import Request, urlopen
url = "file://" + os.getcwd() + "/my_website.html"
html = str(urlopen(url).read())

### Seems to work fine. Let's try scraping some data from it.

In [None]:
import re

In [None]:
static_data = re.findall(r'<td class = "static_input">(.+?)<\/td><td class = "static_output">(.+?)<\/td>', html)
static_data

### Looks good! But I think we have some data missing... Not a problem, let's try to scrape it

In [None]:
dynamic_data = re.findall(r'<td class = "dynamic_input">(.+?)<\/td><td class = "dynamic_output">(.+?)<\/td>', html)
dynamic_data

### What went wrong?

## Cases where you might need Selenium
* Data is generated via interaction e.g. searching, clicking more, etc.
* Data is generated via "ajax" requests
* Website requires login of some kind
* Dealing with the html parsing and regex is just too damn annoying

## Download Instructions

1. Install Selenium for Python. ```python3 -m pip install selenium```. [Full Instructions](https://selenium-python.readthedocs.io/installation.html)

2. [Install chrome webdriver](https://sites.google.com/a/chromium.org/chromedriver/downloads).

3. Move the resulting file to this folder.


### Great! Now let's get started

In [None]:
from selenium import webdriver

In [None]:
driver = webdriver.Chrome("chromedriver")

# for Windows users
# driver = webdriver.Chrome("chromedriver.exe")