# Selenium

> [Table of Contents](../../README.md)

## In This Notebook
- Troubleshoot
- Flow for Selenium 4
- Nasdaq site

## Troubleshoot
- 'Access Denied' help [here](https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver/52108199#52108199) and [here](https://stackoverflow.com/questions/63972523/selenium-access-denied)
- For Nasdaq, Adding all the `add_arguments()` above and removing  
`options.add_argument("--headless")` did the trick. Did not need to modify  
chromedriver

## Flow for Selenium 4
```python
# Install 
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.chrome.options import Options

# Config options
opts = Options()
# headless to use in WSL2
opts.add_argument("--headless") # May cause access denied on some sites

# Initialize driver
service = Service(executable_path="/home/sportybutton/drivers/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=opts)

# Retrieve page
driver.get("https://www.selenium.dev/selenium/web/web-form.html")
print('Page title: ' + driver.title)

# Parse using xpath or css or other means
try:
    # element = driver.find_element_by_xpath()
    element = driver.find_element_by_css_selector()
except NoSuchElementException:
    pass

# Close out the driver
driver.quit()
```

## Nasdaq site
```python
# This worked for nasdaq.com
# Config and Initialize driver
options = Options()
options.add_argument("start-maximized")
options.add_argument('--disable-blink-features=AutomationControlled')
# options.add_argument("--headless")  # NOTE: HEADLESS CAUSES ACCESS DENIED
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
service = Service(executable_path="/home/sportybutton/drivers/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=options)

driver.get("https://www.nasdaq.com/market-activity/stocks/tulip/press-releases")
print('Page title: ' +  driver.title)
driver.implicit_wait(5)
print('Page source:\n' + driver.page_source)

# Parse page
try:
    # element = driver.find_element(By.XPATH, '')
    element = driver.find_element(By.CSS_SELECTOR, '.pagination__pages')
    print(element)
except NoSuchElementException:
    print('nothing here')
driver.quit()
```

### Troubleshoot nasdaq site
- Selenium producing incomplete page source
	- Solution: TL:DR; Use either implicit or explicit waits. Do not mix.  
	html for this site is injected by JS, so scrapy and selenium produce incomplete page sources. With scrapy there isn't a way to get those injections, but selenium mimics a browser and therefore can get injections. But, because selenium is unaware of the status of html DOM, there is a race condition with selenium producing a page source before all html injections were made. The fix is to add a wait. Selenium uses either implicit or explicit waits. Cannot mix these types else unintended consequences.