# Selenium
Allow the remote control of a browser, in this case Chrome

Source: [Notebook from Prof. Joao Sedoc](https://colab.research.google.com/drive/117lYNSQpTV_pNM30oZ9Dubz4w-7iUcoj?usp=sharing#scrollTo=ozL-ock4ewGz)

**Selenium with Python documentation:** https://selenium-python.readthedocs.io

# 🚨 Running Selenium on Google Colab 🚨

**Be careful when running Selenium/doing any scraping on Google Colab!**

- The notebook may timeout if the program has been running for a while, meaning you will loose your data if it is stored in memory.

- Google Colab datacenters do not have very high quality IP addresses, which means you'll look more suspicous when making requests from a notebook.

# What is Selenium?

While lots of websites are easy to scrape by either:
1. Sending a get request to the website and then using something like BeautifulSoup or Lxml to parse the html
2. Finding Private API's to retrive data

Some websites use lots of Javascript and when you send a get request to the url, the data you want doesn't load. When this happens it's A LOT easier to use Selenium.

**Selenium allows you to interect with a website just like a real person would!** It keeps track of all cookies and sessions which can be especially useful if you need to be logged in to scrape content.

## **Selenium vs Raw HTTP Requests**

**Pros of using Selenium for web scraping:**
1. Ability to handle dynamic content: Selenium can handle dynamic content, such as JavaScript-generated content or page updates.
2. Simulates a real browser.
4. Selenium provides a rich API that makes it easy to interact with web pages and perform actions such as clicking buttons or filling out forms.
**Cons of using Selenium for web scraping:**
1. Resource-intensive: Running Selenium can be resource-intensive, as it requires running a full instance of a web browser.
2. Slower than HTTP requests: Selenium can be slower than using HTTP requests, as it requires loading the entire browser and executing JavaScript.
3. More complex to set up.

**Pros of using HTTP requests for web scraping:**
1. Fast and lightweight.
2. Easy to set up.

**Cons of using HTTP requests for web scraping:**
1. Limited ability to handle dynamic content: HTTP requests are limited in their ability to handle dynamic content, as they do not execute JavaScript or simulate a real browser.



**Explanation of the following code block:**

- We will make a Chrome WebDriver instance. This is what will allow us to interact with a browser.


- We add a few Options to the WebDriver. Notably, we are making Selenium **headless**. The "headless" option in Selenium refers to running the browser in a headless state where the browser UI is not displayed.


- We are also telling the WebDriver where our chromedriver file is on the Google Colab Notebook.

In [19]:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

# service = webdriver.chrome.service.Service("/usr/bin/chromedriver")
chrome_service = webdriver.ChromeService()

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

In [3]:
import requests
url = 'https://www.fairness-finance.com/fairness-finance/finance/sample/sp500/product/equityriskpremium.dhtml'

In [4]:
resp = requests.get(url)

In [5]:
from IPython.core.display import display, HTML
display(HTML(resp.text))

  from IPython.core.display import display, HTML


Calculation date,n Number of companies (1),C Average market capitalization (M€),kL Market IRR,rf Risk free rate,πE Market equity risk premium


In [14]:
driver.get('https://www.fairness-finance.com/fairness-finance/finance/sample/sp500/product/equityriskpremium.dhtml')

In [15]:
# html is not in resp.text, but wd.page_source
# import beautifulsoup
display(HTML(driver.page_source))

Calculation date,n Number of companies (1),C Average market capitalization (M€),kL Market IRR,rf Risk free rate,πE Market equity risk premium
29/09/2024,468,103 562,7.46%,3.82%,3.64%
29/08/2024,470,102 598,7.36%,3.91%,3.46%
29/07/2024,480,97 894,7.47%,4.14%,3.33%
27/06/2024,469,93 530,7.63%,4.39%,3.24%
30/05/2024,465,90 326,7.79%,4.50%,3.29%
29/04/2024,460,88 541,7.83%,4.68%,3.14%


In [8]:
# find the table
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source)

In [9]:
soup.find('table')

<table class="finance-table" id="finance-table">
<thead>
<tr>
<th>Calculation date</th>
<th>n <br/>Number of companies <sup>(1)</sup></th>
<th>
						
							C <br/>Average market capitalization (M€)
						
					</th>
<th><b>k<sub>L</sub></b> <br/>Market IRR</th>
<th><b>r<sub>f</sub></b> <br/>Risk free rate</th>
<th><b>π<sub>E</sub></b> <br/>Market equity risk premium</th>
</tr>
</thead>
<tbody>
<tr><td data-th="Calculation date"><span class="bt-content">29/09/2024</span></td><td data-th="n Number of companies (1)"><span class="bt-content">468</span></td><td data-th="
						
							C Average market capitalization (M€)
						
					"><span class="bt-content">103 562</span></td><td data-th="kL Market IRR"><span class="bt-content">7.46%</span></td><td data-th="rf Risk free rate"><span class="bt-content">3.82%</span></td><td data-th="πE Market equity risk premium"><span class="bt-content">3.64%</span></td></tr><tr><td data-th="Calculation date"><span class="bt-content">29/08/2024</span></td

In [10]:
from IPython.core.display import display, HTML

  from IPython.core.display import display, HTML


In [11]:
# display the table
import pandas as pd
pd.read_html(driver.page_source)[0]

  pd.read_html(driver.page_source)[0]


Unnamed: 0,Calculation date,n Number of companies (1),C Average market capitalization (M€),kL Market IRR,rf Risk free rate,πE Market equity risk premium
0,29/09/2024,468,103 562,7.46%,3.82%,3.64%
1,29/08/2024,470,102 598,7.36%,3.91%,3.46%
2,29/07/2024,480,97 894,7.47%,4.14%,3.33%
3,27/06/2024,469,93 530,7.63%,4.39%,3.24%
4,30/05/2024,465,90 326,7.79%,4.50%,3.29%
5,29/04/2024,460,88 541,7.83%,4.68%,3.14%


In [18]:
BeautifulSoup(driver.page_source)

<html lang="en"><head>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta content="
					Equity Risk Premium S&amp;P 500
				" name="keywords"/>
<meta content="
					
					The market risk premium reflects the additional return required by investors in excess of the risk-free rate. The ERP is essential for the calculation of discount rates and derived from the CAPM. It stems from the IRR which equalizes the discounted present value of forecast cash flow and the current share price.&lt;br/&gt;Details on the concepts and methodology, along with some examples and a glossary, are provided in the site's methodology section, particularly methodological notes 1, 2, 3 and 5. 
				" name="description"/>
<link href="/fairness-finance/custom/ebiz/icon/favicon.ico" rel="icon"/>
<title>
					Equity Risk Premi