# Selenium

This code was prepared for the article: [Selenium introduction: A web scraping tools in practice](https://www.forloop.ai/blog/selenium).

In [52]:
from selenium import webdriver
from selenium.webdriver.common.by import By

### Open the 1st article

In [30]:
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to the forloop.ai blog page
driver.get("https://www.forloop.ai/blog")

In [19]:
# Find the first article link
first_article = driver.find_element(By.CLASS_NAME, "article-item")

# Click the article link
first_article.click()

### Extract all articles

In [31]:
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to the forloop.ai blog page
driver.get("https://www.forloop.ai/blog")

# Find the first article link
articles = driver.find_elements(By.CSS_SELECTOR, "div.article-item")

# Extract titles and URLs
for item in articles:
    title = item.find_element(By.CSS_SELECTOR, "h4").text
    tag = item.find_element(By.CSS_SELECTOR, "div.text-white").text
    date = item.find_element(By.CSS_SELECTOR, "div.blog-post-date").text
    print(f'Title: {title}\nTag: {tag}\nDate: {date}\n---')

driver.quit()

Title: Scrapy introduction: A web scraping tools in practice
Tag: Tutorial
Date: July 7, 2023
---
Title: Puppeteer Python API introduction: A web scraping tools in practice
Tag: Tutorial
Date: June 16, 2023
---
Title: Beautiful Soup introduction: A web scraping tools in practice
Tag: Tutorial
Date: June 2, 2023
---
Title: This is the Data: A Mandalorian Guide to Web Scraping Best Practices
Tag: Tutorial
Date: May 31, 2023
---
Title: Real Estate Crowdfunding platform leads generation strategies.
Tag: Business
Date: May 25, 2023
---
Title: Navigating Regulatory Challenges in Real Estate Crowdfunding
Tag: Business
Date: May 25, 2023
---
Title: The Role of AI in Real Estate Crowdfunding
Tag: Business
Date: May 3, 2023
---
Title: How to Start Investing as a Non-Accredited Investors in Real Estate Crowdfunding Platforms
Tag: Business
Date: April 26, 2023
---
Title: The Future of Real Estate Crowdfunding: Market Trends and Opportunities
Tag: Business
Date: April 19, 2023
---
Title: Challenges

### Tips & Tricks

**Running in Headless Mode**

In [37]:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options


options = Options()
options.add_argument("--headless=new")

driver = webdriver.Firefox(options=options)
driver.get("https://www.forloop.ai/blog")

driver.quit()

**Employ Explicit Waits**

In [38]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get("https://www.forloop.ai/blog")

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "div.article-item"))
    )
finally:
    driver.quit()

**Combining Selenium with BeautifulSoup**

In [46]:
from selenium import webdriver
from bs4 import BeautifulSoup


url = "https://www.forloop.ai/blog"

driver = webdriver.Firefox()
driver.get(url)

soup = BeautifulSoup(driver.page_source, "html.parser")
articles = soup.find_all('div', {'class': 'article-item'})

# Create an empty list to store the data
data = []

# Iterate through each article element
for article in articles:
    title = article.find('h4').text
    date = article.find(class_='blog-post-date').text
    link = article.find('a')['href']

    # Append the data as a dictionary to the list
    data.append({'title': title, 'date': date, 'link': link})

# Present all data
for n in range(0, len(data)):
    print(f"Title: {data[n]['title']}")
    print(f"Release date: {data[n]['date']}")
    print(f"Link: {url}{data[n]['link']}\n---")

Title: Scrapy introduction: A web scraping tools in practice
Release date: July 7, 2023
Link: https://www.forloop.ai/blog/blog/scrapy
---
Title: Puppeteer Python API introduction: A web scraping tools in practice
Release date: June 16, 2023
Link: https://www.forloop.ai/blog/blog/puppeteer
---
Title: Beautiful Soup introduction: A web scraping tools in practice
Release date: June 2, 2023
Link: https://www.forloop.ai/blog/blog/beautiful-soup
---
Title: This is the Data: A Mandalorian Guide to Web Scraping Best Practices
Release date: May 31, 2023
Link: https://www.forloop.ai/blog/blog/scraping-best-practices
---
Title: Real Estate Crowdfunding platform leads generation strategies.
Release date: May 25, 2023
Link: https://www.forloop.ai/blog/blog/real-estate-crowdfunding-platform-leads-generation
---
Title: Navigating Regulatory Challenges in Real Estate Crowdfunding
Release date: May 25, 2023
Link: https://www.forloop.ai/blog/blog/real-estate-regulatory-challenges
---
Title: The Role of 

**Implement Error Handling**

In [49]:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException


driver = webdriver.Firefox()
driver.get("https://www.forloop.ai/blog")

try:
    element = driver.find_element(By.CSS_SELECTOR, "div.article-title")
except NoSuchElementException:
    print("Element not found")

Element not found


**Use Page Object Model (POM)**

In [51]:
from selenium import webdriver
from selenium.webdriver.common.by import By


class BlogPage:
    def __init__(self, driver):
        self.driver = driver
        self.articles = self.driver.find_elements(By.CSS_SELECTOR, 'div.article-item')

        
driver = webdriver.Firefox()
driver.get("https://www.forloop.ai/blog")


blog_page = BlogPage(driver)
print(len(blog_page.articles))  # prints the number of articles

23
