Skip to content

ali27kh/Python_Web_Scraping

Repository files navigation

📊 Enterprise Data Scraping and Email Validation

📌 Project Overview

This project automates the collection of enterprise-related data from multiple sources using Python and Selenium for web scraping. It includes validating email addresses, extracting comments from a Facebook post, and gathering enterprise information (e.g., name, address, SIREN) from two French business directories. The scraped data is saved as CSV files for further analysis.


📂 Data Sources


** Chrome Setup**

Configure and initialize Chrome WebDriver for scraping.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

chrome_options = Options()
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)

📦 Requirements

pip install selenium webdriver-manager pandas requests

▶️ How to Run

  1. Clone the repository:
    git clone https://github.com/ali27kh/Python_Web_Scraping.git
    cd Python_Web_Scraping
  2. Install dependencies.
  3. Run the scraping script.

📌 Key Insights

  • Email validation ensures reliable contact information using VerifyEmailAddress.org.
  • Facebook comment scraping from posts.
  • Le Figaro and Manageo provide complementary enterprise data.
  • Selenium automation ensures robust handling of dynamic web content.

📜 License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published