This project automates the collection of enterprise-related data from multiple sources using Python and Selenium for web scraping. It includes validating email addresses, extracting comments from a Facebook post, and gathering enterprise information (e.g., name, address, SIREN) from two French business directories. The scraped data is saved as CSV files for further analysis.
- Email Validation: VerifyEmailAddress.org to check email validity.
- Facebook Comments: Extracted from a specific post on MBI Network.
- Enterprise Info:
Configure and initialize Chrome WebDriver for scraping.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
pip install selenium webdriver-manager pandas requests
- Clone the repository:
git clone https://github.com/ali27kh/Python_Web_Scraping.git cd Python_Web_Scraping
- Install dependencies.
- Run the scraping script.
- Email validation ensures reliable contact information using VerifyEmailAddress.org.
- Facebook comment scraping from posts.
- Le Figaro and Manageo provide complementary enterprise data.
- Selenium automation ensures robust handling of dynamic web content.
MIT License