GitHub - bellecoding/Automated-Price-Scraper: A Python web scraper to collect product prices across multiple European e-commerce sites.

E-Commerce Price Scraper for European Markets

I built this automated web scraper to collect product prices from e-commerce sites across nine European countries. The goal was to create a reliable tool that could handle dynamic JavaScript content and various anti-scraping measures.

Technologies & Features Core Logic: Python

Browser Automation: Selenium WebDriver (with Firefox/GeckoDriver)

Data Processing & Export: Pandas

Concurrency: Python's threading and queue modules

Key Features:

Scalable to hundreds of URLs across multiple countries.

A persistent browser pool architecture for high stability and performance.

Handles complex popups and cookie banners in multiple languages.

Automates data export to a multi-sheet Excel report, categorized by country.

Key Challenges & My Solutions

Initial Instability

The first version of the scraper was unstable, with frequent browser driver crashes. I solved this by migrating the automation from Chrome to Firefox and implementing a robust cleanup process that fully terminates old processes and clears temporary profiles before each run.

Poor Performance

Once stable, the scraper was too slow for practical use, with a projected runtime of over 8 hours. I re-architected the core logic from launching a new browser for every URL to using a persistent browser pool. This producer-consumer model reduced the total runtime by over 80% and made the tool viable for the client.

Dynamic Content & Anti-Scraping

Many sites blocked content with dynamic cookie banners in various languages. I built a flexible handler that uses a comprehensive list of keywords and selectors to reliably dismiss these popups, which resulted in a final success rate of over 95%.

Final Excel output showing successful data export, separated by country:

How to Run

Ensure all requirements are installed: pip install -r requirements.txt

Place geckodriver.exe in the project directory.

Run the script: python price_scraper.py

To run in test mode (first 20 URLs): python price_scraper.py --test

Architecture

A simple diagram showing the scraper's design. A central queue feeds URLs to multiple browser workers that run in parallel to process the data and save it to the final Excel file.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
ScraperDiagram.drawio.png		ScraperDiagram.drawio.png
output image.jpg		output image.jpg
price_scraper.py		price_scraper.py
requirements.txt		requirements.txt
sample_input.xlsx		sample_input.xlsx
sample_selectors.xlsx		sample_selectors.xlsx
scraper.jpg		scraper.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

bellecoding/Automated-Price-Scraper

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages