GitHub - ScrapeGraphAI/Product-Scraper

Product Scraper - Quickstart

Scrape product details from a website extract all the information you need

Follow this guide to scrape product details from a website and export as CSV.

Prerequisites

Python 3.10+
An API key for scrapegraph_py (SGAI). Get one from ScrapeGraphAI.

Setup

Clone the repo and enter the folder.
Install dependencies (choose one):

# Using uv (recommended if you have uv)
uv sync

# Or using pip
pip install -e .

Configure your API key:

Option A: Open main.py and set your key where the client is initialized.
Option B: Set SGAI_API_KEY in a .env file.

Run

python main.py

Tuning

Pages: Adjust TOTAL_PAGES at the top of main.py for the first step.
Concurrency: Update CONCURRENCY to control parallel requests.

How it works (code steps)

Configure pagination
- You can set the website url and prompt as per your requirements.
- Sets TOTAL_PAGES to control how many result pages to scan.
Request listing data
- Calls client.smartscraper(...) with output_schema=MainSchema and total_pages=TOTAL_PAGES.
Extract URLs
- Iterates response.result.pages[*].result.phones and collects URLs.
Helpers
- format_value_as_text(...) normalizes values.
- build_row_from_detail(...) maps a detail response to a CSV row.
Async parallel detail scraping
- CONCURRENCY sets max parallelism.
- fetch_one(...) runs client.smartscraper in a worker thread via asyncio.to_thread.
- scrape_all(...) uses a semaphore and asyncio.gather to fetch all URLs.
Write CSV
- Calls asyncio.run(scrape_all(product_urls)), then writes rows with csv.DictWriter.

Notes

Be mindful of target site policies and rate limits.

Customize

You can change the listing and detail prompts in main.py to extract exactly what you need.
You can also adjust the MainSchema and the CSV columns (build_row_from_detail(...) and fieldnames) to fit your data model.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Product Scraper - Quickstart

Scrape product details from a website extract all the information you need

Follow this guide to scrape product details from a website and export as CSV.

Prerequisites

Setup

Run

Tuning

How it works (code steps)

Notes

Customize

License

About

Uh oh!

Releases

Packages

Languages

ScrapeGraphAI/Product-Scraper

Folders and files

Latest commit

History

Repository files navigation

Product Scraper - Quickstart

Scrape product details from a website extract all the information you need

Follow this guide to scrape product details from a website and export as CSV.

Prerequisites

Setup

Run

Tuning

How it works (code steps)

Notes

Customize

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages