Skip to content

ScrapeGraphAI/Product-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Product Scraper - Quickstart

Scrape product details from a website extract all the information you need

Follow this guide to scrape product details from a website and export as CSV.

Prerequisites

  • Python 3.10+
  • An API key for scrapegraph_py (SGAI). Get one from ScrapeGraphAI.

Setup

  1. Clone the repo and enter the folder.

  2. Install dependencies (choose one):

# Using uv (recommended if you have uv)
uv sync

# Or using pip
pip install -e .
  1. Configure your API key:
  • Option A: Open main.py and set your key where the client is initialized.
  • Option B: Set SGAI_API_KEY in a .env file.

Run

python main.py

Tuning

  • Pages: Adjust TOTAL_PAGES at the top of main.py for the first step.
  • Concurrency: Update CONCURRENCY to control parallel requests.

How it works (code steps)

  1. Configure pagination

    • You can set the website url and prompt as per your requirements.
    • Sets TOTAL_PAGES to control how many result pages to scan.
  2. Request listing data

    • Calls client.smartscraper(...) with output_schema=MainSchema and total_pages=TOTAL_PAGES.
  3. Extract URLs

    • Iterates response.result.pages[*].result.phones and collects URLs.
  4. Helpers

    • format_value_as_text(...) normalizes values.
    • build_row_from_detail(...) maps a detail response to a CSV row.
  5. Async parallel detail scraping

    • CONCURRENCY sets max parallelism.
    • fetch_one(...) runs client.smartscraper in a worker thread via asyncio.to_thread.
    • scrape_all(...) uses a semaphore and asyncio.gather to fetch all URLs.
  6. Write CSV

    • Calls asyncio.run(scrape_all(product_urls)), then writes rows with csv.DictWriter.

Notes

  • Be mindful of target site policies and rate limits.

Customize

  • You can change the listing and detail prompts in main.py to extract exactly what you need.
  • You can also adjust the MainSchema and the CSV columns (build_row_from_detail(...) and fieldnames) to fit your data model.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages