Skip to content

Amazon Page Scraper Script: Extract Products, Store in Elasticsearch, and Query Data #4

@lurenss

Description

@lurenss

Create a Python script that extracts product data from the first 20 pages of Amazon's search results for the query "keyboard" (e.g., https://www.amazon.it/s?k=keyboard&page=N). The script should:

  1. Iterate through pages 1 to 20.
  2. Use the API key: sgai-763dcc80-3a64-417f-b9bf-b98c8f50cc4b for ScrapeGraphAI.
  3. Extract the following product information per page, per product:
    • Name
    • Price
    • Review stars
    • Number of reviews
    • Prime availability
  4. Store the extracted data in Elasticsearch.
  5. After populating the Elasticsearch index, run several interesting queries on the dataset (e.g., top-rated products, most-reviewed, price distribution, prime vs. non-prime products, etc.).

Please include:

  • Robust error handling for possible scraping and storage failures.
  • Clear instructions on dependencies and how to run the script.
  • Example outputs for the queries on the dataset.

Repository: ScrapeGraphAI/scrapegraph-elasticsearch-demo

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions