AmazonScraper is a Python web scraping project built with Scrapy that allows you to extract product information from Amazon.com. It retrieves details such as product names, prices, ratings, and reviews for a given search query on Amazon.
This project demonstrates how to use Scrapy with ScrapeOps Proxy for web scraping with JavaScript rendering. It provides a basic setup for rendering JavaScript content and routing requests through the ScrapeOps proxy network.
A test product_data.csv
file is included to show the data it will pull from Amazon.
- Search and scrape product information from Amazon.in.
- Extract product details.
- Save the scraped data to a CSV file for further analysis.
-
Clone the repository:
git clone https://github.com/Blank333/amazonScraper.git
-
Navigate to the project directory:
cd amazonScraper
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
-
Install the required dependencies:
pip install -r requirements.txt
This will install Scrapy, ScrapeOps Proxy, and ipython along with their dependencies.
-
Configure Scrapy with ScrapeOps Proxy:
- Open the
settings.py
file and update theSCRAPEOPS_API_KEY
variable with your ScrapeOps API key.
- Open the
-
Run the spider:
scrapy crawl amazonSpider
The spider will now use ScrapeOps Proxy to render JavaScript content and route requests through the ScrapeOps proxy network.
- Adjust the spider logic in
spiders/your_spider.py
to define the specific scraping behavior you need. - Modify the proxy as per your requirements.
This project is licensed under the MIT License.