Amazon Product Scraper

This project is a Python-based web scraper designed to extract product listings and details from Amazon.in. It uses the requests library for HTTP requests and BeautifulSoup for parsing HTML content.

Features

Product Listing Extraction: Retrieves product name, URL, price, rating, number of reviews, description, ASIN, and manufacturer.
CSV Export: Exports scraped data to a CSV file for further analysis and processing.
Configurable Parameters: Allows specifying the search query and the number of pages to scrape.

Technologies Used

Python 3.11.3
Requests Library
BeautifulSoup Library

Installation

Clone the repository:

git clone https://github.com/dasdebanna/Amazon-Product-Scraper.git

Navigate to the project directory:
```
cd Amazon-Product-Scraper
```
Install the required libraries:
```
pip install requests beautifulsoup4
```

Usage

Open the scraper.py file.
Set the url variable to the desired Amazon search results page URL:
```
url = 'https://www.amazon.in/s?k=product'
```
Specify the number of pages to scrape by setting the num_pages variable:
```
num_pages = 5
```
Run the script:
```
python scraper.py
```
The scraped product data will be saved in the product_data.csv file.

Project Structure

scraper.py: Main script for scraping Amazon product data.
product_data.csv: Output file containing the scraped product data.

Detailed Description

The scraper.py script performs the following steps:

Fetch HTML Content: Uses the requests library to get the HTML content of the Amazon search results page.
Parse HTML Content: Utilizes BeautifulSoup to parse the HTML and extract product details.
Extract Data: Gathers information such as product name, URL, price, rating, number of reviews, description, ASIN, and manufacturer.
Store Data: Stores the extracted data in a list of dictionaries.
Export to CSV: Writes the collected data to a CSV file for easy analysis and processing.

Code Overview

Fetching HTML Content:

response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

Extracting Data:

product_name = item.find('span', class_='a-size-medium a-color-base a-text-normal').text
product_url = 'https://www.amazon.in' + item.find('a', class_='a-link-normal')['href']
price = item.find('span', class_='a-price-whole').text

Writing to CSV:

with open('product_data.csv', 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(products)

Contributing

Contributions are welcome! Please follow these steps to contribute:

Fork the repository.
Create a new branch:
```
git checkout -b feature-branch
```
Make your changes and commit them:
```
git commit -m "Add new feature"
```
Push to the branch:
```
git push origin feature-branch
```
Create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Product Scraper

Table of Contents

Features

Technologies Used

Installation

Usage

Project Structure

Detailed Description

Code Overview

Contributing

License

About

Releases

Packages

Languages

dasdebanna/Amazon-Product-Scraper

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Scraper

Table of Contents

Features

Technologies Used

Installation

Usage

Project Structure

Detailed Description

Code Overview

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages