Welcome to the Amazon Product Scraper, a Python script designed to extract product information from Amazon's search results. This script, powered by BeautifulSoup and requests, provides a convenient way to gather data on titles, prices, ratings, reviews, availability, brand, and product names.
Before diving into the world of Amazon scraping, ensure you have the following enchantments:
- Python 3.x
- Essential Python scrolls (install using
pip install scroll_name):- beautifulsoup4
- requests
- pandas
- numpy
-
Brew a concoction of required Python libraries:
pip install -r requirements.txt
-
Invoke the script
pip install -r requirements.txt
-
Explore Amazon's massive dataset, collecting data on various products.
-
The generated "amazon_data.csv" file, which is a set of product information, will be the basis for further price analyzes and detailed parameters.
- Extracts the title of the product from the Amazon product page.
- Extracts the price of the product, considering deal prices if available.
- Extracts the product rating, handling variations in HTML structure.
- Extracts the number of customer reviews for the product.
- Determines the availability status of the product.
- Extracts the brand of the product.
- Extracts the name of the product.
- Sends a request to the Amazon search page.
- Extracts links to individual product pages from the search results.
- Iterates through each product page, extracting relevant information using the defined functions.
- Creates a pandas DataFrame with the collected data.
- Cleans the data by dropping rows with missing titles.
- Saves the final dataset as "amazon_data.csv."