This project is a Python-based web scraper that extracts product information (title, price, and link) from Amazon.in using Selenium and BeautifulSoup, and saves the data to a structured CSV file. It's ideal for practicing web scraping and data handling in Python.
- Extracts product title, price, and link
- Handles missing data gracefully
- Saves data to a CSV file for easy access
- Fully automated and easy to run
📁 Selenium Python Project/
├── 📁 data/ # Folder containing saved HTML files from Amazon
├── 📄 collect.py # Main script to parse HTML and save CSV
├── 📄 requirements.txt # List of dependencies
├── 📄 data.csv # Output file with extracted data- You save Amazon product page HTML files in the
data/folder. - The script parses each file, extracts the product info, and stores it in a dictionary.
- All extracted data is saved to
data.csv.
- Python 3.x
- Selenium
- BeautifulSoup (bs4)
- Pandas
Install dependencies using:
pip install -r requirements.txtOr manually:
pip install selenium beautifulsoup4 pandas- Clone this repo:
git clone https://github.com/AdithyaSalian23/amazon-web-scraper
cd amazon-web-scraper-
Place your Amazon HTML files inside the data/ folder.
-
Run the scraper:
python collect.py- Check the generated data.csv file for the output. 🎉
🐍 Python
🌐 Selenium
🍜 BeautifulSoup
📊 Pandas
💾 Git & GitHub
This project is open-source and free to use under the MIT License.