AMAZON MULTIPAGE SCRAPPER

A Multi-Page Scraper made with the help of Selenium and Beautiful Soup 🛒

Overview

Selenium is an extremely powerful tool used for web data scraping however, it has some flaws that are fair because it was produced mainly to test web applications. On the other hand, BeautifulSoup was developed and produced for data scraping and it is extremely powerful indeed. However, even if BeautifulSoup has its faults, it won’t be beneficial if the required data is behind the “wall”, as it needs the user’s login for accessing the data or needs some actions from users. That’s where we can utilize Selenium, for automating user interactions through the website as well as we would use BeautifulSoup for scraping data.

I would be using BeautifulSoup and Selenium to extract product information like name, ratings, etc. from https://www.amazon.in/.

Libraries-Used

Selenium
BeautifulSoup
time
pandas

Workflow

Opening Chrome browser and accessing https://www.amazon.in/ via Selenium.
Getting the "Search Term" thourgh user input.
Formatting the url to a dynamic url where "Search Term" can be changed.
Scraping the Product Names, Prices, Ratings, No. Of Items Sold and Link for each of the products.
Navigating to the next page using Selenium by clicking the next button automatically after the current page is being scraped.
Handling errors like if some elements are not present (for example if reviews are not present) with try and catch block.
Converting the scrapped data to .csv file using the pandas library.

Note: I set the page limit to first 7 pages for scraping, you can increase it to scrap more pages.

Screenshots

Automatically scrapping each page one by one.

The output .csv file.

Challenges

There were some AttributeErrors so had to use try catch block to overcome it.
I used Selenium for paginating through the next page so if we come over to the last page then there would be error as the "next page" button won't be there. I handeled it using try and catch block and a recursive function which will exit the scrapping after it reaches the last page.
Scrapping in Amazon isn't allowed so had to use a time function and delayed the scraping with some seconds so that the site dosen't think that a bot is scraping,
Scraping huge amount of data in Amazon might lead to an ip ban so I scraped the first 7 pages.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
gif		gif
AmazonScrapper.py		AmazonScrapper.py
README.md		README.md
ScrappedData.csv		ScrappedData.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMAZON MULTIPAGE SCRAPPER

A Multi-Page Scraper made with the help of Selenium and Beautiful Soup 🛒

Table of Contents

Overview

Libraries-Used

Workflow

Screenshots

Challenges

Thankyou For Spending Your Precious Time Going Through This Project!

If You Find Any Value In This Project Or Gained Something New Please Do Give A ⭐.

About

Releases

Packages

Languages

Kens3i/Amazon-Multipage-Scrapper-using-Selenium-and-Beautiful-Soup

Folders and files

Latest commit

History

Repository files navigation

AMAZON MULTIPAGE SCRAPPER

A Multi-Page Scraper made with the help of Selenium and Beautiful Soup 🛒

Table of Contents

Overview

Libraries-Used

Workflow

Screenshots

Challenges

Thankyou For Spending Your Precious Time Going Through This Project!

If You Find Any Value In This Project Or Gained Something New Please Do Give A ⭐.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages