# Webscraping Data from Amazon

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. The web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. 

While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

We would be using few python libraries for this purpose which are :-

- BeautifulSoup
- Requests
- Pandas

The website we are going to be scraping is https://www.amazon.in/gp/bestsellers/luggage.

In [1]:
#Importing the libraries 
from bs4 import BeautifulSoup
import requests
import pandas as pd

Using the request library, what we are doing is sending request to the server to retrieve the information from the HTML page essentially and using the BeautifulSoup, we are able to parse the data to see the data clearly. 

BeautifulSoup is a powerful and useful library that gives many methods to navigate through the HTML page in order to get to the data the user is looking for.

In [2]:
request = requests.get('https://www.amazon.in/gp/bestsellers/luggage')
soup = BeautifulSoup(request.text,'html.parser')

In the cell below we have imported all the data that has the 'div' class and we are only pulling in the text data. We later on created a variable Product that is essentially a blank list and appending all the text data to the Product list and removing any unwanted newlines or special characters by using a FOR loop.

In [3]:
xyz = soup.find('div', text = "")

In [4]:
product = xyz.find_all('div', class_='p13n-sc-truncate p13n-sc-line-clamp-2')

In [5]:
Product = []
for prod in product:
    Product.append(prod.text.strip())
    print(prod.text.strip())

NAPA HIDE Black Leather Wallet for Men
GLUN Bolt Electronic Portable Fishing Hook Type Digital LED Screen Luggage Weighing Scale, 50 kg/110 Lb (Black)
WILDHORN® Carter Leather Wallet for Men (Black Croco)
American Tourister Casual Backpack
Storite PU Leather 9 Slot Vertical Credit Debit Card Holder Money Wallet Zipper Coin Purse for Men Women - Chocolate Brown
URBAN FOREST Black Leather Men's Card Holder With Pen Combo (UBF126BLK10208)
Skybags Trooper 55 Cms Polycarbonate Blue Hardsided Cabin Luggage
M MEDLER Epoch Nylon 55 litres Waterproof Strolley Duffle Bag- 2 Wheels - Luggage Bag - (Navy Blue)
SAFARI 15 Ltrs Sea Blue Casual/School/College Backpack (DAYPACKNEO15CBSEB) & SAFARI 15 Ltrs Cherry Red Casual/School/College Backpack (DAYPACKNEO15CBCRE)
Urban Forest Oliver Black RFID Blocking Leather Wallet for Men
Priority Disney Princess Belle 25 litres Yellow & Pink Polyester School Bag | Casual Bags | for Girls, Kids Backpack (Fairy 007)
GoTrippin Metal Luggage Weighing Scale Digital (

We have on created a variable Price that is essentially a blank list and appending all the text data to the Price list and removing any unwanted newlines or special characters by using a FOR loop.

In [6]:
price = xyz.find_all('a', class_ = 'a-link-normal a-text-normal')

In [7]:
Price = []
for pri in price:
    Price.append(pri.text.strip())
    print(pri.text.strip())

₹320.00 - ₹643.00
₹299.00
₹407.00 - ₹2,099.00
₹1,099.00 - ₹2,300.00
₹449.00 - ₹849.00
₹455.00 - ₹699.00
₹2,969.00 - ₹8,630.00
₹569.00 - ₹640.00
₹299.00 - ₹658.00
₹455.00 - ₹699.00
₹259.00 - ₹475.00
₹799.00
₹449.00 - ₹1,050.00
₹299.00 - ₹899.00
₹298.00 - ₹598.00
₹299.00 - ₹799.00
₹1,399.00
₹2,419.00
₹4,289.00
₹3,699.00 - ₹7,098.00
₹208.00 - ₹303.00
₹3,799.00
₹499.00
₹320.00 - ₹2,999.00
₹949.00 - ₹1,945.00
₹1,649.00 - ₹1,804.00
₹449.00 - ₹849.00
₹3,599.00
₹899.00
₹339.00 - ₹369.00
₹269.00 - ₹949.00
₹702.00 - ₹867.00
₹195.00 - ₹999.00
₹290.00
₹322.00 - ₹619.00
₹2,759.00
₹4,299.00
₹357.00 - ₹359.00
₹199.00 - ₹449.00
₹3,599.00
₹2,759.00
₹2,251.00
₹495.00 - ₹499.00
₹959.00 - ₹1,350.00
₹476.00
₹3,189.00
₹2,399.00


Now that we have got our names and the prices of the product from the website in to the form of a list we have converted the list to a dataframe using Pandas library so that this data can be processed further for analysis. You can also write this data to a CSV file using the CSV library available in Python https://docs.python.org/3/library/csv.html

In [8]:
df = pd.DataFrame({'Product': Product,'Price': Price,})
df

Unnamed: 0,Product,Price
0,NAPA HIDE Black Leather Wallet for Men,₹320.00 - ₹643.00
1,GLUN Bolt Electronic Portable Fishing Hook Typ...,₹299.00
2,WILDHORN® Carter Leather Wallet for Men (Black...,"₹407.00 - ₹2,099.00"
3,American Tourister Casual Backpack,"₹1,099.00 - ₹2,300.00"
4,Storite PU Leather 9 Slot Vertical Credit Debi...,₹449.00 - ₹849.00
5,URBAN FOREST Black Leather Men's Card Holder W...,₹455.00 - ₹699.00
6,Skybags Trooper 55 Cms Polycarbonate Blue Hard...,"₹2,969.00 - ₹8,630.00"
7,M MEDLER Epoch Nylon 55 litres Waterproof Stro...,₹569.00 - ₹640.00
8,SAFARI 15 Ltrs Sea Blue Casual/School/College ...,₹299.00 - ₹658.00
9,Urban Forest Oliver Black RFID Blocking Leathe...,₹455.00 - ₹699.00
