#  Scraping Amazon “Best Seller” Books Using Python

![](https://i.imgur.com/ipbnY7p.png)

# Introduction about Web Scraping

Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications. There are many different ways to perform web scraping to obtain data from websites. These include using online services, particular API’s or even creating your code for web scraping from scratch. Many large websites, like Google, Twitter, Facebook, StackOverflow, etc. have API’s that allow you to access their data in a structured format.

![](https://i.imgur.com/CHZJDr8.png)

![](blob:https://imgur.com/8f07cfd9-a916-4f11-bf51-73770e27232d)

# Can we scrape data from everywhere?
Before you get too deep into the process of scraping, bear in mind that scraping causes a spike in website traffic and may cause the website server to crash. As a result, not all websites enable scraping. So, how can you know which websites are permitted and which are prohibited? The website’s robots.txt ‘file can be examined. Simply add “/robots.txt” to the end of the URL you want to scrape to get information on whether the website’s host allows scraping.

# Web Scrapping Libraries
Data can be scraped in several ways. There are dozens of web-scraping libraries in Python as well, but some of the most notable are “Requests,” “Beautiful Soup,” “Scrapy,” “lxml,” “Selenium,” and “AWS Lambda.” Requests is a web scraping library that allows you to communicate with web servers; the rest relies on your use case, such as:

1.Beautiful Soup: The Beautiful Soup library is an essential addition to your data science toolset since it is a basic and easy-to-use but powerful library that allows you to scrape data in just a few hours of practice. Its biggest strength is undoubtedly its simplicity.

2.Scrapy: Scrapy is a Python-based open-source web scraping framework. It’s used to create a sophisticated web scraper. You’ll find all of the tools you need to extract data from websites, process it as needed, and store it in the structure and format you wish. 

3.Selenium: Complex and dynamic codes are present on websites. Furthermore, it is preferable to render all of the website content using a browser first. To reach the webpage, Selenium uses a genuine web browser. This gives the impression that a real person is accessing data in the same way.

4.lxml: lxml is a production-quality HTML and XML parsing library with outstanding performance. You can rely on it to be beneficial to you regardless of which web page you are scraping.

5.AWS Lambda: For simpler tasks, AWS Lambda is wonderful. It is integrated with all of Amazon’s services. A Docker container is used to run the scraper. Every day, AWS Cloud Watch event rules deploy scraping jobs to lambdas. You can operate the server on a schedule rather than manually starting and stopping it.

![](https://i.imgur.com/ibrz0wS.png)

# Scraping top selling books on amazon

Amazon is one of the most popular online marketplaces used by both individual as well as businesses, and the site is available in many different countries and languages. Amazon Kindle is one of its most popular products as well as the Amazon app store. Lastly, Amazon offers software and infrastructure solutions for business and individuals.The page https://www.amazon.in/gp/bestsellers/books/ provides a list of top selling books Amazon. In this project,we'll retrive information from this page using _web scraping_ .
We'll use the Python libraries [Requests](https://realpython.com/python-requests/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) to scrape data from this page.

### First, let's go to the [page](https://www.amazon.in/gp/bestsellers/books/) we want to scrape and take a look at its layout:

![](https://i.imgur.com/8l3jLPq.png)

# Project Purpose
The Project purpose is to extract information from a website and save the extracted information into a CSV file. The result that we will get at the end is shown below:


![](https://i.imgur.com/kTQ6zNR.png)

# Project Outline:
1. Download the webpage using 'requests'.
2. Parse the HTML source code using BeautifulSoup
3. Extract book name, author name, stars, price and url from page
4. Compile extracted information into Python lists and dictionary
5. Extract and combine data from multiple page
6. Save the extracted information into a csv file

# How to run the code
You can execute the code using  the "Run" button at the top of this page and selecting `Run on Binder`

Note:We will use the Jovian library and its commit() function throughout the code to save our progress as we move along

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [3]:
# Execute this to save new versions of the notebook
jovian.commit(project="web-scraping-amazon-books")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "sabahatkhan0304/web-scraping-amazon-books" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/sabahatkhan0304/web-scraping-amazon-books[0m


'https://jovian.ai/sabahatkhan0304/web-scraping-amazon-books'

## Download the web page using `requests`
Requests library is one of the integral part of Python for making HTTP requests to a specified URL. Whether it be REST APIs or Web Scrapping, requests is must to be learned for proceeding further with these technologies. When one makes a request to a URI, it returns a response. Python requests provides inbuilt functionalities for managing both the request and response.The library can be installed using pip.

### Let's install and import `Requests` library

In [4]:
!pip install requests --upgrade --quiet

In [5]:
import requests

To download a page we can use the `get` function from requests, which returns a response object.

In [6]:
url= 'https://www.amazon.in/gp/bestsellers/books/'

In [7]:
response= requests.get(url)

We can check whether the web page is ordinary for web-scraping or not. If the status code falls between 200 and 299, the web page you selected is ordinary; otherwise, it is not. This status code refers to the status of a Hypertext Transfer Protocol (HTTP) response. A server issues status codes in response to a client’s request to the server. A complete list of user guides for these status codes can be found here. 

In [8]:
response.status_code

200

The request was successful. We can get the contents of the pageusing `response.text`

In [9]:
page_contents= response.text

In [10]:
len(page_contents)

318983

### The web page contains the HTML code

In [11]:
page_contents[:1000]

'<!doctype html><html lang="en-in" class="a-no-js" data-19ax5a9jf="dingo"><!-- sp:feature:head-start -->\n<head><script>var aPageStart = (new Date()).getTime();</script><meta charset="utf-8"/>\n<!-- sp:end-feature:head-start -->\n\n<!-- sp:feature:cs-optimization -->\n<meta http-equiv=\'x-dns-prefetch-control\' content=\'on\'>\n<link rel="dns-prefetch" href="https://images-eu.ssl-images-amazon.com">\n<link rel="dns-prefetch" href="https://m.media-amazon.com">\n<link rel="dns-prefetch" href="https://completion.amazon.com">\n<!-- sp:end-feature:cs-optimization -->\n\n<!-- sp:feature:aui-assets -->\n<link rel="stylesheet" href="https://images-eu.ssl-images-amazon.com/images/I/11EIQ5IGqaL._RC|01ZTHTZObnL.css,41C-I1lXVwL.css,31ufSReDtSL.css,013z33uKh2L.css,017DsKjNQJL.css,0131vqwP5UL.css,41EWOOlBJ9L.css,11TIuySqr6L.css,01ElnPiDxWL.css,11Qjwq-j69L.css,01Dm5eKVxwL.css,01IdKcBuAdL.css,01y-XAlI+2L.css,21P6CS3L9LL.css,01oDR3IULNL.css,41CYNGpGlrL.css,01XPHJk60-L.css,01smHc51S9L.css,21aPhFy+riL.cs

What we are looking at above is the [HTML source code]()of the web page.

We can also save it to a file and view the page locally within Jupyter using "File > Open".

We have successfully download the web page using request.

##  Use Beautiful Soup to parse and extract information

## What is BeautifulSoup?
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.For more details go through this link 
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

### Let's install and import `BeautifulSoup` library

In [12]:
# Install the library
!pip install beautifulsoup4 --upgrade --quiet

In [13]:
from bs4 import BeautifulSoup

In [14]:
doc=BeautifulSoup(page_contents, 'html.parser')

### We can use the `Beautiful Soup` class to parse the HTML document

In [15]:
# This function is going to parse the webpage using requests and beautifulsoup
def get_page(url):
    print("Scraping URL:", url)
    time.sleep(2)
    response= requests.get(url)
    print ("Status code:",response.status_code)
    page_contents= response.text
    with open('webpage.html', 'w') as f:
        f.write(response.text)
    doc = BeautifulSoup(page_contents, 'html.parser')
    return doc
    

![](https://i.imgur.com/nH2A7kj.png)

## 1. Extracting Book Title 

![](https://i.imgur.com/o7bg6QJ.png)

In [16]:
#extracting title of the book
book_title_tags= doc.find_all('div',{"class": "zg-grid-general-faceout"})

In [17]:
len(book_title_tags)

50

In [18]:
book_title_tags[0].find('a',{'class':'a-link-normal'})

<a class="a-link-normal" href="/Atomic-Habits-James-Clear/dp/1847941834/ref=zg_bs_books_sccl_1/000-0000000-0000000?pd_rd_i=1847941834&amp;psc=1" role="link" tabindex="-1"><div class="a-section a-spacing-mini _cDEzb_noop_3Xbw5"><img alt="Atomic Habits: The life-changing million copy bestseller" class="a-dynamic-image p13n-sc-dynamic-image p13n-product-image" data-a-dynamic-image='{"https://images-eu.ssl-images-amazon.com/images/I/91bYsX41DVL._AC_UL302_SR302,200_.jpg":[302,200],"https://images-eu.ssl-images-amazon.com/images/I/91bYsX41DVL._AC_UL604_SR604,400_.jpg":[604,400],"https://images-eu.ssl-images-amazon.com/images/I/91bYsX41DVL._AC_UL906_SR906,600_.jpg":[906,600]}' height="200px" src="https://images-eu.ssl-images-amazon.com/images/I/91bYsX41DVL._AC_UL302_SR302,200_.jpg" style="max-width:302px;max-height:200px"/></div></a>

In [19]:
# title of the first book
book_title_tags[0].find('span').text

'Atomic Habits: The life-changing million copy bestseller'

In [20]:
def get_book_title(doc):
    book_title_tags= doc.find_all('div',{"class": "zg-grid-general-faceout"})
    Book_Titles=[]
    for tag in book_title_tags:
        try:
            title_tag = tag.find('span')
            Book_Titles.append(title_tag.text)
        except :
            Book_Titles.append("Not Available")
    return Book_Titles
# this function will give the titles of the books

In [21]:
get_book_title(doc)[:10]       # titles of top ten  selling books on amazon

['Atomic Habits: The life-changing million copy bestseller',
 'Ikigai: The Japanese secret to a long and happy life',
 'The Psychology of Money',
 'My First Library: Boxset of 10 Board Books for Kids',
 'The Power of Your Subconscious Mind',
 'Word Power Made Easy',
 'It Ends With Us: A Novel: Volume 1',
 'SARALA PAṬHANĪYA ŚRĪMADBHAGAVADGĪTĀ - सरल पठनीय श्रीमद्भगवद्गीता - LEARNGEETA - GEETA PARIWAR',
 "Life's Amazing Secrets: How to Find Balance and Purpose in Your Life | Inspirational Zen book on motivation, self-development & healthy living",
 'Rich Dad Poor Dad : What The Rich Teach Their Kids About Money That The Poor And Middle Class Do Not!: (25th Anniversary Edition)']

 we have successfully scraped the title of the books.

## 2. Extracting Author Name

In [22]:
# extracting Author name by defining tag and class in html

In [23]:
author_name_tags= doc.find_all('div',{'class':'zg-grid-general-faceout'})

In [24]:
len(author_name_tags)

50

In [25]:
author_name_tags[0].find('a',{'class':'a-size-small a-link-child'})

<a class="a-size-small a-link-child" href="/James-Clear/e/B07DJTJC3X/ref=zg_bs_books_bl_sccl_1/000-0000000-0000000?pd_rd_i=1847941834"><div class="_cDEzb_p13n-sc-css-line-clamp-1_1Fn1y">James Clear</div></a>

In [26]:
def get_all_authors(doc):
    author_name_tags= doc.find_all('div',{'class':'zg-grid-general-faceout'})
    Author_Names=[]
    for tag in author_name_tags:
        try:
            Author_Names.append(tag.find('div',{'class':'a-row a-size-small'}).text)
        except :
            Author_Names.append("Not Available")
    return Author_Names
# This function will help to get the name of the authors

In [27]:
get_all_authors(doc)[:10] 

['James Clear',
 'Héctor García',
 'Morgan Housel',
 'Wonder House Books',
 'Joseph Murphy',
 'Norman Lewis',
 'Colleen Hoover',
 'GEETA PARIWAR',
 'Gaur Gopal Das',
 'Robert T. Kiyosaki']

##  3. Extracting  star rating on each book

In [28]:
rating= 'a-icon a-icon-star-small a-star-small-4-5 aok-align-top'
rating_tags= doc.find_all('i',{'class': rating})

In [29]:
len(rating_tags)

43

In [30]:
rating_tags[:1]

[<i class="a-icon a-icon-star-small a-star-small-4-5 aok-align-top"><span class="a-icon-alt">4.7 out of 5 stars</span></i>]

In [31]:
rating_tags[0].find('span').text

'4.7 out of 5 stars'

In [32]:
def get_all_stars(doc):
    rating_tags= doc.find_all('div',{'class':'zg-grid-general-faceout'})
    Stars=[]
    for tag in rating_tags:
        try:
            Stars.append(tag.find('span',{'class':'a-icon-alt'}).text)
        except :
            Stars.append("Not Available")
            
    return Stars

rating = rating_tags[0].find('i',{'class':rating})
if title:
    title_list.append(title)
else:
    append("None")

In [33]:
 get_all_stars(doc)[:10]

['4.7 out of 5 stars',
 '4.6 out of 5 stars',
 '4.6 out of 5 stars',
 '4.5 out of 5 stars',
 '4.5 out of 5 stars',
 '4.4 out of 5 stars',
 '4.5 out of 5 stars',
 '5.0 out of 5 stars',
 '4.6 out of 5 stars',
 '4.6 out of 5 stars']

## 4. Extracting Book Price

In [34]:
book_price_tags= doc.find_all('div',{"class": "zg-grid-general-faceout"})

In [35]:
book_price_tags[0].find('span',class_ ='p13n-sc-price').text

'₹295.00'

In [36]:
len(book_price_tags)

50

In [37]:
def get_all_price(doc):
    book_price_tags= doc.find_all('div',{"class": "zg-grid-general-faceout"})
    Book_Price=[]
    for tag in book_price_tags:
        try:
            Book_Price.append(tag.find('span',{'class':'p13n-sc-price'}).text)
        except :
            Book_Price.append("Not Available")
            
    return Book_Price

In [38]:
 get_all_price(doc)[:10]

['₹295.00',
 '₹300.00',
 '₹255.00',
 '₹399.00',
 '₹115.00',
 '₹122.00',
 '₹215.00',
 '₹169.00',
 '₹159.00',
 '₹223.00']

We have scraped the price of top 10 selling books on amazon.

## 5. Extracting  Book URL

In [39]:

book_url_tag=doc.find_all('div',{"class": "zg-grid-general-faceout"})

In [40]:
len(book_url_tag)

50

In [41]:
book_url_tag0=book_url_tag[0].find('a',{'class':'a-link-normal'})

In [42]:
def get_all_url(doc):
    book_url_tag=doc.find_all('div',{"class": "zg-grid-general-faceout"})
    Book_Title_Urls=[]
    base_url="https://www.amazon.in"
    for tag in book_url_tag:
        try:
            Book_Title_Urls.append(base_url + tag.find('a',{'class':'a-link-normal'})['href'])
        except :
            Book_Title_Urls.append("Not Available")
    return Book_Title_Urls

In [43]:
get_all_url(doc)[:10]

['https://www.amazon.in/Atomic-Habits-James-Clear/dp/1847941834/ref=zg_bs_books_sccl_1/000-0000000-0000000?pd_rd_i=1847941834&psc=1',
 'https://www.amazon.in/Ikigai-H%C3%A9ctor-Garc%C3%ADa/dp/178633089X/ref=zg_bs_books_sccl_2/000-0000000-0000000?pd_rd_i=178633089X&psc=1',
 'https://www.amazon.in/Psychology-Money-Morgan-Housel/dp/9390166268/ref=zg_bs_books_sccl_3/000-0000000-0000000?pd_rd_i=9390166268&psc=1',
 'https://www.amazon.in/My-First-Library-Boxset-Board/dp/9387779262/ref=zg_bs_books_sccl_4/000-0000000-0000000?pd_rd_i=9387779262&psc=1',
 'https://www.amazon.in/Power-Your-Subconscious-Mind/dp/8194790832/ref=zg_bs_books_sccl_5/000-0000000-0000000?pd_rd_i=8194790832&psc=1',
 'https://www.amazon.in/Word-Power-Made-Norman-Lewis/dp/0143424688/ref=zg_bs_books_sccl_6/000-0000000-0000000?pd_rd_i=0143424688&psc=1',
 'https://www.amazon.in/Ends-Us-Novel-Colleen-Hoover/dp/1501110365/ref=zg_bs_books_sccl_7/000-0000000-0000000?pd_rd_i=1501110365&psc=1',
 'https://www.amazon.in/SARALA-PA%E1%B9

We have successful scraped 10 book url.

# Extract and combine data from multiple pages

In [44]:
#This function is used to get the BS4 doc by providing the page number
def get_doc(page_number):
    time.sleep(2)
    url='https://www.amazon.in/gp/bestsellers/books/ref=zg_bs_pg_{}?ie=UTF8&pg={}'.format(str(page_number),str(page_number)) 
    doc=get_page(url)
    return doc
    

# Creating a DataFrame using `Pandas`

## What is Pandas?
Pandas is an open-source library that is made mainly for working with relational or labeled data both easily and intuitively. It provides various data structures and operations for manipulating numerical data and time series. This library is built on top of the NumPy library. Pandas is fast and it has high performance & productivity for users.

![](https://i.imgur.com/lxU0hTf.png)

## What is Data Frame?
A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. DataFrames are one of the most common data structures used in modern data analytics because they are a flexible and intuitive way of storing and working with data.

The import pandas portion of the code tells Python to bring the pandas data analysis library into your current environment.

In [45]:
!pip install pandas --quiet

In [46]:
import pandas as pd

In [47]:
# At last extract all the details from a page
import time
def get_all_details(n):
    all_books={'Title': [], 'Author': [], 'Stars': [], 'Price': [], 'URL': []}
    for page_number in range(1,n+1):
        doc = get_doc(page_number)
        all_books['Title'] += get_book_title(doc)
        time.sleep(1)
        all_books['Author'] += get_all_authors(doc)
        time.sleep(1)
        all_books['Stars'] += get_all_stars(doc)
        time.sleep(1)
        all_books['Price'] += get_all_price(doc)
        time.sleep(1)
        all_books['URL'] += get_all_url(doc)
        time.sleep(1)
    
    return all_books

A DataFrame is a 2-dimensional array. The following code shows how to quickly create a DataFrame using pandas:

book_box = doc.find_all('div', class_="p13n-asin-index-0")

title = book_box[0].find(tag_for_title, class_fortitle)
if title:
    title_list.append(title)
else:
    append("None")

In [48]:
dataframe = pd.DataFrame.from_dict(get_all_details(2), orient= 'index')
dataframe= dataframe.transpose()

Scraping URL: https://www.amazon.in/gp/bestsellers/books/ref=zg_bs_pg_1?ie=UTF8&pg=1
Status code: 200
Scraping URL: https://www.amazon.in/gp/bestsellers/books/ref=zg_bs_pg_2?ie=UTF8&pg=2
Status code: 200


In [49]:
dataframe

Unnamed: 0,Title,Author,Stars,Price,URL
0,Atomic Habits: The life-changing million copy ...,James Clear,4.7 out of 5 stars,₹295.00,https://www.amazon.in/Atomic-Habits-James-Clea...
1,Ikigai: The Japanese secret to a long and happ...,Héctor García,4.6 out of 5 stars,₹300.00,https://www.amazon.in/Ikigai-H%C3%A9ctor-Garc%...
2,The Psychology of Money,Morgan Housel,4.6 out of 5 stars,₹255.00,https://www.amazon.in/Psychology-Money-Morgan-...
3,My First Library: Boxset of 10 Board Books for...,Wonder House Books,4.5 out of 5 stars,₹399.00,https://www.amazon.in/My-First-Library-Boxset-...
4,The Power of Your Subconscious Mind,Joseph Murphy,4.5 out of 5 stars,₹115.00,https://www.amazon.in/Power-Your-Subconscious-...
...,...,...,...,...,...
95,Objective NCERT at your Fingertips for NEET-AI...,MTG Editorial Board,4.5 out of 5 stars,₹459.00,https://www.amazon.in/MTG-Objective-NCERT-your...
96,It Starts With Us,Colleen Hoover,Not Available,₹461.00,https://www.amazon.in/Starts-Us-Colleen-Hoover...
97,How to Prepare for QUANTITATIVE APTITUDE for C...,ARUN SHARMA,4.4 out of 5 stars,₹712.00,https://www.amazon.in/Prepare-QUANTITATIVE-APT...
98,How to Trade in Crypto | Cryptocurrency Tradin...,Budhil Vyas,4.8 out of 5 stars,₹284.00,https://www.amazon.in/Trade-Crypto-Cryptocurre...


##  Create CSV file(s) with the extracted information

In [50]:
dataframe.to_csv('books.csv',index=None)

In [51]:
!head books.csv

Title,Author,Stars,Price,URL
Atomic Habits: The life-changing million copy bestseller,James Clear,4.7 out of 5 stars,₹295.00,https://www.amazon.in/Atomic-Habits-James-Clear/dp/1847941834/ref=zg_bs_books_sccl_1/000-0000000-0000000?pd_rd_i=1847941834&psc=1
Ikigai: The Japanese secret to a long and happy life,Héctor García,4.6 out of 5 stars,₹300.00,https://www.amazon.in/Ikigai-H%C3%A9ctor-Garc%C3%ADa/dp/178633089X/ref=zg_bs_books_sccl_2/000-0000000-0000000?pd_rd_i=178633089X&psc=1
The Psychology of Money,Morgan Housel,4.6 out of 5 stars,₹255.00,https://www.amazon.in/Psychology-Money-Morgan-Housel/dp/9390166268/ref=zg_bs_books_sccl_3/000-0000000-0000000?pd_rd_i=9390166268&psc=1
My First Library: Boxset of 10 Board Books for Kids,Wonder House Books,4.5 out of 5 stars,₹399.00,https://www.amazon.in/My-First-Library-Boxset-Board/dp/9387779262/ref=zg_bs_books_sccl_4/000-0000000-0000000?pd_rd_i=9387779262&psc=1
The Power of Your Subconscious Mind,Joseph Murphy,4.5 out of 5 stars,₹115.00,ht

# Summary

We have successfully scraped all the important information as per requirement from a website and combined the data into a 
CSV file.

- First, we installed all of the necessary libraries in our Jupyter notebook.
- Using the requests library, we download the web page to our notebook.
- We inspect the web page for HTML tags for all required attributes regarding each data that we want to scrape from the web page.
- The data from each HTML tag is then collected and written into a Python dictionary.
- For collecting the data from different pages we have written some helper functions and then we have written a parser function to extract all the data from each different page and then parse the collected data into a python dictionary.
- We constructed various helper methods(functions) to gather data from various pages, and then we write a parser function to extract all of the data from each page, and then we parse the collected data into a Python dictionary.
- Finally, we have created a CSV file using the panda’s library.

### Here's the complete code for this project:

In [52]:
# Download the page and parse using beautiful soup
def get_page(url):
    print("Scraping URL:", url)
    time.sleep(2)
    response= requests.get(url)
    print ("Status code:",response.status_code)
    page_contents= response.text
    with open('webpage.html', 'w') as f:
        f.write(response.text)
    doc = BeautifulSoup(page_contents, 'html.parser')
    return doc
    

def get_book_title(doc):
    book_title_tags= doc.find_all('div',{"class": "zg-grid-general-faceout"})
    Book_Titles=[]
    for tag in book_title_tags:
        try:
            title_tag = tag.find('span')
            Book_Titles.append(title_tag.text)
        except :
            Book_Titles.append("Not Available")
    return Book_Titles


# get all the authors name 
def get_all_authors(doc):
    author_name_tags= doc.find_all('div',{'class':'zg-grid-general-faceout'})
    Author_Names=[]
    for tag in author_name_tags:
        try:
            Author_Names.append(tag.find('div',{'class':'a-row a-size-small'}).text)
        except :
            Author_Names.append("Not Available")
    return Author_Names

# get the rating on each book
def get_all_stars(doc):
    rating_tags= doc.find_all('div',{'class':'zg-grid-general-faceout'})
    Stars=[]
    for tag in rating_tags:
        try:
            Stars.append(tag.find('span',{'class':'a-icon-alt'}).text)
        except :
            Stars.append("Not Available")
            
    return Stars

# get the cost of the books
def get_all_price(doc):
    book_price_tags= doc.find_all('div',{"class": "zg-grid-general-faceout"})
    Book_Price=[]
    for tag in book_price_tags:
        try:
            Book_Price.append(tag.find('span',{'class':'p13n-sc-price'}).text)
        except :
            Book_Price.append("Not Available")
            
    return Book_Price

# at last get the book url
def get_all_url(doc):
    book_url_tag=doc.find_all('div',{"class": "zg-grid-general-faceout"})
    Book_Title_Urls=[]
    base_url="https://www.amazon.in"
    for tag in book_url_tag:
        try:
            Book_Title_Urls.append(base_url + tag.find('a',{'class':'a-link-normal'})['href'])
        except :
            Book_Title_Urls.append("Not Available")
    return Book_Title_Urls

# Future Work

We can work forward in this project and extract many meaningful information in upcoming time.
We can analyze this data and give many useful answers like-
- Which book has high Sales Rank?
- Which book has lowest customer's review?
- Which book has highest rating?
- Which book has lowest price but high rating?
- Which book has lowest rating?


# References
[1] Aakash N S, Introduction to Web Scraping, 2021. https://jovian.ai/aakashns/python-web-scraping-and-rest-api

[2] Working with Jupyter Notebook https://towardsdatascience.com/write-markdown-latex-in-the-jupyter-notebook-10985edb91fd

[3] Python offical documentation. https://docs.python.org/3/

[4] Requests library. https://pypi.org/project/requests/

[5] Beautiful Soup documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/

[6] Pandas library documentation. https://pandas.pydata.org/docs/

[7] Web Scraping Article. https://www.toptal.com/python/web-scraping-with-python

[8] Web Scraping Image. https://morioh.com/p/431153538ecb



In [53]:
import jovian

In [None]:
jovian.commit(files=['books.csv'])

<IPython.core.display.Javascript object>