### Create an excel file with website data using beautiful soup or request or selector gadget for link below:
<br/>https://www.amazon.in/s?k=top+10+phones+under+20000&crid=3UFKG06L1X1O1&sprefix=top+10+phone%2Caps%2C310&ref=nb_sb_ss_i_4_12<br/>

<b>Required columns:</b>
> * Mobile Name <br/>
> * Mobile prize <br/>
> * Discount option (like : Save ₹3,500 (15%)) <br/>
> * EMI option or not (like : Save extra with No Cost EMI) <br/>
> * Other information (like : FREE Delivery by Thursday, September 17)

##### Import necessary libraries

In [1]:
from bs4 import BeautifulSoup
import requests

##### Link to scrap the data from

In [2]:
link = 'https://www.amazon.in/s?k=top+10+phones+under+20000&crid=3UFKG06L1X1O1&sprefix=top+10+phone%2Caps%2C310&ref=nb_sb_ss_i_4_12'

##### Get page

In [3]:
page = requests.get(link)

In [4]:
page  # Response 200 indicates successful retrieval

<Response [200]>

In [5]:
# Displays content of HTML
page.content

##### Beautify the code using bs (for better readability)

In [6]:
soup = BeautifulSoup(page.content, 'html.parser')

In [7]:
print(soup.prettify())

##### Identify the data to be scraped from the code using Inspect Element option

<i>Getting Mobile Names</i>

In [8]:
mobile_name_values = soup.find_all('span', class_='a-size-medium a-color-base a-text-normal')

In [9]:
mobile_name_values[0]

<span class="a-size-medium a-color-base a-text-normal" dir="auto">Honor 9A (Phantom Blue, 3GB RAM, 64GB Storage)- Download Apps Through Petal Search</span>

In [10]:
mobile_names = []
for each_item in range(0, len(mobile_name_values)):
    mobile_names.append(mobile_name_values[each_item].get_text())

In [11]:
mobile_names

['Honor 9A (Phantom Blue, 3GB RAM, 64GB Storage)- Download Apps Through Petal Search',
 'OPPO A5 2020 (Dazzling White, 3GB RAM, 64GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Redmi Note 9 Pro (Interstellar Black, 4GB RAM, 64GB Storage) - Latest 8nm Snapdragon 720G & Gorilla Glass 5 Protection',
 'Redmi Note 9 Pro (Glacier White, 6GB RAM, 128GB Storage) - Latest 8nm Snapdragon 720G & Gorilla Glass 5 Protection',
 'Samsung Galaxy M31s (Mirage Blue, 6GB RAM, 128GB Storage)',
 'Redmi Note 8 (Cosmic Purple, 4GB RAM, 64GB Storage)',
 'Redmi Note 8 (Moonlight White, 4GB RAM, 64GB Storage)',
 'Samsung Galaxy M31 (Space Black, 6GB RAM, 128GB Storage)',
 'OPPO A5 2020 (Mirror Black, 3GB RAM, 64GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Samsung Galaxy M01 Core (Red, 1GB RAM, 16GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Vivo S1 Pro (Dreamy White, 8GB RAM, 128GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Vivo Y12 (Aqua Blue, 3GB RAM, 64

In [12]:
len(mobile_names)

16

<i>Remove Duplicates (if any)</i>

(Create a dictionary, using the List items as keys. This will automatically remove any duplicates because dictionaries cannot have duplicate keys.)

In [13]:
mobile_list = list(dict.fromkeys(mobile_names))
mobile_list

['Honor 9A (Phantom Blue, 3GB RAM, 64GB Storage)- Download Apps Through Petal Search',
 'OPPO A5 2020 (Dazzling White, 3GB RAM, 64GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Redmi Note 9 Pro (Interstellar Black, 4GB RAM, 64GB Storage) - Latest 8nm Snapdragon 720G & Gorilla Glass 5 Protection',
 'Redmi Note 9 Pro (Glacier White, 6GB RAM, 128GB Storage) - Latest 8nm Snapdragon 720G & Gorilla Glass 5 Protection',
 'Samsung Galaxy M31s (Mirage Blue, 6GB RAM, 128GB Storage)',
 'Redmi Note 8 (Cosmic Purple, 4GB RAM, 64GB Storage)',
 'Redmi Note 8 (Moonlight White, 4GB RAM, 64GB Storage)',
 'Samsung Galaxy M31 (Space Black, 6GB RAM, 128GB Storage)',
 'OPPO A5 2020 (Mirror Black, 3GB RAM, 64GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Samsung Galaxy M01 Core (Red, 1GB RAM, 16GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Vivo S1 Pro (Dreamy White, 8GB RAM, 128GB Storage) with No Cost EMI/Additional Exchange Offers',
 'Vivo Y12 (Aqua Blue, 3GB RAM, 64

In [14]:
len(mobile_list)

16

<i>Getting Mobile Price</i>

(Since there are no duplicates, no need to further process any other categories of data)

In [15]:
price_values = soup.find_all('span', class_='a-price-whole')
price_values[0]

<span class="a-price-whole">9,999</span>

In [16]:
price_list = []
for each_value in range(0, len(price_values)):
    price_list.append(price_values[each_value].get_text())

In [17]:
price_list

['9,999',
 '10,990',
 '13,999',
 '16,999',
 '19,499',
 '12,799',
 '12,799',
 '17,499',
 '10,990',
 '4,999',
 '18,990',
 '10,990',
 '9,990',
 '19,499',
 '5,999',
 '13,499']

In [18]:
len(price_list)

16

<i>Getting Discount Options</i>

In [19]:
scrap_data = soup.find_all('div', class_="a-section a-spacing-none a-spacing-top-small")

<i>The span class for the required value does not contain a class name. So, we're retrieving the whole `div` tag enclosing the discount options and extracting the required string </i>

In [20]:
scrap_data[0]

<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-size-base a-color-base"><div class="a-row">
<a class="a-size-base a-link-normal s-no-hover a-text-normal" href="/Honor-Phantom-Storage-Download-Through/dp/B08BSK3GP9?dchild=1" target="_blank">
<span class="a-price" data-a-color="price" data-a-size="l"><span class="a-offscreen">₹9,999</span><span aria-hidden="true"><span class="a-price-symbol">₹</span><span class="a-price-whole">9,999</span></span></span>
<span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">₹11,999</span><span aria-hidden="true">₹11,999</span></span>
</a>
<span class="a-letter-space"></span><span dir="auto">Save ₹2,000 (17%)</span><span class="a-letter-space"></span></div></div><div class="a-row a-size-base a-color-secondary"><div class="a-row"><span class="a-color-secondary"><span class="a-truncate" data-a-max-rows="1" data-a-overflow-marker="&amp;hellip;" data-a-word-brea

<i> Stringify the Soup object to apply regex search </i>

In [21]:
trial = str(scrap_data[0])

In [22]:
import re

m = re.search('<span dir="auto">(.+?)</span>', trial)
if m:
    found = m.group(1)


In [23]:
found

'Save ₹2,000 (17%)'

In [24]:
len(scrap_data)

16

<i> Applying the same logic to the whole data collected </i>

(We have to handle 'No offer' criteria as well)

In [25]:
discount_options = []
for each_value in range(0, len(scrap_data)):
    m = re.search('<span dir="auto">(.+?)</span>', str(scrap_data[each_value]))
    if m is not None:
        discount_options.append(m.group(1))
    else:
        discount_options.append("No Savings")

In [26]:
discount_options

['Save ₹2,000 (17%)',
 'Save ₹4,000 (27%)',
 'Save ₹3,000 (18%)',
 'Save ₹3,000 (15%)',
 'Save ₹3,500 (15%)',
 'Save ₹200 (2%)',
 'Save ₹200 (2%)',
 'No Savings',
 'Save ₹4,000 (27%)',
 'Save ₹2,000 (29%)',
 'Save ₹2,000 (10%)',
 'Save ₹3,000 (21%)',
 'Save ₹1,000 (9%)',
 'Save ₹3,500 (15%)',
 'Save ₹2,500 (29%)',
 'No Savings']

In [27]:
len(discount_options)

16

<i> Getting EMI otions </i>

(We have to handle 'No EMI' criteria as well)

In [28]:
emi_options_values = soup.find_all('span', class_="a-color-secondary")
emi_options_values[0]

<span class="a-color-secondary"><span class="a-truncate" data-a-max-rows="1" data-a-overflow-marker="&amp;hellip;" data-a-word-break="normal" style="line-height: 1.3em !important; max-height: 1.3em;"><span class="a-truncate-full">Save extra with No Cost EMI</span><span aria-hidden="true" class="a-truncate-cut a-hidden"></span></span></span>

In [29]:
emi_options_list = []
for each_value in range(0, len(emi_options_values)):
    m = re.search('<span class="a-truncate-full">(.+?)</span>', str(emi_options_values[each_value]))
    if m is not None:
        emi_options_list.append(m.group(1))
    else:
        emi_options_list.append("No EMI Options")

In [30]:
emi_options_list

['Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI',
 'Save extra with No Cost EMI']

In [31]:
len(emi_options_list)

16

#### Fetching Additional Information

<i> Fetching when the product will be received </i>

In [32]:
scrap_data = soup.find_all('div', class_="a-row s-align-children-center")
scrap_data[0]

<div class="a-row s-align-children-center">
<span class="aok-inline-block s-image-logo-view">
<span class="aok-relative s-icon-text-medium s-prime">
<i aria-label="Amazon Prime" class="a-icon a-icon-prime a-icon-medium" role="img"></i>
</span>
<span>
</span>
</span>
<span aria-label="Get it by Tuesday, September 22">
<span dir="auto">Get it by </span><span class="a-text-bold" dir="auto">Tuesday, September 22</span>
</span>
</div>

In [33]:
get_it_by = []
for each_value in range(0, len(scrap_data)):
    m = re.search('<span dir="auto">(.+?)</span>', str(scrap_data[each_value]))
    n = re.search('<span class="a-text-bold" dir="auto">(.+?)</span>', str(scrap_data[each_value]))
    if m and n:
        get_it_by.append(m.group(1) + ":" + n.group(1))

In [34]:
get_it_by

['Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Tuesday, September 22',
 'Get it by :Thursday, September 24',
 'Get it by :Tuesday, September 22',
 'Get it by :Wednesday, September 23',
 'Get it by :Tuesday, September 22']

In [35]:
len(get_it_by)

16

<i> Ratings </i>

In [36]:
scrap_data = soup.find_all('div', class_="a-row a-size-small")
scrap_data[0]

<div class="a-row a-size-small">
<span aria-label="3.1 out of 5 stars">
<span class="a-declarative" data-a-popover='{"max-width":"700","closeButton":false,"position":"triggerBottom","url":"/review/widgets/average-customer-review/popover/ref=acr_search__popover?ie=UTF8&amp;asin=B08BSK3GP9&amp;ref=acr_search__popover&amp;contextId=search"}' data-action="a-popover">
<a class="a-popover-trigger a-declarative" href="javascript:void(0)"><i class="a-icon a-icon-star-small a-star-small-3 aok-align-bottom"><span class="a-icon-alt">3.1 out of 5 stars</span></i><i class="a-icon a-icon-popover"></i></a>
</span>
</span>
<span aria-label="512">
<a class="a-link-normal" href="/Honor-Phantom-Storage-Download-Through/dp/B08BSK3GP9?dchild=1#customerReviews" target="_blank">
<span class="a-size-base" dir="auto">512</span>
</a>
</span>
</div>

In [37]:
ratings = []
for each_value in range(0, len(scrap_data)):
    m = re.search('<span aria-label="(.+?)">', str(scrap_data[each_value]))
    if m:
        ratings.append(m.group(1))

In [38]:
ratings

['3.1 out of 5 stars',
 '4.1 out of 5 stars',
 '4.3 out of 5 stars',
 '4.3 out of 5 stars',
 '4.4 out of 5 stars',
 '4.3 out of 5 stars',
 '4.3 out of 5 stars',
 '4.3 out of 5 stars',
 '4.1 out of 5 stars',
 '3.1 out of 5 stars',
 '4.0 out of 5 stars',
 '4.2 out of 5 stars',
 '4.0 out of 5 stars',
 '4.4 out of 5 stars',
 '3.1 out of 5 stars',
 '4.3 out of 5 stars']

In [39]:
len(ratings)

16

<i> Total Number of Reviews </i>

In [40]:
scrap_data = soup.find_all('span', class_='a-size-base')
scrap_data[8]

<span class="a-size-base" dir="auto">512</span>

In [41]:
total_reviews = []
for each_value in range(0, len(scrap_data)):
    m = re.search('<span class="a-size-base" dir="auto">(.+?)</span>', str(scrap_data[each_value]))
    if m :
        total_reviews.append(m.group(1))

In [42]:
total_reviews

['512',
 '4,952',
 '12,203',
 '12,203',
 '15,826',
 '98,418',
 '98,418',
 '87,858',
 '4,952',
 '1,293',
 '1,517',
 '1,967',
 '1,058',
 '15,826',
 '1,293',
 '797']

In [43]:
len(total_reviews)

16

### Storing these values onto the spreadsheet

In [44]:
import pandas as pd
data = pd.DataFrame({'Mobile Name':mobile_names, 'Selling Price': price_list,
                     'Discount Options': discount_options, 'EMI Options':emi_options_list, 'Delivery Date': get_it_by,
                  'Ratings': ratings, 'Review Count': total_reviews})
data

Unnamed: 0,Mobile Name,Selling Price,Discount Options,EMI Options,Delivery Date,Ratings,Review Count
0,"Honor 9A (Phantom Blue, 3GB RAM, 64GB Storage)...",9999,"Save ₹2,000 (17%)",Save extra with No Cost EMI,"Get it by :Tuesday, September 22",3.1 out of 5 stars,512
1,"OPPO A5 2020 (Dazzling White, 3GB RAM, 64GB St...",10990,"Save ₹4,000 (27%)",Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.1 out of 5 stars,4952
2,"Redmi Note 9 Pro (Interstellar Black, 4GB RAM,...",13999,"Save ₹3,000 (18%)",Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.3 out of 5 stars,12203
3,"Redmi Note 9 Pro (Glacier White, 6GB RAM, 128G...",16999,"Save ₹3,000 (15%)",Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.3 out of 5 stars,12203
4,"Samsung Galaxy M31s (Mirage Blue, 6GB RAM, 128...",19499,"Save ₹3,500 (15%)",Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.4 out of 5 stars,15826
5,"Redmi Note 8 (Cosmic Purple, 4GB RAM, 64GB Sto...",12799,Save ₹200 (2%),Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.3 out of 5 stars,98418
6,"Redmi Note 8 (Moonlight White, 4GB RAM, 64GB S...",12799,Save ₹200 (2%),Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.3 out of 5 stars,98418
7,"Samsung Galaxy M31 (Space Black, 6GB RAM, 128G...",17499,No Savings,Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.3 out of 5 stars,87858
8,"OPPO A5 2020 (Mirror Black, 3GB RAM, 64GB Stor...",10990,"Save ₹4,000 (27%)",Save extra with No Cost EMI,"Get it by :Tuesday, September 22",4.1 out of 5 stars,4952
9,"Samsung Galaxy M01 Core (Red, 1GB RAM, 16GB St...",4999,"Save ₹2,000 (29%)",Save extra with No Cost EMI,"Get it by :Tuesday, September 22",3.1 out of 5 stars,1293


In [45]:
data.to_csv('Output/ProductDetails.csv', index=False, encoding='utf-8-sig')

<center>Prepared by J.Haripriya</center>