# WEB SCRAPPING & ANALYSIS OF AMAZON SMARTPHONES  


<img src="https://www.webharvy.com/images/amazon-scraping.png">

## INTRODUCTION:
### Executed a web scraping project on Amazon to collect data on smartphones under 20,000 INR. Extracted details such as mobile prices, descriptions, ratings, number of ratings, original prices, and discounted prices. Compiled the data into a structured dataset and conducted a detailed exploratory analysis to uncover trends and insights in the smartphone market.
### Amazon web scraping involves extracting data from Amazon's website using automated tools or scripts. Here's a brief overview of how it works and some important considerations:
## Tools and Libraries:
   #### 1.BeautifulSoup: A Python library for parsing HTML and XML documents. It creates parse trees that help extract data easily.
   #### 2.Requests: A simple HTTP library for Python, making it easy to send HTTP requests


<img src="https://www.amarinfotech.com/wp-content/uploads/2020/04/Web-Scraping.jpg">

## - Import Requests Library

In [5]:
import requests

## -Import Beautifulsoup Library.

<img src="https://media.licdn.com/dms/image/D4D12AQGeRZCojr1rMA/article-cover_image-shrink_720_1280/0/1708598369630?e=2147483647&v=beta&t=T737I1Mo7M-3MXHf_rKo3xx-9OlxsdHP-OXMzTs6KU8">

In [6]:
from bs4 import BeautifulSoup

In [7]:
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"}
url="https://www.amazon.in/smartphones-under-20000/s?k=smartphones+under+20000"

In [8]:
req=requests.get(url,headers=headers)
print(req)

<Response [200]>


In [9]:
html_content=req.content

In [10]:
# print(html_content)

In [11]:
soup=BeautifulSoup(req.content,"html.parser")
print(req)

<Response [200]>


<img erc="https://i.ytimg.com/vi/G9D4wQKI1wc/maxresdefault.jpg">

<img src="https://miro.medium.com/v2/resize:fit:1400/1*LhyDelFNvzAqMd8edX1E8Q.png">

### 1. PRICE

In [23]:
mob_prices = soup.findAll("span", class_="a-price-whole")
mob_prices_list = [l.text for l in mob_prices]

# Slice the list to get only the first 21 prices
mob_prices_list = mob_prices_list[:21]

# Print the 21 mobile prices
for i, mp in enumerate(mob_prices_list, start=1):
    print(f"#{i} {mp}")

# Verify the length
print(f"Total mobile prices listed: {len(mob_prices_list)}")


#1 26,999
#2 19,999
#3 17,000
#4 19,999
#5 19,621
#6 18,248
#7 17,490
#8 12,999
#9 17,749
#10 12,435
#11 19,269
#12 19,998
#13 14,399
#14 16,299
#15 6,999
Total mobile prices listed: 15


### 2. DESCRIPTION

In [24]:
#mob_descriptions=soup.findAll("span", class_="a-size-medium a-color-base a-text-normal")
# print(mob_descriptions)
#mob_description_list=[l.text for l in mob_descriptions]
#i = 1
#for md in mob_description_list:
#print(f"#{i}\n{md}\n")
#i += 1
#len(mob_descriptions)

In [34]:
mob_descriptions = soup.findAll("span", class_="a-size-medium a-color-base a-text-normal")
mob_description_list = [l.text for l in mob_descriptions]

# Slice the list to get only the first 15 descriptions
mob_description_list = mob_description_list[:15]

# Print the 15 mobile descriptions
for i, md in enumerate(mob_description_list, start=1):
    print(f"#{i}\n{md}\n")

# Verify the length
print(f"Total mobile descriptions listed: {len(mob_description_list)}")

#1
Redmi note 13 pro 5g scarlet red 8gb 256gb

#2
OnePlus Nord CE4 Lite 5G (Super Silver, 8GB RAM, 128GB Storage)

#3
realme 12 5G (Woodland Green, 8GB RAM 128 GB Storage) | 108 MP 3X Zoom Portrait Camera | Trendy Watch Design | 45 W SUPERVOOC Charge | 5000 mAh Massive Battery | Dual Stereo Speakers | Dynamic Button

#4
OnePlus Nord CE4 Lite 5G (Mega Blue, 8GB RAM, 128GB Storage)

#5
OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB RAM, 256GB Storage)

#6
OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB RAM, 128GB Storage)

#7
Motorola g64 5G (Pearl Blue, 256 GB) (12 GB RAM)

#8
Samsung Galaxy M34 5G (Waterfall Blue,6GB,128GB)|120Hz sAMOLED Display|50MP Triple No Shake Cam|6000 mAh Battery|4 Gen OS Upgrade & 5 Year Security Update|12GB RAM with RAM+|Android 13|Without Charger

#9
OnePlus Nord CE 3 Lite 5G (Pastel Lime, 8GB RAM, 128GB Storage)

#10
Motorola G34 5G (Charcoal Black, 8GB RAM, 128GB Storage)

#11
Redmi 13C (Stardust Black, 6GB RAM, 128GB Storage) | Powered by 4G MediaTek Helio 

### 3. RATING

In [36]:
mob_rating = soup.findAll("span", class_="a-icon-alt")
mob_rating_list = [l.text for l in mob_rating]

# Remove the last element
mob_rating_list.pop()

# Slice the list to get only the first 15 ratings
mob_rating_list = mob_rating_list[:15]

# Print the 15 mobile ratings
for i, mr in enumerate(mob_rating_list, start=1):
    print(f"#{i} {mr}")

# Verify the length
print(f"Total mobile ratings listed: {len(mob_rating_list)}")

#1 4.5 out of 5 stars.
#2 5.0 out of 5 stars
#3 4.2 out of 5 stars
#4 4.2 out of 5 stars
#5 4.2 out of 5 stars
#6 3.3 out of 5 stars
#7 4.0 out of 5 stars
#8 4.2 out of 5 stars
#9 3.8 out of 5 stars
#10 4.1 out of 5 stars
#11 4.1 out of 5 stars
#12 4.0 out of 5 stars
#13 3.8 out of 5 stars
#14 3.6 out of 5 stars
#15 4.7 out of 5 stars
Total mobile ratings listed: 15


### 4. NO OF RATINGS

In [37]:
mob_rating_count=soup.findAll("span", class_="a-size-base s-underline-text")
# print(mob_rating_count)
mob_rating_count_list=[l.text for l in mob_rating_count]

i=1
for nor in mob_rating_count_list:
    print(f"#{i}",nor)
    i += 1
    
len(mob_rating_count)

#1 2
#2 85
#3 47,280
#4 47,280
#5 12
#6 3,428
#7 47,280
#8 138
#9 5,214
#10 6,365
#11 305
#12 61
#13 536
#14 11
#15 803


15

### 5. ORIGINAL PRICE

In [39]:

mob_original_price = soup.findAll("span", class_="a-price-whole")

# Extract text from the elements and clean it up
mob_original_price_list = [l.text.strip() for l in mob_original_price if l.text.strip()]

# Slice the list to get only the first 21 original prices
mob_original_price_list = mob_original_price_list[:21]

# Print the 21 original prices
for i, op in enumerate(mob_original_price_list, start=1):
    print(f"#{i} {op}")

# Verify the length
print(f"Total original prices listed: {len(mob_original_price_list)}")


#1 26,999
#2 19,999
#3 17,000
#4 19,999
#5 19,621
#6 18,248
#7 17,490
#8 12,999
#9 17,749
#10 12,435
#11 19,269
#12 19,998
#13 14,399
#14 16,299
#15 6,999
Total original prices listed: 15


## IMPORT NECESSARY LIBRARIES

<img src="https://miro.medium.com/v2/resize:fit:1400/1*3GbLagVDPY9QKjjgB_Tfqw.png">

In [40]:
import pandas as pd

In [41]:
mob={"mob_prices_list": mob_prices_list,
    "mob_description_list": mob_description_list,
    "mob_rating_list": mob_rating_list,
    "mob_rating_count_list":mob_rating_count_list,
    "mob_original_price_list": mob_original_price_list,
}

In [42]:
print(mob)

{'mob_prices_list': ['26,999', '19,999', '17,000', '19,999', '19,621', '18,248', '17,490', '12,999', '17,749', '12,435', '19,269', '19,998', '14,399', '16,299', '6,999'], 'mob_description_list': ['Redmi note 13 pro 5g scarlet red 8gb 256gb', 'OnePlus Nord CE4 Lite 5G (Super Silver, 8GB RAM, 128GB Storage)', 'realme 12 5G (Woodland Green, 8GB RAM 128 GB Storage) | 108 MP 3X Zoom Portrait Camera | Trendy Watch Design | 45 W SUPERVOOC Charge | 5000 mAh Massive Battery | Dual Stereo Speakers | Dynamic Button', 'OnePlus Nord CE4 Lite 5G (Mega Blue, 8GB RAM, 128GB Storage)', 'OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB RAM, 256GB Storage)', 'OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB RAM, 128GB Storage)', 'Motorola g64 5G (Pearl Blue, 256 GB) (12 GB RAM)', 'Samsung Galaxy M34 5G (Waterfall Blue,6GB,128GB)|120Hz sAMOLED Display|50MP Triple No Shake Cam|6000 mAh Battery|4 Gen OS Upgrade & 5 Year Security Update|12GB RAM with RAM+|Android 13|Without Charger', 'OnePlus Nord CE 3 Lite 5G (

In [43]:
mob = {
    "mob_prices_list": mob_prices_list,
    "mob_description_list": mob_description_list,
    "mob_rating_list": mob_rating_list,
    "mob_rating_count_list":mob_rating_count_list,
    "mob_original_price_list": mob_original_price_list,
}

for key, value in mob.items():
    print(f"The length of {key} is {len(value)}")


The length of mob_prices_list is 15
The length of mob_description_list is 15
The length of mob_rating_list is 15
The length of mob_rating_count_list is 15
The length of mob_original_price_list is 15


## AMAZON WEB SCRAPPING TO CREATE A DATAFRAME

In [44]:
df=pd.DataFrame(mob)

In [45]:
df

Unnamed: 0,mob_prices_list,mob_description_list,mob_rating_list,mob_rating_count_list,mob_original_price_list
0,26999,Redmi note 13 pro 5g scarlet red 8gb 256gb,4.5 out of 5 stars.,2,26999
1,19999,"OnePlus Nord CE4 Lite 5G (Super Silver, 8GB RA...",5.0 out of 5 stars,85,19999
2,17000,"realme 12 5G (Woodland Green, 8GB RAM 128 GB S...",4.2 out of 5 stars,47280,17000
3,19999,"OnePlus Nord CE4 Lite 5G (Mega Blue, 8GB RAM, ...",4.2 out of 5 stars,47280,19999
4,19621,"OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB...",4.2 out of 5 stars,12,19621
5,18248,"OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB...",3.3 out of 5 stars,3428,18248
6,17490,"Motorola g64 5G (Pearl Blue, 256 GB) (12 GB RAM)",4.0 out of 5 stars,47280,17490
7,12999,"Samsung Galaxy M34 5G (Waterfall Blue,6GB,128G...",4.2 out of 5 stars,138,12999
8,17749,"OnePlus Nord CE 3 Lite 5G (Pastel Lime, 8GB RA...",3.8 out of 5 stars,5214,17749
9,12435,"Motorola G34 5G (Charcoal Black, 8GB RAM, 128G...",4.1 out of 5 stars,6365,12435


### 1.DISPLAY THE TOTAL NUMBER OF RECORDS IN THE DATASETS

In [46]:
df.count()

mob_prices_list            15
mob_description_list       15
mob_rating_list            15
mob_rating_count_list      15
mob_original_price_list    15
dtype: int64

### 2.DISPLAY THE FIRST 10 ROWS IN THE DATASETS

In [47]:
df.head(10)

Unnamed: 0,mob_prices_list,mob_description_list,mob_rating_list,mob_rating_count_list,mob_original_price_list
0,26999,Redmi note 13 pro 5g scarlet red 8gb 256gb,4.5 out of 5 stars.,2,26999
1,19999,"OnePlus Nord CE4 Lite 5G (Super Silver, 8GB RA...",5.0 out of 5 stars,85,19999
2,17000,"realme 12 5G (Woodland Green, 8GB RAM 128 GB S...",4.2 out of 5 stars,47280,17000
3,19999,"OnePlus Nord CE4 Lite 5G (Mega Blue, 8GB RAM, ...",4.2 out of 5 stars,47280,19999
4,19621,"OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB...",4.2 out of 5 stars,12,19621
5,18248,"OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB...",3.3 out of 5 stars,3428,18248
6,17490,"Motorola g64 5G (Pearl Blue, 256 GB) (12 GB RAM)",4.0 out of 5 stars,47280,17490
7,12999,"Samsung Galaxy M34 5G (Waterfall Blue,6GB,128G...",4.2 out of 5 stars,138,12999
8,17749,"OnePlus Nord CE 3 Lite 5G (Pastel Lime, 8GB RA...",3.8 out of 5 stars,5214,17749
9,12435,"Motorola G34 5G (Charcoal Black, 8GB RAM, 128G...",4.1 out of 5 stars,6365,12435


### 3.DISPLAY THE LAST FIVE ROWS IN THE DATASETS

In [48]:
df.tail(5)

Unnamed: 0,mob_prices_list,mob_description_list,mob_rating_list,mob_rating_count_list,mob_original_price_list
10,19269,"Redmi 13C (Stardust Black, 6GB RAM, 128GB Stor...",4.1 out of 5 stars,305,19269
11,19998,"OnePlus Nord CE 3 Lite 5G (Pastel Lime, 8GB RA...",4.0 out of 5 stars,61,19998
12,14399,"POCO X6 5G (Snowstorm White, 12 GB RAM 256 GB ...",3.8 out of 5 stars,536,14399
13,16299,"Samsung Galaxy F15 5G (Ash Black, 6GB RAM, 128...",3.6 out of 5 stars,11,16299
14,6999,"Redmi Note 13 5G (Arctic White, 8GB RAM, 256GB...",4.7 out of 5 stars,803,6999


### 4.DISPLAY THE ANY RANDOM VALUE IN THE DATASETS

In [49]:
df.sample()

Unnamed: 0,mob_prices_list,mob_description_list,mob_rating_list,mob_rating_count_list,mob_original_price_list
10,19269,"Redmi 13C (Stardust Black, 6GB RAM, 128GB Stor...",4.1 out of 5 stars,305,19269


### 5.DISPLAY THE COUNT OF RECORDS IN EACH COLOUM OF THE DATASETS

In [50]:
df.count()

mob_prices_list            15
mob_description_list       15
mob_rating_list            15
mob_rating_count_list      15
mob_original_price_list    15
dtype: int64

### 6.DISPLAY THE MINIMUM VALUE OF RECORDS IN EACH COLOUM OF THE DATASETS

In [51]:
df.min()

mob_prices_list                                                       12,435
mob_description_list       Motorola G34 5G (Charcoal Black, 8GB RAM, 128G...
mob_rating_list                                           3.3 out of 5 stars
mob_rating_count_list                                                     11
mob_original_price_list                                               12,435
dtype: object

### 7.DISPLAY THE MAXIMUM VALUE OF RECORDS IN EACH COLOUM OF THE DATASETS.

In [52]:
df.max()

mob_prices_list                                                        6,999
mob_description_list       realme 12 5G (Woodland Green, 8GB RAM 128 GB S...
mob_rating_list                                           5.0 out of 5 stars
mob_rating_count_list                                                     85
mob_original_price_list                                                6,999
dtype: object

### 8.CHECK THE MISSING VALUE IN THE DATASETS

In [53]:
df.isnull().sum()

mob_prices_list            0
mob_description_list       0
mob_rating_list            0
mob_rating_count_list      0
mob_original_price_list    0
dtype: int64

### 9.DISPLAY THE LAST ONE ROWS IN THE DATASETS

In [54]:
df.tail(1)

Unnamed: 0,mob_prices_list,mob_description_list,mob_rating_list,mob_rating_count_list,mob_original_price_list
14,6999,"Redmi Note 13 5G (Arctic White, 8GB RAM, 256GB...",4.7 out of 5 stars,803,6999


### 10.DISPLAY THE FIRST ONE ROWS IN THE DATASETS

In [55]:
df.head(1)

Unnamed: 0,mob_prices_list,mob_description_list,mob_rating_list,mob_rating_count_list,mob_original_price_list
0,26999,Redmi note 13 pro 5g scarlet red 8gb 256gb,4.5 out of 5 stars.,2,26999


In [57]:
df

Unnamed: 0,mob_prices_list,mob_description_list,mob_rating_list,mob_rating_count_list,mob_original_price_list
0,26999,Redmi note 13 pro 5g scarlet red 8gb 256gb,4.5 out of 5 stars.,2,26999
1,19999,"OnePlus Nord CE4 Lite 5G (Super Silver, 8GB RA...",5.0 out of 5 stars,85,19999
2,17000,"realme 12 5G (Woodland Green, 8GB RAM 128 GB S...",4.2 out of 5 stars,47280,17000
3,19999,"OnePlus Nord CE4 Lite 5G (Mega Blue, 8GB RAM, ...",4.2 out of 5 stars,47280,19999
4,19621,"OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB...",4.2 out of 5 stars,12,19621
5,18248,"OnePlus Nord CE 3 Lite 5G (Chromatic Gray, 8GB...",3.3 out of 5 stars,3428,18248
6,17490,"Motorola g64 5G (Pearl Blue, 256 GB) (12 GB RAM)",4.0 out of 5 stars,47280,17490
7,12999,"Samsung Galaxy M34 5G (Waterfall Blue,6GB,128G...",4.2 out of 5 stars,138,12999
8,17749,"OnePlus Nord CE 3 Lite 5G (Pastel Lime, 8GB RA...",3.8 out of 5 stars,5214,17749
9,12435,"Motorola G34 5G (Charcoal Black, 8GB RAM, 128G...",4.1 out of 5 stars,6365,12435


<img src="https://t4.ftcdn.net/jpg/05/33/33/35/360_F_533333591_z3CPyLiiFqMoHv6nLj137Byph2l4gFuU.jpg">