<!DOCTYPE html>
<html>
<head>
  <style>
    .center {
      display: flex;
      justify-content: center;
      align-items: center;
    }
  </style>
</head>
<body>
  <div class="center">
    <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Python_logo_01.svg/640px-Python_logo_01.svg.png" alt="Python Logo" width="150">
    <img src="https://purepng.com/public/uploads/large/amazon-logo-s3f.png" alt="Amazon Logo" width="400">
  </div>
</body>
</html>


# **Amazon Deals Scraper**
--------

This script scrapes Amazon deals using BeautifulSoup and requests. It also uses the Pandas library to process the scraped data.

## Importing Required Libraries

In [71]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import os
import random
import string

## Header and Amazon page URL

In [72]:
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
    "Accept-Language": "en-US, en;q=0.5",
}

URL = "https://www.amazon.in/deal/81486f94?showVariations=true&pf_rd_r=WETZ6TSPT47BDG8ADXYS&pf_rd_t=Events&pf_rd_i=greatindianfestival&pf_rd_p=315dfba2-4182-43a7-9ab8-5c38c7bd7a91&pf_rd_s=slot-7&ref=dlx_great_gd_dcl_tlt_0_81486f94_dt_sl7_91"

## Making request to amazon url

In [73]:
webpage = requests.get(URL, headers=HEADERS)

In [74]:
soup = BeautifulSoup(webpage.content, "html.parser")

## Scrapped content from URL

In [None]:
soup

## Looking for anchor tags having products URL in href

In [76]:
anchor_tags = soup.find_all(
    "a", attrs={"class": "a-size-base a-color-base a-link-normal a-text-normal"}
)

## Processing products link

In [78]:
product_links = []
product_base_url = "https://www.amazon.in/"

for anchor_tag in anchor_tags:
    product_links.append(product_base_url + anchor_tag["href"])

In [79]:
product_links

['https://www.amazon.in//Fire-Boltt-Bluetooth-Calling-Assistance-Resolution/dp/B0BF57RN3K?ref_=Oct_DLandingS_D_81486f94_0',
 'https://www.amazon.in//Fire-Boltt-Smartwatch-Resolution-Connection-Assistance/dp/B0B3N7LR6K?ref_=Oct_DLandingS_D_81486f94_1',
 'https://www.amazon.in//beatXP-Flux-Display-Bluetooth-Tracking/dp/B0C4T91SNK?ref_=Oct_DLandingS_D_81486f94_2',
 'https://www.amazon.in//Fire-Boltt-Bluetooth-Smartwatch-Assistant-Monitoring/dp/B0BRKXXPZ7?ref_=Oct_DLandingS_D_81486f94_3',
 'https://www.amazon.in//Fire-Boltt-Phoenix-Bluetooth-Calling-Monitoring/dp/B0B3RRWSF6?ref_=Oct_DLandingS_D_81486f94_4',
 'https://www.amazon.in//beatXP-Bluetooth-Assistant-Monitoring-Charging/dp/B0BRFX19Y1?ref_=Oct_DLandingS_D_81486f94_5',
 'https://www.amazon.in//Noise-ColorFit-Bluetooth-instacharge-Functional/dp/B0BGSV43WY?ref_=Oct_DLandingS_D_81486f94_6',
 'https://www.amazon.in//boAt-Smartwatch-Display-Bluetooth-Monitoring/dp/B0CBPL63B3?ref_=Oct_DLandingS_D_81486f94_7',
 'https://www.amazon.in//Fire-

## Directory for storing products images

In [93]:
image_directory = 'product_images'
os.makedirs(image_directory, exist_ok=True)

## Functions to get image, title, price, rating etc-

In [94]:
def generate_random_filename(length=8):
    characters = string.ascii_letters + string.digits
    return ''.join(random.choice(characters) for _ in range(length))

def download_image(image_url, image_name):
    response = requests.get(image_url)
    if response.status_code == 200:
        with open(os.path.join(image_directory, f'{image_name}.png'), 'wb') as file:
            file.write(response.content)

def get_title(soup):
    try: 
        title = soup.find("span", attrs={"id":'productTitle'})
        title_value = title.text
        title_string = title_value.strip()

    except AttributeError:
        title_string = ""

    return title_string

def get_price(soup):
    try:
        price = soup.find('span', attrs={'class':'a-price-whole'}).text[:-1]
    except AttributeError:
        price = ""

    return price

def get_rating(soup):
    try:
        rating = soup.find('a', attrs={'class':'a-popover-trigger a-declarative'}).text.split()[0]
    except AttributeError:
        rating = ""

    return rating

def get_rating_count(soup):
    try:
        review_count = soup.find('span', attrs={'id':'acrCustomerReviewText'}).text.split()[0]
    except AttributeError:
        review_count = ""

    return review_count

## Dictionary to store product details

In [95]:
dict = {"title": [], "price_in_rupees": [], "rating": [], "rating_count": [], "product_image_name": []}


## Processing the scrapped content using functions

This loop iterates over all product links capturturing the needed data and for images it firts generates a random name and downloads the image as `png` and store the name of image file in dictionary.

In [96]:
for product_link in product_links:
    product_page = requests.get(product_link, headers=HEADERS)
    product_soup = BeautifulSoup(product_page.content, 'html.parser')

    title = get_title(product_soup)
    price = get_price(product_soup)
    rating = get_rating(product_soup)
    rating_count = get_rating_count(product_soup)
    product_image_link = product_soup.find('img', attrs={'id': 'landingImage'})['src']

    image_name = generate_random_filename()
    download_image(product_image_link, image_name)

    dict['title'].append(title)
    dict['price_in_rupees'].append(price)
    dict['rating'].append(rating)
    dict['rating_count'].append(rating_count)
    dict['product_image_name'].append(image_name)

## Converting `dict` to dataframe

In [99]:
amazon_deals_df = pd.DataFrame.from_dict(dict)
amazon_deals_df['title'].replace('', np.nan, inplace=True)
amazon_deals_df = amazon_deals_df.dropna(subset=['title'])

## Saving data in CSV

In [100]:
amazon_deals_df.to_csv("amazon_data.csv", header=True, index=False)

In [86]:
amazon_deals_df

Unnamed: 0,title,price_in_rupees,rating,rating_count,product_image_name
0,"Fire-Boltt Ninja Call Pro Plus 1.83"" Smart Wat...",1099,4.2,56335,FHldTzqb
1,"Fire-Boltt Visionary 1.78"" AMOLED Bluetooth Ca...",2199,4.2,35705,GgOfUqfm
2,"beatXP Flux 1.45"" (3.6 cm) Ultra HD Display Bl...",1099,4.0,1790,dwiLbmJt
3,"Fire-Boltt Phoenix Pro 1.39"" Bluetooth Calling...",1199,4.2,101344,sxpsATxb
4,Fire-Boltt Phoenix Smart Watch with Bluetooth ...,1299,4.2,101344,9EcWoKQw
5,"beatXP Marv Neo 1.85” (4.6 cm) Display, Blueto...",999,4.1,8354,numFubjv
6,"Noise ColorFit Pro 4 Alpha 1.78"" AMOLED Displa...",2299,4.0,3775,vbgk4knA
7,"boAt Wave Sigma Smartwatch with 2.01"" HD Displ...",1099,4.0,1494,EhvVfQdJ
8,Fire-Boltt Ninja 3 Smartwatch Full Touch 1.69 ...,1099,4.1,52024,Cg6L9fM9
9,Fire-Boltt Ninja Call Pro Smart Watch Dual Chi...,1049,4.2,32071,Ca5vKvpH


In [102]:
!ls

amazon_data.csv       amazon_scrapper.ipynb [34mproduct_images[m[m


In [107]:
os.listdir('product_images/')

['GpiTgVow.png',
 '7hHpKKZ4.png',
 'jS1Wurdi.png',
 'NFHqzDA3.png',
 'iLpMTuvp.png',
 'axumItuH.png',
 't5qcN05X.png',
 'sssgZkQA.png',
 'DkNHdOPM.png',
 'YynGucRa.png',
 '5TcXE03x.png',
 'G4T67rj8.png',
 'Z14X7aNo.png',
 'pnkw9G8y.png',
 'uS6raPxV.png',
 'o5A1ClRB.png',
 'SiHiB1AZ.png',
 'dnDKL7lE.png',
 'shDr0QQi.png',
 '508lwuQL.png',
 'JpUx4JxY.png',
 'osH6oDYH.png',
 'VV1gAStm.png',
 'luYMWXYM.png',
 'nZKFf34h.png',
 'cQVxWaDD.png',
 'NDWUDJl3.png',
 'ee61neql.png',
 '6QXFy5kf.png',
 '6t36JItE.png']

---
## Conclusion

Thank you for taking the time to explore this notebook! If you have any questions or suggestions for improvement, please feel free to reach out.

**Happy coding!**
