# Amazon Web Scraping

This project scrapes data like the Product title and Price of a product and a timer is set that can provide a time to time data about the product upto the time it is specified! We also provided a feature which will automaically send a mail whenever the price of the product is low or in sale.

### Starting off the project with importing libraries & packages

Here the libraries used are - 
* BeautifulSoup - This library is used for web scraping purposes particularly in python.
* requests - The requests library is a popular Python library used for making HTTP requests.
* time - The time library is a built-in Python library that provides various functions to manipulate time-related data. 
* datetime - The datetime library is a built-in Python library that provides classes for working with dates and times.
* smtplib - The smtplib library in Python provides a simple way to send email messages over SMTP (Simple Mail Transfer Protocol). 

In [1]:
from bs4 import BeautifulSoup
import requests
import time
import datetime
import smtplib

### Extracting the data from source using requests, BeautifulSoup library

In [2]:
URL = "https://www.amazon.in/DHRUVI-TRENDZ-LooksGud-Regular-Western/dp/B08CB4H8MF/ref=pd_ci_mcx_mh_mcx_views_0?pd_rd_w=IG4mr&content-id=amzn1.sym.7c947cdc-0249-4ded-881f-f826efe2df4c&pf_rd_p=7c947cdc-0249-4ded-881f-f826efe2df4c&pf_rd_r=WF1VG5MBJKNWD5MR815H&pd_rd_wg=ErPzG&pd_rd_r=4441d9f2-670b-4388-9018-b62bf2a2946d&pd_rd_i=B08CB4H8MF&th=1&psc=1"

headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"}

page= requests.get(URL,headers=headers)

soup1 = BeautifulSoup(page.content,"html.parser")

soup2 = BeautifulSoup(soup1.prettify(), "html.parser")

title = soup2.find(id='productTitle').get_text()

price = soup2.find(class_ ='a-price-whole').get_text()


print(title)
print(price)


              DHRUVI TRENDZ Women's Plain Lycra with Dori Design on Half-Sleeve Regular Fit Office wear,Casual wear Top
             

                            279
                            
                             .
                            



### Using Datetime library to get the date of the day

In [3]:
today = datetime.date.today()

print(today)

2023-05-15


### Using CSV library to provide the headers of each column and to enter the datas and saving it into a CSV file 

In [4]:
import csv 

header = ['Title', 'Price', 'Date']
data = [title, price, today]


with open('AmazonWebScraperDataset.csv', 'w', newline='', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)

### Extracting the data from CSV file and saving it into a python dataframe 

In [5]:
import pandas as pd

df = pd.read_csv(r'C:\Users\91943\AmazonWebScraperDataset.csv')

print(df)

                                               Title  \
0  \n              DHRUVI TRENDZ Women's Plain Ly...   

                                               Price        Date  
0  \n                            279\n           ...  2023-05-15  


### Again Updating the data

In [6]:
with open('AmazonWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(data)

### Putting all the steps under one function `check_price()`

In [7]:
def check_price():
    URL = "https://www.amazon.in/DHRUVI-TRENDZ-LooksGud-Regular-Western/dp/B08CB4H8MF/ref=pd_ci_mcx_mh_mcx_views_0?pd_rd_w=IG4mr&content-id=amzn1.sym.7c947cdc-0249-4ded-881f-f826efe2df4c&pf_rd_p=7c947cdc-0249-4ded-881f-f826efe2df4c&pf_rd_r=WF1VG5MBJKNWD5MR815H&pd_rd_wg=ErPzG&pd_rd_r=4441d9f2-670b-4388-9018-b62bf2a2946d&pd_rd_i=B08CB4H8MF&th=1&psc=1"

    headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"}

    page = requests.get(URL, headers=headers)

    soup1 = BeautifulSoup(page.content, "html.parser")

    soup2 = BeautifulSoup(soup1.prettify(), "html.parser")

    title = soup2.find(id='productTitle').get_text()

    price = soup2.find(class_ ='a-price-whole').get_text()

    today = datetime.date.today()
    
    import csv 

    header = ['Title', 'Price', 'Date']
    data = [title, price, today]

    with open('AmazonWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
        writer = csv.writer(f)
        writer.writerow(data)
        
    if(price < '200'):
        send_mail()
        
 

### Setting the timer

In [None]:
while(True):
    check_price()
    time.sleep(24)


### Showing the ultimate dataframe

In [8]:
df = pd.read_csv(r'C:\Users\91943\AmazonWebScraperDataset.csv')

print(df)

                                               Title  \
0  \n              DHRUVI TRENDZ Women's Plain Ly...   
1  \n              DHRUVI TRENDZ Women's Plain Ly...   

                                               Price        Date  
0  \n                            279\n           ...  2023-05-15  
1  \n                            279\n           ...  2023-05-15  


### Sending the mail for sale

In [9]:
def send_mail():
    server = smtplib.SMTP_SSL('smtp.gmail.com',465)
    server.ehlo()
    server.ehlo()
    server.login('debasmitac73@gmail.com','XXXXXXXXXXx')
    
    subject = "The Shirt you want is below Rs.200! Now is your chance to buy!"
    body = "Debasmita, This is the moment we have been waiting for. Now is your chance to pick up the shirt of your dreams. Don't mess it up! Link here: https://www.amazon.com/Funny-Data-Systems-Business-Analyst/dp/B07FNW9FGJ/ref=sr_1_3?dchild=1&keywords=data+analyst+tshirt&qid=1626655184&sr=8-3"
   
    msg = f"Subject: {subject}\n\n{body}"
    
    server.sendmail(
        'debasmitac73@gmail.com',
        msg
     
    )