## Amazon Web Scraper

This notebook aims to scrape data, specifically the price of a product from a specific Amazon page: https://www.amazon.com/Tommy-Hilfiger-Performance-Stretch-Regular/dp/B07HL41FKF/ref=sr_1_2?crid=2RJSFRJZXALTR&dib=eyJ2IjoiMSJ9.OY5fqbXmaP_tkolXO48ZeJaW4bmx3S5Cg9rWuKHS6OCMsBlaB2NLePg9ZgKl7c3rcmcPwjD88biPxxEM98qFgPMY8sNXB6ZNfXQ-9mj8v9-a9Vx-y1u97j_AXcTVy8wtllYTnACHx_XANTgoCCEOKMKNPm2ICtUV-4j1CvAzksyFbAYLm1YN9P2qohu5u-JEIpoJ1TirTb04vGG0BHxbYGctzyfcCO_XlImDbjYFhy5Cg_ILzA66M6w9-nAoq3hrTohl0I_UWwBa0Hgn9ruN2-sL_We10rY77lyoI1B8PKE.q2i2Tk1V0e4pb3_zSXHs063193EQKbdgGPn3nZMlMbc&dib_tag=se&keywords=suits+for+men&qid=1710830278&refinements=p_123%3A232763&rnid=85457740011&s=apparel&sprefix=suit%2Caps%2C274&sr=1-2

This page refers to a men's suit from Tommy Hilfiger. The notebook automates the data collection process by collecting the price every 30 minutes and storing it to a csv file. It also sends out an automatic email alert if the price falls below \\$100.

In [1]:
# import libraries 

from bs4 import BeautifulSoup
import requests
import time
import datetime
import csv 
import pandas as pd

import smtplib

### Connect to Website and pull in data

In [2]:
# URL of the webpage

URL = 'https://www.amazon.com/Tommy-Hilfiger-Performance-Stretch-Regular/dp/B07HL41FKF/ref=sr_1_2?crid=2RJSFRJZXALTR&dib=eyJ2IjoiMSJ9.OY5fqbXmaP_tkolXO48ZeJaW4bmx3S5Cg9rWuKHS6OCMsBlaB2NLePg9ZgKl7c3rcmcPwjD88biPxxEM98qFgPMY8sNXB6ZNfXQ-9mj8v9-a9Vx-y1u97j_AXcTVy8wtllYTnACHx_XANTgoCCEOKMKNPm2ICtUV-4j1CvAzksyFbAYLm1YN9P2qohu5u-JEIpoJ1TirTb04vGG0BHxbYGctzyfcCO_XlImDbjYFhy5Cg_ILzA66M6w9-nAoq3hrTohl0I_UWwBa0Hgn9ruN2-sL_We10rY77lyoI1B8PKE.q2i2Tk1V0e4pb3_zSXHs063193EQKbdgGPn3nZMlMbc&dib_tag=se&keywords=suits+for+men&qid=1710830278&refinements=p_123%3A232763&rnid=85457740011&s=apparel&sprefix=suit%2Caps%2C274&sr=1-2'

In [3]:
# User agent

headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}

**Note:** The user agent is a unique identifier of one's device on the web. It can be found here: https://httpbin.org/get

In [4]:
# Connect to the page and access content

page = requests.get(URL, headers=headers)

soup1 = BeautifulSoup(page.content, "html.parser")

soup2 = BeautifulSoup(soup1.prettify(), "html.parser") # Make page content more readable

In [5]:
# Extract title and price of product

title = soup2.find(id='productTitle').get_text()

price = soup2.find(class_='a-offscreen').get_text()

print(title)
print(price)


                    Tommy Hilfiger Men's Slim Fit Performance Suit with Stretch
                   

                       $177.33
                      


In [6]:
# Clean up the data a little bit

price = price.strip()[1:]
title = title.strip()

print(title)
print(price)

Tommy Hilfiger Men's Slim Fit Performance Suit with Stretch
177.33


In [7]:
# Create a Timestamp for output to track when data was collected

today = datetime.date.today()

print(today)

2024-03-19


### Store data to CSV

In [8]:
# Create CSV and write headers and data into the file

header = ['Title', 'Price', 'Date']
data = [title, price, today]


with open('AmazonWebScraperDataset.csv', 'w', newline='', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)

In [9]:
df = pd.read_csv('AmazonWebScraperDataset.csv')

df.head()

Unnamed: 0,Title,Price,Date
0,Tommy Hilfiger Men's Slim Fit Performance Suit...,177.33,2024-03-19


In [10]:
#Appending data to the csv

with open('AmazonWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(data)

### Creating a function to automate the process

In [11]:
#Combine all of the above code into one function


def check_price():
    URL = 'https://www.amazon.com/Tommy-Hilfiger-Performance-Stretch-Regular/dp/B07HL41FKF/ref=sr_1_2?crid=2RJSFRJZXALTR&dib=eyJ2IjoiMSJ9.OY5fqbXmaP_tkolXO48ZeJaW4bmx3S5Cg9rWuKHS6OCMsBlaB2NLePg9ZgKl7c3rcmcPwjD88biPxxEM98qFgPMY8sNXB6ZNfXQ-9mj8v9-a9Vx-y1u97j_AXcTVy8wtllYTnACHx_XANTgoCCEOKMKNPm2ICtUV-4j1CvAzksyFbAYLm1YN9P2qohu5u-JEIpoJ1TirTb04vGG0BHxbYGctzyfcCO_XlImDbjYFhy5Cg_ILzA66M6w9-nAoq3hrTohl0I_UWwBa0Hgn9ruN2-sL_We10rY77lyoI1B8PKE.q2i2Tk1V0e4pb3_zSXHs063193EQKbdgGPn3nZMlMbc&dib_tag=se&keywords=suits+for+men&qid=1710830278&refinements=p_123%3A232763&rnid=85457740011&s=apparel&sprefix=suit%2Caps%2C274&sr=1-2'

    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}

    page = requests.get(URL, headers=headers)

    soup1 = BeautifulSoup(page.content, "html.parser")

    soup2 = BeautifulSoup(soup1.prettify(), "html.parser")

    title = soup2.find(id='productTitle').get_text()

    price = soup2.find(class_='a-offscreen').get_text()

    price = price.strip()[1:]
    title = title.strip()

    today = datetime.date.today()

    header = ['Title', 'Price', 'Date']
    data = [title, price, today]

    with open('AmazonWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
        writer = csv.writer(f)
        writer.writerow(data)
        
    if float(price) < 100: # If price ever falss below $100, call the send_mail function
        send_mail()   

In [12]:
# Runs check_price after a set time (30m) and inputs data into CSV file

while(True):
    check_price()
    time.sleep(1800)

KeyboardInterrupt: 

In [13]:
df = pd.read_csv('AmazonWebScraperDataset.csv')

print(df)

                                               Title   Price        Date
0  Tommy Hilfiger Men's Slim Fit Performance Suit...  177.33  2024-03-19
1  Tommy Hilfiger Men's Slim Fit Performance Suit...  177.33  2024-03-19
2  Tommy Hilfiger Men's Slim Fit Performance Suit...  177.33  2024-03-19


### Send automatic email when price falls below certain threshold

In [14]:
# Function to send an automated email when the price hits below a certain level ($100)

def send_mail():
    server = smtplib.SMTP_SSL('smtp.gmail.com',465)
    server.ehlo()
    #server.starttls()
    server.ehlo()
    server.login('mrigangka1998@gmail.com','xxxxxxxxxxxxxxxx')
    
    subject = "The Suit you want is below $100! Now is your chance to buy!"
    body = "Mrigangka, This is the moment we have been waiting for. Now is your chance to pick up the suit of your dreams. Don't mess it up! Link here: https://www.amazon.com/Tommy-Hilfiger-Performance-Stretch-Regular/dp/B07HL41FKF/ref=sr_1_2?crid=2RJSFRJZXALTR&dib=eyJ2IjoiMSJ9.OY5fqbXmaP_tkolXO48ZeJaW4bmx3S5Cg9rWuKHS6OCMsBlaB2NLePg9ZgKl7c3rcmcPwjD88biPxxEM98qFgPMY8sNXB6ZNfXQ-9mj8v9-a9Vx-y1u97j_AXcTVy8wtllYTnACHx_XANTgoCCEOKMKNPm2ICtUV-4j1CvAzksyFbAYLm1YN9P2qohu5u-JEIpoJ1TirTb04vGG0BHxbYGctzyfcCO_XlImDbjYFhy5Cg_ILzA66M6w9-nAoq3hrTohl0I_UWwBa0Hgn9ruN2-sL_We10rY77lyoI1B8PKE.q2i2Tk1V0e4pb3_zSXHs063193EQKbdgGPn3nZMlMbc&dib_tag=se&keywords=suits+for+men&qid=1710830278&refinements=p_123%3A232763&rnid=85457740011&s=apparel&sprefix=suit%2Caps%2C274&sr=1-2"
   
    msg = f"Subject: {subject}\n\n{body}"
    
    server.sendmail(
        'mrigangka1998@gmail.com',
        msg    
    )