# Imagine...
You're now the owner of QuantumByte — a company that makes laptops for personal use.

You're eyeing the Italian market, and you've decided to kick things off on Amazon.

Now, the big question: where should you invest your marketing budget?

Should you push the high-end €2,999 powerhouse... the sleek minimal notebook... or the budget-friendly version?

To figure this out, let's check the current laptop scene on Amazon.it. **We'll see if there's a link between a laptop's price and how well it sells.** Maybe there's a sweet spot in the price range that... who knows.

So, let's dive in...

![Lp Image](https://images.pexels.com/photos/205421/pexels-photo-205421.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)

### Importing the libraries

Below there are all the libraries we'll use in this notebook.

In [None]:
from bs4 import BeautifulSoup
import requests
import datetime
import csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
import os
import numpy as np

### Creating the functions

This function creates a new CSV file, where we'll store all the values we scrape.

In [None]:
#Create new csv file

def create_new_csv(df_header, csv_name):

      if os.path.isfile(csv_name):
            print('File already exists!')
      else:
            with open (csv_name, 'w',
                        newline='', encoding='UTF8') as f:
                  writer = csv.writer(f)
                  writer.writerow(df_header)

This function scrapes all the products in the previously created list, automatically providing us with the title (or product name), price, reviews, and ratings.

In [None]:
def scrape_page(elements_list, csv_file):

    # Loop through the list
    for num, i in enumerate(elements_list):
        title2 = elements_list[num].find("h2").get_text(strip=True)

        # Scrape the product only if the title matches the brand/s
        if ' ' in title2:

            # Scrape the product's title
            title = elements_list[num].find("h2").get_text(strip=True)
            
            # Scrape the price
            try:
                price = float(elements_list[num].find(class_="a-offscreen").get_text(strip=True).replace('\xa0€','').replace(',','.'))
            except:
                price = np.nan
            
            # Scrape the reviews
            try:
                reviews_num = int(elements_list[num].find(class_="a-size-base s-underline-text").get_text(strip=True).replace('.',''))
            except:
                reviews_num = np.nan
                
            str_indx = str(elements_list[num].find(class_="a-icon-alt"))
            indx=str_indx.find('su')
            
            # Scrape the rating
            try:
                rating = float(str_indx[indx-4:indx-1].replace(',','.'))
            except:
                rating = np.nan
            today = datetime.date.today()

            data = [title, price, rating, reviews_num, today]

            # Append the data to the csv file
            with open (csv_file, 'a+', newline='', encoding='UTF8') as f:
                writer2 = csv.writer(f)
                writer2.writerow(data)
            
        print("Everything looks good... until now.")

The function below converts all the products on the Amazon.it page into a list. This way, it'll be easier to work with them.

In [None]:
def get_elts(url, headerss):

    # Start the GET request
    page = requests.get(url, headers=headerss)
    soup1 = BeautifulSoup(page.content, 'html.parser')
    soup2 = BeautifulSoup(soup1.prettify(), 'html.parser')

    # Find all the search results (products)
    everything_raw = soup2.find_all(attrs={"data-component-type": "s-search-result"})

    elements_list = []

    # Create a list with all the search results (products)
    for element in everything_raw:
        elements_list.append(element)
    return elements_list, soup2

This is the function that allows us to go from page 1 to page 2, from page 2 to page 3... and so on.

In [None]:
def get_next_link(soup):

    footer_page = soup.find(attrs={'class':"s-pagination-strip"}).find_all('a', href=True)
    next_link = 'https://www.amazon.it/' + footer_page[-1].get('href')
    
    return next_link

### The actual code

In [None]:
# Use your own User-Agent header
headerss = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"}

df_headers = ['title', 'price', 'rating', 'reviews_num', 'date']

# URL of the Amazon.it page with the search results
url = 'https://www.amazon.it/s?k=Monitor&__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_1'

# How do you wanna call your file?
csv_name = 'monitors_amz_it.csv'

# How many pages do you want to scrape?
pages_to_scrape = 7



create_new_csv(df_headers, csv_name)


# This will print "Everything looks good... until now." for evet loop
# so... well, you can see if everything is going as planned.

for i in range(pages_to_scrape):
    elm_list2, soup = get_elts(url, headerss)

    scrape_page(elm_list2, csv_name)

    url = get_next_link(soup)


In [None]:
# Turn the csv file into a dataframe and check
# if there's a correlation between price, reviews, and rating

df = pd.read_csv(csv_name)

seaborn.pairplot(df)

# What we found

Looking at the scatterplot in the top-right, we can see that there's no clear link between price and sales (in this case, assuming that more reviews equal more sales)...

However, we see that laptops in the €450-€700 range tend to garner more sales on Amazon.it.

Does this mean you should promote your budget-friendly laptop?

Well, it's hard to say... and usually, I don't think it's a decision that should come from just one mind.

Anyway, **it would be a good idea to start testing your budget-friendly version...**

**Or perhaps you high-end version if your brand truly aims to be unique. (Indeed, there are very few lapotps above €1,000.)**