Technical Proposal:

Perform web scraping on the Cúspide Libros website, www.cuspide.com, to obtain the list of the top 100 best-selling books of the week. 
The data to collect includes: book title, book URL, price in pesos, price in dollars, 
and price in dollars considering the exchange rate of the Argentine blue dollar or in euros, along with the date. 
The exchange rate for the Argentine blue dollar or in euros will also be obtained through scraping from a selected web page.

These data will be stored in a .csv file with the corresponding fields. 
Additionally, another output file will be created as an error log in case scraping for any title cannot be performed for some reason. 


In [4]:
# Index:
# 0- Import libraries
# 1- Scraping and create a dictionary of Books and URLs
# 2- Published price in pesos
# 3- Published price on each page, in dollars
# 4- BLUE (Exchange rate)
# 5- Date
# 6- Combine everything into a data frame
# 7- EXPORT to .CSV
# 8- Output file, error log

In [18]:
#0
# import libraries
import requests
import re
import lxml
from bs4 import BeautifulSoup
import pandas as pd
import logging

In [6]:
#1

# select the website for the top 100 best sellers on Cúspide and set variables for scraping
website = "https://cuspide.com/100-mas-vendidos/"
resultado0 = requests.get(website)
content = resultado0.text

# Create BeautifulSoup object
soup = BeautifulSoup(content, 'lxml')

# Find all <h3> tags
nlibros = soup.find_all('h3', class_='name product-title woocommerce-loop-product__title')

lista_Titulodellibro = []
lista_Url = []

# Iterate over the list of BeautifulSoup objects
for h3_element in nlibros:
    # Find the <a> tag within <h3>
    a_element = h3_element.find('a')

    # Check if the <a> tag was found
    if a_element:
        # Print the text inside the <a> tag
        titulo_libro = a_element.get_text(strip=True)
        lista_Titulodellibro.append (titulo_libro)
        
        # get the URL
        url_libro = a_element['href']
        lista_Url.append (url_libro)
    else:
        print("The <a> tag was not found within the <h3> element")

# create dictionaries
diccionario_Titulos = {'Titulos' : lista_Titulodellibro}
diccionario_Url = { 'URL' : lista_Url}

In [7]:
#2 Price of the books published on the page (in pesos)
listadesordenada = soup.find_all ('span', class_='woocommerce-Price-amount')
lista_preciodellibro = []

for precio in listadesordenada:
    # Find the <a> tag within <h3>
    a_element = precio.find('bdi')

    if a_element:
        # Print the text inside the <a> tag
        precio_libro = a_element.get_text(strip=True)
        lista_preciodellibro.append (precio_libro)
listaprecios = lista_preciodellibro[1:]
# I shifted it one to the right because it was taking the value of the shopping cart, which we are not going to consider.

#create the dictionary
diccionario_precio_en_pesos = {'Precio en pesos' : listaprecios}

In [8]:
#3- Published price on each page, in dollars.
lista_precioendolares = []

for i in lista_Url:
    resultado = requests.get(i)
    contenido = resultado.text
    soup = BeautifulSoup(contenido, 'lxml')
    preciodolar = soup.find_all('span', style= 'font-size: 1.3em')
    x = str (preciodolar)
    y = x.split(">")[1].split("<")[0]
    lista_precioendolares.append(f'USD${y}')

#create the dictionary
diccionario_dolares = {'En USD (publicado)':lista_precioendolares}

In [9]:
#4- VALUE OF THE BOOKS IN BLUE DOLLAR

web = 'https://dolarhoy.com/cotizaciondolarblue'
resultado = requests.get(web)
contenido = resultado.text
soup = BeautifulSoup(contenido, 'lxml')
dolarblue = soup.find_all('div', class_= 'value')

blue = list (dolarblue)
blue = str(blue[1])
dolarblue = (blue.split(">")[1].split("<")[0])

def calcularpreciodivision(x, y):
    x = x.replace (".", "")
    x = x.replace (",", ".")
    x = x.replace ("$", "")
    x = float (x)
    y = y.replace(",", ".")
    y = y.replace("$","")
    y = float(y)
    return ( x / y)

lista_valorblue =[]
for i in listaprecios:
    x = calcularpreciodivision (i,dolarblue)
    redondeado = round(x, 2)
    lista_valorblue.append(f'blue_USD: ${redondeado}')

#create the dictionary
diccionario_blue = {'En USD BLUE' : lista_valorblue}

In [19]:
#5 Date 
from datetime import datetime
# Get the date
fecha_actual = datetime.now()

lista_fechas = []
for i in listaprecios:
    lista_fechas.append(fecha_actual.strftime("%d-%m-%Y"))

diccionario_fechas = {'fecha' : lista_fechas}

In [11]:
#6- Combine everything into a data frame
df0 = diccionario_Titulos
df1 = diccionario_Url
df2 = diccionario_precio_en_pesos
df3 = diccionario_dolares
df4 = diccionario_blue
df5 = diccionario_fechas
df = df0 | df1 | df2 | df3| df4| df5
df = pd.DataFrame (df)


In [12]:
#7
df.to_csv('cuspide-100-mas-vendidos.csv')

In [17]:
#8
logging.basicConfig(level = logging.INFO, filename="log.log", filemode = "w")
logging.debug("debug")
logging.info("info")
logging.warning("warning")
logging.error("error")
logging.critical("critical")