![MBITSchool](https://i.imgur.com/UiDMkO3.png)

### Proyecto de Consolidación APIs y Web Scraping

##### Alejandro Paredero - paredero@mbitschool.com

**El proyecto de consolidación de Web Scraping, API REST y Streamlit pretende que de forma autónoma combinemos proyectos de captura, procesamiento de datos y visualización basados en un sistema o entre dos o más distintos.**

Aquí la creatividad es muy importante ✔️. Una vez resuelto los retos propuestos siéntete libre de extenderlos con características adicionales.

⚠️ **ATENCIÓN**: <u>Ten a mano las presentaciones y los cuadernos resueltos de las sesiones anteriores, te serán de gran ayuda.</u>

**Los ejercicios EXTRA son opcionales**. Si dais vuestro consentimiento tras la fecha de cierre, podré hacer publicaciones con capturas de pantalla en redes como LinkedIn con objeto de promocionaros.

Para dudas tenéis un foro en el campus o mi correo electrónico 📧 paredero@mbitschool.com

**Datos del alumno:**

- **Nombre:** `Daniel Herraiz Tello`
- **Consentimiento sobre ejercicios EXTRA:** Si realizas el ejercicio, ¿permites que pueda publicar el contenido en foto/vídeo si el resultado es relevante? `SI`
- **Comentario:** Tras realizar los diferentes retos, ¿qué te han parecido? ¿qué problemas has encontrado?


##### Comentarios tras finalizar
* ##### Reto 1
    * El reto 1 está por completo en reto1.py
    * La columna +/- requiere especial atención y me ha dado algunos problemas al variar el nombre de la clase según >0 o <0
    * Al realizar el reto inicialmente en el notebook, podía usar una variable global como caché. En el script de python he tenido que usar un sistema distinto:
        * Las variables globales se reinician al interactuar con el UI
        * Ha sido necesario un sistema de archivos
        * Inicialmente probé con un archivo y json (pag1->content1...) pero era muy complicado con los formatos, decidí crear un archivo por página
        * He añadido una sección para ver y vaciar caché en la app para facilitar las pruebas

* ##### Reto 2
    * La única forma que he encontrado de obtener el precio actualizado es cargar y cerrar el driver en cada loop. Cualquier otra forma me daba un valor estático o me saltaba el bot si intentaba refrescar la página.

* ##### Reto 3
    * He hecho una app más extensa con varios archivos incluidos en este directorio ("RetoExtraMain", "RetoExtraUtils")
    * Encuentro más obstáculos con las tiendas más grandes (amazon, corte inglés, casa del libro)
        El buscador carga de distinta manera a veces, en algunas ocasiones hay que expandir un shadow root, hay algunas redirecciones a ciertas categorías según los resultados, etc
    * El "implicit wait" me ha parecido muy útil excepto cuando esperas encontrar varios elementos (puede sólo quedarse con el primero y no cargar más)
    * Streamlit tiene muchas funcionalidades pero también algunas limitaciones al pintar dataFrames (quería por ejemplo haber usado algún formato condicional, y esto no está soportado por el "data_editor")


### Reto 1 - BeautifulSoup y Streamlit: Capturar estadísticas de equipos de hockey

Vista la web https://www.scrapethissite.com/pages/forms/. Examina como se comporta la URL con la paginación.  El objetivo es capturar la información de las primeras 10 páginas y crear una aplicación sencilla en Streamlit que permita mostrar la información de las columnas "Team name", "Year", "Win", "Loses" y "+/-"

##### Ejercicio:
* Captura el los términos listados anteriormente:  "Team name", "Year", "Win", "Loses" y "+/-" , de al menos las primeras 8 páginas.
* Almacena el resultado en un **dataframe** de Steramlit.
* Implementa el cacheo de contenido de las URL (Revisa la presentación) para evitar consultas repetidas a las URL.
* **Extra**: Permite que el usuario elija el rango de página mínima y máxima a capturar contenido
* **Extra**: Permite que el usuario elija filtrar por equipos cuyo "+/-" sea superior a una cifra determinada.

*PISTA: La página 1 tiene la estructura `https://www.scrapethissite.com/pages/forms/?page_num=1` , la página 2 `https://www.scrapethissite.com/pages/forms/?page_num=2` y así sucesivamente.*

### Reto 2 - Selenium: Captura del valor del oro

Como vimos en clase, hay páginas que implementan medidas para evitar el Web Scraping. En la web de https://www.inversoro.es/precio-del-oro/precio-oro-hoy/ apreciamos cómo la página web cara un valor antiguo y pasados unos segundos actualiza. 

Esto hace que bibliotecas como `request` + `BeautifulSoup` no nos sirva aquí ya que capturamos el instante t=0. Sin embargo, con *Selenium* podemos interactuar en tiempo real mientras el navegador se ejecuta.

#### Ejercicio
Inspecciona la web, obteniendo la etiqueta HTML donde el valor aparece representado y crea un script que utilice Selenium, cargando la página y capturando  dicho valor en tiempo real cada 5 segundos en un periodo de 1 minuto. Almacena el resultado de dicho valor y del timestamp correspondiente (*[ayuda](https://www.geeksforgeeks.org/get-current-timestamp-using-python/)*)

In [1]:
from selenium.webdriver.common.by import By
import datetime
import time
import pandas as pd


In [4]:
# Carga de Driver para Firefox
from selenium import webdriver
from selenium.webdriver.firefox.service import Service

service = Service(executable_path="C:/Repos/Utils/geckodriver.exe")
options = webdriver.FirefoxOptions()
options.add_argument("-private")

In [None]:
# tras realizar el loop y probar varias veces veo que no se actualiza el precio sin refrescar la página
# Pruebo la págiona "tiempo-real", y salta captcha siempre, no veo forma de saltarlo
# Si uso refresh() también salta captcha
# Lo único que me funciona es cargar y cerrar el driver en cada loop:

entryList = []
start_time = datetime.datetime.now().timestamp()
while datetime.datetime.now().timestamp() - start_time < 60:
    driver = webdriver.Firefox(service=service, options=options)
    driver.get("https://www.inversoro.es/precio-del-oro/precio-oro-hoy/")
    time.sleep(2.5)
    precio_oro = driver.find_element(By.NAME, "current_price_field").text
    timestamp = datetime.datetime.now()
    entryList.append({"timestamp": timestamp, "precioOro": precio_oro})
   # driver.refresh()
    driver.close()
goldPriceDF = pd.DataFrame(entryList)
goldPriceDF

Unnamed: 0,timestamp,precioOro
0,2025-04-11 13:54:31.761804,"2 833,86 €"
1,2025-04-11 13:54:40.337720,"2 833,86 €"
2,2025-04-11 13:54:49.100095,"2 836,79 €"
3,2025-04-11 13:54:57.550843,"2 833,86 €"
4,2025-04-11 13:55:06.044242,"2 833,86 €"
5,2025-04-11 13:55:14.662721,"2 836,79 €"
6,2025-04-11 13:55:23.283841,"2 836,79 €"
7,2025-04-11 13:55:31.820693,"2 833,86 €"


In [None]:
driver.close()


### Reto EXTRA - Selenium, BeautifulSoup y Streamlit: El comparador (simple) de precios 📊📉

Los comparadores de precios <u>es uno de los nichos de mercado más lucrativos en Internet.</u> Dado un producto o servicio a nivel de usuario nos beneficiamos del precio más bajo existente y a nivel de empresa éstas obtienen grandes beneficios gracias a sistemas de referidos o de comisiones por venta realizada.

En esta ocasión vamos a realizar un simple comparador de precios basado en el número ISBN. Un ISBN es un código normalizado internacional para libros (International Standard Book Number). Estos estaban compuestos por 10 dígitos hasta diciembre de 2006 pero, desde enero de 2007, tienen una extensión de 13 dígitos. 


Por ejemplo, el ISBN [9788478884452](https://www.google.es/search?q=9788478884452) (haz click) corresponde al libro "Harry Potter y la piedra filosofal".

Dado que no disponemos de acceso a APIs de las tiendas principales <u>capturaremos el precio a través de técnicas de Web Sraping.</u>

##### Ejercicio 

El objetivo es realizar la consulta de un ISBN, por ejemplo el de "Harry Potter y la Piedra filosofal" en al menos **tres** de las siguientes tiendas propuestas. Captura el primer resultado en EUROS, y devuelve cómo resultado cuál tiene el precio más bajo. 

Crea un sencillo interfaz en Streamlit que pregunte por el ISBN de un número (entre 10 y 13 dígitos) y devuelva como resultado el primer elemento de cada tienda examinada, con el título, la imagen (si la hubiese), el precio y un enlace para hacer click.

Como puntos extra:
* Incluye en la comparativa más de las 3 tiendas propuestas.
* Incluye por texto posteriormente cuál es la diferencia de precio en euros y en porcentaje respecto al valor más bajo detectado para saber cuánto nos estamos ahorrando.
* Crea un diagrama de barras con el precio en cada tienda para representar visualmente el ahorro en precio.

Ejemplo de tiendas propuestas:
- https://www.casadellibro.com/
- https://www.libreriacentral.com/
- https://www.iberlibro.com/
- https://www.amazon.es/
- https://ebay.es
- https://www.elcorteingles.es/

Recuerda que para cada página debes realizar ingeniería inversa, averiguando cómo se comportan las URLs de cada sitio web para hacer una búsqueda directa.

**Pista**: Utiliza Selenium para capturar datos si te resulta request/BeautifulSoup complicado de utilizar.

**Extra**: *¿Conoces alguna otra página donde comprar libros? Inclúyela en el comprador*

**Extra 2**: *¿Cómo podríamos obtener una evolución del precio durante una semana?*

In [4]:
import json
import requests
import streamlit as st
from bs4 import BeautifulSoup
# from selenium import webdriver
from time import sleep

from selenium.webdriver.common.by import By
import datetime
import time
import pandas as pd
# Carga de Driver para Firefox
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.support.wait import WebDriverWait
from selenium.common.exceptions import NoSuchElementException

from selenium.webdriver import ActionChains
from selenium.webdriver.common.actions.wheel_input import ScrollOrigin

In [37]:
isbn = '9788478884452'
url = f"https://www.casadellibro.com/?query={isbn}"
print(url)
page = requests.get(url)
print(page.status_code)
soup = BeautifulSoup(page.content,"html.parser")
priceList = soup.find_all("span", class_="x-currency")
print(priceList)

with open("casaLibroBs4.txt", "wb") as cacheFile:
        cacheFile.write(page.content)

https://www.casadellibro.com/?query=9788478884452
403
[]


## Con beautifulSoup obtengo "access denied"

In [None]:
driver.close()

: 

In [3]:

service = Service(executable_path="C:/Repos/Utils/geckodriver.exe")
options = webdriver.FirefoxOptions()
options.add_argument("-private")



In [5]:

isbn = '9788478884452'
url = f"https://www.casadellibro.com/?query={isbn}"
driver = webdriver.Firefox(service=service, options=options)
driver.get(url)

In [28]:
soup = BeautifulSoup(driver.page_source,"html.parser")
priceList = soup.find_all("span", class_="x-currency")
print(priceList)

with open("casaLibroBs4.html", "w",  encoding='utf-8') as cacheFile:
        cacheFile.write(driver.page_source)

[]


In [39]:
search = driver.find_element(By.XPATH, "//input[@id='empathy-search']")
search.send_keys(isbn)

In [42]:
links = driver.find_elements(By.XPATH, "//article[@data-test='search-grid-result']")
print(links)


[]


In [23]:
# availability = driver.find_element(By.XPATH, "//div[@data_test='availability']")
# availability.find_elements(By.XPATH, "//li[@data-test='base-filters-item]")
# print(len(availability))

# --- EXPAND SHADOW DOM ROOT ---
def expand_shadow_element(element):
    shadow_root = driver.execute_script("return arguments[0].shadowRoot", element)
    return shadow_root

root1 = driver.find_element(By.XPATH, "//div[@class='x-root-container']")
shadow_root1 = expand_shadow_element(root1)

button = shadow_root1.find_element(By.CSS_SELECTOR, 'button[data-test="toggle-facets-button"]')
button.click()
# availability = shadow_root1.find_element(By.CSS_SELECTOR, 'div[data-test="availability"]')

In [24]:
availability = shadow_root1.find_element(By.CSS_SELECTOR, 'div[data-test="availability"]')
# print(len(availability))

In [25]:
availability.click()

In [10]:
driver.close()

WebDriverException: Message: Failed to decode response from marionette


In [15]:
scroll_origin = ScrollOrigin.from_element(availability)
ActionChains(driver)\
    .scroll_from_origin(scroll_origin, 0, 2000)\
    .perform()


In [None]:
html = shadow_root1.find_element(By.XPATH, "div[class='x-scroll x-flex-auto x-p-40 x-pr-24 x-pt-0']")
# html.send_keys(Keys.END)

In [39]:
# checkboxList = availability.find_elements(By.CSS_SELECTOR, 'li[data-test="base-filters-item"]')
# print(checkboxList)

# print(checkboxList[1].text)
# checkboxList[1].click()

for checkbox in availability.find_elements(By.CSS_SELECTOR, 'li[data-test="base-filters-item"]'):
    print(checkbox.text)
    if checkbox.text == 'disponible':
        checkbox.click()
        print('click')
        break

Todo
con Stock
(43)
agotado
(31)
descatalogado
(18)
disponible
(14)
preventa
(1)


In [48]:
for checkbox in availability.find_elements(By.CSS_SELECTOR, 'button[data-test="filter"]'):
    print(checkbox.text)
    if 'disponible' in checkbox.text:
        checkbox.click()
        print('click')
        break

con Stock
(43)
agotado
(31)
descatalogado
(18)
disponible
(14)
click


In [43]:
print(availability.find_elements(By.CSS_SELECTOR, 'button[data-test="filter"]')[3].text)
availability.find_elements(By.CSS_SELECTOR, 'button[data-test="filter"]')[3].click()

disponible
(14)


In [222]:
sortButton = shadow_root1.find_element(By.CSS_SELECTOR, 'div[data-test="sort"]')
sortButton.click()

In [228]:
sortButton.click()

In [202]:
checkboxListSort = sortButton.find_elements(By.CSS_SELECTOR, 'button[data-test="sort-picker-button"]')
print(checkboxListSort[2].text)
print(len(checkboxListSort))

Precio: De menor a mayor
7


In [203]:
checkboxListSort[2].click()

In [None]:
scroll_origin = ScrollOrigin.from_element(sortButton)
ActionChains(driver)\
    .scroll_from_origin(scroll_origin, 0, 3500)\
    .perform()

# modalFooter = driver.find_element(By.CSS_SELECTOR, 'button[data-test="filters-show-more"]')
# ActionChains(driver)\
#     .scroll_to_element(modalFooter)\
#     .perform()



MoveTargetOutOfBoundsException: Message: Move target (1006, -102) is out of bounds of viewport dimensions (1280, 601)
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:199:5
MoveTargetOutOfBoundsError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:518:5
assertTargetInViewPort@chrome://remote/content/shared/webdriver/Actions.sys.mjs:3103:11
#assertInViewPort@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:115:17
receiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:210:42


In [44]:
links = driver.find_elements(By.XPATH, "//a[@data-test='x-result-link']")
links = driver.find_elements(By.XPATH, "//div[@data-test='x-relative']")
links = driver.find_elements(By.XPATH, "//input[@id='empathy-search']")
print(links)


[<selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="15fd8340-2d7b-4cd1-acea-96637e77e9fc")>]


In [47]:
def expand_shadow_element(element):
    shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
    return shadow_root

In [62]:
root1 = driver.find_element(By.XPATH, "//div[@class='x-root-container']")
print(root1)
shadow_root1 = expand_shadow_element(root1)
bookPrice = shadow_root1.find_elements(By.CLASS_NAME, "x-currency")
print(bookPrice[0].text)

# bookPrice1 = shadow_root1.find_elements(By.XPATH, "//div[@data-test='result-previous-price']")
bookPrice1 = shadow_root1.find_elements(By.CSS_SELECTOR, 'div[data-test="result-current-price"]')

try:
    bookPrice2 = shadow_root1.find_element(By.CSS_SELECTOR, 'div[data-test="result-current-price"]')
    print("Element found:", bookPrice2.text)
except NoSuchElementException:
    print("Element not found.")

print(bookPrice1[0].text)

<selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="493cd458-721b-4237-9e45-cb542d2524cb")>
16,95 €
Element found: 16,10 €
16,10 €


In [18]:

listaPrecios = driver.find_elements(By.CLASS_NAME, "x-currency")
print(listaPrecios)
listaPrecios2 = driver.find_elements(By.XPATH, "/div/div/div[1]/div/div[3]/div[3]/div/ul/li/article/a[2]/div[1]/div[2]/span")
print(listaPrecios2)
listaPrecios3 = driver.find_elements(By.CSS_SELECTOR, 'div[data-test="result-current-price"]')
print(listaPrecios3)
listaPrecios4 = driver.find_elements(By.XPATH, "//*[@data-test='result-current-price']/span")
print(listaPrecios4)


[]
[]
[]
[]


In [13]:
listaPrecios4 = driver.find_elements(By.CSS_SELECTOR, 'h2[data-test="result-title"]')
print(listaPrecios4)
listaPrecios5 = driver.find_elements(By.TAG_NAME, 'h2')
print(listaPrecios5)

[]
[<selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="455293ad-f0f4-45ca-baf8-20e84364edb5")>, <selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="e5b900a6-2ea4-4f3f-aa00-bd422ba79338")>, <selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="5d6145ed-371b-43b0-821e-1ad7a7de0157")>, <selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="74306c36-5f68-4e6e-9579-dd6164fb3ac5")>, <selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="1c1462b9-0131-4c58-a1b0-1b9cf4118398")>, <selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="18ba1d59-da19-4f00-8b03-8c897392c621")>, <selenium.webdriver.remote.webelement.WebElement (session="ae533682-0573-4cf9-8819-f64c49859626", element="e

In [None]:
listaPrecios = driver.find_elements(By.CLASS_NAME, "x-currency")
print(listaPrecios)
acceptCookies = driver.find_element(By.XPATH, '//*[@id="onetrust-accept-btn-handler"]')

acceptCookies.click()


[]


In [35]:
url2 = f"https://www.casadellibro.com"
driver.get(url2)
with open("casaLibroDriver.html", "w",  encoding='utf-8') as cacheFile:
        cacheFile.write(driver.page_source)

In [None]:
driver.refresh()

In [40]:
# driver.navigate().to(url)
# firstPrice = driver.find_element(By.CLASS_NAME, "x-currency")
# driver.refresh()
driver.get(url)
wait = WebDriverWait(driver, timeout=2)
wait.until(lambda _ : driver.find_element(By.CLASS_NAME, "x-currency").is_displayed())
listaPrecios = driver.find_elements(By.CLASS_NAME, "x-currency")
# //*[@id="onetrust-accept-btn-handler"]
print(listaPrecios)

TimeoutException: Message: 
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:199:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:552:5
dom.find/</<@chrome://remote/content/shared/DOM.sys.mjs:136:16


In [None]:
## ENTREGA AQUÍ EL CÓDIGO PYTHON DE LA APLICACIÓN STREAMLIT



In [1]:
#tienda 2

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.service import Service
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.options import Options
import requests
from bs4 import BeautifulSoup

import time

In [None]:

# --- SETUP FIREFOX DRIVER ---
service = Service(executable_path="C:/Repos/Utils/geckodriver.exe")
options = Options()
searchInput= "piedra filosofal"
query = searchInput.strip().replace(" ","+")
driver = webdriver.Firefox(service=service, options=options)
# driver.get(f"https://www.libreriacentral.com/SearchResults.aspx?st={query}&cId=0&sm=qck")

# time.sleep(1) 

# search = driver.find_element(By.XPATH, "//input[@id='empathy-search']")
# search.send_keys(query)   

In [2]:
driver.close()

NameError: name 'driver' is not defined

In [1]:
searchInput= "piedra filosofal"
query = searchInput.strip().replace(" ","+")

url = f"https://www.libreriacentral.com/SearchResults.aspx?st={query}&cId=0&sm=qck"
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
results = soup.find_all("div",class_="products-preview-list-item")

NameError: name 'requests' is not defined

In [88]:
pricetest = ' '
if not pricetest:
    print('si')
else:
    print('no')

no


In [45]:

# print(results[0].find(attrs={"itemprop": "name"}).get_text())
print(results[0].find("meta", attrs={"itemprop": "author"})['content'])
price_ele = results[0].find("div", class_="precio")
print(results[0].find("div", class_="precio").prettify())

for price_substring in price_ele.find_all("span"):
    # print(price_substring[0].strip() + price_substring[0].strip())
    print(price_substring.get_text().strip())

Rowling, J. K.
<div class="precio" id="ctl00_CPHMainCenter_searchedProducts_rptrProdSearched_ctl00_ProductSmall_pnlPrice">
 <strong>
  <span content="16.10" id="ctl00_CPHMainCenter_searchedProducts_rptrProdSearched_ctl00_ProductSmall_lblPriceWeb" itemprop="price">
   16,10
  </span>
  <span content="EUR" itemprop="priceCurrency">
   €
  </span>
 </strong>
 <strike>
  P.V.P.:
  <span id="ctl00_CPHMainCenter_searchedProducts_rptrProdSearched_ctl00_ProductSmall_lblPrice">
   16,95
  </span>
  €
 </strike>
</div>

16,10
€
16,95


In [80]:
result = results[1]
name_element = result.find(attrs={"itemprop": "name"})
title = name_element.get_text().strip()
author = result.find("meta", attrs={"itemprop": "author"})['content'].strip()
print(author)
price_substrings = result.find("div", class_="precio").find_all("span")
current_price = price_substrings[0].get_text().strip() +' '+price_substrings[1].get_text().strip()
if 2 < len(price_substrings):
    original_price = price_substrings[2].get_text().strip() +' '+price_substrings[1].get_text().strip()
else:
    original_price = current_price

print(current_price)
print(original_price)

img_url = result.find("img", class_='foto')['src']
print(img_url)

availability = result.find("link", attrs={"itemprop": "availability"})
print(availability.get_text().strip())
availability2 = result.find("span", class_='css-disponible')
print(availability2.get_text())

Rowling, J. K.
37,00 €
38,95 €
Resources/Pictures/978841817407.jpg

Disponible


In [None]:

books = []
i = 0
for result in results:
    i+=1
    print(i)
    try:
        #In stock? 
        availability = result.find("span", class_='css-disponible')
        if availability and result.find("span", class_='css-disponible').get_text().strip() == "Disponible":

            # Title
            name_element = result.find(attrs={"itemprop": "name"})
            title = name_element.get_text().strip()
            print(title)

            #Author
            author = result.find("meta", attrs={"itemprop": "author"})['content'].strip()
            print(author)
            detail = 'N/A'

            # Price 
            try:
                price_substrings = result.find("div", class_="precio").find_all("span")
                current_price = price_substrings[0].get_text().strip() +' '+price_substrings[1].get_text().strip()
                if len(price_substrings) > 2:
                    original_price = price_substrings[2].get_text().strip() +' '+price_substrings[1].get_text().strip()
                else:
                    original_price = current_price
            except e:
                current_price = "N/A"
                original_price = "N/A"
            print(current_price)
            # Image
            try:
                img_url = result.find("img", class_='foto')['src']
            except e:
                img_url = "N/A"

            #Link
            try:
                link = name_element["href"]
                print(link)
            except:
                link = "N/A"

            books.append({
                "Title": title,
                "Author": author,
                "Detail": detail,
                "Original Price": original_price,
                "Current Price": current_price,
                "Image url": img_url,
                "Link url": link
            })
    except Exception as e:
        print(f"Error al obtener resultados: {e}")

1
Harry Potter y la piedra filosofal
Rowling, J. K.
16,10 €
2
Harry Potter y la piedra filosofal (Ed. Minalima)
Rowling, J. K.
37,00 €
3
Harry Potter y la piedra filosofal (Harry Potter 1)
Rowling, J. K.
12,30 €
4
harry potter y la piedra filosofal
Rowling, J. K.
19,90 €
5
6
7
8
9
10
11
12
13
14
15
16


In [90]:
query = '9788478884452'
baseUrl = 'https://www.iberlibro.com/'
queryUrl = query.strip().replace(" ","%20")
url = f"{baseUrl}servlet/SearchResults?cond=new&ds=20&fs=es&kn={queryUrl}&n=100046497&pt=book&rollup=on&sortby=2"
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
results = soup.find_all("div",class_="products-preview-list-item")
print(url)

https://www.iberlibro.com/servlet/SearchResults?cond=new&ds=20&fs=es&kn=9788478884452&n=100046497&pt=book&rollup=on&sortby=2


In [7]:
driver = webdriver.Firefox(service=service, options=options)
driver.implicitly_wait(5)

# time.sleep(2)  # Wait for dynamic content to load
# acceptCookies = driver.find_element(By.XPATH, '//*[@id="onetrust-accept-btn-handler"]')
# acceptCookies.click()
# time.sleep(1) 


In [143]:
driver.close()

WebDriverException: Message: Failed to decode response from marionette


In [4]:

service = Service(executable_path="C:/Repos/Utils/geckodriver.exe")
options = webdriver.FirefoxOptions()
options.add_argument("-private")

In [5]:
driver = webdriver.Firefox(service=service, options=options)
url = "https://www.amazon.es/"
driver.get(url)

In [138]:
categorySelector = driver.find_element(By.ID, "searchDropdownBox")
categorySelector.click()
booksOption = categorySelector.find_element(By.XPATH, "//option[@value='search-alias=stripbooks']")
booksOption.click()


In [139]:
query = "piedra filosofal"
search = driver.find_element(By.XPATH, "//input[@id='twotabsearchtextbox']")
search.send_keys(query)
acceptSearch = driver.find_element(By.XPATH, "//input[@id='nav-search-submit-button']")
acceptSearch.click()

In [140]:
from selenium.webdriver.support.select import Select

In [151]:
driver.find_element(By.CSS_SELECTOR, "span[class='a-dropdown-container']").click()
sortPrice = driver.find_element(By.XPATH, "//a[@id='s-result-sort-select_1']")
sortPrice.click()

In [149]:
#sort alternative

driver.find_element(By.CSS_SELECTOR, "span[class='a-dropdown-container']").click()
select_element = driver.find_element(By.ID, 's-result-sort-select')

select_sort = Select(select_element)
select_sort.select_by_value('price-asc-rank')

ElementClickInterceptedException: Message: Element <select id="s-result-sort-select" class="a-native-dropdown a-declarative" name="s"> is not clickable at point (1173,153) because another element <a id="s-result-sort-select_0" class="a-dropdown-link a-active" href="javascript:void(0)"> obscures it
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:199:5
ElementClickInterceptedError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:351:5
webdriverClickElement@chrome://remote/content/marionette/interaction.sys.mjs:177:11
interaction.clickElement@chrome://remote/content/marionette/interaction.sys.mjs:136:11
clickElement@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:354:29
receiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:230:31


In [20]:
#Category button
driver.find_element(By.ID, "searchDropdownBox").click()
#Select books category
driver.find_element(By.XPATH, "//option[@value='search-alias=stripbooks']").click()
#Find search input and send query
search = driver.find_element(By.XPATH, "//input[@id='twotabsearchtextbox']")
search.send_keys(query)
#Confirm search
driver.find_element(By.XPATH, "//input[@id='nav-search-submit-button']").click()

# #Sort price ascending
driver.find_element(By.XPATH, "//span[@class='a-dropdown-prompt']").click()
driver.find_element(By.XPATH, "//a[@id='s-result-sort-select_1']").click()

In [None]:
results = driver.find_elements(By.XPATH, "//div[@data-csa-c-type='item']")
print(len(results))

for result in results:
    try:
        # bookHeaderInfo = result.find_element(By.XPATH, "//div[@data-cy='title-recipe']")
        title = result.find_element(By.XPATH, ".//h2[@class='a-size-medium a-spacing-none a-color-base a-text-normal']").text
        #Authors might come with different class 
        print(title)
        try:
            driver.implicitly_wait(0.5)
            author = result.find_element(By.XPATH, ".//a[@class='a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style']").text
            print(author)
        except Exception as e:
            try:
                author = result.find_elements(By.XPATH, ".//span[@class='a-size-base']")[1].text
                # print('author 2')
                # print(author)
            except:
                continue
        #Different prices for different formats
        for priceResult in result.find_elements(By.XPATH, ".//div[@data-cy='price-recipe']"):
            # print(priceResult.text)
            current_price = priceResult.find_elements(By.CSS_SELECTOR, '.span[class="a-price"]')
            # current_price = priceResult.find_elements(By.XPATH, './/span[@class="a-price-whole"]')
            if priceList:
                price = priceList[0]
                print(price.text)
                if len(priceList)>1:
                    originalPrice = priceList[1]
            
                # print(f'orig price: {originalPrice.text}')
                # print(f'final price: {price.text}')
            
    except Exception as e:
        print(f"Error al obtener resultados de {url} : {e}")


In [88]:
results = driver.find_elements(By.CSS_SELECTOR, "div[data-csa-c-type='item']")
print(len(results))

for result in results:
    try:
        # bookHeaderInfo = result.find_element(By.XPATH, "//div[@data-cy='title-recipe']")
        title = result.find_element(By.CSS_SELECTOR, "h2[class='a-size-medium a-spacing-none a-color-base a-text-normal']").text
        #Authors might come with different class 
        print(title)
        try:
            driver.implicitly_wait(0.5)
            author = result.find_element(By.CSS_SELECTOR, "a[class='a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style']").text
            # print(author)
        except Exception as e:
            try:
                author = result.find_elements(By.CSS_SELECTOR, "span[class='a-size-base']")[1].text
                # print('author 2')
                # print(author)
            except:
                continue
        #Different prices for different formats
        # for priceResult in result.find_elements(By.CSS_SELECTOR, "[data-cy='price-recipe'] .a-price-whole, .a-price-fraction"):
        i=0
        for priceResult in result.find_elements(By.CSS_SELECTOR, "[data-cy='price-recipe'] .a-price, .a-offscreen"):
            i+=1
            print(f"price {i}")
            print(priceResult.text)
            # print(result.text)
            # current_price = priceResult.find_elements(By.CSS_SELECTOR, '.span[class="a-price"]')
            # current_price = priceResult.find_elements(By.XPATH, './/span[@class="a-price-whole"]')
            if priceList:
                price = priceList[0]
                # print(price.text)
                if len(priceList)>1:
                    originalPrice = priceList[1]
            
                # print(f'orig price: {originalPrice.text}')
                # print(f'final price: {price.text}')
            
    except Exception as e:
        print(f"Error al obtener resultados de {url} : {e}")


18
Harry Potter y la piedra filosofal (edición Ravenclaw del 20º aniversario) (Harry Potter 1): Ingenio · Estudio · Sabiduría
price 1
19
76€
price 2

price 3

price 4
21,95€
price 5

La piedra filosofal de los evangelios a los tratados alquímicos
price 1
9
00€
price 2

price 3

price 4
10,00€
price 5

price 6

price 7

price 8

Harry Potter y la piedra filosofal
price 1
0
00€
price 2

price 3

price 4

price 5

price 6

Sobre la Piedra Filosofal: 3 (Nuevos Horizontes)
price 1
6
88€
price 2

Potter Libro para Colorear: La Piedra Filosofal para colorear
price 1
7
81€
price 2

price 3

price 4
8,68€
price 5

Tratado de la piedra filosofal y Tratado sobre el arte de la alquimia (2010)
price 1
8
06€
price 2

price 3

price 4
8,95€
price 5

El libro la piedra filosofal: Alquimia
price 1
8
43€
price 2

price 3

price 4
9,37€
price 5

La piedra filosofal: Libro de alquimia
price 1
8
57€
price 2

price 3

price 4
9,52€
price 5

Harry Potter y la piedra filosofal (edición Gryffindor del 20º aniv

In [113]:
import pandas as pd
import re

In [None]:

results = driver.find_elements(By.CSS_SELECTOR, "div[data-csa-c-type='item']")
print(len(results))
baseUrl = "https://www.amazon.es/"
books = []
driver.implicitly_wait(0)
for result in results:
    tempBooks = []
    try:
        title = result.find_element(By.CSS_SELECTOR, "h2[class='a-size-medium a-spacing-none a-color-base a-text-normal']").text
        #Authors might come with different class 
        try:
            author = result.find_element(By.CSS_SELECTOR, "a[class='a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style']").text
        except Exception as e:
            try:
                author = result.find_elements(By.CSS_SELECTOR, "span[class='a-size-base']")[1].text
            except:
                continue
        try:
            img_url = result.find_element(By.CSS_SELECTOR, "img[class='s-image']").get_attribute("src")
            print(img_url)
        except:
            print('no image')
        #Different prices for different formats. Will be saved as different book results
        # i=0
        
        for detailResult in result.find_elements(By.XPATH, './/a[@class="a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style a-text-bold"]'):
            print(detailResult.text)
            detail = detailResult.text
            link = detailResult.get_attribute("href")
            print(link)
            tempBooks.append({
                "Título": title,
                "Autor": author,
                "Detalle": detail,
                "Cubierta": img_url,
                "Enlace": link,
                "Tienda": "Amazon"
            })
        i=0
        for price in result.find_elements(By.XPATH, './/a[@aria-describedby="price-link"]'):
            print(f"price {i}")
            current_price = re.match(r"(\d+[.]?\d*)", price.text.replace("\n", ".").replace(",", "."))
            if current_price:
                current_price = float(current_price.group(1))
                print(f"current price: {current_price}")
                try:
                    original_price = price.find_element(By.XPATH, './/span[@class="a-price a-text-price"]')
                    original_price = re.match(r"(\d+[.]?\d*)", original_price.text.replace("\n", ".").replace(",", "."))
                    if original_price:
                        original_price = float(original_price.group(1))
                    else:
                        original_price = current_price
                except:
                    original_price = current_price
                print(f"original price: {original_price}")
                if priceList:
                    price = priceList[0]
                    # print(price.text)
                    if len(priceList)>1:
                        originalPrice = priceList[1]
            tempBooks[i]["Precio base"] = original_price
            tempBooks[i]["Precio final"] = current_price
            i+=1
        
           
    except Exception as e:
        print(f"Error al obtener resultados de {url} : {e}")
    i+=1 
books.append(tempBooks)
bookDf = pd.DataFrame(books)
bookDf.head(10)

18
Harry Potter y la piedra filosofal (edición Ravenclaw del 20º aniversario) (Harry Potter 1): Ingenio · Estudio · Sabiduría
https://m.media-amazon.com/images/I/81AaIMWmRRL._AC_UY218_.jpg
Tapa dura
https://www.amazon.es/Harry-Potter-piedra-filosofal-Philosophers/dp/8498388910/ref=sr_1_1?__mk_es_ES=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=2BFZ717XNNAYX&dib=eyJ2IjoiMSJ9.FTsUqtwzrbMe5TeCNNRtGiXS5lQZPpNhHYvvVkwMkRCd4XpKtQNXH1QmF5WlpnvWb7cLB1317BecjcDDpM2jaMdtbidaXBIxpvrfpLM5PjiDot7eXHqoWRTuldfOJxBtEf0bYYPreJsZcHcMBsomzjA_zeELDDPHPKSQH1KokLtWCCAnHpb1looQgzQ503gw3525QtmaccdXkAzU5mkJvQBUgpVBvakJ7MT2fgx2S_8.cLgp7anfNZ2C7WSEsvvHrI2U56i41u6yk3Wh3oVez8Y&dib_tag=se&keywords=piedra+filosofal&qid=1745424937&s=books&sprefix=%2Cstripbooks%2C71&sr=1-1
price 0
current price: 19.76
original price: 21.95
La piedra filosofal de los evangelios a los tratados alquímicos
https://m.media-amazon.com/images/I/81s+GpSGrwL._AC_UY218_.jpg
Tapa blanda
https://www.amazon.es/piedra-filosofal-evangelios-tratados-alqu%C3%A

Unnamed: 0,0
0,"{'Título': 'LOS CUENTOS QUE NUNCA LEÍSTE', 'Au..."


In [86]:
i=0
j=0
for result in results:
    j=0
    i+=1
    print(f"libro {i}")
    for priceResult in result.find_elements(By.XPATH, './/div[@class="a-row a-spacing-mini a-size-base a-color-base"]'):
        j+=1
        print(f"detail {j}")
        print(priceResult.text)
        try:
            for alt in priceResult.find_elements(By.XPATH, './/div[@class="a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style a-text-bold"]'):
                print(alt.text)
        except:
            print('err')

libro 1
detail 1
Tapa dura
libro 2
detail 1
Tapa blanda
libro 3
detail 1
Versión Kindle
libro 4
detail 1
Tapa blanda
libro 5
detail 1
Tapa blanda
libro 6
detail 1
Tapa blanda
libro 7
detail 1
Tapa blanda
err
libro 8
detail 1
Tapa blanda
libro 9
detail 1
Tapa dura
libro 10
detail 1
Tapa dura
libro 11
detail 1
Tapa dura
libro 12
detail 1
Tapa dura
libro 13
detail 1
Tapa blanda
libro 14
detail 1
Tapa dura
libro 15
detail 1
Libro de bolsillo
libro 16
detail 1
Tapa blanda
libro 17
detail 1
Tapa dura
libro 18
detail 1
Versión Kindle


In [109]:

i=0
j=0
for result in results:
    try:
        image_url = result.find_element(By.CSS_SELECTOR, "img[class='s-image']").get_attribute("src")
        print(image_url)
    except:
        print('no image')


https://m.media-amazon.com/images/I/81AaIMWmRRL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81s+GpSGrwL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81DIK77B0PL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/71HhMRl+keL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/716uCTds23L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/91GyF21onCL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61vsjOxkiXL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/71DLq-ZAoHL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81T6V8gqfOL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/816eW3eMz9L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81T-IxMzgZL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/91-m-29l27L._AC_UY218_.jpg
https://m.media-amazon.com/images/I/61+eyKIZDQL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81nE2-8HncL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/81ev-5PzbUL._AC_UY218_.jpg
https://m.media-amazon.com/images/I/51FY1mL2zML._AC_UY2

In [46]:
for allprices in results[9].find_elements(By.XPATH, './/span[@class="a-price"]'):
    print('test')
    print(allprices)
    print (allprices.text)

test
<selenium.webdriver.remote.webelement.WebElement (session="3ed690e8-77cf-40a6-93fa-0e33d5df8e39", element="3b4db4f3-b2ec-4096-bb94-64364b2ad468")>
18
86€
test
<selenium.webdriver.remote.webelement.WebElement (session="3ed690e8-77cf-40a6-93fa-0e33d5df8e39", element="7c023266-2988-48cc-bd55-2176db839d41")>
24
19€


In [None]:
results = driver.find_elements(By.CSS_SELECTOR, "div[data-csa-c-type='item']")
print(len(results))
books = []

for result in results:
    try:
        # Title
        title = result.find_element(By.CSS_SELECTOR, "h2.a-size-medium").text

        # Author (may vary in structure)
        try:
            author = result.find_element(By.CSS_SELECTOR, "a.a-link-normal.s-underline-text").text
        except:
            try:
                author = result.find_elements(By.CSS_SELECTOR, "span.a-size-base")[1].text
            except:
                continue

        # Image
        try:
            img_url = result.find_element(By.CSS_SELECTOR, "img.s-image").get_attribute("src")
        except:
            img_url = ""

        # All detail links
        detail_links = result.find_elements(By.XPATH, './/a[contains(@class,"a-text-bold")]')
        # All prices
        price_elements = result.find_elements(By.XPATH, './/a[@aria-describedby="price-link"]')

        for i in range(max(len(detail_links), len(price_elements))):
            detail = detail_links[i].text if i < len(detail_links) else ""
            link = detail_links[i].get_attribute("href") if i < len(detail_links) else ""

            current_price = None
            original_price = None

            if i < len(price_elements):
                price_text = price_elements[i].text.replace("\n", ".").replace(",", ".")
                match = re.match(r"(\d+[.]?\d*)", price_text)
                if match:
                    current_price = float(match.group(1))
                try:
                    original_text = price_elements[i].find_element(By.XPATH, './/span[@class="a-price a-text-price"]').text
                    match = re.match(r"(\d+[.]?\d*)", original_text.replace("\n", ".").replace(",", "."))
                    if match:
                        original_price = float(match.group(1))
                    else:
                        original_price = current_price
                except:
                    original_price = current_price

            books.append({
                "Título": title,
                "Autor": author,
                "Detalle": detail,
                "Cubierta": img_url,
                "Enlace": link,
                "Tienda": "Amazon",
                "Precio base": original_price,
                "Precio final": current_price
            })

    except Exception as e:
        print(f"Error procesando un resultado: {e}")

# Convert to DataFrame
bookDf = pd.DataFrame(books)
bookDf.head(15)

In [None]:
service = Service(executable_path="C:/Repos/Utils/geckodriver.exe")


In [None]:
options = webdriver.FirefoxOptions()
options.add_argument("-private")
driver = webdriver.Firefox(service=service, options=options)
url = "https://www.amazon.es/"
driver.get(url)

In [None]:
url = 'https://www.casadellibro.com/libro-harry-potter-y-la-piedra-filosofal-edicion-especial-con-cantos-p-intados-harry-potter-1/9788419868282/16710131'
driver.get(url)


In [9]:
current_price = driver.find_element(By.XPATH, "//*[@id='p-pf-f' or @id='p-pmkt-f']").text
try:
    original_price = driver.find_element(By.CSS_SELECTOR, 'p[class="s-5-text"]').text.split('€')[0]
    print(current_price)
except:  
    original_price = current_price

print(original_price)


190,00 €


In [131]:
url = "https://www.amazon.es/"
driver.get(url)
driver.implicitly_wait(5)

# time.sleep(2)  # Wait for dynamic content to load
# acceptCookies = driver.find_element(By.XPATH, '//*[@id="onetrust-accept-btn-handler"]')
# acceptCookies.click()
# time.sleep(1) 
#cookies
try:
    driver.find_element(By.ID, "sp-cc-accept").click()
except:
    print('accepted')
#Category button
driver.find_element(By.ID, "searchDropdownBox").click()
#Select books category
driver.find_element(By.XPATH, "//option[@value='search-alias=stripbooks']").click()
#Find search input and send query
search = driver.find_element(By.XPATH, "//input[@id='twotabsearchtextbox']")
search.send_keys(query)
#Confirm search
driver.find_element(By.XPATH, "//input[@id='nav-search-submit-button']").click()

# #Sort price ascending
driver.find_element(By.CSS_SELECTOR, "span[class='a-dropdown-prompt']").click()
driver.find_element(By.CSS_SELECTOR, "a[id='s-result-sort-select_1']").click()
options.add_argument("-private")
results = driver.find_elements(By.CSS_SELECTOR, "div[data-csa-c-type='item']")

accepted


In [135]:
driver.close()

WebDriverException: Message: Failed to decode response from marionette


In [160]:

results = driver.find_elements(By.CSS_SELECTOR, "div[data-csa-c-type='item']")
# print(datetime.now().time())
print(len(results))
baseUrl = "https://www.amazon.es/"
books = []
driver.implicitly_wait(0)
for result in results:
    tempBooks = []
    try:
        title = result.find_element(By.CSS_SELECTOR, "h2[class='a-size-medium a-spacing-none a-color-base a-text-normal']").text
        #Authors might come with different class 
        try:
            author = result.find_element(By.CSS_SELECTOR, "a[class='a-size-base a-link-normal s-underline-text s-underline-link-text s-link-style']").text
        except Exception as e:
            try:
                author = result.find_elements(By.CSS_SELECTOR, "span[class='a-size-base']")[1].text
            except:
                continue
        try:
            img_url = result.find_element(By.CSS_SELECTOR, "img[class='s-image']").get_attribute("src")
            print(img_url)
        except:
            print('no image')
        #Different prices for different formats. Will be saved as different book results
        # i=0
        # All detail links
        detail_link_List = result.find_elements(By.XPATH, './/a[contains(@class,"a-text-bold")]')
        # All prices
        price_List = result.find_elements(By.XPATH, './/a[@aria-describedby="price-link"]')
        
        for i in range(len(price_List)):
            try:
                
                detail = detail_link_List[i].text
                print(detail)
                if "Kindle" in detail or "Audiolibro" in detail:
                    print(f"skipped with: {detail}")
                    continue
                link = detail_link_List[i].get_attribute("href")
            except:
                detail = "N/A"
                link = "N/A" 
                print('no detail')     
            price_text = price_List[i].text.replace("\n", ".").replace(",", ".")
            match = re.match(r"(\d+[.]?\d*)", price_text)
            if match:
                
                current_price = float(match.group(1))
                print(f"price match: {current_price}")
            try:
                original_text = price_List[i].find_element(By.XPATH, './/span[@class="a-price a-text-price"]').text
                match = re.match(r"(\d+[.]?\d*)", original_text.replace("\n", ".").replace(",", "."))
                if match:
                    original_price = float(match.group(1))
                else:
                    original_price = current_price
            except:
                print(f"origin price exception: {current_price}")
                original_price = current_price
            books.append({
                "Título": title,
                "Autor": author,
                "Detalle": detail,
                "Cubierta": img_url,
                "Enlace": link,
                "Tienda": "Amazon",
                "Precio base": original_price,
                "Precio final": current_price
            })
    except Exception as e:
        print(f"Error procesando un resultado: {e}")
        
bookDf = pd.DataFrame(books)
print(bookDf.head())
 

3
https://m.media-amazon.com/images/I/91R1AixEiLL._AC_UY218_.jpg
Tapa dura
price match: 16.1
Versión Kindle
skipped with: Versión Kindle
Audible Audiolibro
skipped with: Audible Audiolibro
https://m.media-amazon.com/images/I/81VR2yyPi9L._AC_UY218_.jpg
Versión Kindle
skipped with: Versión Kindle
https://m.media-amazon.com/images/I/81NewGv4UHL._AC_UY218_.jpg
Versión Kindle
skipped with: Versión Kindle
                                              Título         Autor    Detalle  \
0  Harry Potter y la piedra filosofal (Harry Pott...  J.K. Rowling  Tapa dura   

                                            Cubierta  \
0  https://m.media-amazon.com/images/I/91R1AixEiL...   

                                              Enlace  Tienda  Precio base  \
0  https://www.amazon.es/Harry-Potter-Piedra-Filo...  Amazon        16.95   

   Precio final  
0          16.1  


In [180]:

for resultWindow in driver.find_elements(By.XPATH, "//div[contains(@data-cel-widget,'search_result')]"):
    print(resultWindow.text)
    if 'No hay resultados para' in resultWindow.text:
        print('salio')

Piezas en las que puede confiar
Compra en la Store de RIDEX en Amazon 
Filtros
Frenos
Correas, Cadenas, Rodillos
Patrocinado
Mostrando resultados de Todos los departamentos
No hay resultados para 12435345 en Libros
salio
Resultados
Más información sobre estos resultados. Consulta la página del producto para ver otras opciones de compra.
Febi Bilstein 36745 Interruptores
214
6
80€
Recomendado:
11,99€
Entrega GRATIS el dom, 27 de abr en tu primer pedido
Entrega más rápida mañana, 25 de abr
Añadir a la cesta
Más opciones de compra
4,84 €(7 nuevas ofertas)
Febi 26274 Limpiaparabrisas
431
10+ comprados el mes pasado
19
90€
Entrega GRATIS el dom, 27 de abr en tu primer pedido
Entrega más rápida mañana, 25 de abr
Añadir a la cesta
Más opciones de compra
11,50 €(4+ ofertas usadas y nuevas)
Bosch P3353 - Filtro de aceite para vehículos
13
9
90€
Ahorra con Suscríbete y ahorra
Entrega GRATIS el dom, 27 de abr en tu primer pedido
Entrega más rápida mañana, 25 de abr
Añadir a la cesta
Más opciones 

In [3]:

from RetoExtraUtils import getBooksCasaLibro

books = getBooksCasaLibro("piedra filosofal", 3 ,False)
print(books.head(10))


HOLA
Con stock (17)
no click
Agotado (16)
no click
Descatalogado (11)
no click
Disponible (8)
click
https://www.casadellibro.com/libro-la-piedra-filosofal-de-j-obleman-historia-de-un-doctor-que-ha-resuelto-el-problema-de-vivir-sin-comer/9788494435140/2772441
14,50 
1
Error al obtener resultados de https://www.casadellibro.com : Message: The element with the reference cf11c1d4-0db2-43db-868e-d1faa0e77dc3 is stale; either its node document is not the active document, or it is no longer connected to the DOM; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#stale-element-reference-exception
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:199:5
StaleElementReferenceError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:796:5
getKnownElement@chrome://remote/content/marionette/json.sys.mjs:405:11
deserializeJSON@chro

In [26]:
driver.close()

WebDriverException: Message: Failed to write request to stream


In [4]:
import pandas as pd

In [12]:
query = 'piedra filosofal'
bookLimit = 5
books = []
bookDf = pd.DataFrame()
baseUrl = 'https://www.ebay.es/'
queryUrl = query.strip().replace(" ","+")
url = f"{baseUrl}sch/i.html?_nkw={queryUrl}&_sacat=267&_from=R40&_trksid=p2334524.m570.l1313&rt=nc&_odkw=9788478884452&_osacat=267&_sop=15&LH_ItemCondition=1000"
page = requests.get(url)
time.sleep(1)
soup = BeautifulSoup(page.content,"html.parser")
results = soup.css.select('li[data-marko-key^="0 s0-55-0-9-8-4-4-0-3-0-4"]', limit=bookLimit)
i = 0
for result in results:
    i+=1
    print(i)
    try:
        # Title
        name_element = result.find("div", class_="s-item__title")
        title = name_element.get_text().strip()
        print(title)

        #Author
        author = 'N/A'
        print(author)
        detail_element = result.find("div", class_="s-item__subtitle")
        
        
        detail = detail_element.get_text().strip()
        if "|" in detail:
            detail = detail.split("|")[0]
        shipping_element = result.find("span", class_="s-item__shipping s-item__logisticsCost")
        if "EUR" in shipping_element.get_text():
            detail = detail +'. '+ shipping_element.get_text().replace(r"[^\d.,]", "", regex=True).replace(",",".") + '€ envío'
        # Price 
        try:
            price_element = result.find("span", class_="s-item__price")
            current_price = price_element.get_text().strip()
            original_price = current_price
        except:
            #Book discarded if no price
            continue
        print(current_price)
        # Image
        try:
            img_element = result.find("div", class_="s-item__image-wrapper image-treatment")
            img_url = img_element.find('img')["src"]
        except e:
            img_url = "N/A"
        print(img_url)
        #Link
        try:
            link = result.find("a", class_="s-item__link")["href"]
            print(link)
        except:
            link = "N/A"

        books.append({
            "Título": title,
            "Autor": author,
            "Detalle": detail,
            "Precio base": original_price,
            "Precio final": current_price,
            "Cubierta": img_url,
            "Enlace": link,
            "Tienda": "Librería central"
        })
        print(len(books))
        bookDf = pd.DataFrame(books)
        if len(books) == bookLimit:
            break
    except Exception as e:
        print(f"Error al obtener resultados de {baseUrl} : {e}")
        print(f"Error al obtener resultados de {baseUrl} : {e.args}")
# return bookDf.truncate(after=bookLimit)
print(f'tienda 2 size: {bookDf.shape[0]}')

1
La Piedra Filosofal
N/A
5,99 EUR
https://i.ebayimg.com/images/g/deAAAOSwGidnM0zk/s-l500.jpg
https://www.ebay.es/itm/126773751698?_skw=piedra+filosofal&hash=item1d844def92:g:deAAAOSwGidnM0zk&itmprp=enc%3AAQAKAAAA0FkggFvd1GGDu0w3yXCmi1c%2FicS5u5AR81qcSAFjnTZq3ECOQDWRFJvkgYINwPpjH2fB5z6Eft9svrvrMtf463PHoPBlVCkTkA7gvbJ9Hw5CtPJL%2B5D8C5VzlFmr1qS2wiPuVS7mqM01s%2Bb5qXRMCA64vXY%2BkclJBixmQN%2FHM%2F5HsJZniBCFqYF5v%2FF1%2Bf4p4st%2BX7tx%2BO3q7yTis%2Ftmqav92JzBJpGivcd0nLVlSvnwOSp1wY%2B9GGB3AHvYcyyJxDtD5P3ve9xB8LYTZlJCYAU%3D%7Ctkp%3ABk9SR6Lv3ZrNZQ
1
2
Harry Potter y la Piedra Filosofal
N/A
5,99 EUR
https://i.ebayimg.com/images/g/8r0AAeSwOsln8dvv/s-l500.jpg
https://www.ebay.es/itm/297184712730?_skw=piedra+filosofal&hash=item453196d81a:g:8r0AAeSwOsln8dvv&itmprp=enc%3AAQAKAAAA0FkggFvd1GGDu0w3yXCmi1cagRN4im--odl23r5C6B61tJTH5oEbKlsxW4srLFGVYorTFZlVleQY79Jv9aeAJFBMctMwfTyjSCmR6eoXFnA02F8NMsMEhfGIElmZWDcMDnb6oXdGRYLJHNIIbuIMt%2FOqCruKQSoY9l5kI0MMvDy2CE%2BhNvHiXCXO2V%2FDcAriZ%2Bh4zVaeCbSBU6PmTnwMoz%2F8X

In [28]:
options = webdriver.FirefoxOptions()
options.add_argument("-private")
driver = webdriver.Firefox(service=service, options=options)

In [27]:
driver.close()

In [27]:
query = 'filosofal'

In [None]:
books = []
bookDf = pd.DataFrame()

url = "https://www.elcorteingles.es/"
driver.get(url)
driver.implicitly_wait(2)
try:
    driver.find_element(By.ID, "onetrust-accept-btn-handler").click()
except:
    print("no cookies")




In [24]:
driver.get(url)

In [43]:
driver.switch_to.default_content()

In [18]:
# driver.switch_to.frame(driver.find_element(By.CSS_SELECTOR,'iframe[id="INDdefaultPropIframe"]'))
search = driver.find_element(By.ID, "searchBoxBtn")
print(search.text)
search.click()

# search.send_keys(query)
# searchButton = driver.find_element(By.CSS_SELECTOR, 'svg[class=search-link__icon]').click()
# driver.switch_to.default_content()

¿Qué estás buscando?


In [16]:
from selenium.webdriver.common.keys import Keys

In [19]:
searchButton = driver.find_element(By.CSS_SELECTOR, 'svg[class=search-link__icon]').click()

NoSuchElementException: Message: Unable to locate element: svg[class=search-link__icon]; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:199:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:552:5
dom.find/</<@chrome://remote/content/shared/DOM.sys.mjs:136:16


In [20]:
query= 'filosofal'
search_2 = driver.find_element(By.CSS_SELECTOR, 'input[class="search-bar__input"]')
search_2.send_keys(query)


In [21]:
time.sleep(0.5)
search_2.send_keys(Keys.ENTER)

In [12]:
search_accept = driver.find_element(By.CSS_SELECTOR, 'button[data-synth="LOCATOR_SEARCH_BUTTON"]').click()

In [29]:
url = "https://www.elcorteingles.es/search-nwx/1/?s=piedra+filosofal&stype=text_box&sorting=priceAsc"
driver.get(url)