# Curso de Web Scraping

<img src="https://yaelmanuel.com/wp-content/uploads/2021/12/platzi-banner-logo-matematicas.png" width="500px">

---

## 0) Dependencias

In [21]:
!bash "/content/setup.sh"

In [22]:
import time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

## 1) Configuración del Servicio y Driver

In [23]:
# Ruta al chromedriver (si no está en el PATH)
chrome_path = "/content/chrome/chrome"
chromedriver_path = "/content/chromedriver"

# Opcional: configurar opciones
options = Options()
options.add_argument("--headless")
options.add_argument("--start-maximized")  # Pantalla completa
options.add_argument("--no-sandbox")
options.add_argument("--disable-blink-features=AutomationControlled")  # Menos detectable

options.binary_location = chrome_path

# Crear el servicio y el driver
service = Service(executable_path=chromedriver_path)
driver = webdriver.Chrome(service=service, options=options)

## 2) Definir url

In [28]:
url = "http://quotes.toscrape.com/scroll"
driver.get(url)



MaxRetryError: HTTPConnectionPool(host='localhost', port=57951): Max retries exceeded with url: /session/734947d48ca04d8b6297e45b8869b8bd/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7e70ae5bfdd0>: Failed to establish a new connection: [Errno 111] Connection refused'))

## 3) Implementación

In [25]:
SCROLL_PAUSE_TIME = 2 # seconds

In [27]:
last_height = driver.execute_script("return document.body.scrollHeight")
quotes_set = set()

steps = 5

# Desplazarse varias veces (limitando hasta 3 iteraciones)
for i in range(steps):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(SCROLL_PAUSE_TIME)
    new_height = driver.execute_script("return document.body.scrollHeight")

    # Extraer las nuevas frases cargadas en esta iteración
    quotes = driver.find_elements(By.CLASS_NAME, "quote")
    for quote in quotes:
        text = quote.find_element(By.CLASS_NAME, "text").text
        quotes_set.add(text)

    if new_height == last_height:
        break

    last_height = new_height

driver.quit()



MaxRetryError: HTTPConnectionPool(host='localhost', port=57951): Max retries exceeded with url: /session/734947d48ca04d8b6297e45b8869b8bd/execute/sync (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7e70ae5bda90>: Failed to establish a new connection: [Errno 111] Connection refused'))

In [29]:
print(f"Total de frases únicas cargadas: {len(quotes_set)}")
for quote in quotes_set:
    print(quote)

Total de frases únicas cargadas: 40
“To the well-organized mind, death is but the next great adventure.”
“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, becau