# Web Scraping Exercise

## 1. Introduction and Planning

### Objective:
The goal of this exercise is to build a web scraper that collects data from a chosen website. You will learn how to send HTTP requests, parse HTML content, extract relevant data, and store it in a structured format.

### Tasks:
1. Identify the data you want to scrape.
2. Choose the target website(s).
3. Plan the structure of your project.

### Example:
For this exercise, we will scrape job listings from Indeed.com. We will extract job titles, company names, locations, and job descriptions.

## 2. Understanding the Target Website
### Objective:

Analyze the structure of the web pages to be scraped.
### Tasks:

* Inspect the target website using browser developer tools.
* Identify the HTML elements that contain the desired data.

### Instructions:

* Open your browser and navigate to the target website (e.g., allrecipes.com).
* Right-click on the webpage and select "Inspect" or press Ctrl+Shift+I.
* Use the developer tools to explore the HTML structure of the webpage.
* Identify the tags and classes of the elements that contain the job titles, company names, locations, and descriptions.

## 3. Writing the Scraper
### Objective:

Develop the code to scrape data from the target website.
### Tasks:

* Send HTTP requests to the target website.
* Parse the HTML content and extract the required data.
* Handle pagination to scrape data from multiple pages.
* Implement error handling.

# Instalacion de selenium y sus dependencias

In [None]:
%pip install selenium

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


# Librerias

In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time



## Configuración de opciones para Selenium

In [None]:
# Configuración de opciones para Selenium
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')

# Ruta del driver de chrome

In [None]:
chrome_driver_path = 'D:/SEPTIMO SEMESTRE II/RI/KevinMaldonado99/RETRIEVAL INFO/week12/chromedriver.exe'

# Crear una instancia del navegador

In [None]:
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service, options=options)

# Navegacion a la página web

In [None]:

url = 'https://www.allrecipes.com/recipes/17562/dinner/'
driver.get(url)

# Esperar un momento para asegurarse de que la página ha cargado completamente
time.sleep(5)

# Obtiene el contenido de la pagina para luego parsearlo

In [None]:
# Obtener el contenido de la página
page_source = driver.page_source

# Parsear el contenido HTML de la página
soup = BeautifulSoup(page_source, 'html.parser')


# Lee y encuentra todas las recetas disponibles de la pagina

In [None]:
# Encontrar las recetas en la página principal

recipes = soup.find_all('div', class_='comp mntl-taxonomysc-article-list-group mntl-block')

In [None]:
# Guardar las recetas en un csv
import csv

# Lista e itera todas las recetas para ser extraidas

In [None]:
with open('recipes.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'Description', 'Ingredients', 'Directions', 'Link'])

    # Iterar sobre las recetas y extraer la información
    for recipe in recipes:
        # Obtener todos los links de la receta para obtener los ingredientes
        links = recipe.find_all('a', class_='comp mntl-card-list-items mntl-document-card mntl-card card card--no-image')
        for link_element in links:
            link = link_element['href']

            # Navegar a la página de cada receta
            driver.get(link)
            time.sleep(5)  # Esperar a que la página cargue

            # Obtener el contenido de la página de la receta
            recipe_page_source = driver.page_source
            recipe_soup = BeautifulSoup(recipe_page_source, 'html.parser')

            # Obtener el titulo de la receta
            titles = recipe_soup.find_all('h1', class_='article-heading type--lion')
            recipe_title = titles[0].text.strip() if titles else 'No Title'
            print(f'Título: {recipe_title}')

            # Obtener la descripción de la receta
            descriptions = recipe_soup.find_all('p', class_='article-subheading type--dog')
            description_text = descriptions[0].text.strip() if descriptions else 'No Description'
            print(f'Descripción: {description_text}')

            # Obtener los ingredientes
            ingredients = recipe_soup.find_all('li', class_='mm-recipes-structured-ingredients__list-item')
            ingredient_list = [ingredient.text.strip() for ingredient in ingredients]
            print('Ingredientes:')
            for ingredient in ingredient_list:
                print(f'- {ingredient}')

            # Obtener las instrucciones
            directions = recipe_soup.find_all('p', class_='comp mntl-sc-block mntl-sc-block-html')
            direction_list = [direction.text.strip() for direction in directions]
            print('Instrucciones:')
            for direction in direction_list:
                print(f'- {direction}')

            print('\n' + '='*40 + '\n')
            # Escribir la información de la receta en el archivo CSV
            writer.writerow([recipe_title, description_text, ingredient_list, direction_list, link])



Título: 7 Chicken Thigh Dinners for Every Night of the Week
Descripción: Here's the answer to "What's for dinner?"
Ingredientes:
Instrucciones:
- Chicken thighs are cheap, available at pretty much any grocery store, and easy to cook—what's not to love about the versatile meat counter staple? Use chicken thighs to make everything from quick and easy slow cooker meals for busy weeknights to impressive, dinner party-worthy dishes for weekend get-togethers.
- "Calling all garlic lovers! This creamy garlic chicken is an easy, fragrant skillet dish that is great with mashed potatoes or mashed cauliflower. Since it cooks in one pan, cleanup is easy, too." —Brenda Venable
- "These harissa honey chicken thighs are a spicy-sweet delight, rubbed in harissa paste and honey, then baked to perfection. Adjust the amount of harissa paste to suit your spice preference." —France Cevallos
- "This quick and easy recipe for homemade chicken teriyaki hits the spot. Serve over rice." —Nicole McLaughlin
- "Tr

# Cierre el archivo csv con su informacion

In [None]:
# Cerrar el csv
file.close()

# Finalizar el navegador


In [None]:
# Cerrar el navegador
driver.quit()

# Lectura del archivo csv con la informacion guardada de las recetas

In [None]:
# Leer el archivo CSV
import pandas as pd

df_recipes = pd.read_csv('recipes.csv')
df_recipes.head()

Unnamed: 0,Title,Description,Ingredients,Directions,Link
0,7 Chicken Thigh Dinners for Every Night of the...,"Here's the answer to ""What's for dinner?""",[],"[""Chicken thighs are cheap, available at prett...",https://www.allrecipes.com/chicken-thigh-dinne...
1,Trout Tacos,These trout tacos take advantage of trout's mi...,"['16 white corn tortillas', '1/2 pound purple ...",['Preheat the oven to 400 degrees F (200 degre...,https://www.allrecipes.com/trout-tacos-recipe-...
2,7 Rotisserie Chicken Dinners for Every Night o...,Meal prep made easy.,[],['Rotisserie chicken is the dinner shortcut to...,https://www.allrecipes.com/rotisserie-chicken-...
3,7 Chicken Casseroles for Every Night of the Week,You can't go wrong with a chicken casserole.,[],"[""Can't decide what's for dinner? A chicken ca...",https://www.allrecipes.com/chicken-casseroles-...
4,7 Ground Beef Dinners for Every Night of the Week,Consider dinner handled.,[],"[""If you have a pound of ground beef, your din...",https://www.allrecipes.com/ground-beef-recipes...


# Listado de ingredientes por recetas que se capturaron o recuperaron de la URL dada

In [None]:
df = df_recipes['Ingredients']
df

0                                                    []
1     ['16 white corn tortillas', '1/2 pound purple ...
2                                                    []
3                                                    []
4                                                    []
5     ['1/3 cup honey', '1/4 cup lower-sodium soy sa...
6                                                    []
7     ['1/4 cup low-sodium soy sauce', '3 tablespoon...
8                                                    []
9     ['4 thick-cut bacon slices', '2 (8 ounce) bone...
10    ['canola oil, for frying', '4 (4-ounce) skinle...
11    ['1 lemon, divided', '1 tablespoon olive oil, ...
12    ['1 pound red potatoes', '1 pound carrots', '1...
13    ['8 ounces fettuccine, uncooked', '2 cups fres...
14    ['2 tablespoons olive oil', '4 boneless pork c...
15    ['1 cup chopped English cucumber', '1 tablespo...
16    ['1 tablespoon brown sugar', '1 tablespoon Mon...
17    ['2 tablespoons vegetable oil', '3 medium 