NOMBRE: RAQUEL ZUMBA

# Web Scraping Exercise

## 1. Introduction and Planning

### Objective:
The goal of this exercise is to build a web scraper that collects data from a chosen website. You will learn how to send HTTP requests, parse HTML content, extract relevant data, and store it in a structured format.

### Tasks:
1. Identify the data you want to scrape.
2. Choose the target website(s).
3. Plan the structure of your project.

### Example:
For this exercise, we will scrape job listings from Indeed.com. We will extract job titles, company names, locations, and job descriptions.

## 2. Understanding the Target Website
### Objective:

Analyze the structure of the web pages to be scraped.
### Tasks:

* Inspect the target website using browser developer tools.
* Identify the HTML elements that contain the desired data.

### Instructions:

* Open your browser and navigate to the target website (e.g., allrecipes.com).
* Right-click on the webpage and select "Inspect" or press Ctrl+Shift+I.
* Use the developer tools to explore the HTML structure of the webpage.
* Identify the tags and classes of the elements that contain the job titles, company names, locations, and descriptions.

## 3. Writing the Scraper
### Objective:

Develop the code to scrape data from the target website.
### Tasks:

* Send HTTP requests to the target website.
* Parse the HTML content and extract the required data.
* Handle pagination to scrape data from multiple pages.
* Implement error handling.

### Importacion de librerias

In [2]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

In [1]:
%pip install selenium

Collecting seleniumNote: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip



  Downloading selenium-4.23.0-py3-none-any.whl.metadata (7.1 kB)
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.26.0-py3-none-any.whl.metadata (8.8 kB)
Collecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.11.1-py3-none-any.whl.metadata (4.7 kB)
Collecting websocket-client==1.8.0 (from selenium)
  Downloading websocket_client-1.8.0-py3-none-any.whl.metadata (8.0 kB)
Collecting attrs>=23.2.0 (from trio~=0.17->selenium)
  Downloading attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting outcome (from trio~=0.17->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Collecting h11<1,>=0.9.0 (from wsproto>=0.14->trio-websocket~=0.9->selenium)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading selenium-4.23.0-py3-none-any.whl (9.4 MB)
   ----------------------------------------

# 1 Send HTTP requests to the target websiteng.

In [4]:
# Creamos una instancia de la clase Options, la cual nos permite configurar opciones para el navegador Chrome
options = Options()
# Agregamos el argumento '--headless' para que el navegador se ejecute en modo headless (sin interfaz gráfica)
options.add_argument('--headless')
# Agregamos el argumento '--disable-gpu' para desactivar la aceleración por GPU (Gráfica) en el navegador
options.add_argument('--disable-gpu')

### Especificar la ruta del ejecutable ChromeDriver

In [22]:
chrome_driver_path = 'C:/Users/Dell/RI-Exercise/TASK12/chromedriver.exe'

### Crear una instancia del navegador

In [23]:
# Crear una instancia del navegador
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service, options=options)

### Navegar en la pagina web

In [24]:
# Navegar a la página web
url = 'https://www.allrecipes.com/recipes/'
# Utilizamos el controlador del navegador para navegar a la URL especificada
driver.get(url)
# Esperar un momento para asegurarse de que la página ha cargado completamente
time.sleep(5)

### Obtener el contenido de la página

In [25]:
# Obtener el contenido de la página
page_source = driver.page_source

# Parsear el contenido HTML de la página
soup = BeautifulSoup(page_source, 'html.parser')


In [26]:
# Encontrar las recetas en la página principal
recipes = soup.find_all('div', class_='comp mntl-taxonomysc-article-list-group mntl-block')

# 2 Parse the HTML content and extract the required data

In [27]:
# Guardar las recetas en un csv
import csv    

### Abrir un archivo csv donde se va a gurdar los atributos

In [28]:
# Abrimos un archivo CSV para escribir en él, especificando el modo de escritura, newline y codificación UTF-8
with open('recipes.csv', 'w', newline='', encoding='utf-8') as file:
    # Creamos un objeto writer para escribir en el archivo CSV
    writer = csv.writer(file)
    # Escribimos la fila de cabecera en el archivo CSV
    writer.writerow(['Title', 'Description', 'Ingredients', 'Directions', 'Link'])
    
    # Iterar sobre las recetas y extraer la información
    for recipe in recipes:
        # Obtener todos los links de la receta para obtener los ingredientes
        links = recipe.find_all('a', class_='comp mntl-card-list-items mntl-document-card mntl-card card card--no-image')
        for link_element in links:
            link = link_element['href']
            
            # Navegar a la página de cada receta
            driver.get(link)
            time.sleep(5)  # Esperar a que la página cargue

            # Obtener el contenido de la página de la receta
            recipe_page_source = driver.page_source
            recipe_soup = BeautifulSoup(recipe_page_source, 'html.parser')

            # Obtener el titulo de la receta
            titles = recipe_soup.find_all('h1', class_='article-heading type--lion')
            recipe_title = titles[0].text.strip() if titles else 'No Title'
         
            # Obtener la descripción de la receta
            descriptions = recipe_soup.find_all('p', class_='article-subheading type--dog')
            description_text = descriptions[0].text.strip() if descriptions else 'No Description'
          
            # Obtener los ingredientes
            ingredients = recipe_soup.find_all('li', class_='mm-recipes-structured-ingredients__list-item')
            ingredient_list = [ingredient.text.strip() for ingredient in ingredients]
        
            # Obtener las instrucciones
            directions = recipe_soup.find_all('p', class_='comp mntl-sc-block mntl-sc-block-html')    
            direction_list = [direction.text.strip() for direction in directions]
            
            # Escribir la información de la receta en el archivo CSV
            writer.writerow([recipe_title, description_text, ingredient_list, direction_list, link])
        


In [29]:
# Cerrar el csv
file.close()

In [30]:
# Cerrar el navegador
driver.quit()

### Leer el archivo CSV

In [31]:
# Leer el archivo CSV
import pandas as pd 
# leer el archivo csv
df_recipes = pd.read_csv('recipes.csv')
# Muestra las primeras filas del dataframe
df_recipes.head()

Unnamed: 0,Title,Description,Ingredients,Directions,Link
0,Ginger Chicken,Ginger chicken is a Chinese dish typically mad...,"['1/4 cup packed light brown sugar', '3 tables...","['Gather all ingredients.', 'Whisk together br...",https://www.allrecipes.com/ginger-chicken-reci...
1,Breakfast Nachos,These breakfast nachos are a play on steak and...,['3/4 cup packed fresh cilantro leaves and ten...,"['Gather all ingredients.', 'For chimichurri: ...",https://www.allrecipes.com/breakfast-nachos-re...
2,"Spinach, Feta, and Rice Casserole","This spinach, feta, and rice casserole is a pe...","['1 cup thinly sliced scallions', '1/4 cup dra...",['Gather all ingredients. Preheat the oven to ...,https://www.allrecipes.com/spinach-feta-and-ri...
3,Zucchini Boats,"These zucchini boats, with fork-tender zucchin...","['4 (8 ounce) zucchini', '1 teaspoon dried Ita...","['Gather the ingredients.', 'Preheat the oven ...",https://www.allrecipes.com/zucchini-boats-reci...
4,Crispy Cheesy Chicken Ranch Patties,"These crispy cheesy chicken ranch patties, mad...","['1 1/2 pounds skinless, boneless chicken brea...","['Combine chicken, eggs, onion, parsley, mozza...",https://www.allrecipes.com/crispy-cheesy-chick...


### Muestra los titulos de las recetas 

In [38]:
df = df_recipes['Title']
df

0                                        Ginger Chicken
1                                      Breakfast Nachos
2                     Spinach, Feta, and Rice Casserole
3                                        Zucchini Boats
4                   Crispy Cheesy Chicken Ranch Patties
5                                     Peach Custard Pie
6                                          Cajun Caviar
7                 Creamy Mississippi Chicken Enchiladas
8                         Grilled Watermelon Feta Pizza
9                                    Chili Bean Chicken
10                       Everything Bagel Roast Chicken
11                              Country Peach Dumplings
12                           Chocolate Chipless Cookies
13                                  Coffee-Rubbed Steak
14                             Bang Bang Chicken Kebabs
15                         Shrimp and Bacon Pasta Salad
16             Sheet Pan Halibut with Orange and Fennel
17                                Wild Raspberry

### Muestra las descripciones de las recetas

In [32]:
df = df_recipes['Description']
df

0     Ginger chicken is a Chinese dish typically mad...
1     These breakfast nachos are a play on steak and...
2     This spinach, feta, and rice casserole is a pe...
3     These zucchini boats, with fork-tender zucchin...
4     These crispy cheesy chicken ranch patties, mad...
5     This peach custard pie, a cross between peach ...
6     This Cajun caviar will be your next favorite p...
7     These creamy Mississippi chicken enchiladas ta...
8     This grilled watermelon feta pizza is your sho...
9     This chili bean chicken, a savory casserole of...
10    Everything bagel seasoning is an all-purpose, ...
11    For these country peach dumplings, we used fre...
12                    It has the chocolate on the side.
13    This coffee-rubbed steak uses coffee grounds i...
14    For these bang bang chicken kebabs, combine ma...
15    This shrimp and bacon pasta salad is a simple,...
16    This sheet pan halibut with orange and fennel ...
17    This wild raspberry mousse is light as air

### Muestra los ingredientes de las recetas

In [33]:
df = df_recipes['Ingredients']
df

0     ['1/4 cup packed light brown sugar', '3 tables...
1     ['3/4 cup packed fresh cilantro leaves and ten...
2     ['1 cup thinly sliced scallions', '1/4 cup dra...
3     ['4 (8 ounce) zucchini', '1 teaspoon dried Ita...
4     ['1 1/2 pounds skinless, boneless chicken brea...
5     ['1 ready-to-bake single pie crust shell', '2 ...
6     ['6 ounces Monterey Jack cheese, shredded', '6...
7     ['2 pounds skinless, boneless chicken breasts'...
8     ['1/2 red onion\xa0thinly sliced', '1/4 cup re...
9     ['1 cup roasted tomato salsa', '1/2 cup ketchu...
10    ['4 whole leg quarters, with skin', '1/4 cup o...
11    ['2 large fresh peaches, halved and pitted', '...
12    ['3/4 cup unsalted butter, at room temperature...
13    ['1/4 cup finely ground coffee', '2 tablespoon...
14    ['1 1/2 pounds skinless, boneless chicken brea...
15    ['1 pound your favorite pasta', '6 slices baco...
16    ['1 pound fennel, trimmed and cut into 1/2-inc...
17    ['3 cups fresh black raspberries', '1/2 cu

### Muestra las direcciones de  las recetas 

In [34]:
df = df_recipes['Directions']
df

0     ['Gather all ingredients.', 'Whisk together br...
1     ['Gather all ingredients.', 'For chimichurri: ...
2     ['Gather all ingredients. Preheat the oven to ...
3     ['Gather the ingredients.', 'Preheat the oven ...
4     ['Combine chicken, eggs, onion, parsley, mozza...
5     ['To blind bake the crust (optional): Chill pa...
6     ['Combine Monterey Jack cheese, Cheddar cheese...
7     ['Place chicken, ranch dressing mix, pepperonc...
8     ['Combine onions, vinegar, sugar, and salt in ...
9     ['Preheat the oven to 375 degrees F (190 degre...
10    ['Everything bagel seasoning makes it easy to ...
11    ['Preheat the oven to 350 degrees F (175 degre...
12    ["I'll say it: chocolate chip cookies are way ...
13    ['Combine coffee, brown sugar, chili powder, g...
14    ['Combine chicken breast pieces, buttermilk, a...
15    ['Fill a large pot with lightly salted water a...
16    ['Preheat oven to 425 degrees F (220 degrees C...
17    ['Place raspberries, maple syrup, lemon ju