# Web Scraping Exercise

## 1. Introduction and Planning

### Objective:
The goal of this exercise is to build a web scraper that collects data from a chosen website. You will learn how to send HTTP requests, parse HTML content, extract relevant data, and store it in a structured format.

### Tasks:
1. Identify the data you want to scrape.
2. Choose the target website(s).
3. Plan the structure of your project.

### Example:
For this exercise, we will scrape job listings from Indeed.com. We will extract job titles, company names, locations, and job descriptions.

## 2. Understanding the Target Website
### Objective:

Analyze the structure of the web pages to be scraped.
### Tasks:

* Inspect the target website using browser developer tools.
* Identify the HTML elements that contain the desired data.

### Instructions:

* Open your browser and navigate to the target website (e.g., allrecipes.com).
* Right-click on the webpage and select "Inspect" or press Ctrl+Shift+I.
* Use the developer tools to explore the HTML structure of the webpage.
* Identify the tags and classes of the elements that contain the job titles, company names, locations, and descriptions.

## 3. Writing the Scraper
### Objective:

Develop the code to scrape data from the target website.
### Tasks:

* Send HTTP requests to the target website.
* Parse the HTML content and extract the required data.
* Handle pagination to scrape data from multiple pages.
* Implement error handling.

In [7]:
%pip install selenium
%pip install beautifulsoup4

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Collecting beautifulsoup4
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4)
  Downloading soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)
Downloading beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Downloading soupsieve-2.6-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.12.3 soupsieve-2.6
Note: you may need to restart the kernel to use updated packages.


In [8]:


from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

In [9]:
# Configuración de opciones para Selenium
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')

In [14]:
chrome_driver_path = 'C:\\Users\\Madelyn\\Desktop\\chromedriver.exe'

In [15]:
# Crear una instancia del navegador
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service, options=options)

In [16]:
# Navegar a la página web
url = 'https://www.allrecipes.com/recipes/'
driver.get(url)

# Esperar un momento para asegurarse de que la página ha cargado completamente
time.sleep(5)

In [17]:
# Obtener el contenido de la página
page_source = driver.page_source

# Parsear el contenido HTML de la página
soup = BeautifulSoup(page_source, 'html.parser')


In [18]:
# Encontrar las recetas en la página principal
recipes = soup.find_all('div', class_='comp mntl-taxonomysc-article-list-group mntl-block')

In [19]:
# Guardar las recetas en un csv
import csv    

In [20]:
with open('recipes.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'Description', 'Ingredients', 'Directions', 'Link'])
    
    # Iterar sobre las recetas y extraer la información
    for recipe in recipes:
        # Obtener todos los links de la receta para obtener los ingredientes
        links = recipe.find_all('a', class_='comp mntl-card-list-items mntl-document-card mntl-card card card--no-image')
        for link_element in links:
            link = link_element['href']
            
            # Navegar a la página de cada receta
            driver.get(link)
            time.sleep(5)  # Esperar a que la página cargue

            # Obtener el contenido de la página de la receta
            recipe_page_source = driver.page_source
            recipe_soup = BeautifulSoup(recipe_page_source, 'html.parser')

            # Obtener el titulo de la receta
            titles = recipe_soup.find_all('h1', class_='article-heading type--lion')
            recipe_title = titles[0].text.strip() if titles else 'No Title'
            print(f'Título: {recipe_title}')

            # Obtener la descripción de la receta
            descriptions = recipe_soup.find_all('p', class_='article-subheading type--dog')
            description_text = descriptions[0].text.strip() if descriptions else 'No Description'
            print(f'Descripción: {description_text}')

            # Obtener los ingredientes
            ingredients = recipe_soup.find_all('li', class_='mm-recipes-structured-ingredients__list-item')
            ingredient_list = [ingredient.text.strip() for ingredient in ingredients]
            print('Ingredientes:')
            for ingredient in ingredient_list:
                print(f'- {ingredient}')

            # Obtener las instrucciones
            directions = recipe_soup.find_all('p', class_='comp mntl-sc-block mntl-sc-block-html')    
            direction_list = [direction.text.strip() for direction in directions]
            print('Instrucciones:')
            for direction in direction_list:
                print(f'- {direction}')
                
            print('\n' + '='*40 + '\n')  
            # Escribir la información de la receta en el archivo CSV
            writer.writerow([recipe_title, description_text, ingredient_list, direction_list, link])
        


Título: Tomato Pie Dip
Descripción: "Very creamy and so flavorful!"
Ingredientes:
- cooking spray
- 3 red beefsteak tomatoes, cored
- 1 teaspoon kosher salt, divided
- 1 (8-ounce) package cream cheese, softened
- 1/2 cup mayonnaise
- 1/2 cup chopped Vidalia onion (from 1 small [5 ounce] onion)
- 1/4 cup loosely packed fresh basil leaves, cut into thin slices, plus more for garnish
- 1/2 cup shredded low-moisture part-skim mozzarella cheese
- 1/4 cup freshly grated Parmesan cheese
- 1 1/2 teaspoons hot sauce (such as Tabasco)
- 1 teaspoon Worcestershire sauce
- 1/2 teaspoon onion powder
- 1/4 teaspoon freshly ground black pepper
- 1 cup shredded sharp Cheddar cheese
- crostini toast
Instrucciones:
- A Southern classic reimagined, this irresistible Tomato Pie Dip is perfect for any occasion of the season. With layers of ripe, juicy tomatoes, fresh basil, and a creamy three-cheese spread, this crowd-pleasing dip brings all the comfort of a traditional tomato pie straight to your appetizer

In [21]:
# Cerrar el csv
file.close()

In [25]:
# Cerrar el navegador
driver.quit()
%pip install pandas

^C
Note: you may need to restart the kernel to use updated packages.


In [26]:
# Leer el archivo CSV
import pandas as pd 

df_recipes = pd.read_csv('recipes.csv')
df_recipes.head()

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-2.2.2-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.0.1-cp312-cp312-win_amd64.whl.metadata (60 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.2-cp312-cp312-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ------------------------ --------------- 7.1/11.5 MB 39.9 MB/s eta 0:00:01
   ---------------------------------------- 11.5/11.5 MB 31.3 MB/s eta 0:00:00
Downloading numpy-2.0.1-cp312-cp312-win_amd64.whl (16.3 MB)
   ---------------------------------------- 0.0/16.3 MB ? eta -:--:--
   ---------------------- ----------------- 9.2/16.3 MB 40.7 MB/s eta 0:00:01
   ------------------

Unnamed: 0,Title,Description,Ingredients,Directions,Link
0,Tomato Pie Dip,"""Very creamy and so flavorful!""","['cooking spray', '3 red beefsteak tomatoes, c...","['A Southern classic reimagined, this irresist...",https://www.allrecipes.com/tomato-pie-dip-reci...
1,Sparkling PB&J Sidecars,"A classic sidecar calls for Cognac, orange liq...","['superfine sugar', '1 lemon wedge', 'ice, as ...",['Spread sugar on a small plate. Rub rims of 2...,https://www.allrecipes.com/sparkling-pb-and-j-...
2,Mushroom Pilaf with Pine Nuts and Dried Cherries,Savory mushrooms and sweet cherries combine in...,"['3 teaspoons olive oil, divided', '8 ounces c...",['Preheat the oven to 350 degrees F (175 degre...,https://www.allrecipes.com/mushroom-pilaf-with...
3,Parmesan-Garlic Green Beans,These cheesy green beans come together in just...,"['1 pound trimmed green beans', '1/2 teaspoon ...",['Preheat the oven to 400 degrees F (200 degre...,https://www.allrecipes.com/parmesan-garlic-gre...
4,Squash Gratin with Mornay Sauce,Use up in-season squash with this quick gratin...,"['1/2 cup water', '3/4 cup thinly sliced Vidal...",['Set oven rack 6 inches from heat source; pre...,https://www.allrecipes.com/squash-gratin-with-...


In [27]:
df = df_recipes['Description']
df

0                       "Very creamy and so flavorful!"
1     A classic sidecar calls for Cognac, orange liq...
2     Savory mushrooms and sweet cherries combine in...
3     These cheesy green beans come together in just...
4     Use up in-season squash with this quick gratin...
5     A quick and easy one pan recipe with linguine ...
6     This one pan lemon garlic Parmesan chicken rec...
7     When I heard New England beach pizza called "A...
8     Birria tacos are so delicious and this recipe ...
9                     "Hearty, creamy, saucy goodness."
10    These cinnamon roll blondies are everything yo...
11    This slow cooker lasagna soup has all the flav...
12    These chicken chalupas use leftover chicken, w...
13    Need to get a head start on dinner tonight? St...
14    Juicy apples, hearty Brussels sprouts, and fra...
15    These baked turkey curry empanadas are from bl...
16    Bourbon, vanilla, cinnamon, maple syrup, and a...
17    Rye bagels, hot-smoked salmon, and cream c