# Session 28: Web Scraping with BeautifulSoup (Part 2)

**Unit 3: Data Collection and Cleaning**
**Hour: 28**
**Mode: Practical Lab**

---

### 1. Objective

This lab builds on our previous web scraping session. Instead of just finding the first quote, our goal is to:
1.  Find **all** the quotes on the page.
2.  Loop through them to extract the text and author from each one.
3.  Store the scraped data in a clean, structured Pandas DataFrame.

### 2. Setup

We need `requests`, `BeautifulSoup`, and now `pandas` to store our results.

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

### 3. Review: Request and Parse

Let's quickly re-run the code from last session to get our `soup` object.

In [None]:
URL = 'http://quotes.toscrape.com/'
response = requests.get(URL)
soup = BeautifulSoup(response.text, 'html.parser')

### 4. Finding All Elements

Instead of `.find()`, which only gets the first match, we use the `.find_all()` method to get a list of all matching elements.

Let's find all the `<div class='quote'>` elements on the page.

In [None]:
all_quote_divs = soup.find_all('div', class_='quote')

print(f"Found {len(all_quote_divs)} quotes on the page.")

### 5. Looping and Extracting Data

Now that `all_quote_divs` is a list of BeautifulSoup objects, we can loop through it. In each iteration, the `quote_div` variable will be one of the quote containers we found, and we can run `.find()` *inside* it to get the specific text and author.

In [None]:
for quote_div in all_quote_divs:
    quote_text = quote_div.find('span', class_='text').text
    author_name = quote_div.find('small', class_='author').text
    
    print(f"{quote_text} - {author_name}")
    print("---")

### 6. Storing the Data in a DataFrame

This is the final and most important step. We need to store our results in a structured format. The standard way to do this is to:
1.  Create an empty list.
2.  Inside the loop, create a dictionary for each quote.
3.  Append each dictionary to the list.
4.  After the loop, convert the list of dictionaries into a Pandas DataFrame.

In [None]:
# 1. Create an empty list
quotes_list = []

# Find all the quote divs again
all_quote_divs = soup.find_all('div', class_='quote')

for quote_div in all_quote_divs:
    quote_text = quote_div.find('span', class_='text').text
    author_name = quote_div.find('small', class_='author').text
    
    # 2. Create a dictionary
    quote_dict = {
        'Author': author_name,
        'Quote': quote_text
    }
    
    # 3. Append the dictionary to the list
    quotes_list.append(quote_dict)

# 4. Convert the list of dictionaries to a DataFrame
df_quotes = pd.DataFrame(quotes_list)


Now, let's look at our beautiful, structured DataFrame!

In [None]:
df_quotes.head()

We can save this data to a CSV file for later use.

In [None]:
df_quotes.to_csv('scraped_quotes.csv', index=False) # index=False prevents pandas from writing the row numbers

### 7. Conclusion

In this lab, you leveled up your web scraping skills significantly. You learned to:
1.  Use `.find_all()` to retrieve all elements matching your criteria.
2.  Iterate through the results in a `for` loop.
3.  Extract the specific data you need from each element inside the loop.
4.  Follow the best practice of storing scraped data as a list of dictionaries.
5.  Convert this list into a clean Pandas DataFrame, transforming unstructured web data into structured, analyzable data.

**Next Session:** We will return to our Telco dataset to begin the hands-on data cleaning labs, starting with how to handle missing data.