# Lab 5: Web Scraping, APIs, and Topic Modeling

Today, we're diving into the world of extracting data directly from the web. We'll learn how to programmatically interact with websites (**web scraping**) to gather information, how to use official channels (**APIs**) like Reddit's to get structured data, and finally, how to make sense of large amounts of text using **Topic Modeling**.

The web is full of data, and today you'll learn the techniques to retrieve it.

**Why are these skills important?**

*   **Data Acquisition:** You can't analyze data you don't have! Scraping and APIs are fundamental ways to build datasets for Machine Learning, market analysis, or academic research.
*   **Competitive Intelligence:** Want to know what competitors are doing? Scrape product prices, features, or reviews.
*   **Social Insights:** Analyze discussions on platforms like Reddit to understand public opinion, trends, or identify communities interested in specific topics (like *sarmale* vs. *mici*?).
*   **News Aggregation & Monitoring:** Create your own news feed or track mentions of specific keywords across the web.
*   **Understanding Text Data:** Topic modeling helps us automatically discover the hidden themes or subjects within large collections of text, like customer feedback or forum posts.

Let's get our tools ready and start investigating!

## Part 1: Web Scraping with BeautifulSoup - The Static Web Investigator

First, we need to understand the structure of most web pages: **HTML** (Hypertext Markup Language). It's the skeleton of a webpage, using tags like `<html>`, `<head>`, `<body>`, `<h1>`, `<p>`, `<a>`, `<div>`, `<span>`, etc., to organize content.

To parse this structure and extract information from *static* websites (pages where the content is mostly fixed and doesn't change much without a full reload), we'll use a fantastic Python library called **BeautifulSoup**. It helps us navigate the HTML tree like a pro.

Think of BeautifulSoup as your magnifying glass for HTML. 🔎

In [None]:
# Install necessary libraries quietly
!pip install beautifulsoup4 requests pandas --quiet

print("Libraries installed successfully!")

Libraries installed successfully!


In [None]:
# @title Basic HTML Parsing with BeautifulSoup - Romanian News Example

import requests
from bs4 import BeautifulSoup
import pandas as pd # We'll use pandas later

# Let's try scraping headlines from a popular Romanian news site
# Note: Website structures can change! This might need adjustments in the future.
# Always check the website's terms of service regarding scraping.
url = 'https://www.digi24.ro' # Example: Digi24 front page

try:
    response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}) # Add a User-Agent header
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    html_content = response.text

    # Parse the HTML content
    soup = BeautifulSoup(html_content, 'html.parser')

    # Let's look at the raw HTML (optional, can be very long!)
    print(soup.prettify()) # Use prettify for a nicer formatted output

    print(f"Successfully fetched content from {url}")
    print(f"Page title: {soup.title.string}") # Get the page title

except requests.exceptions.RequestException as e:
    print(f"Error fetching URL {url}: {e}")
    soup = None # Ensure soup is None if fetching failed

<!DOCTYPE html>
<html lang="ro">
 <head>
  <!-- BEGIN: "FrontendUiMain\View\Helper\WidgetLayoutLayoutHeadAssets"; -->
  <!-- BEGIN Seo HEAD -->
  <title>
   Digi24 - Stiri - Informația la putere!
  </title>
  <meta content="Digi24 aduce în prim plan știri relevante, imparțiale și prezentate cu acuratețe. Digi24.ro iti ofera cele mai noi ştiri interne, externe, economice si politice." name="description"/>
  <link href="https://www.digi24.ro/" rel="canonical"/>
  <link href="https://m.digi24.ro/" media="only screen and (max-width: 980px)" rel="alternate"/>
  <link href="https://m.digi24.ro/" rel="handheld"/>
  <!-- END Seo HEAD -->
  <!-- BEGIN Facebook HEAD -->
  <meta content="Digi24 - Stiri - Informația la putere!" property="og:title"/>
  <meta content="Digi24 aduce în prim plan știri relevante, imparțiale și prezentate cu acuratețe. Digi24.ro iti ofera cele mai noi ştiri interne, externe, economice si politice." property="og:description"/>
  <meta content="website" property="og:type"

Okay, we have the HTML! Now, how do we find the specific pieces of information we want, like news headlines?

This is where browser developer tools come in handy. In most browsers (Chrome, Firefox, Edge), you can right-click on an element (like a headline) and select "Inspect" or "Inspect Element". This will open a panel showing the HTML code for that specific element and its surroundings.

You'll typically look for patterns:
*   Are all headlines inside `<h2>` tags?
*   Do the elements containing headlines have a specific `class` attribute (e.g., `class="article-title"`)?
*   Are they within a larger `<div>` or `<article>` tag that groups related content?

Let's try to identify a pattern for headlines on Digi24 (as of March 2025, this might change!). Often, main headlines are in `<h2>` or `<h3>` tags within article elements. Let's assume we find they often use a specific class like `article-title` or similar within an `<article>` tag. *(Self-correction: Actual inspection needed here if running live. For this example, let's use a plausible structure.)*

In [None]:
# @title Extracting Structured Data - Headlines from Digi24

if soup: # Proceed only if fetching was successful
    headlines_data = []

    # Find all article blocks (adjust selector based on actual inspection)
    # Common patterns: <article>, <div class="teaser">, etc.
    # Let's try finding <article> tags first.
    article_blocks = soup.find_all('article')

    if not article_blocks:
        # If <article> tags don't work, try a common div structure (example)
        article_blocks = soup.find_all('div', class_='article-item') # Adjust class name as needed!

    print(f"Found {len(article_blocks)} potential article blocks.")

    # Iterate through the blocks and extract headlines
    for block in article_blocks:
        # Try finding headline tags (h2, h3) within the block
        headline_tag = block.find(['h2', 'h3', 'h4']) # Find first h2, h3 or h4

        # Sometimes headlines are inside links (<a>) within these header tags
        if headline_tag:
            headline_link = headline_tag.find('a')
            if headline_link and headline_link.text.strip():
                 headline_text = headline_link.text.strip()
                 headline_url = headline_link.get('href', 'No URL found')
                 # Make URL absolute if it's relative
                 if headline_url.startswith('/'):
                     headline_url = 'https://www.digi24.ro' + headline_url
            elif headline_tag.text.strip():
                 headline_text = headline_tag.text.strip()
                 headline_url = block.find('a').get('href', 'No URL found') if block.find('a') else 'No URL found'
                 if headline_url.startswith('/'):
                     headline_url = 'https://www.digi24.ro' + headline_url
            else:
                continue # Skip if no text found

            headlines_data.append({'Headline': headline_text, 'URL': headline_url})


    # Create a Pandas DataFrame for easier viewing
    if headlines_data:
        headlines_df = pd.DataFrame(headlines_data)
        print("\n--- Extracted Headlines ---")
        display(headlines_df.head()) # Display the first few headlines
    else:
        print("\nCould not extract headlines with the current selectors. Website structure might have changed.")

else:
    print("HTML content not available for parsing.")

Found 71 potential article blocks.

--- Extracted Headlines ---


Unnamed: 0,Headline,URL
0,Rusia nu acceptă propunerile SUA de a pune cap...,https://www.digi24.ro/stiri/externe/rusia-nu-a...
1,Video Ucrainenii au distrus un buncăr plin de...,https://www.digi24.ro/stiri/externe/ucrainenii...
2,"Plecat de 18 ani din România, Cosmin Olăroiu a...",https://www.digisport.ro/fotbal/incep-negocier...
3,Românii care vor să meargă în Marea Britanie a...,https://www.digi24.ro/stiri/actualitate/romani...
4,Pensionarii cu venituri mici au primit ajutoru...,https://www.digi24.ro/stiri/actualitate/social...


### Useful BeautifulSoup Methods Recap

*   `find(tag, attrs={}, class_='...', **kwargs)`: Returns the *first* matching element. Great for unique items.
*   `find_all(tag, attrs={}, class_='...', limit=None, **kwargs)`: Returns a *list* of all matching elements. Perfect for repeating items like headlines, products, or comments.
*   `.text` or `get_text(separator='', strip=True)`: Extracts the human-readable text content from within a tag or tags. `strip=True` is useful for removing extra whitespace.
*   `.get(attribute_name)`: Gets the value of an attribute (e.g., `link_tag.get('href')` to get the URL from an `<a>` tag).
*   `select(css_selector)`: Uses CSS selectors (like in stylesheets) to find elements. Very powerful! E.g., `soup.select('div.article > h2.title')` finds `<h2>` tags with class `title` inside `<div>` tags with class `article`.

### Exercise 1: Timișoara Weather Forecaster 🌦️

**Goal:** Scrape the monthly weather forecast for Timișoara from a weather website using BeautifulSoup.

**Website:** `https://www.accuweather.com/` (Let's use AccuWeather for this example - it often has a clear monthly view. *Note: Check the URL validity and website structure before running.*)

**Tasks:**

1.  **Fetch and Parse:** Get the HTML content of the April forecast page for Timișoara.
2.  **Extract Daily Data:** For each day listed on the page, extract:
    *   The day of the month (e.g., "1", "15", "31").
    *   The maximum predicted temperature.
    *   The minimum predicted temperature.
    *   The general weather description (e.g., 'Însorit', 'Parțial noros', 'Ploaie' - this might be in text or associated with an icon, perhaps in the `title` or `alt` attribute of an image, or a specific `<span>`). *Focus on finding textual descriptions.*
3.  **Handle Temperatures:** Ensure temperatures are stored in Celsius. AccuWeather usually shows Celsius for the Romanian version, but if your scraper shows Fahrenheit, convert using: C = (F - 32) * 5 / 9. *Be prepared for non-numeric values like '--' if data is missing and handle them (e.g., store as `NaN`).*
4.  **Create DataFrame:** Store the extracted data in a Pandas DataFrame with columns: `"day"`, `"max_temp_c"`, `"min_temp_c"`, `"weather_description"`.
5.  **Analysis:**
    *   Calculate and display the average *maximum* temperature for the days scraped.
    *   Calculate and display the average *minimum* temperature for the days scraped.
    *   Find and display the day(s) with the lowest minimum temperature.
    *   Count and display how many upcoming days (from today onwards) are predicted to have 'Ploaie' (Rain) or similar rain-indicating terms in their description. (You might need the current date for this).

**Hints:**
*   Use "Inspect Element" heavily! Look for repeating elements (like `div` or `a` tags) that contain the daily forecast data. Find a common class or structure.
*   The `select()` method might be useful if daily blocks share a common CSS class pattern.
*   Temperature might be inside specific `span` tags. The description might be in a `span` or associated with an `img` tag's `alt`/`title` attribute.
*   Use `try-except` blocks when converting temperatures to numbers to handle potential errors (like '--').
*   Use `pd.to_numeric` with `errors='coerce'` for robust temperature conversion in the DataFrame.
*   Import the `datetime` module to get the current day for the rain forecast analysis.

In [None]:
# Write your code below


## Part 2: Handling Dynamic Websites with Playwright - The Interactive Investigator

BeautifulSoup + Requests are great for static pages. But what about websites where content loads *after* the initial page load (using JavaScript), or sites that require you to click buttons, scroll down ("infinite scroll"), or fill forms?

For these **dynamic websites**, we need a tool that can actually control a web browser programmatically. Enter **Playwright**!

Playwright allows our Python script to:
*   Launch real browsers (Chromium, Firefox, WebKit) - even headlessly (without a visible window).
*   Navigate to pages.
*   Wait for specific elements or events to happen.
*   Click buttons, fill input fields, hover over elements.
*   Execute JavaScript within the page context.
*   Take screenshots.
*   Get the HTML *after* JavaScript has done its magic.

Think of Playwright as giving your script the ability to use a browser just like a human would, but much faster and automatically.

In [None]:
# @title Install Playwright and its browsers
!pip install playwright pandas --quiet
!playwright install --with-deps # Installs browsers (Chromium, Firefox, WebKit) and their dependencies
# The '--with-deps' flag is important on Linux environments like Colab

print("Playwright and browsers installed.")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 MB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling dependencies...
Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:6 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Get:7 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:10 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:11 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,686 kB]
Get:12 ht

In [None]:
# @title Basic Playwright Functionality (Async)

# Playwright often uses Python's asyncio for non-blocking operations.
# nest_asyncio allows running asyncio event loops within environments like Jupyter/Colab.
import asyncio
import nest_asyncio
from playwright.async_api import async_playwright

nest_asyncio.apply()

async def run_basic_playwright():
    pw = None
    browser = None
    try:
        print("Launching Playwright...")
        pw = await async_playwright().start()
        # Launch Chromium (you can also use pw.firefox.launch() or pw.webkit.launch())
        # headless=False makes the browser window visible (useful for debugging)
        # headless=True runs it in the background (typical for automation)
        browser = await pw.chromium.launch(headless=True)
        print("Browser launched.")

        page = await browser.new_page()
        print("Navigating to Playwright's website...")
        await page.goto("https://playwright.dev/python/", timeout=60000) # Increased timeout
        print("Page loaded.")

        # Get the page title
        title = await page.title()
        print(f"Page Title: {title}")

        # Get the full HTML content AFTER any potential JS execution
        content = await page.content()
        print("\n--- Page Content Snippet ---")
        print(content[:500] + "...") # Print the first 500 chars

    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if browser:
            await browser.close()
            print("Browser closed.")
        if pw:
            await pw.stop()
            print("Playwright stopped.")

# Run the async function
asyncio.run(run_basic_playwright())

Launching Playwright...
Browser launched.
Navigating to Playwright's website...
Page loaded.
Page Title: Fast and reliable end-to-end testing for modern web apps | Playwright Python

--- Page Content Snippet ---
<!DOCTYPE html><html lang="en" dir="ltr" class="plugin-pages plugin-id-default" data-has-hydrated="true" data-theme="light" data-rh="lang,dir,class,data-has-hydrated"><head><meta charset="UTF-8"><meta name="generator" content="Docusaurus v3.7.0"><title>Fast and reliable end-to-end testing for modern web apps | Playwright Python</title><meta data-rh="true" name="viewport" content="width=device-width, initial-scale=1.0"><link data-rh="true" rel="icon" href="/python/img/playwright-logo.svg"><link r...
Browser closed.
Playwright stopped.


In [None]:
# @title Playwright Interaction Example: Searching on eMAG

import asyncio
import nest_asyncio
from playwright.async_api import async_playwright
import pandas as pd

nest_asyncio.apply()

async def search_emag(search_term="laptop", max_items=5):
    pw = None
    browser = None
    results = []
    print(f"Starting eMAG search for: '{search_term}'")
    try:
        pw = await async_playwright().start()
        browser = await pw.chromium.launch(headless=True) # Run headless
        page = await browser.new_page()

        # Go to eMAG's homepage (adjust URL if needed)
        print("Navigating to eMAG...")
        await page.goto("https://www.emag.ro/", timeout=90000) # Longer timeout for potentially slow sites
        print("eMAG homepage loaded.")

        # Find the search input, fill it, and press Enter
        search_input_selector = 'input#searchboxTrigger' # Selector for search input (Inspect to confirm!)
        print(f"Filling search input: '{search_input_selector}'")
        await page.fill(search_input_selector, search_term, timeout=60000)

        print("Pressing Enter...")
        await page.press(search_input_selector, 'Enter')

        # Wait for search results to load - IMPORTANT!
        # We need a selector that identifies the container of the product cards
        results_container_selector = 'div#card_grid' # Adjust selector based on inspection!
        print(f"Waiting for results container: '{results_container_selector}'")
        await page.wait_for_selector(results_container_selector, state='visible', timeout=90000)
        print("Search results page loaded.")

        # Find product cards (adjust selector)
        product_card_selector = 'div.card-item.card-standard' # Adjust!
        print(f"Looking for product cards: '{product_card_selector}'")
        product_cards = await page.query_selector_all(product_card_selector)
        print(f"Found {len(product_cards)} product cards on the first page.")

        # Extract data from the first few cards
        for i, card in enumerate(product_cards):
            if i >= max_items:
                break

            title = "N/A"
            price = "N/A"
            url = "#"

            # Extract title (adjust selector)
            title_tag = await card.query_selector('a.card-v2-title') # Adjust!
            if title_tag:
                title = await title_tag.inner_text()
                url = await title_tag.get_attribute('href')
                 # Make URL absolute if necessary
                if url and not url.startswith('http'):
                    url = f"https://www.emag.ro{url}"


            # Extract price (adjust selector - often complex with different parts)
            price_tag = await card.query_selector('p.product-new-price') # Adjust!
            if price_tag:
                # Get all text parts within the price tag, join them, clean up whitespace
                price_parts = await price_tag.inner_text()
                price = ' '.join(price_parts.split()).replace('\n', ' ').strip() if price_parts else "N/A"


            results.append({'Title': title.strip(), 'Price': price, 'URL': url})
            print(f"  - Scraped: {title.strip()} - {price}")


    except Exception as e:
        print(f"An error occurred during eMAG search: {e}")
        # Optional: Take a screenshot on error for debugging
        # if page: await page.screenshot(path='error_screenshot.png')

    finally:
        if browser:
            await browser.close()
            print("Browser closed.")
        if pw:
            await pw.stop()
            print("Playwright stopped.")

    return results

# Run the search and display results
search_results = asyncio.run(search_emag(search_term="procesor AMD", max_items=5))

if search_results:
    emag_df = pd.DataFrame(search_results)
    print("\n--- eMAG Search Results ---")
    display(emag_df)
else:
    print("\nNo results scraped from eMAG.")

Starting eMAG search for: 'procesor AMD'
Navigating to eMAG...
eMAG homepage loaded.
Filling search input: 'input#searchboxTrigger'
Pressing Enter...
Waiting for results container: 'div#card_grid'
Search results page loaded.
Looking for product cards: 'div.card-item.card-standard'
Found 60 product cards on the first page.
  - Scraped: Procesor AMD Ryzen™ 7 9800X3D, 104MB, 4.7/5.2GHz Boost, Socket AM5, Radeon Graphics - 2.869,99 Lei
  - Scraped: Procesor AMD Ryzen™ 7 5700X, 36MB, 4.6GHz, Socket AM4 - 835,99 Lei
  - Scraped: Procesor AMD Ryzen™ 5 4500, 4.1GHz, 11MB, socket AM4, Box - 304,99 Lei
  - Scraped: Procesor AMD Ryzen™ 5 5500, 4.2GHz, 19MB, socket AM4, BOX - 456,03 Lei
  - Scraped: Procesor AMD Ryzen™ 5 3600, 35MB, 3.6GHz/4.2GHz Boost, Socket AM4, Wraith Spire Cooler - 362,93 Lei
Browser closed.
Playwright stopped.

--- eMAG Search Results ---


Unnamed: 0,Title,Price,URL
0,"Procesor AMD Ryzen™ 7 9800X3D, 104MB, 4.7/5.2G...","2.869,99 Lei",https://www.emag.ro/procesor-amd-ryzentm-7-980...
1,"Procesor AMD Ryzen™ 7 5700X, 36MB, 4.6GHz, Soc...","835,99 Lei",https://www.emag.ro/procesor-amd-ryzentm-7-570...
2,"Procesor AMD Ryzen™ 5 4500, 4.1GHz, 11MB, sock...","304,99 Lei",https://www.emag.ro/procesor-amd-ryzentm-5-450...
3,"Procesor AMD Ryzen™ 5 5500, 4.2GHz, 19MB, sock...","456,03 Lei",https://www.emag.ro/procesor-amd-ryzentm-5-550...
4,"Procesor AMD Ryzen™ 5 3600, 35MB, 3.6GHz/4.2GH...","362,93 Lei",https://www.emag.ro/procesor-amd-ryzentm-5-360...


### Exercise 2: Dynamic Data Challenge (Choose A or B)

Now it's your turn to tackle a dynamic website! Choose **one** of the following exercises.

**Option A: Goodreads Top Books Scraper **

**Goal:** Scrape the "Best Books Ever" list from Goodreads, potentially handling pagination with Playwright. While *sometimes* possible with BeautifulSoup by changing page URLs, Playwright is more robust if JS is involved in loading or navigation.

**Website:** `https://www.goodreads.com/list/show/1.Best_Books_Ever`

**Tasks:**

1.  **Navigate & Scrape Pages:** Use Playwright to navigate through the first 3 pages of the list (each page usually contains 100 books, so this gets you the top 300). You'll need to find the "next page" button/link and simulate clicks, waiting for the next page to load each time.
2.  **Extract Book Data:** For each book on these pages, scrape:
    *   **Title:** The title of the book.
    *   **Author:** The author's name.
    *   **Ranking:** The book's rank on the list (e.g., "1", "150", "299"). This might be implicitly by order or explicitly listed.
    *   **Average Rating:** The average user rating (e.g., "4.35 avg rating"). Extract the number.
    *   **Score:** The score assigned by Goodreads voters (e.g., "score: 3,941,839"). Extract the number.
    *   **Description:** The text description of the plot, extract the entire text.
3.  **Create DataFrame:** Store all the collected information (for ~300 books) in a Pandas DataFrame.
4.  **Analysis:**
    *   Display the book with the highest *average rating* and its description.
    *   Display the book with the highest *score*. Are they the same book?

**Hints:**
*   Identify the CSS selectors for the book container, title, author, rating, score, and the "next page" link/button.
*   Use `page.click()` for pagination and `page.wait_for_load_state('networkidle')` or `page.wait_for_selector()` to ensure the next page's content is loaded before scraping.
*   Use loops to iterate through pages and books.
*   Clean the extracted text (e.g., remove "avg rating", "score:", commas from numbers) before converting to numeric types.

---

**Option B: Real-Time Stock Tracker (Yahoo Finance) **

**Goal:** Use Playwright to get near real-time stock information for major tech companies from Yahoo Finance, which heavily relies on JavaScript for updating prices.

**Website:** `https://finance.yahoo.com/` (You'll navigate to specific ticker pages)

**Tickers:** `AAPL` (Apple), `GOOGL` (Alphabet/Google), `MSFT` (Microsoft), `AMZN` (Amazon), `TSLA` (Tesla)

**Tasks:**

1.  **Navigate & Extract:** For each ticker symbol:
    *   Navigate to its specific Yahoo Finance page (e.g., `https://finance.yahoo.com/quote/AAPL`).
    *   Wait for the main price information to load (it might update dynamically).
    *   Extract the **Current Price**. (Hint: The original hint `fin-streamer[data-test="qsp-price"]` is a good starting point, but *verify* it with Inspect Element, as attributes can change).
    *   Extract the **Market Change (Absolute Value)** (e.g., "+2.91" or "-1.50").
    *   Extract the **Market Change (Percentage)** (e.g., "+1.05%" or "-0.88%").
    *   Record the **Date and Time** of the reading.
2.  **Repeat Readings:** Perform the extraction process 5 times for each ticker, perhaps with a small delay (e.g., 10-15 seconds) between readings for the *same* ticker to potentially capture minor fluctuations (Note: If the market is closed, readings will be identical).
3.  **Create DataFrame:** Store all readings (5 tickers * 5 readings = 25 rows) in a Pandas DataFrame with columns: `"Ticker"`, `"Price ($)"`, `"Change ($)"`, `"Change (%)"`, `"Timestamp"`.
4.  **Analysis:**
    *   Display the final DataFrame.
    *   For each ticker, calculate the difference between the first and last price reading you captured.

**Hints:**
*   Use f-strings to construct the URLs for each ticker: `f"https://finance.yahoo.com/quote/{ticker}"`.
*   Yahoo Finance uses specific attributes (like `data-field`, `data-symbol`, `data-test`) on elements, especially `<fin-streamer>`. Use Inspect Element carefully to find reliable selectors for the price and change values. `page.locator()` is very useful here.
*   Use `page.wait_for_selector()` to ensure the elements you need are present before trying to extract text.
*   Use `datetime.now()` from the `datetime` module to get the timestamp for each reading.
*   Use `asyncio.sleep()` for delays between readings if needed.
*   Clean the extracted text (remove '+', '%', '$', commas) and convert to numeric types.

In [None]:
# # @title Exercise 2: Code Implementation (Choose A or B)
# Write your code below
