# In-Class Assignment: Web Scraping for Economic Data

**Course:** Data Science for Economists  
**Topic:** HTML Parsing with BeautifulSoup

## Objective
In this assignment, you will apply the web scraping techniques learned in Unit 10. 
Your goal is to parse unstructured HTML text representing economic news and data, extract specific information, and structure it into a usable format (Pandas DataFrame) for analysis.

### Prerequisites
Ensure you have `beautifulsoup4` and `pandas` installed.

```python
pip install bs4 pandas
```

In [None]:
from bs4 import BeautifulSoup
import pandas as pd

## Part 1: Parsing Economic News Snippets

Economists often scrape news aggregators to build sentiment indices or track policy announcements. Below is a raw HTML string representing a news card you might find on a financial news site.

In [None]:
html_news = """
<div class="news-card">
    <h2 class="headline">Central Bank Raises Interest Rates by 0.25%</h2>
    <p class="date">October 24, 2024</p>
    <div class="summary">
        <p>In a move to combat sticky inflation, the policy committee voted unanimously to hike rates. 
        Analysts predict this will impact <a href="/markets/housing" class="topic">housing markets</a> and 
        <a href="/markets/bonds" class="topic">bond yields</a> significantly.</p>
    </div>
    <span class="author">By: Dr. J. Yellen</span>
</div>
"""

### Question 1: The Soup Object
Create a `BeautifulSoup` object from the `html_news` string.

In [None]:
# Your code here


### Question 2: Extracting Key Info
Extract and print the **Headline** text and the **Date** from the soup object.

In [None]:
# Your code here


### Question 3: Identifying Topics
Find all the links (`<a>` tags) within the news card. Extract the text from these links to see what economic topics are tagged in the article.

**Expected Output:** `['housing markets', 'bond yields']`

In [None]:
# Your code here


## Part 2: From Unstructured HTML to Structured Data

A common task for data economists is grabbing tables of indicators from government websites (like the BLS or Census Bureau) that don't offer clean API access.

Below is HTML representing a simple table of economic indicators for various countries.

In [None]:
html_table = """
<table id="economic_indicators">
  <thead>
    <tr>
      <th>Country</th>
      <th>GDP_Trillions_USD</th>
      <th>Inflation_Rate_Pct</th>
      <th>Unemployment_Rate_Pct</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>USA</td>
      <td>26.95</td>
      <td>3.7</td>
      <td>3.8</td>
    </tr>
    <tr>
      <td>China</td>
      <td>17.70</td>
      <td>0.1</td>
      <td>5.2</td>
    </tr>
    <tr>
      <td>Japan</td>
      <td>4.23</td>
      <td>3.2</td>
      <td>2.7</td>
    </tr>
    <tr>
      <td>Germany</td>
      <td>4.43</td>
      <td>6.1</td>
      <td>5.7</td>
    </tr>
  </tbody>
</table>
"""

### Question 4: Parsing the Table Headers
Create a new soup object for `html_table`. Extract the column names from the `<thead>` section and store them in a list called `headers`.

*Hint: Look for `<th>` tags.*

In [None]:
# Your code here


### Question 5: Parsing the Data Rows
Iterate through the `<tbody>` rows (`<tr>`). For each row, extract the data cells (`<td>`), get their text, and store the row as a list. 

Accumulate these lists into a master list called `data`.

In [None]:
# Your code here


### Question 6: The "Data Science" Finish
1. Use your `headers` list and `data` list to create a Pandas DataFrame.
2. Ensure the numerical columns are actually stored as numbers (floats), not strings.
3. Calculate and print the average **Inflation Rate** across these four countries.

In [None]:
# Your code here
