## Q1. What is Web Scraping? Why is it Used? Give Three Use Cases

**Web Scraping** is the automated process of extracting data from websites.

### ✅ Why is it Used?
- To collect data that is not readily available via APIs.
- To analyze market trends, competitors, or public opinion.
- To automate repetitive data collection tasks.

### ✅ Use Cases:
1. **Price Comparison Websites** – Scraping product prices from e-commerce platforms.
2. **News Aggregators** – Collecting news articles from various news sources.
3. **Real Estate** – Gathering property listings and price trends from property websites.

## Q2. Methods Used for Web Scraping

### ✅ Common Methods for Web Scraping:
1. **Using HTTP Libraries** (e.g., `requests`, `http.client`)
2. **Using HTML Parsers** (e.g., `BeautifulSoup`, `lxml`)
3. **Browser Automation** (e.g., `Selenium`, `Playwright`)
4. **Headless Browsers** (e.g., `Puppeteer`)

## Q3. What is Beautiful Soup? Why is it Used?

**BeautifulSoup** is a Python library used for parsing HTML and XML documents.

### ✅ Why is it Used?
- To navigate, search, and modify the parse tree easily.
- It provides Pythonic ways to access HTML elements (e.g., `find()`, `find_all()`).

### ✅ Example:


In [2]:

from bs4 import BeautifulSoup

html = "<html><body><h1>Hello</h1></body></html>"
soup = BeautifulSoup(html, 'html.parser')
print(soup.h1.text)  # Output: Hello

Hello


## Q4. Why is Flask Used in This Web Scraping Project?

**Flask** is used as a lightweight web framework to:
- Build a simple interface to trigger web scraping.
- Display the scraped data on a web page.
- Provide endpoints (APIs) that users or systems can query.

Flask acts as a bridge between the backend scraping logic and the frontend display.

## Q5. AWS Services Used in This Project and Their Purpose

Here are common AWS services used in web scraping projects:

1. **EC2 (Elastic Compute Cloud)**:
   - Used to host the Python web scraping script and Flask app.
   - Provides scalable compute power in the cloud.

2. **S3 (Simple Storage Service)**:
   - Stores the scraped data as files (CSV, JSON, etc.).
   - Used for backup or further processing.

3. **Lambda (Optional)**:
   - Used for serverless execution of scraping scripts on a schedule.
   - Reduces the need to keep servers running 24/7.

4. **RDS (Relational Database Service)** or **DynamoDB**:
   - Used to store structured scraped data for querying and analytics.