# Module29 Web scrapping Assignment

Q1. What is Web Scraping? Why is it Used? Give three areas where Web Scraping is used to get data.

A1.

### What is Web Scraping?
Web Scraping is an automated technique used to extract data from websites. It involves sending requests to web pages, extracting relevant information, and saving it in a structured format (CSV, JSON, database, etc.).


### Why is Web Scraping Used?
1.) To collect large amounts of data quickly.

2.) To automate data extraction from dynamic websites.

3.) To gather real-time or historical data for analysis.


### Three Areas Where Web Scraping is Used:

**1. E-commerce Price Monitoring**

a.) Scraping product prices from Amazon, Flipkart, etc.

b.) Helps businesses compare competitor pricing.


**2. Job Market Analysis**

a.) Extracting job listings from LinkedIn, Indeed, etc.

b.) Helps in tracking trends and salary insights.


**3. News & Sentiment Analysis**

a.) Scraping news articles from CNN, BBC, etc.

b.) Used for stock market predictions and social media trends.


Q2. What are the different methods used for Web Scraping?

A2. The different methods for web scrapping are :

**1. Using requests Library (Basic Method)**

a.) Sends HTTP requests to fetch web page content.

Example:

In [None]:
import requests
response = requests.get('https://example.com')
print(response.text)


**2. Using BeautifulSoup (Parsing HTML)**

a.) Parses and extracts data from HTML content.

Example:

In [None]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
print(soup.title.text)


**3. Using Selenium (For Dynamic Pages)**

a.) Automates browser actions for JavaScript-based pages.

Example:

In [None]:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://example.com')
print(driver.page_source)
driver.quit()


**4. Using Scrapy (Advanced Web Scraping Framework)**

a.) Handles large-scale scraping projects efficiently.



Q3. What is Beautiful Soup? Why is it used?

A3.

### What is BeautifulSoup?
a.) BeautifulSoup is a Python library used to parse and extract data from HTML/XML files.

b.) It works with requests to navigate, search, and modify HTML.

### Why is BeautifulSoup Used?
a.) Easier HTML Parsing: Extracts content without writing complex regex.

b.) Supports Multiple Parsers: Works with html.parser, lxml, etc.

c.) Efficient Data Extraction: Finds elements by tag, class, or ID.

Example:

In [None]:
from bs4 import BeautifulSoup
html_code = "<html><body><h1>Hello, World!</h1></body></html>"
soup = BeautifulSoup(html_code, 'html.parser')
print(soup.h1.text)  # Output: Hello, World!


Q4. Why is flask used in this Web Scraping project?

A4. Flask is used in a web scraping project to serve scraped data as an API or web application.

### Why Use Flask?
1.) Lightweight & Easy to Use – Minimal setup required.

2.) Creates APIs to Serve Scraped Data – Provides structured data to front-end or other applications.

3.) Allows User Interaction – Users can input queries (e.g., search terms for scraping).

4.) Enables Deployment – Host web scraping results on a local/remote server.

Example:

In [None]:
from flask import Flask, jsonify
app = Flask(__name__)

@app.route('/data')
def scraped_data():
    return jsonify({"product": "Laptop", "price": "$999"})

if __name__ == '__main__':
    app.run(debug=True)


This creates an API that returns scraped data.

Q5. Write the names of AWS services used in this project. Also, explain the use of each service.

A5.

### AWS Services Used in Web Scraping:

**1. Amazon EC2 (Elastic Compute Cloud)**

a.) Provides virtual servers to run the Flask web scraping application.

b.) Handles large-scale scraping jobs efficiently.


**2. Amazon S3 (Simple Storage Service)**

a.) Stores scraped data (CSV, JSON, images).

b.) Used for data backup and retrieval.


**3. Amazon RDS (Relational Database Service)**

a.) Stores structured scraped data in MySQL/PostgreSQL.

b.) Enables easy querying and analysis.

**4. AWS Lambda**

a.) Runs serverless web scraping scripts on a schedule.

b.) Reduces costs by running only when needed.

**5. Amazon CloudWatch**

a.) Monitors logs and performance of scraping tasks.

b.) Helps in debugging errors.

## Example Workflow in AWS:

1.) EC2 runs a Flask app to serve scraped data.

2.) Scraped data is stored in S3 or RDS.

3.) Lambda triggers the scraper periodically.

4.) CloudWatch monitors logs and alerts errors.
