## Q1. What is Web Scraping? Why is it Used? Give three areas where Web Scraping is used to get data.

### Ans :
Web scraping : Web scraping is the process of automatically extracting data from websites. It involves using software or scripts to retrieve and organize information from web pages into a structured format, such as a database or spreadsheet.

Uses of web scraping : Web scraping is used to collect large volumes of data from websites efficiently and quickly. It is commonly employed to:

* Gather publicly available data for analysis.
* Automate repetitive data collection tasks.
* Enable businesses to stay competitive by accessing real-time information.
  
Three Areas Where Web Scraping is Used:

1. E-Commerce: Collecting product prices, reviews, and availability for price comparison and market analysis.
2. Social Media Monitoring: Analyzing trends, hashtags, or public sentiment for marketing campaigns or reputation management.
3. Research and Academia: Gathering data for studies, such as extracting news articles, scientific papers, or statistics from public websites.

## Q2. What are the different methods used for Web Scraping?

### Ans :

* Manual Copy-Pasting : Copying data manually from a website and pasting it into a structured format like a spreadsheet.Suitable for small-scale scraping tasks.

* HTML Parsing: Using tools like BeautifulSoup (Python) to parse and extract data from the HTML structure of web pages.

Example: Extracting titles, links, or tables.

* Using APIs: Accessing structured data provided by websites through their Application Programming Interfaces (APIs).

Example: Twitter API for tweets, OpenWeatherMap API for weather data.

* Browser Automation: Using tools like Selenium or Puppeteer to automate browser actions and scrape dynamic websites that rely on JavaScript for rendering.

Example: Interacting with login forms or infinite scroll.

* Web Scraping Libraries/Frameworks: Libraries like Scrapy or requests in Python to send HTTP requests and process the response.

Example: Extracting data from multiple pages using custom scripts.

* Headless Browsers:

Tools like Playwright or headless Chrome allow scraping of JavaScript-heavy websites by simulating a browser environment.

* XPath and CSS Selectors: Using XPath or CSS selectors to target specific elements in the web page's DOM for extraction.

Example: Extracting elements based on class or tag.

* Data Extraction Tools: Using pre-built tools like Octoparse, ParseHub, or WebHarvy to scrape websites without writing code.

## Q3. What is Beautiful Soup? Why is it used?

### Ans :
Beautiful Soup is a Python library used for parsing HTML and XML documents. It creates a parse tree that allows easy navigation, searching, and modification of the document’s structure.

Why is it Used?
Beautiful Soup is used for web scraping tasks to extract specific data from web pages. It simplifies the process of:

* Navigating and searching through the HTML structure.
* Extracting data from tags, attributes, and text content.
* Handling poorly formatted or broken HTML documents.
 
Key Features:
* Supports powerful navigation using tags, attributes, or CSS selectors.
* Works well with other libraries like requests for fetching web pages.
* Can parse and clean up malformed HTML documents.


## Q4. Why is flask used in this Web Scraping project?

### Ans :
Flask is a lightweight and flexible Python web framework used to build web applications. In a web scraping project, Flask is often used to:

1. Create a Web Interface:Flask provides a user-friendly web interface where users can input URLs, parameters, or other details required for scraping.
Example: A form where users can submit the website they want to scrape.

2. Display Scraped Data:After scraping, the extracted data can be formatted and displayed in the browser as HTML tables, charts, or plain text.

Example: Showing extracted product prices or reviews on a webpage.

3. Handle User Requests: Flask handles HTTP requests (e.g., GET, POST) and serves dynamic responses based on the user input or actions.

Example: Triggering the scraping process when a user submits a URL.

4. API Development:Flask can expose the scraping functionality as a REST API, enabling other applications to access the scraped data programmatically.
   
Example: Returning JSON data containing the scraped content.

5. Integration with Other Tools: Flask seamlessly integrates with web scraping libraries like BeautifulSoup, Selenium, or Scrapy, as well as with front-end tools for enhanced functionality.

## Q5. Write the names of AWS services used in this project. Also, explain the use of each service.

### Ans :
1. Amazon EC2 (Elastic Compute Cloud):
Use: Provides scalable virtual servers to run the web scraping scripts.
Example: Hosting the Flask application and running scraping scripts.
2. Amazon S3 (Simple Storage Service):
Use: Used for storing large volumes of scraped data in a secure and scalable manner.
Example: Saving extracted data (e.g., JSON, CSV files) for further analysis.
3. AWS Lambda:
Use: Runs scraping scripts in a serverless environment, triggered by specific events or schedules.
Example: Periodic scraping of websites without managing servers.
4. Amazon RDS (Relational Database Service):
Use: Stores structured data from web scraping in relational databases like MySQL, PostgreSQL, or MariaDB.
Example: Maintaining a database of products, prices, or reviews for e-commerce analysis.
5. AWS CloudWatch:
Use: Monitors the performance of scraping scripts and logs errors or execution details.
Example: Tracking the success of scraping tasks and debugging issues.
6. Amazon DynamoDB:
Use: A NoSQL database for storing unstructured or semi-structured scraped data.
Example: Storing metadata or logs from scraping jobs.
7. AWS Batch:
Use: Manages and runs batch scraping jobs at scale.
Example: Performing scraping tasks on multiple websites in parallel.
8. Amazon SQS (Simple Queue Service):
Use: Manages task queues for asynchronous scraping processes.
Example: Queuing URLs to scrape when working with multiple sources.
9. AWS Elastic Load Balancing (ELB):
Use: Balances incoming traffic to the Flask application across multiple EC2 instances.
Example: Ensures high availability and reliability of the web interface for users.

Example Use Case Integration:

* Amazon EC2 runs the Flask app and scraping scripts.
* Aazon S3 stores the scraped data.
* Amazon RDS or DynamoDB organizes the data for querying.
* AWS Lambda automates scraping tasks on a schedule.
* CloudWatch monitors and logs scraping performance.
