# 21st Feb | Assignment

### Q1. What is Web Scraping? Why is it Used? Give three areas where Web Scraping is used to get data.

Web scraping refers to the process of automatically collecting data from websites using software tools or scripts. <br>
It involves extracting information from websites by sending requests to web servers and parsing the responses to extract the desired information.

Web scraping is used for:

1. Business Intelligence: <br>
Web scraping is used by businesses to collect and analyze data on competitors, market trends, customer behavior, pricing, and other relevant information. <br>
For example: An e-commerce company might use web scraping to track the prices of products on competing websites, analyze customer reviews, and monitor the availability of products.

2. Research: <br>
Researchers use web scraping to collect data for various academic, scientific, and social research purposes. <br>
For example: Social scientists might use web scraping to study online behavior, track sentiment on social media platforms, and monitor news and media sources.

3. Content Aggregation: <br>
Web scraping is used by content aggregators to collect and organize data from different sources for display on their websites or mobile apps. <br>
For example: A news aggregator might use web scraping to collect headlines, summaries, and images from multiple news websites and present them on their own website.

Here are three areas where Web Scraping is used to get data:

1. E-commerce: <br>
Online retailers use web scraping to monitor competitors' prices, inventory levels, and product descriptions. <br>
For example: Amazon might use web scraping to collect data on the prices of products on other e-commerce websites to adjust their own pricing strategy.

2. Real Estate: <br> 
Real estate agents and property management companies use web scraping to collect data on properties for sale or rent, including price, location, amenities, and other details. <br>
For example: Information of an area can help real estate agents and property management companies make informed decisions about pricing, marketing, and property management.

3. Social Media: <br>
Social media companies use web scraping to collect data on user behavior, engagement, and sentiment. <br>
For example: Twitter might use web scraping to collect data on hashtags, mentions, and user engagement to analyze trends and inform their advertising strategy.

### Q2. What are the different methods used for Web Scraping?

Answer 2:-

Different methods used for Web Scraping are:

1. Parsing HTML: <br>
This involves analyzing the structure of HTML documents to extract the relevant data. <br>
This can be done using tools like BeautifulSoup, which provide a simple interface for parsing HTML.

For Example: Extracting the titles of articles from a news website.

In [3]:
# Example

import requests
from bs4 import BeautifulSoup

url = 'https://www.example.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
articles = soup.find_all('h2', {'class': 'article-title'})

for article in articles:
    print(article.text)

2. Using APIs: <br>
Many websites provide APIs (Application Programming Interfaces) that allow developers to access their data programmatically. <br>
APIs provide a structured and standardized way of accessing data, making it easier to extract the desired information. <br>
For example: If you want to extract weather data from a weather website, you can use their API to access the data and extract the relevant information.

In [6]:
import requests

url = 'http://www.omdbapi.com/'
params = {
    'apikey': 'your_api_key',
    's': 'star wars',
    'type': 'movie'
}

response = requests.get(url, params=params)
data = response.json()

for movie in data['Search']:
    print(movie['Title'], movie['Year'])

KeyError: 'Search'

3. Automated web browsing: <br>
This method involves using tools such as Selenium and Puppeteer to automate web browsing and extract data from websites. <br>
Automated web browsing can be useful for scraping data from websites that require user authentication or have complex user interfaces. <br>
For example: If you want to extract data from a website that requires user authentication, you can use Selenium to automate the login process and extract the data.

In [6]:
pip install selenium

Collecting selenium
  Downloading selenium-4.9.0-py3-none-any.whl (6.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.5/6.5 MB[0m [31m63.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting trio-websocket~=0.9
  Downloading trio_websocket-0.10.2-py3-none-any.whl (17 kB)
Collecting trio~=0.17
  Downloading trio-0.22.0-py3-none-any.whl (384 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m384.9/384.9 kB[0m [31m45.1 MB/s[0m eta [36m0:00:00[0m
Collecting outcome
  Downloading outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting exceptiongroup>=1.0.0rc9
  Downloading exceptiongroup-1.1.1-py3-none-any.whl (14 kB)
Collecting wsproto>=0.14
  Downloading wsproto-1.2.0-py3-none-any.whl (24 kB)
Collecting h11<1,>=0.9.0
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: outcome, h11, exc

In [7]:
from selenium import webdriver

url = 'https://www.example.com/product/1234/reviews'
driver = webdriver.Chrome('/path/to/chromedriver')
driver.get(url)

reviews = driver.find_elements_by_class_name('review-text')

for review in reviews:
    print(review.text)
    
driver.quit()

  driver = webdriver.Chrome('/path/to/chromedriver')


WebDriverException: Message: Service /home/jovyan/.cache/selenium/chromedriver/linux64/112.0.5615.49/chromedriver unexpectedly exited. Status code was: 127


### Q3. What is Beautiful Soup? Why is it used?

### Q4. Why is flask used in this Web Scraping project?

### Q5. Write the names of AWS services used in this project. Also, explain the use of each service.