# Task 1 (Data Collection and Web Scraping)

### Installing Libraries

Before you begin, make sure you have the necessary libraries installed.

In [2]:
!pip install requests
!pip install beautifulsoup4



### Import libraries

In [3]:
import requests
from bs4 import BeautifulSoup
import csv



*   requests : Used to send HTTP requests to URLs and get HTML.
*   BeautifulSoup : Used to parse HTML and extract data.
*  csv : Used to save data in CSV format.




### Sending HTTP Request and Getting HTML

In [4]:
url = "https://quotes.toscrape.com/"
response = requests.get(url)

print(response.status_code)

200


If the status code is 200, it means the request was successful and we can continue to parse the HTML.

### Parsing HTML with BeautifulSoup

This step will display the HTML structure that we can use to extract data.

In [5]:
soup = BeautifulSoup(response.text, 'html.parser')

print(soup.prettify()[:500])

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Quotes to Scrape
  </title>
  <link href="/static/bootstrap.min.css" rel="stylesheet"/>
  <link href="/static/main.css" rel="stylesheet"/>
 </head>
 <body>
  <div class="container">
   <div class="row header-box">
    <div class="col-md-8">
     <h1>
      <a href="/" style="text-decoration: none">
       Quotes to Scrape
      </a>
     </h1>
    </div>
    <div class="col-md-4">
     <p>
      <a href="/login">
   


### Filtering Required Data

In [6]:
quotes = []

#
for quote in soup.find_all('div', class_='quote'):
    text = quote.find('span', class_='text').text
    author = quote.find('small', class_='author').text
    tags = [tag.get_text() for tag in quote.find_all('a', class_='tag')]

    quotes.append({'text': text, 'author': author, 'tags': tags})

for quote in quotes [:5]:
  print(quotes)

[{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'author': 'Albert Einstein', 'tags': ['change', 'deep-thoughts', 'thinking', 'world']}, {'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'author': 'J.K. Rowling', 'tags': ['abilities', 'choices']}, {'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'author': 'Albert Einstein', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles']}, {'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'author': 'Jane Austen', 'tags': ['aliteracy', 'books', 'classic', 'humor']}, {'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'author': 'Marilyn Monroe', 'tags': ['be-yourself', 'inspirationa

For example, we want to retrieve data in the form of quotes, authors, and tags. On this page, each quote is in a `<div class="quote">` element.

### Store Data into CSV


Now we will save the data we have extracted into a CSV file using `csv.writer`

In [31]:
import csv

# Open the file in write mode
with open("scrapped_quotes.csv", "w", newline='', encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["text", "author", "tags"])

    for quote in quotes:
        tags_string = ','.join(quote['tags'])
        writer.writerow([quote['text'], quote['author'], f'"{tags_string}"'])
