# Data Analytics Project – Task 1: Web Scraping

### **Objective**
To scrape data from a public website using Python libraries such as **BeautifulSoup** and create a custom dataset that can be used for data analytics tasks.

## **Step 1: Import Libraries**

In [1]:

import requests
from bs4 import BeautifulSoup
import pandas as pd


## **Step 2: Choose a Website for Scraping**
We’ll scrape **quotes from http://quotes.toscrape.com** (a free website designed for web scraping practice).

## **Step 3: Fetch Web Page**

In [2]:

url = "http://quotes.toscrape.com/page/1/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
print(soup.title.string)


Quotes to Scrape


## **Step 4: Extract Data**
We’ll extract **Quotes**, **Authors**, and **Tags**.

In [3]:

quotes = []
authors = []
tags = []

quote_containers = soup.find_all("div", class_="quote")

for q in quote_containers:
    text = q.find("span", class_="text").get_text()
    quotes.append(text)
    
    author = q.find("small", class_="author").get_text()
    authors.append(author)
    
    tag_list = [tag.get_text() for tag in q.find_all("a", class_="tag")]
    tags.append(", ".join(tag_list))

for i in range(5):
    print(f"{i+1}. {quotes[i]} — {authors[i]} [{tags[i]}]")


1. “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” — Albert Einstein [change, deep-thoughts, thinking, world]
2. “It is our choices, Harry, that show what we truly are, far more than our abilities.” — J.K. Rowling [abilities, choices]
3. “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.” — Albert Einstein [inspirational, life, live, miracle, miracles]
4. “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.” — Jane Austen [aliteracy, books, classic, humor]
5. “Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.” — Marilyn Monroe [be-yourself, inspirational]


## **Step 5: Store Data in DataFrame**

In [4]:

df = pd.DataFrame({
    "Quote": quotes,
    "Author": authors,
    "Tags": tags
})

df.head()


Unnamed: 0,Quote,Author,Tags
0,“The world as we have created it is a process ...,Albert Einstein,"change, deep-thoughts, thinking, world"
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"abilities, choices"
2,“There are only two ways to live your life. On...,Albert Einstein,"inspirational, life, live, miracle, miracles"
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"aliteracy, books, classic, humor"
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"be-yourself, inspirational"


## **Step 6: Extend to Multiple Pages**

In [5]:

all_quotes = []
all_authors = []
all_tags = []

for page in range(1, 6):
    url = f"http://quotes.toscrape.com/page/{page}/"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    
    quote_containers = soup.find_all("div", class_="quote")
    
    for q in quote_containers:
        all_quotes.append(q.find("span", class_="text").get_text())
        all_authors.append(q.find("small", class_="author").get_text())
        tag_list = [tag.get_text() for tag in q.find_all("a", class_="tag")]
        all_tags.append(", ".join(tag_list))

df_full = pd.DataFrame({
    "Quote": all_quotes,
    "Author": all_authors,
    "Tags": all_tags
})

df_full.head(10)


Unnamed: 0,Quote,Author,Tags
0,“The world as we have created it is a process ...,Albert Einstein,"change, deep-thoughts, thinking, world"
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"abilities, choices"
2,“There are only two ways to live your life. On...,Albert Einstein,"inspirational, life, live, miracle, miracles"
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"aliteracy, books, classic, humor"
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"be-yourself, inspirational"
5,“Try not to become a man of success. Rather be...,Albert Einstein,"adulthood, success, value"
6,“It is better to be hated for what you are tha...,André Gide,"life, love"
7,"“I have not failed. I've just found 10,000 way...",Thomas A. Edison,"edison, failure, inspirational, paraphrased"
8,“A woman is like a tea bag; you never know how...,Eleanor Roosevelt,misattributed-eleanor-roosevelt
9,"“A day without sunshine is like, you know, nig...",Steve Martin,"humor, obvious, simile"


## **Step 7: Save Dataset**

In [6]:

df_full.to_csv("quotes_dataset.csv", index=False)
print("✅ Dataset saved as quotes_dataset.csv")


✅ Dataset saved as quotes_dataset.csv
