# WEB SCRAPING QUOTES WITH PYTHON

## Introduction
This project demonstrates Python web scraping on a real website, using 'requests' and 'BeautifulSoup'. We'll extract quotes, author names, and tags from http://quotes.toscrape.com/ and save the data for further analysis.
Web scraping enables data gathering from online sources for analytics and machine learning.

## Setup & Imports

In [4]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Target Website
We'll scrape data from http://quotes.toscrape.com/, a practice website created for web scraping learners

In [33]:
url= "http://quotes.toscrape.com"

## Fetch HTML Content
We use 'requests.get()' to download the HTML content of the page. A 200 status code means success

In [16]:
response = requests.get(url)
print("Status:", response.status_code)
if response.status_code == 200:
    print("Page fetched successfully!")
else:
    raise Exception("Failed to fetch the webpage.")

Status: 200
Page fetched successfully!


## Parse HTML and Extract Data
We'll use "BeautifulSoup' to parse the HTML, then find quote blocks and extract the quote text, the author, and associated tags from each block.

In [21]:
soup = BeautifulSoup(response.text, "html.parser")
quote_blocks = soup.find_all("div", class_="quote")
all_quotes = []
for q in quote_blocks:
    text = q.find("span", class_="text").get_text(strip=True)
    author = q.find("small", class_="author").get_text(strip=True)
    tags = [tag.get_text(strip=True) for tag in q.find_all("a", class_="tag")]
    all_quotes.append({"Quote": text,"Author": author,"Tags": ", ".join(tags)})
df_quotes = pd.DataFrame(all_quotes)
df_quotes.head()

Unnamed: 0,Quote,Author,Tags
0,“The world as we have created it is a process ...,Albert Einstein,"change, deep-thoughts, thinking, world"
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"abilities, choices"
2,“There are only two ways to live your life. On...,Albert Einstein,"inspirational, life, live, miracle, miracles"
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"aliteracy, books, classic, humor"
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"be-yourself, inspirational"


## Save to CSV
We save the scrapped data to a CSV file for future analysis, sharing, or visualization.

In [24]:
df_quotes.to_csv("quotes.csv", index=False)
print("Quotes saved to quotes.csv")

Quotes saved to quotes.csv


## Simple Data Analysis
Let's use Pandas to see which authors appear most frequently and what tags are popular.

In [27]:
print("Author counts:")
print(df_quotes["Author"].value_counts())
tags_list = []
for tag_string in df_quotes["Tags"]:
    tags_list.extend([t.strip() for t in tag_string.split(",") if t])
pd.Series(tags_list).value_counts().head(10)

Author counts:
Author
Albert Einstein      3
J.K. Rowling         1
Jane Austen          1
Marilyn Monroe       1
André Gide           1
Thomas A. Edison     1
Eleanor Roosevelt    1
Steve Martin         1
Name: count, dtype: int64


inspirational                      3
humor                              2
life                               2
change                             1
obvious                            1
misattributed-eleanor-roosevelt    1
paraphrased                        1
failure                            1
edison                             1
love                               1
Name: count, dtype: int64

## Notes and Ethics
- This site is designed for learning; **never scrape commercial websites without checking their robots.txt and terms of service.**
- You can extend this code for multi page scraping by following "Next" links.

## Next Steps
- Scrape all pages by following the pagination links.
- Analyze sentiment or wordclouds of the quotes.
- Try scraping job listings, shop items, or news headlines (always check site policy).

## About This Notebook
**Author:** Aryan Tyagi
**Location:** New Delhi, India
**Tools:** Python, Requests, BeautifulSoup, Pandas

## References
- https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- https://docs.python-requests.org/
- http://quotes.toscrape.com