## Web Scraping Quotes from "Quotes to Scrape"
This notebook scrapes quotes, authors, and author details from the website.



## Import necessary liberaries


In [None]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

## Setting Up Web Scraping Environment  
- Define the **base URL** of the website.  
- Set up the **headers** to avoid being blocked.  
- Send a **GET request** to fetch the webpage content.


In [None]:
Base_url = 'https://quotes.toscrape.com'

url='https://quotes.toscrape.com/page/1/'
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
}
web_page = requests.get(url,headers = headers).text

In [None]:
soup = BeautifulSoup(web_page,'lxml') # Parse the webpage content using BeautifulSoup and lxml parser

In [None]:
quotes = soup.find_all('div',class_ = 'quote') # Find all quote containers on the page

In [None]:
len(quotes)

10

## Extracting Quotes, Authors, Tags, and Author Links  

In [None]:
# Create empty lists to store the scraped data
quote=[]
author=[]
genre=[]
about=[]

# Loop through all quotes found on the page
for i in quotes:
  quote.append(i.find('span',class_='text').text.strip())
  author.append(i.find('small',class_='author').text.strip())
  genre.append(",".join([tag.text.strip() for tag in i.find_all('a',class_='tag')]))


  # for the url of author
  relative_url = i.find('a', href=True)['href']
  full_author_url = Base_url + relative_url  # Combine with base URL
  about.append(full_author_url)  # Store the full link


## Creating a DataFrame

In [None]:
# Create a dictionary with the extracted data
d={'quote':quote,'author':author,'genre':genre,'about':about}

# Convert the dictionary into a Pandas DataFrame
df = pd.DataFrame(d)

In [None]:
df

Unnamed: 0,quote,author,genre,about
0,“The world as we have created it is a process ...,Albert Einstein,"change,deep-thoughts,thinking,world",https://quotes.toscrape.com/author/Albert-Eins...
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"abilities,choices",https://quotes.toscrape.com/author/J-K-Rowling
2,“There are only two ways to live your life. On...,Albert Einstein,"inspirational,life,live,miracle,miracles",https://quotes.toscrape.com/author/Albert-Eins...
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"aliteracy,books,classic,humor",https://quotes.toscrape.com/author/Jane-Austen
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"be-yourself,inspirational",https://quotes.toscrape.com/author/Marilyn-Monroe
5,“Try not to become a man of success. Rather be...,Albert Einstein,"adulthood,success,value",https://quotes.toscrape.com/author/Albert-Eins...
6,“It is better to be hated for what you are tha...,André Gide,"life,love",https://quotes.toscrape.com/author/Andre-Gide
7,"“I have not failed. I've just found 10,000 way...",Thomas A. Edison,"edison,failure,inspirational,paraphrased",https://quotes.toscrape.com/author/Thomas-A-Ed...
8,“A woman is like a tea bag; you never know how...,Eleanor Roosevelt,misattributed-eleanor-roosevelt,https://quotes.toscrape.com/author/Eleanor-Roo...
9,"“A day without sunshine is like, you know, nig...",Steve Martin,"humor,obvious,simile",https://quotes.toscrape.com/author/Steve-Martin


## Scraping Multiple Pages (Page 1 to 9)  
- Loop through **pages 1 to 9** dynamically.  
- Extract quotes, authors, tags, and author URLs.  
- Store data in **separate DataFrames** and merge them at the end.  



In [None]:
final = pd.DataFrame()
all_dfs = []  # Create a list to store DataFrames

for j in range (1,10):

    Base_url = 'https://quotes.toscrape.com'

    url='https://quotes.toscrape.com/page/1/'
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
    }
    web_page = requests.get(url,headers = headers).text

    soup = BeautifulSoup(web_page,'lxml')
    quotes = soup.find_all('div',class_ = 'quote')

    quote=[]
    author=[]
    genre=[]
    about=[]

    for i in quotes:
      quote.append(i.find('span',class_='text').text.strip())
      author.append(i.find('small',class_='author').text.strip())
      genre.append(",".join([tag.text.strip() for tag in i.find_all('a',class_='tag')]))


      relative_url = i.find('a', href=True)['href']
      full_author_url = Base_url + relative_url  # Combine with base URL
      about.append(full_author_url)  # Store the full link

    d={'quote':quote,'author':author,'genre':genre,'about':about}
    df = pd.DataFrame(d)
    all_dfs.append(df)  # Append the DataFrame to the list

# Concatenate all DataFrames into one final DataFrame
final = pd.concat(all_dfs,ignore_index=True)


In [None]:
final

Unnamed: 0,quote,author,genre,about
0,“The world as we have created it is a process ...,Albert Einstein,"change,deep-thoughts,thinking,world",https://quotes.toscrape.com/author/Albert-Eins...
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"abilities,choices",https://quotes.toscrape.com/author/J-K-Rowling
2,“There are only two ways to live your life. On...,Albert Einstein,"inspirational,life,live,miracle,miracles",https://quotes.toscrape.com/author/Albert-Eins...
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"aliteracy,books,classic,humor",https://quotes.toscrape.com/author/Jane-Austen
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"be-yourself,inspirational",https://quotes.toscrape.com/author/Marilyn-Monroe
...,...,...,...,...
85,“Try not to become a man of success. Rather be...,Albert Einstein,"adulthood,success,value",https://quotes.toscrape.com/author/Albert-Eins...
86,“It is better to be hated for what you are tha...,André Gide,"life,love",https://quotes.toscrape.com/author/Andre-Gide
87,"“I have not failed. I've just found 10,000 way...",Thomas A. Edison,"edison,failure,inspirational,paraphrased",https://quotes.toscrape.com/author/Thomas-A-Ed...
88,“A woman is like a tea bag; you never know how...,Eleanor Roosevelt,misattributed-eleanor-roosevelt,https://quotes.toscrape.com/author/Eleanor-Roo...


In [None]:
len(final)

90

In [None]:
from google.colab import files

# Save the DataFrame to a CSV file
final.to_csv('Quotes_to_scrape.csv', index=False)

# Download the file
files.download('Quotes_to_scrape.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
!ls /content/drive/MyDrive


'Colab Notebooks'
