# In This Project we are Scrapping Data i.e Name(author), Quotes and Tags
# Site Used: http://quotes.toscrape.com/

# Libraries Used:
 Pandas: Here pandas is used to to create dataframe and save it to csv file.
 
 BeautifulSoup: Here BeautifulSoup is used  to scrape all required information from webpages.
 
 Requests: Here Requests is used to send http requests and get all response data from website.

# Installing All Important Libraries Which Is Required For This Project

In [1]:
# Installing pandas
!pip install pandas

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 23.0.1 -> 23.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# Installing BeautifulSoup
!pip install bs4

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 23.0.1 -> 23.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
# Installing requests
!pip install requests

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 23.0.1 -> 23.1
[notice] To update, run: python.exe -m pip install --upgrade pip


# Importing All Important Libraries Which Is Required For This Project

In [4]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Create functions for the end-to-end process of downloading, parsing, and saving CSVs.

In [36]:
# This is a Function used to scrap data from websites, in this project we are sacrapping name(author), quotes and tags.
def scrap_data(): 
# creating empty list here for storing our scrapped data.
    Content = []
# Here i = 1 means that we are starting our scrapping data process from page 1.
    i = 1

    '''
    Using while loop here to check all the available pages which has information which we require i.e name, quotes and tags.
    Using this here because we don't know the number of pages available, to get rid of manually checking all the pages we are
    using while loop here which will do all the work for us automatically.
    '''
    while True:
        url = f"http://quotes.toscrape.com/page/{i}/"
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        Data = soup.find_all('div', class_='quote')
        if not Data:
            #if no data is found it will save our data in csv file.
            print("Total number of page with required data is:", i-1)
            save_as_csv(Content)
            print("Data saved in csv file")
            break # stop requesting pages if no quotes are found

        '''
        Using for loop to iterate through all the quote elements on the page and extracts the name of the author, the quote text, 
        and the tags associated with the quote.
        '''
        for data in Data:
            # To Scrap Name(author)
            name = data.find('small', class_='author').text
            # To Scrap Quotes
            quotes = data.find('span', class_='text').text
            # To Scrap tags
            tag = data.find('div', class_='tags')
            keywords = tag.find('meta', attrs={'itemprop': 'keywords'})
            tags = keywords['content']
            #here appending all extracted data to empty list which is mentioned above i.e. Content = [].
            Content.append([name, quotes, tags])
        i += 1 #increment that page by 1
        
# This Function is for saving our scraped data in csv format.
def save_as_csv(Content):
    data = pd.DataFrame(Content, columns = ['Name', "Quotes", "Tags"])
    data.to_csv('quote_Scrap.csv', index = False)
        
#Here calling scrap_data() function to execute this function    
scrap_data()    

Total number of page with required data is: 10
Data saved in csv file


In [29]:
# Verifying the information in the saved CSV files by reading them back using Pandas.

df = pd.read_csv("quote_Scrap.csv")
df.head(100)

Unnamed: 0,Name,Quotes,Tags
0,Albert Einstein,“The world as we have created it is a process ...,"change,deep-thoughts,thinking,world"
1,J.K. Rowling,"“It is our choices, Harry, that show what we t...","abilities,choices"
2,Albert Einstein,“There are only two ways to live your life. On...,"inspirational,life,live,miracle,miracles"
3,Jane Austen,"“The person, be it gentleman or lady, who has ...","aliteracy,books,classic,humor"
4,Marilyn Monroe,"“Imperfection is beauty, madness is genius and...","be-yourself,inspirational"
...,...,...,...
95,Harper Lee,“You never really understand a person until yo...,better-life-empathy
96,Madeleine L'Engle,“You have to write the book that wants to be w...,"books,children,difficult,grown-ups,write,write..."
97,Mark Twain,“Never tell the truth to people who are not wo...,truth
98,Dr. Seuss,"“A person's a person, no matter how small.”",inspirational


# Summary:
This is a Python project that is used to scrape data from a website. The function is designed to extract the name of the author, quotes, and tags from a website called "http://quotes.toscrape.com/".

The function scrap_data() begins by creating an empty list called "Content" to store the scraped data. It then sets a variable called "i" to 1, indicating that it will begin scraping data from page 1 of the website.

The function uses a while loop to iterate through all the available pages of the website. It does this by dynamically creating a URL for each page using string interpolation and requests the HTML content of the page using the requests library. The HTML content is then parsed using the BeautifulSoup library.Also, using while loop here to check all the available pages which has information which we require i.e name, quotes and tags.Using this here because we don't know the number of pages available, to get rid of manually checking all the pages we are using while loop here which will do all the work for us automatically. If no required data is found the while loop will break.

The function then uses a for loop to iterate through all the quote elements on the page and extracts the name of the author, the quote text, and the tags associated with the quote. This data is then appended to the "Content" list.

Once all the pages have been scraped, the function calls another function called "save_as_csv" to save the scraped data to a CSV file.

The "save_as_csv" function takes the "Content" list as an input parameter and converts it into a Pandas DataFrame. The DataFrame is then saved as a CSV file called "quote_Scrap.csv".

Overall, this function is a useful tool for anyone who needs to scrape data from websites, especially for those who need to extract information from multiple pages of a website.