# Importing Necessary Libraries

In [None]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

# Extracting Books Details

This Python function, extract_book_details, parses information from a given book tag, typically obtained from a webpage containing a list of books. It extracts details such as the book's title, rating, price, and link to its individual page.

1.The Title is extracted from the title attribute of the anchor tag within the book tag.
2.The Rating is derived from the second class of the paragraph tag within the book tag.
3.The Price is obtained from the text within a paragraph tag with the class price_color, excluding the currency symbol at the beginning.
4.The Link is formed by appending the relative URL obtained from the anchor tag to the base URL of the website.

The function then returns these extracted details as a tuple: (Title, Rating, Price, Link).

In [None]:
def extract_book_details(book_tag):
    Title = book_tag.find('a', title=True)['title']
    Rating = book_tag.find('p')['class'][1]
    Price = book_tag.find('p', class_='price_color').text[1:]
    Link = 'http://books.toscrape.com/' + book_tag.find('a')['href']
    return Title, Rating, Price, Link

# Establishing Connection

This Python function, fetch_page_soup, is responsible for retrieving the HTML content of a webpage specified by the url parameter and converting it into a BeautifulSoup object for parsing.

1.It first sends a GET request to the provided URL using the requests.get() function from the requests library.
2.If the response status code is 200 (indicating a successful request), the HTML content of the webpage is passed to the bs() function (assuming it's an alias for BeautifulSoup), creating a BeautifulSoup object.
3.If the status code is not 200 (indicating an unsuccessful request), the function returns None, signifying that the webpage could not be fetched.

In [None]:
def fetch_page_soup(url):
    resp = requests.get(url)
    if resp.status_code == 200:
        return bs(resp.text)
    else: return None

# Scrapping the Data 

This Python script retrieves data from multiple pages of a fictional online bookstore and stores it in a structured format using the Pandas library. The fetch_books_data function iterates through a specified number of pages, extracts book details such as title, rating, price, and link, and compiles them into a DataFrame. Each page's HTML content is fetched using a helper function (fetch_page_soup), and book details are extracted from HTML elements using another helper function (extract_book_details). The resulting DataFrame contains the aggregated book data, ready for further analysis or processing.

In [None]:
def fetch_books_data(pages=7):
    all_books = []
    for page_num in range(1, pages + 1):
        page_url = f'http://books.toscrape.com/catalogue/page-{page_num}.html'
        page_soup = fetch_page_soup(page_url)
        if page_soup:
            book_tags = page_soup.find_all('article', class_='product_pod')
            for book_tag in book_tags:
                book_info = extract_book_details(book_tag)
                all_books.append(book_info)

    all_books = pd.DataFrame(all_books, columns=['Title', 'Rating', 'Price', 'Link'])
    return all_books


In [None]:
df = fetch_books_data(7)
df.head(10)