# 📚 Web Scraping Books Data (Beginner Guide)

**✅ What You'll Learn**

1. What is Web Scraping?
2. How to Extract Data from Websites


**1. What is Web Scraping ?**

 Web scraping is the process of automatically extracting data from websites using code. It involves fetching a webpage, parsing its HTML, and extracting useful information for further use, such as storing it in a database or analyzing it.

**How Web Scraping Works ?**

**Send a Request →** A program sends an HTTP request to a website.

**Retrieve HTML →** The website returns the raw HTML code.

**Parse the Data →** A parser (e.g., BeautifulSoup) extracts useful content.

**Store or Process →** The data is stored in a structured format like CSV, JSON, or a database.

**2. How to Extract data from website ?**


In [2]:
# Import all the required libraries
from pickle import TRUE
import pandas as pd
import requests
from bs4 import BeautifulSoup

# Write the base URL for the website
base_url = 'https://books.toscrape.com/catalogue/page-{}.html'
page = 1

# Initialize empty lists to store extracted data from the website
bookTitleList = []
prices = []
ratingsOfBooks = []

# While loop to handle multiple pages, making requests to each page by slightly modifying the base URL
while TRUE:
   url = base_url.format(page)# url.format(page) fills in the placeholder {} in base_url with the page number (e.g., page-1, page-2, etc.)
   response = requests.get(url) # requests.get is used to make a request to the website

   if response.status_code != 200: # If the response is not 200, it means an error occurred while making the request
      break;

   soup = BeautifulSoup(response.text , 'html.parser') # Parse the response using BeautifulSoup with the 'html.parser' option

   books = soup.find_all('h3') # We use find , find_all method to find specific tag
   priceOfBooks = soup.find_all('p' , class_ = "price_color") # We can extract tags by specifing the class name in class_ attribute.

   for book in books: # In this example all title of books were in h3 tag so i stored all h3 tags title in books and iterate over it so that i can append it one by one in bookTitleList which we initialzed at start
      bookTitle = book.a["title"]
      bookTitleList.append(bookTitle)

   for price in priceOfBooks: # Similary for price of the book
      priceOfBook = price.text
      priceOfBook = priceOfBook[1:-1]
      prices.append(priceOfBook)

   ratings = soup.find_all('p' , class_ = "star-rating") # Similarly for rating of the book
   for rating in ratings:
      ratingOfBook = rating["class"][1] # The second class name represents the book rating (e.g., "Three", "Four")
      ratingsOfBooks.append(ratingOfBook)

   page = page + 1;

# Create a DataFrame using the extracted data
df = pd.DataFrame({
    "Title":bookTitleList,
    "Price":prices,
    "Rating": ratingsOfBooks
})

# Save the extracted data to a CSV file
df.to_csv("books_data.csv" , index = False)

# Now we can perform ML or data analysis on this dataset
BooksDataFrame = pd.read_csv("books_data.csv")
BooksDataFrame.head()

Unnamed: 0,Title,Price,Rating
0,A Light in the Attic,£51.7,Three
1,Tipping the Velvet,£53.7,One
2,Soumission,£50.1,One
3,Sharp Objects,£47.8,Four
4,Sapiens: A Brief History of Humankind,£54.2,Five
