# 📘 Web Scraping Project: Books to Scrape

In this project, we will scrape data from [Books to Scrape](https://books.toscrape.com),  
a mock e-commerce website designed for practicing web scraping.

**We will extract:**
- Book Title
- Price
- Availability
- Rating
- Product Page URL

We will use Python libraries like `requests`, `BeautifulSoup`, and `pandas`.


# 🔧 Import Libraries

In [None]:
# Step 1: Import required libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd


# 🔢 Rating Converter

In [4]:
# Step 2: Define helper function to get star rating as number
def convert_rating(text):
    ratings = {'One': 1, 'Two': 2, 'Three': 3, 'Four': 4, 'Five': 5}
    return ratings.get(text, 0)


# 🔁 Loop Over 50 Pages

In [20]:
# Step 3: Scrape book data from all 50 pages
base_url = 'https://books.toscrape.com/catalogue/page-{}.html'
books = []

for page in range(1, 51):
    url = base_url.format(page)
    print(f"Scraping page {page}...")
    res = requests.get(url)
    soup = BeautifulSoup(res.content, 'html.parser')
    articles = soup.find_all('article', class_='product_pod')

    for book in articles:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text.strip()[1:]  # remove £
        availability = book.find('p', class_='instock availability').text.strip()
        rating = book.p['class'][1]  # get rating text like 'Three'
        rating_num = convert_rating(rating)
        link = 'https://books.toscrape.com/catalogue/' + book.h3.a['href']
        books.append([title, price, availability, rating_num, link])


Scraping page 1...
Scraping page 2...
Scraping page 3...
Scraping page 4...
Scraping page 5...
Scraping page 6...
Scraping page 7...
Scraping page 8...
Scraping page 9...
Scraping page 10...
Scraping page 11...
Scraping page 12...
Scraping page 13...
Scraping page 14...
Scraping page 15...
Scraping page 16...
Scraping page 17...
Scraping page 18...
Scraping page 19...
Scraping page 20...
Scraping page 21...
Scraping page 22...
Scraping page 23...
Scraping page 24...
Scraping page 25...
Scraping page 26...
Scraping page 27...
Scraping page 28...
Scraping page 29...
Scraping page 30...
Scraping page 31...
Scraping page 32...
Scraping page 33...
Scraping page 34...
Scraping page 35...
Scraping page 36...
Scraping page 37...
Scraping page 38...
Scraping page 39...
Scraping page 40...
Scraping page 41...
Scraping page 42...
Scraping page 43...
Scraping page 44...
Scraping page 45...
Scraping page 46...
Scraping page 47...
Scraping page 48...
Scraping page 49...
Scraping page 50...


# 📊 Create DataFrame

In [10]:
# Step 4: Create DataFrame
df = pd.DataFrame(books, columns=['Title', 'Price (£)', 'Availability', 'Rating', 'Product URL'])
df.head()


Unnamed: 0,Title,Price (£),Availability,Rating,Product URL
0,A Light in the Attic,51.77,In stock,3,https://books.toscrape.com/catalogue/a-light-i...
1,Tipping the Velvet,53.74,In stock,1,https://books.toscrape.com/catalogue/tipping-t...
2,Soumission,50.1,In stock,1,https://books.toscrape.com/catalogue/soumissio...
3,Sharp Objects,47.82,In stock,4,https://books.toscrape.com/catalogue/sharp-obj...
4,Sapiens: A Brief History of Humankind,54.23,In stock,5,https://books.toscrape.com/catalogue/sapiens-a...


# 📡 Live Currency Conversion using exchangerate-api.com

In [21]:
# Step 5: Get live conversion rate GBP to INR
import json

live_url = "https://api.frankfurter.app/latest?from=GBP&to=INR"
live_res = requests.get(live_url)
live_data = json.loads(live_res.text)
conversion_rate = live_data['rates']['INR']
print(f"Live GBP to INR rate: ₹{conversion_rate}")

# Apply conversion to new column
df['Price (₹)'] = df['Price (£)'].astype(float) * conversion_rate
df[['Title', 'Price (£)', 'Price (₹)', 'Availability', 'Rating', 'Product URL']].head()


Live GBP to INR rate: ₹116.42


Unnamed: 0,Title,Price (£),Price (₹),Availability,Rating,Product URL
0,A Light in the Attic,51.77,6027.0634,In stock,3,https://books.toscrape.com/catalogue/a-light-i...
1,Tipping the Velvet,53.74,6256.4108,In stock,1,https://books.toscrape.com/catalogue/tipping-t...
2,Soumission,50.1,5832.642,In stock,1,https://books.toscrape.com/catalogue/soumissio...
3,Sharp Objects,47.82,5567.2044,In stock,4,https://books.toscrape.com/catalogue/sharp-obj...
4,Sapiens: A Brief History of Humankind,54.23,6313.4566,In stock,5,https://books.toscrape.com/catalogue/sapiens-a...


# 💾 Export to CSV

In [22]:
# Step 6: Convert prices from GBP to INR (assumed rate: 1 GBP = ₹105)
conversion_rate = 105
df['Price (₹)'] = df['Price (£)'].astype(float) * conversion_rate

# Show updated DataFrame
df[['Title', 'Price (£)', 'Price (₹)', 'Availability', 'Rating', 'Product URL']].head()


Unnamed: 0,Title,Price (£),Price (₹),Availability,Rating,Product URL
0,A Light in the Attic,51.77,5435.85,In stock,3,https://books.toscrape.com/catalogue/a-light-i...
1,Tipping the Velvet,53.74,5642.7,In stock,1,https://books.toscrape.com/catalogue/tipping-t...
2,Soumission,50.1,5260.5,In stock,1,https://books.toscrape.com/catalogue/soumissio...
3,Sharp Objects,47.82,5021.1,In stock,4,https://books.toscrape.com/catalogue/sharp-obj...
4,Sapiens: A Brief History of Humankind,54.23,5694.15,In stock,5,https://books.toscrape.com/catalogue/sapiens-a...


In [23]:
# Step 7: Save to CSV
df.to_csv('books.csv', index=False)
print("✅ Data saved to books.csv")


✅ Data saved to books.csv


# 🔍 Filter Books Below ₹5000

In [24]:
# Filter books priced under ₹5000
cheap_books = df[df['Price (₹)'] < 5000]
print(f"Books under ₹5000: {len(cheap_books)} found")
cheap_books[['Title', 'Price (₹)', 'Availability', 'Rating', 'Product URL']].head()


Books under ₹5000: 752 found


Unnamed: 0,Title,Price (₹),Availability,Rating,Product URL
5,The Requiem Red,2378.25,In stock,1,https://books.toscrape.com/catalogue/the-requi...
6,The Dirty Little Secrets of Getting Your Dream...,3500.7,In stock,4,https://books.toscrape.com/catalogue/the-dirty...
7,The Coming Woman: A Novel Based on the Life of...,1882.65,In stock,3,https://books.toscrape.com/catalogue/the-comin...
8,The Boys in the Boat: Nine Americans and Their...,2373.0,In stock,4,https://books.toscrape.com/catalogue/the-boys-...
10,"Starving Hearts (Triangular Trade Trilogy, #1)",1468.95,In stock,2,https://books.toscrape.com/catalogue/starving-...


# 📊 Analyze Average Price per Rating & Availability

In [25]:
# Average price per rating
print("Average price (₹) by Rating:")
print(df.groupby('Rating')['Price (₹)'].mean().sort_index())

# Average price per availability status
print("\nAverage price (₹) by Availability:")
print(df.groupby('Availability')['Price (₹)'].mean())


Average price (₹) by Rating:
Rating
1    3628.925442
2    3655.146429
3    3642.662069
4    3789.796089
5    3714.321429
Name: Price (₹), dtype: float64

Average price (₹) by Availability:
Availability
In stock    3682.38675
Name: Price (₹), dtype: float64


## ✅ Summary

- We successfully scraped book information from all 50 pages.
- Extracted book titles, prices, availability, ratings, and product URLs.
- Integrated live currency conversion API.
- Stored the data in a clean `pandas` DataFrame.
- Saved the dataset as `books.csv` for further use.
- Filtered books using INR prices.
- Analyzed trends based on rating and stock availability.


